[Clam-devel] [PATCH] std::vector operator[] and not calling frameSize()

Thu Jun 14 11:15:57 PDT 2007

Nice to see you sending patches and getting familiar with the chord extraction 
code. Optimizing is a good way of getting insight of the code. Perfect.

On Thursday 14 June 2007 17:46:31 Roman Goj wrote:
> Hi!
>
> Finally I managed to cook up something which should actually be useful
>
> :) (contrary to the previous "processing time improvement" patch...
>
> btw. is it not ok? it didn't get comitted... )
>
> I learned to use valgrind (Pau, thanks for the tip, I would have had
> doubts whether to use it or try anything else :) ) ... and it turns
> out that right after fftw_execute, the biggest memory hog is the []
> operator in std::vector and since _spectrum is a vector and it has to
> be assigned literally millions of times in ConstantQTransform.cxx,
> TonalAnalysis.cxx and FourierTransform.cxx I thought optimizing that
> was the way to go.
>
> The hack I did seems a bit dirty (at least to me)
> double * spectrumArray = &_spectrum[0];
> and then instead of calling _spectrum[i] I call spectrumArray[i]
> and this turns out to be something like 5 times faster... I guess
> iterators would have been a nicer way to do this, but I think they
> would also be slower...

&_spectrum[0] is not dirty at all if you use it wisely. And if it speedups so 
much is not dirty it is almost wonderful. But this must be done in a very 
controlled code (as you do) so that vector memory won't be invalidated. 

Anyway Pau might be right when says that we maybe missing some 'release' 
compilation flag. operator[]() is meant to be inline and unchecked versus 
at().

> This site: (quite interesting, a nice find :) )
> http://www.gotw.ca/gotw/050.htm
> says this method is actually quite valid :) (should I dwelve deeper in
> the site and maybe use the later tips also?)
>
> Plus there were two loops which called frameSize() like this:
> for(i=0; i<frameSize(); i++)
> and it turns out the compiler isn't intelligent enough to optimize
> this (I never cared about this, always thought gcc would take care of
> this for me, I'm disappointed :( ) and calls frameSize everytime the
> stopping condition is evaluated... so I put the result of frameSize()
> in a variable and used that (a couple percent of improvement)

Yep, the nice way.

> Conclusions:
> I checked out the svn code today (would have sent the patch much
> earlier but turns out there was something wrong with the way I was
> building CLAM and I had some trouble with that :( ), compiled it and
> issued:
> ./ChordExtractor -f .pool Debaser-CoffeeSmell.mp3
>
> which resulted in:
> Processing time: 585 seconds
>
> then I recompiled it using fftw3 (why isn't this turned on by
> default?), tried ChordExtractor again and got:
> Processing time: 294 seconds
>
> then I put the
> optimizations-vectors-into-arrays-frameSize-as-variable-not-function.patch
> to work and got:
> Processing time: 187 seconds
>
> So the patch gives 36% improvement compared to the fftw3 version :)
> (this doesn't take into account the improvement in TonalAnalysis.cxx,
> which is also audible, but I didn't benchmark it that well)

Wow. Do you keep the callgrind files? could you tell us which ones benefited 
the more of the optimization. (Disable the '%' mode and take the cycle 
estimation) It is a pity that the EfficiencyGuardian is still not deployed to 
keep track of CLAM optimizations. 

Are you enabling the cache simulation? Cache has a deep impact into callgrind 
numbers in the ChordExtraction algorithm.

> Also doing and exhaustive search for a plan helps:
> Processing time: 130 seconds
> but, as I previously posted, this requires some additional
> calculations (lots of them) beforehand, I'll try to catch you on #clam
> tomorrow to ask about this...

Computing wisdom seems to be too long. Hours!! i thought it wasn't so long.
Computing it on installation is something that must be managed, we'd better 
focus on other issues. Anyway, knowing that we can still speedup fftw3 a 30% 
is a nice thing to know. We can take the option for specific cases.

A compromise solution could be to check whether the wisdom file exists on some 
standard paths (documentation suggests /etc/fftw/wisdom and i would add 
~/.fftw/wisdom). If it exists, we can do EXHAUSTIVE if it doesn't, do 
ESTIMATE). This way a user may choose to compute the wisdom with 
the 'fftw-wisdom' command.

One problem with this approach is that the wisdom file may contain several 
configurations. If i understood it properly, any new fft size will compute 
the wisdom again. Do you know any method to know whether your configuration 
is in a given wisdom file, to ignore the WISDOM if you didn't precomputed it?

> So... I hope you accept these changes :) (And could anyone tell me how
> it works on a better computer? On my duron 700 with the changes
> TonalAnalysis in NetworkEditor works better then the cvs version on my
> sempron 2,4... so I guess that's nice ;) )

I'll commit the vector related changes tomorrow or during the weekend. By now, 
i won't commit the wisdom related ones, but i'll keep it in a safe place. ;-) 
You are wellcome to send a new conditional wisdom computing patch, but don't 
hesitate if you don't.

David.