[Clam-devel] [PATCH] std::vector operator[] and not calling frameSize()
David García Garzón
dgarcia at iua.upf.edu
Thu Jun 14 11:15:57 PDT 2007
Nice to see you sending patches and getting familiar with the chord extraction
code. Optimizing is a good way of getting insight of the code. Perfect.
On Thursday 14 June 2007 17:46:31 Roman Goj wrote:
> Hi!
>
> Finally I managed to cook up something which should actually be useful
>
> :) (contrary to the previous "processing time improvement" patch...
>
> btw. is it not ok? it didn't get comitted... )
>
> I learned to use valgrind (Pau, thanks for the tip, I would have had
> doubts whether to use it or try anything else :) ) ... and it turns
> out that right after fftw_execute, the biggest memory hog is the []
> operator in std::vector and since _spectrum is a vector and it has to
> be assigned literally millions of times in ConstantQTransform.cxx,
> TonalAnalysis.cxx and FourierTransform.cxx I thought optimizing that
> was the way to go.
>
> The hack I did seems a bit dirty (at least to me)
> double * spectrumArray = &_spectrum[0];
> and then instead of calling _spectrum[i] I call spectrumArray[i]
> and this turns out to be something like 5 times faster... I guess
> iterators would have been a nicer way to do this, but I think they
> would also be slower...
&_spectrum[0] is not dirty at all if you use it wisely. And if it speedups so
much is not dirty it is almost wonderful. But this must be done in a very
controlled code (as you do) so that vector memory won't be invalidated.
Anyway Pau might be right when says that we maybe missing some 'release'
compilation flag. operator[]() is meant to be inline and unchecked versus
at().
> This site: (quite interesting, a nice find :) )
> http://www.gotw.ca/gotw/050.htm
> says this method is actually quite valid :) (should I dwelve deeper in
> the site and maybe use the later tips also?)
>
> Plus there were two loops which called frameSize() like this:
> for(i=0; i<frameSize(); i++)
> and it turns out the compiler isn't intelligent enough to optimize
> this (I never cared about this, always thought gcc would take care of
> this for me, I'm disappointed :( ) and calls frameSize everytime the
> stopping condition is evaluated... so I put the result of frameSize()
> in a variable and used that (a couple percent of improvement)
Yep, the nice way.
> Conclusions:
> I checked out the svn code today (would have sent the patch much
> earlier but turns out there was something wrong with the way I was
> building CLAM and I had some trouble with that :( ), compiled it and
> issued:
> ./ChordExtractor -f .pool Debaser-CoffeeSmell.mp3
>
> which resulted in:
> Processing time: 585 seconds
>
> then I recompiled it using fftw3 (why isn't this turned on by
> default?), tried ChordExtractor again and got:
> Processing time: 294 seconds
>
> then I put the
> optimizations-vectors-into-arrays-frameSize-as-variable-not-function.patch
> to work and got:
> Processing time: 187 seconds
>
> So the patch gives 36% improvement compared to the fftw3 version :)
> (this doesn't take into account the improvement in TonalAnalysis.cxx,
> which is also audible, but I didn't benchmark it that well)
Wow. Do you keep the callgrind files? could you tell us which ones benefited
the more of the optimization. (Disable the '%' mode and take the cycle
estimation) It is a pity that the EfficiencyGuardian is still not deployed to
keep track of CLAM optimizations.
Are you enabling the cache simulation? Cache has a deep impact into callgrind
numbers in the ChordExtraction algorithm.
> Also doing and exhaustive search for a plan helps:
> Processing time: 130 seconds
> but, as I previously posted, this requires some additional
> calculations (lots of them) beforehand, I'll try to catch you on #clam
> tomorrow to ask about this...
Computing wisdom seems to be too long. Hours!! i thought it wasn't so long.
Computing it on installation is something that must be managed, we'd better
focus on other issues. Anyway, knowing that we can still speedup fftw3 a 30%
is a nice thing to know. We can take the option for specific cases.
A compromise solution could be to check whether the wisdom file exists on some
standard paths (documentation suggests /etc/fftw/wisdom and i would add
~/.fftw/wisdom). If it exists, we can do EXHAUSTIVE if it doesn't, do
ESTIMATE). This way a user may choose to compute the wisdom with
the 'fftw-wisdom' command.
One problem with this approach is that the wisdom file may contain several
configurations. If i understood it properly, any new fft size will compute
the wisdom again. Do you know any method to know whether your configuration
is in a given wisdom file, to ignore the WISDOM if you didn't precomputed it?
> So... I hope you accept these changes :) (And could anyone tell me how
> it works on a better computer? On my duron 700 with the changes
> TonalAnalysis in NetworkEditor works better then the cvs version on my
> sempron 2,4... so I guess that's nice ;) )
I'll commit the vector related changes tomorrow or during the weekend. By now,
i won't commit the wisdom related ones, but i'll keep it in a safe place. ;-)
You are wellcome to send a new conditional wisdom computing patch, but don't
hesitate if you don't.
David.
More information about the clam-devel
mailing list