[Clam-devel] [PATCH] std::vector operator and not calling frameSize()
David García Garzón
dgarcia at iua.upf.edu
Thu Jun 14 11:15:57 PDT 2007
Nice to see you sending patches and getting familiar with the chord extraction
code. Optimizing is a good way of getting insight of the code. Perfect.
On Thursday 14 June 2007 17:46:31 Roman Goj wrote:
> Finally I managed to cook up something which should actually be useful
> :) (contrary to the previous "processing time improvement" patch...
> btw. is it not ok? it didn't get comitted... )
> I learned to use valgrind (Pau, thanks for the tip, I would have had
> doubts whether to use it or try anything else :) ) ... and it turns
> out that right after fftw_execute, the biggest memory hog is the 
> operator in std::vector and since _spectrum is a vector and it has to
> be assigned literally millions of times in ConstantQTransform.cxx,
> TonalAnalysis.cxx and FourierTransform.cxx I thought optimizing that
> was the way to go.
> The hack I did seems a bit dirty (at least to me)
> double * spectrumArray = &_spectrum;
> and then instead of calling _spectrum[i] I call spectrumArray[i]
> and this turns out to be something like 5 times faster... I guess
> iterators would have been a nicer way to do this, but I think they
> would also be slower...
&_spectrum is not dirty at all if you use it wisely. And if it speedups so
much is not dirty it is almost wonderful. But this must be done in a very
controlled code (as you do) so that vector memory won't be invalidated.
Anyway Pau might be right when says that we maybe missing some 'release'
compilation flag. operator() is meant to be inline and unchecked versus
> This site: (quite interesting, a nice find :) )
> says this method is actually quite valid :) (should I dwelve deeper in
> the site and maybe use the later tips also?)
> Plus there were two loops which called frameSize() like this:
> for(i=0; i<frameSize(); i++)
> and it turns out the compiler isn't intelligent enough to optimize
> this (I never cared about this, always thought gcc would take care of
> this for me, I'm disappointed :( ) and calls frameSize everytime the
> stopping condition is evaluated... so I put the result of frameSize()
> in a variable and used that (a couple percent of improvement)
Yep, the nice way.
> I checked out the svn code today (would have sent the patch much
> earlier but turns out there was something wrong with the way I was
> building CLAM and I had some trouble with that :( ), compiled it and
> ./ChordExtractor -f .pool Debaser-CoffeeSmell.mp3
> which resulted in:
> Processing time: 585 seconds
> then I recompiled it using fftw3 (why isn't this turned on by
> default?), tried ChordExtractor again and got:
> Processing time: 294 seconds
> then I put the
> to work and got:
> Processing time: 187 seconds
> So the patch gives 36% improvement compared to the fftw3 version :)
> (this doesn't take into account the improvement in TonalAnalysis.cxx,
> which is also audible, but I didn't benchmark it that well)
Wow. Do you keep the callgrind files? could you tell us which ones benefited
the more of the optimization. (Disable the '%' mode and take the cycle
estimation) It is a pity that the EfficiencyGuardian is still not deployed to
keep track of CLAM optimizations.
Are you enabling the cache simulation? Cache has a deep impact into callgrind
numbers in the ChordExtraction algorithm.
> Also doing and exhaustive search for a plan helps:
> Processing time: 130 seconds
> but, as I previously posted, this requires some additional
> calculations (lots of them) beforehand, I'll try to catch you on #clam
> tomorrow to ask about this...
Computing wisdom seems to be too long. Hours!! i thought it wasn't so long.
Computing it on installation is something that must be managed, we'd better
focus on other issues. Anyway, knowing that we can still speedup fftw3 a 30%
is a nice thing to know. We can take the option for specific cases.
A compromise solution could be to check whether the wisdom file exists on some
standard paths (documentation suggests /etc/fftw/wisdom and i would add
~/.fftw/wisdom). If it exists, we can do EXHAUSTIVE if it doesn't, do
ESTIMATE). This way a user may choose to compute the wisdom with
the 'fftw-wisdom' command.
One problem with this approach is that the wisdom file may contain several
configurations. If i understood it properly, any new fft size will compute
the wisdom again. Do you know any method to know whether your configuration
is in a given wisdom file, to ignore the WISDOM if you didn't precomputed it?
> So... I hope you accept these changes :) (And could anyone tell me how
> it works on a better computer? On my duron 700 with the changes
> TonalAnalysis in NetworkEditor works better then the cvs version on my
> sempron 2,4... so I guess that's nice ;) )
I'll commit the vector related changes tomorrow or during the weekend. By now,
i won't commit the wisdom related ones, but i'll keep it in a safe place. ;-)
You are wellcome to send a new conditional wisdom computing patch, but don't
hesitate if you don't.
More information about the clam-devel