[Clam-devel] recall/precision for a simple song, and the road to getting more and better results...
roman.goj at gmail.com
Thu Aug 16 17:49:56 PDT 2007
I wanted to test the new recall and precision computing script
(committed yesterday, finished (more or less) today). Unfortunately we
were unable to find the repository of annotated Beatles' songs. which
would've been ideal for testing. But as I really wanted to see the
script work, I annotated by hand (well, by Annotator, to be precise :) )
a dead simple song by Leonard Cohen (One of us cannot be wrong).
And then I calculated recall and precision, using this ground truth
data, for the current ChordExtractor output and the new "improved"
version, I made recently... heh... even as I annotated the song I
realized what the results would be:
current svn ChordExtractor:
my "improved" segmentation algorithm:
The bigger the values the better of course... Heh... hooray for
So question is why did my new improved algorithm (and I stand by using
the word "improved" - if only because the number of segments is three
time smaller then the svn version...) go wrong and what can I do about
it? Well, a very interesting question, but to answer I'd need to have a
reasonable way of tweaking the algorithm... and for that it needs to be
So - what I plan to do now:
* ChordSegmentator will have a _method variable (now unsigned, later an
enum probably) for choosing the segmentation method
* doIt() will be a switch between the methods
* doItSimple() - the current algorithm, doItSimilarity() the new one
* ChordExtractor will get an additional constructor parameter to pass
down to the ChordSegmentator...
* ChordExtractor will get a command line attribute choice of the algorithm
This way it should be easy to test the different algorithms... and
hopefully this is not just a stupid temporary solution? I saw something
similar used in SMSSynthesis, so I'm hoping it's ok...
Any objections, advice, David? :) Any GSoC related objections/advice?
Brr... the date - 20th instead of 31st was a cold shower :-/
Anyway will be committing the changes as I go (mostly tomorrow I
guess...), meanwhile the annotations for the song by Cohen: (links to
box.net, forgot the files are so big, sent to the ML and got a rejection
and "approval needed e-mail, because of the size... so - box.net links)
Cohen.poolGROUNDTRUTH - the ground truth (or rather: more or less the
way I and cohenchords.com percieve it)
Cohen.poolCOMPUTEDOLD - the current simple svn algorithm results
Cohen.poolCOMPUTEDNEW - the new "in my sandbox" "sort of improved"
More information about the clam-devel