abe.kazemzadeh at gmail.com
Tue Jun 12 14:21:29 PDT 2007
If you need to calculate acoustic similarity, you can use dynamic time
warping (DTW, http://en.wikipedia.org/wiki/Dynamic_time_warping ), which
basically windows the speech from the target and test files, extracts
features for each frame, and aligns them in the best possible way. The
algorithm defines a cost for insertion and deletion of frames and the
similarity of the features, so the overall cost (or the cost normalized
for the length of the file) provides a good measure for difference.
There's other refinements to the algorithm.
Clam can do the feature extraction, but I'm not sure if there's the dtw
algorithm (I'm in the process of learning clam so I'm not an expert), so
that might be something you'll have to do yourself. I've used this
methodology for comparing accents (native speaker vs non-native speaker
reading the same sentences). Back then I used HTK for the feature
extraction and a perl script for the dtw.
Hope this helps and maybe stimulates more ideas for doing it with clam...
David García Garzón wrote:
> Hi, Liviu.
> Do you mean speech recognition? Speaker recognition? Just sound
> classification? Which is the concrete use case? I am not sure of what you
> mean but your statement seems too general to be achieved. Depending on your
> purpose you might be in a research bleeding edge, specially if you go to the
> semantic level.
> CLAM currently has no sound classification system but it has many of the
> building blocks such systems use. That's better than starting from scratch.
> A new document we added to the wiki  gives you an overview on the steps to
> get introduced into CLAM:
>  http://clam.iua.upf.edu/wikis/clam/index.php/Approaching_CLAM
> If you need more help, just ask.
> On Tuesday 12 June 2007 16:44:20 Liviu Macoviciuc wrote:
>> I am a newbie to CLAM and I don' t understand much.
>> However, I need to write a program that says if 2 audio files are distinct
>> For example, a file might contain a voice saying "I am John", and another
>> file the same voice or another voice saying "I am Bill"
>> Can anybody help me to get started ?!
>> Best regards,
> CLAM mailing list
> CLAM at iua.upf.es
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 267 bytes
Desc: not available
More information about the clam-users