[CLAM] Questions about SMSTools (Frame Segmentation).

Mon Feb 20 15:28:36 PST 2006

On Mon, 20 Feb 2006, Matthias Geier wrote:

> Thanks for the link. I wouldn't have found it by myself ...

It might be a bit difficult to find but this is linked under the 
documentation part of the website, under Doxigen...

> But now to the initial problem with the number of frames:
> 
> It occurred when I was analyzing a soundfile with 19200 samples.
> I would have split that (using the standard values Hopsize=256 and
> FrameLength=2049) into 75 frames, but SMSTools splits it into 76
> frames!
> Wouldn't that mean, that the sample in the center of the 76th frame is zero?
> A soundfile with 19201 samples, however, is split into 76 frames by
> both me and SMSTools. We also agree on splitting 19199 samples into 75
> frames.
> I also tried that on 2048 samples and maybe its the same for all
> integer multiples of the HopSize?
> 
> Now please tell me:
> Am I calculating correctly?
> Is there a bug in SMSTools?
> Or a feature which I don't recognize as such?
> 
> Maybe the relevant part is in the very same Function as mentioned
> above around line 154:
> 
> if(frameCenterTime>in.GetAudio().GetDuration()*0.001)
>   return false;
> 
> why doesn't it say the following?
> 
> if(frameIndex*step>in.GetAudio().GetSize())
>   return false;
> 
> Or something similar? I don't really know C++, so I don't know if that
> would work.

By looking at the code (and your explanation) I certainly think that is a
bug due to unnecessary rounding operations. Your alternative for the
termination condition looks much better to me at first sight and I can't
think of any reason why the one in the code would be preferred.

This termination condition is not very relevant in most cases because you
don't have audio going until the last sample (you usually have zeros both
at the beginning and end) and to tell you the truth I still don't know
which one should be a good strategy:

- The one we use now (with the corrected one sample rounding error) means
that you should have an output file with the same number of samples than
the input one (and this is good).
- But, if you do have a soundfile with usable samples until the last one 
and
you are using quite some overlap, this procedure would not be the most 
appropiate one because you would be cancelling the effect of the overlap 
on
the last samples. In this case it might make sense to continue the 
analysis
procedure until the window is completely outside of the original audio 
(not
only its center).

In any case I guess the most sensible strategy is the first one but 
telling users
to make sure there are sufficient zeros at the end of the sound file.