[CLAM] Questions about SMSTools (Frame Segmentation).

David Garcia Garzon dgarcia at iua.upf.es
Tue Feb 21 11:51:59 PST 2006


A Dissabte 18 Febrer 2006 08:31, xavier at create.ucsb.edu va escriure:
> > 1. Does the time tag of a frame mark the beginning or the middle (or
> > even something else) of the frame?
>
> The time tag marks the center of the frame.
>
> > 2. What does the first frame include? The samples 0 to FrameLength-1?
> > Or first some zeros and then the samples 0 to (FrameLength-1)/2? Or
> > maybe first some zeros and then the samples 0 to HopSize? Or something
> > else?
>
> The first frame fed to the analysis process contains zeros and only
> samples from 0 to HopSize. This means that this frame has usually a
> negative centerTime. These frames with negative center times are never
> used in the synthesis process so the first valid frame for synthesis is
> the one centered at t=0 having zeros in the first half and samples in the
> second.

Xavi, I am also having frame timing offset problems. Harte's chord extractor 	
is using a huge hop size and a greater window size, so any error of this kind 
is very noticeable: i think i am doing something wrong as most chords are 
detected with some fixed offset along the song. I am currently suposing that 
the first frame i read from the MonoAudioReader starts at 0 so the first 
center is at:
	(WindowSize/2)/samplingRate
But, if what you say is true, and the first frame contains zeros and the first 
'hopSize' samples, the first frame center should be at:
	(step-windowSize/2)/samplingRate
But then the formula:
> TTime frameCenterTime=frameIndex*step/samplingRate;
is wrong as it doesn't imply such offset for the first frame, isn't it? Well i 
am just puzzled.

I am just using the MonoAudioFileReader without any standard analysis 
processing so, is it something the spectral analysis does? is it the 
AudioFileIn the one which inserts the zeroes?

Currently i am doing some tests to empirically locate the offset. But any clue 
or hint would be apreciated.


> > 3. How are the time-tags calculated? When I multiply the time-tags by
> > the sample rate, I get more or less what I expect (integer multiples
> > of the HopSize), but the values have a little error. This error could
> > be just a round-off error, but I don't know. The strange thing is,
> > that with bigger frame number this error increases. At about frame
> > number 100 it had a value of about 10^-3.
>
> The center times should be multiples of the HopSize indeed. If you look at
> the SMSAnalysis.cxx file (around line 146) this is the formulation:
>
> TTime frameCenterTime=frameIndex*step/samplingRate;
>
> (http://clam.iua.upf.es/CLAM-doxygen/SMSAnalysis_8cxx-source.html)
>
> So yes there could be a rounding error but what you explain is weird.

By doing my quest for the chord offset i noticed that MonoAudioFileReader 
computes the beginTime incrementally so it could lead to a great rounding 
when the deltaTime is so small compared to mCurrentBeginTime. (See 
MonoAudioFileReader.cxx:146)

-- 
David García Garzón <david.garcia at removespam.iua.upf.es>
Phone: 034 93 542 21 99
Music Technology Group, Institut Universitari de l'Audiovisual
Universitat Pompeu Fabra





More information about the clam-users mailing list