[Clam-devel] residual spectrum line segment approximation?
Xavier Amatriain
xavier at amatriain.net
Sat Aug 9 13:46:05 PDT 2008
Hi again,
Hi think Yushen summarized the odd-window issue pretty well: we need odd
size to guarantee odd symmetry for ensuring zero-phase conditions. This
is specially so because when doing the circular shift this central point
becomes the zero value.
About the results, it is clear that you have an issue in the
interpolated files. But this is basically because of the procedure you
choose to pick the points in the residual spectrum: you can't simply
sub-sample the spectrum and then recover it back doing a line segment
approximation. You need to somehow extract the relevant points in that
spectrum (i.e. compute the spectral envelope). However, there might be
something else...
Xavier (signing off for a vacations week off email :-)
roumbaba wrote:
>
> Hi Han and thanks for your indications,
>
> So I put the examples in this online
> folder: http://www.drivehq.com/folder/p4466208.aspx (for some reason
> storeandserve did not work)
>
> 1_interp.conv.synth_res.wav: the interpolated noise spectrum resynth
> with convolution by bh92 in freq domain to compensate for the 'lost'
> window:
>
> 2_interp.noconv.synth_res.wav: the interpolated noise spectrum resynth
> without convolution:
>
> 3_originalSpec.synth_res.wav: the original noise spectrum resynth:
>
> 4_originalSpec.conv.synth_res.wav: the original noise spectrum resynth
> with convolution of bh92 in freq domain (window applied twice but in
> this case no clipping and sounds almost ok. amplitude reduced though)
>
>
> Thanks again
>
> Roumbaba
>
>
>
> On 8 août 08, at 21:15, Han, Yushen wrote:
>
>
>> Hi, Roumbaba, Xavier, and Greg
>>
>> I was thinking about the same question that you raised about the
>> window length.
>> As far as I understand, in spectral analysis we need a DFT-even window.
>> (The DFT-even window is not symmetric about its midpoint. )
>>
>> This is because "DFT essentially considers sequences to be periodic"
>> and "we can consider the missing end point to be the beginning of the
>> next period of the periodic extension of this sequence" (from Harris
>> paper in 1978).
>>
>> Here I just found some "modern" illustrations:
>> http://zone.ni.com/reference/en-XX/help/371361A-01/lvanlsconcepts/windows_spect_anls_coeff_dsgn/
>>
>> In my plugin, I need to do some SMS transform on the SMS-analyzed
>> audio stored in SDIF.
>> I believe the BH window in the SMS analysis should be of even length
>> (but not symmetric).
>> However, I did not understand why we should use BH window of odd length.
>>
>> Below is the code I inherited from Greg's plugin
>> "continuousExcitationSynthesizer".
>> It was in the SMS analysis (WAV->SDIF) part.
>>
>> // SMS Analysis configuration
>> CLAM::XMLStorage::Restore(AnalConfig,xmlconfig);
>> //if window size is even we add one !
>> if (AnalConfig.GetSinWindowSize()%2==0)
>> AnalConfig.SetSinWindowSize(AnalConfig.GetSinWindowSize()+1);
>> if (AnalConfig.GetResWindowSize()%2==0)
>> AnalConfig.SetResWindowSize(AnalConfig.GetResWindowSize()+1);
>> if (AnalConfig.GetHopSize()<0)
>> AnalConfig.SetHopSize((AnalConfig.GetResWindowSize()-1)/2 );
>>
>> It forced the window length to be odd in SMS analysis.
>> I keep it since it did not make any audible problem for me so far
>> (except for this question).
>> But I would like to know the reason why it was forced to be odd.
>> Maybe Greg could answer this.
>>
>>
>> For the audio examples by Roumbaba, here is a sharing website that I
>> like (you don't need to register) .
>> http://storeandserve.com/
>> I would like to hear the artifact in your examples.
>>
>>
>> Best regards,
>> Han, Yushen
>>
>>
>>
>>
>>
>> On Fri, Aug 8, 2008 at 8:16 PM, roumbaba <roumbaba at gmail.com
>> <mailto:roumbaba at gmail.com>> wrote:
>>> Hi Xavier, and thank you for you reply,
>>>
>>> I will ask this question to another forum as you suggest. One question
>>> though that another forum might not be able to address is how
>>> specifically
>>> SMS internally deals with the analysis window size: When I specify an
>>> analysis window of 1024 I get 1STF frames of windowsize 1025 and fft
>>> size
>>> 513. What happens internally?
>>>
>>> Anyhow, I actually have tried different input spectrums
>>>
>>> - the original spectrum with phase randomization
>>> - the original spectrum subsampled (in frequency domain) then
>>> linearly interpolated to reconstruct it with its full number of values.
>>>
>>> I always get the same type of audio results which I suspect might be
>>> due to
>>> the way I choose window sizes and what I do with that 513th sample.
>>> Following your recommendations today I also did some trys where I
>>> "deconvolved" the effect of the original analysis window before I
>>> even do
>>> the linear subsampling (or the phase randomization). Still I get the
>>> same
>>> type of artifacts which actually are not "phase discontinuities" as I
>>> wrongly stated in my previous message. (What I meant by that was clicks
>>> caused by discontinuities in the synthesized audio signal ). In fact the
>>> artifacts now is that the resynthesized signal seems to be composed of
>>> "packets" or "wavelet kernels" which seems to indicate that the
>>> overlap add
>>> is wrong somewhere, or that my window shapes are off or something.
>>>
>>> I agree that a short sound can save lines of text. I have 4 audio
>>> examples
>>> (about 184Kb each) that I can send to you directly to avoid sending
>>> it to
>>> the whole list. (I don't have a place to post it unless you know of
>>> one.)
>>>
>>> 1- the interpolated noise spectrum resynth with convolution by bh92
>>> in freq
>>> domain to compensate for the 'lost' window
>>> 2- the interpolated noise spectrum resynth without convolution:
>>> 3- the original noise spectrum resynth:
>>> originalSpec.noconv.synth_res.wav
>>> 4- the original noise spectrum resynth with convolution of bh92 in spec
>>> domain (window applied twice but sound more or less ok as opposed to
>>> 1 and
>>> 2)
>>>
>>>
>>> Thank you again for your time,
>>>
>>> Roumbaba
>>>
>>>
>>>
>>>
>>> On 7 août 08, at 14:18, Xavier Amatriain wrote:
>>>
>>>> Hi Baba,
>>>>
>>>> Sorry for the late response but I think that this discussion is
>>>> getting a
>>>> bit off-topic for this mailing list as it is more a discussion on
>>>> DSP issues
>>>> than on CLAM itself. I encourage you to take the thread to the
>>>> music-dsp
>>>> mailing list [1] where you will probably get much more (and quicker)
>>>> feedback on general DSP questions... Unfortunately I don't have as
>>>> much time
>>>> as I wished to get to these questions that require more thinking than
>>>> writing ;-)
>>>>
>>>> In any case, I don't see anything fundamentally wrong in your procedure
>>>> except in the way you have decided on the input spectrum. The idea
>>>> behind
>>>> applying the BH92 to the residual spectrum was because when doing
>>>> the line
>>>> approximation out of few spectral points you are "losing" the
>>>> effect of the
>>>> analysis window. It is similar to what happens when you do the peak
>>>> detection process in the sinusoidal component. If you use the original
>>>> spectrum you are in fact applying the window twice, right? Or am I
>>>> missing
>>>> something? As a quick test you could try doing a peak detection +
>>>> sinusoidal
>>>> synthesis (without phase continuation) also on the residual
>>>> component. This
>>>> should mimic the effect of what I was proposing... more or less.
>>>>
>>>> Also, what do you exactly mean when you mention phase discontinuities?
>>>> Could you post some audio examples somewhere? Listening to the
>>>> result can
>>>> sometimes save a few lines of email text :-)
>>>>
>>>> X
>>>>
>>>>
>>>> [1] http://music.columbia.edu/mailman/listinfo/music-dsp
>>>>
>>>> roumbaba wrote:
>>>>>
>>>>> So I have *not* managed to correctly apply the bh92 window to my
>>>>> modified
>>>>> residual spectrum and thus I have *not* eliminate phase
>>>>> discontinuities at
>>>>> resynth time.
>>>>>
>>>>> One thing i still do not understand is why SMS need odd analysis
>>>>> window
>>>>> sizes and how I should handle this. I specify analysis window size
>>>>> to be
>>>>> 1024 and internally it seems to become 1025 and my 1STF frames are
>>>>> 513 in
>>>>> size. The fact that i do not understand that issue might be one of the
>>>>> source of what I do not do right.
>>>>>
>>>>> Here is where I am at so far. Any hint on what I do wrong or should do
>>>>> otherwise is welcome of course:
>>>>>
>>>>> - For testing purpose the only modification I do to the original 513
>>>>> values of the noise spectrum is to randomize phases.
>>>>> - Then I expand the 513 spectrum to a 1026 spectrum by an even symetry
>>>>> across the 513.5 axis and complex conjugate of the last 513 values.
>>>>>
>>>>> - Then I do a circular convolution of my 1026 spectrum with the
>>>>> FFT of a
>>>>> 1026 bh92time window.
>>>>> the way I compute the bh92 time window is (matlab code for now):
>>>>>
>>>>> w1Length = 1026;
>>>>> fConst=2*pi/(w1Length+1-1);
>>>>> w1=[1:w1Length];
>>>>> w1=.35875 -.48829*cos(fConst*w1)+.14128*cos(fConst*2*w1)
>>>>> -.01168*cos(fConst*3*w1);
>>>>>
>>>>> - When I check the real part (and the magnitude) of the ifft of the
>>>>> resulting 1026 values spectrum resulting of the convolutiong, I do
>>>>> see that
>>>>> the windowing worked and that the resulting time signal smoothes
>>>>> to 0 at
>>>>> begining and end.
>>>>>
>>>>> - Then I take the first 513 values of the resulting spectrum and
>>>>> replace
>>>>> the corresponding 1STF frame in the original sdif analysis file
>>>>>
>>>>> Still I get phase discontinuites in the resynth signal.
>>>>>
>>>>> What am i missing?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Baba
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 15 juil. 08, at 14:55, Xavier Amatriain wrote:
>>>>>
>>>>>> Hi Roumbaba, and congrats for your progress!
>>>>>>
>>>>>> You are right on the source of your problem: SMSSynthesis expects
>>>>>> your
>>>>>> residual to come with an analysis window and if not things are
>>>>>> likely to
>>>>>> mess up.
>>>>>>
>>>>>> The lines that are "guilty" for that are around SMSSynthesis.cxx:252
>>>>>>
>>>>>>
>>>>>> http://clam.iua.upf.edu/doc/CLAM-doxygen/SMSSynthesis_8cxx-source.html#l00252
>>>>>>
>>>>>> First the peaks are synthesized into a sinusoidal spectrum. Then
>>>>>> the two
>>>>>> spectrums are added. Already at that point the spectrums are
>>>>>> supposed to
>>>>>> have the same analysis window (BH92) and size. The effect of that
>>>>>> window is
>>>>>> undone in line 261 when the global spectral synthesis is performed.
>>>>>>
>>>>>> The issue here is that you need to guarantee that both spectrum come
>>>>>> from a similar place before adding them... The sinusoidal peaks are
>>>>>> reconstructed by convolving by the transform of the main lobe of
>>>>>> the window
>>>>>> (BH92) but you are reconstructing the residual in a different
>>>>>> way. So....
>>>>>> you either apply the BH92 transform to your spectrum or avoid
>>>>>> doing that in
>>>>>> the peak synthesis (and then avoid multiplying by the inverse in
>>>>>> the global
>>>>>> spectral synthesis). None of the two options are immediate but
>>>>>> I'd say the
>>>>>> first one should be easier to work out.
>>>>>>
>>>>>> Hope it helps... and if you get it to work don't forget to report
>>>>>> back.
>>>>>>
>>>>>> roumbaba wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hello all and thanks again for your previous help,
>>>>>>>
>>>>>>> So I have written some matlab script to perform noise spectrum line
>>>>>>> segment approximation.
>>>>>>>
>>>>>>> - As input the script takes an sdif file generated by analysis
>>>>>>> with
>>>>>>> SMSConsole.
>>>>>>> - It then reads all sdif frames, in particular the 1STF frames
>>>>>>> containing the noise spectrums in complex form.
>>>>>>> - It converts these complex spectrums into magPhase form
>>>>>>> - It performs line segment approximation on the amplitudes.
>>>>>>>
>>>>>>> To check the impact of the approximation on the quality of
>>>>>>> resynthesis
>>>>>>> the script does the following:
>>>>>>> - It reconstructs full noise magnitude spectrums from the line
>>>>>>> approximations (by linear interpolation)
>>>>>>> - It randomizes the phases
>>>>>>> - It converts the new "smoothed" magPhase spectrums back to complex
>>>>>>> spectrums
>>>>>>> - It writes back the sdif file with these new "smoothed" spectrums
>>>>>>> instead of the original raw noise spectrums.
>>>>>>>
>>>>>>> Then I run SMSConsole to synthesize that sdif file with the
>>>>>>> exact same
>>>>>>> parameters than for the original sdif file.
>>>>>>> My problem is that the resulting synthesised noise sounds like
>>>>>>> something is wrong in the synthesis overlap-add (like lots of
>>>>>>> discontinuites
>>>>>>> in the resynthesis)
>>>>>>> I think that this might be due to what is described in the
>>>>>>> Serra/Smith
>>>>>>> 1990 CMJ paper concerning line segment approximation noise
>>>>>>> resynthesis:
>>>>>>>
>>>>>>> " ...Since the [new] phase spectrum used is not the result of an
>>>>>>> analysis process (with windowing of a waveform, zero padding,
>>>>>>> and FFT
>>>>>>> computation), the resulting signal does not tapper to 0 at the
>>>>>>> boundaries.
>>>>>>> This is because a phase spectrum with random values corresponds
>>>>>>> to a phase
>>>>>>> spectrum of a rectangular-windowed noise waveform of size N. In
>>>>>>> order to
>>>>>>> succeed in the overlap-add resynthesis (ie, to obtain smooth
>>>>>>> transitions
>>>>>>> between frames) we need a smoothly windowed waveform of size M,
>>>>>>> where M is
>>>>>>> the synthesis-window length. ....
>>>>>>> "
>>>>>>>
>>>>>>> So what might be happening is that by default SMSConsole assumes
>>>>>>> that
>>>>>>> the 1STF frames are *NOT* line segment approximation and
>>>>>>> therefore does
>>>>>>> *NOT* perform that last windowing at synthesis time. I have gone
>>>>>>> a little
>>>>>>> bit through SMS/Clam code but I cannot find where I can change
>>>>>>> this behavior
>>>>>>> or even if that is the default behavior. Where shoud I look in
>>>>>>> the SMS/Clam
>>>>>>> code?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Roumbaba
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 27 mai 08, at 23:25, Xavier Amatriain wrote:
>>>>>>>
>>>>>>>> Hi Roumbaba,
>>>>>>>>
>>>>>>>> In the paper you cite it says "you can", which does not mean
>>>>>>>> "you have
>>>>>>>> to" :-) Doing an approximation of the residual model is indeed
>>>>>>>> an interesting thing to do, especially if you want to reduce the
>>>>>>>> amount of data in your transformed signal, however it is not a
>>>>>>>> must.
>>>>>>>> Note that there are many other ways to model the residual apart
>>>>>>>> from
>>>>>>>> the one mentioned in that paper.
>>>>>>>>
>>>>>>>> So far, in CLAM we are using the residual as is, with no
>>>>>>>> modeling or
>>>>>>>> approximation. The "only" downside is that the transformed
>>>>>>>> signal (SMS Data) is in fact larger than the original audio when it
>>>>>>>> could be much smaller with not much loss in quality. If for
>>>>>>>> whatever reason you do need to do the residual modeling you can
>>>>>>>> look
>>>>>>>> at the SpectralEnvelopeExtract processing. This processing
>>>>>>>> generates a spectral approximation (spectrum in bpf format) but
>>>>>>>> from
>>>>>>>> an array of peaks, it would not be hard to modify it to work
>>>>>>>> with an input spectrum.
>>>>>>>>
>>>>>>>> X
>>>>>>>>
>>>>>>>>
>>>>>>>> roumbaba wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am trying to understand how the residual spectrum gets
>>>>>>>>> modeled in
>>>>>>>>> clam/SMS. I have read the Serra/Smith 1990 CMJ paper and as I
>>>>>>>>> understand it
>>>>>>>>> it describes two steps:
>>>>>>>>> 1- substract the harmonic spectrum from the original spectrum
>>>>>>>>> 2- perform a line-segment approximation of the residual spectrum
>>>>>>>>> obtained in 1
>>>>>>>>>
>>>>>>>>> I have stepped through clam and SMS code and I think I can see
>>>>>>>>> where
>>>>>>>>> step 1 gets performed:
>>>>>>>>>
>>>>>>>>> SMSAnalysisCore::Do()
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>> mSinSpectralAnalysis.Do();
>>>>>>>>> mResSpectralAnalysis.Do();
>>>>>>>>> ...
>>>>>>>>> ...
>>>>>>>>> ...
>>>>>>>>> mSynthSineSpectrum.Do();
>>>>>>>>> mSpecSubstracter.Do(); /* step 1 gets performed here I think*/
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> but I cannot find where step 2 (line approximation) gets
>>>>>>>>> performed.
>>>>>>>>> Where should I look in the code?
>>>>>>>>>
>>>>>>>>> Thank you very much,
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Roumbaba
>>>>>>>>>
>>>>>>>>> ps:
>>>>>>>>>
>>>>>>>>> Here is a quote from the paper I mentionned above:
>>>>>>>>>
>>>>>>>>> "Approximation of the Spectral Residual
>>>>>>>>>
>>>>>>>>> Assuming the the residual signal is quasi-stochastic, each
>>>>>>>>> magnitude-spectrum residual can be approximated by its
>>>>>>>>> envelope since only
>>>>>>>>> its shape contributes to the sound characteristics. [...] The
>>>>>>>>> particular
>>>>>>>>> line-segment approximation performed here is done by stepping
>>>>>>>>> through the
>>>>>>>>> magnitude spectrum and finding local maxima in every section, ..."
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Clam-devel mailing list
>>>>>>>>> Clam-devel at llistes.projectes.lafarga.org
>>>>>>>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>>>>>>>>
>>>>>>>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Clam-devel mailing list
>>>>>>>> Clam-devel at llistes.projectes.lafarga.org
>>>>>>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>>>>>>>
>>>>>>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Clam-devel mailing list
>>>>>>> Clam-devel at llistes.projectes.lafarga.org
>>>>>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>>>>>>
>>>>>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Clam-devel mailing list
>>>>>> Clam-devel at llistes.projectes.lafarga.org
>>>>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>>>>>
>>>>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Clam-devel mailing list
>>>>> Clam-devel at llistes.projectes.lafarga.org
>>>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> Clam-devel mailing list
>>>> Clam-devel at llistes.projectes.lafarga.org
>>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>
>>>
>>> _______________________________________________
>>> Clam-devel mailing list
>>> Clam-devel at llistes.projectes.lafarga.org
>>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>>>
>>
>> _______________________________________________
>> Clam-devel mailing list
>> Clam-devel at llistes.projectes.lafarga.org
>> <mailto:Clam-devel at llistes.projectes.lafarga.org>
>> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Clam-devel mailing list
> Clam-devel at llistes.projectes.lafarga.org
> https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
>
More information about the clam-devel
mailing list