[CLAM] RT and port sizes (was Re: AudioPorts Usage)

Thu Sep 15 10:12:00 PDT 2005

>
> Well if we face such a fat processing and we need real low-latency,
> we would proceed as usually: do some profiling and improve the
> efficiency of the code.
>
> I was kind of kidding. But the truth is that I think that some
> spectral processing in CLAM (like peak-continuation) have a great
> space for efficiency improvements.
>
> Following your example, I can not see how could possibly work
> the "real" solution you proposed:
>
>  - time t0 : (1st callback)  N frames are stored. The
> fat-processing can't execute. Silence is written to the output.
> (cpu is "wasted")
>  - time t1 : (2on callback) Now the fat-processing do can execute,
> but is not able to terminate before next callback (t2).
>
> Now, if only we could do some pre-work to take advantage of the
> unused cpu in the t0-t1 period... but this is not the case,
> because the fat-processing needs all the 2N frames to start its
> algorithm.
>
> I see no solution. Apart, of course of increasing the
> latency/callback buffer size.
>
> Pau
>
>
>
>
> --__--__--
>
> Message: 7
> Date: Thu, 15 Sep 2005 09:26:09 -0700
> From: Xavier Amatriain <xavier at create.ucsb.edu>
> To: Pau Arumi <parumi at iua.upf.es>
> CC: clam at iua.upf.es
> Subject: Re: [CLAM] RT and port sizes (was Re: AudioPorts Usage)
>
>
>> I was not meaning that there was any specific problem linked to
>>
> spectral processing. Just that processing a networks with node   
> running
> with different buffer sizes, thus "called" >at different rates  has  
> some
> consequences when activated in a real-time context.
>
> I don't see what is the difference between having a process  
> consuming 2N
> and taking 50% and another one taking N and consuming 100%, it is  
> just a
> matter of algorithm optimization.
>
> To put it simple, when doing block by block processing, latency is the
> issue, not CPU usage of a single process. If you are doing spectral
> processing and you have a process
> that takes more CPU than the FFT, you better take a look at your code
> because the FFT is pretty expensive. And we know that libraries  
> such as
> the fftw do it in less than 50% CPU time up to pretty large buffer
> sizes. As a matter of fact, the individual process latency is rarely
> even comparable to the buffer latency.
>
> I would like to see more details of what they do in Jamin and why they
> do it. In my experience, multithreading is usually not a good idea.  
> What
> you gain in CPU spreading you loose in context switching,  
> especially on
> some operating systems.
>
> I am saying this acknowledging, as Pau says, that there are some
> Processing objects in CLAM that are not optimized at all. As a  
> matter of
> fact some of them, such as the  FundFreqDetect need a whole lot of
> tweeking if you want them to run in acceptable conditions. But that is
> quite "easy" to do once you know the final application it will end  
> up in.
>
>

I think I was not able to explain clearly what I meant... (-:

But after doing some google, I found that in the CREATE CSL  
documentation (http://www.create.ucsb.edu/CSL/CSL_ICMC_2003.pdf)

A ThreadedFrameStream uses a background
thread to compute samples. It caches some number of
buffers from its “producer” sub-graph and supplies
them to its “consumer” thread immediately on
demand. It controls the scheduling of the thread of its
producer. While this obviously introduces latency
within a DSP graph, it is a known latency with (ideally)
no latency jitter.

This is typically the kind of solution I was thinking of : use  
another thread to do the main process and "smooth" CPU use , in the  
cost of additional latency
The same kind of design is also done in Jamin.

And I think doing multithreading correctly will be the next thing to  
do of course on MP machines, but it may help even UP machines.

Stephane