[CLAM] RT and port sizes (was Re: AudioPorts Usage)
letz at grame.fr
Thu Sep 15 10:12:00 PDT 2005
> Well if we face such a fat processing and we need real low-latency,
> we would proceed as usually: do some profiling and improve the
> efficiency of the code.
> I was kind of kidding. But the truth is that I think that some
> spectral processing in CLAM (like peak-continuation) have a great
> space for efficiency improvements.
> Following your example, I can not see how could possibly work
> the "real" solution you proposed:
> - time t0 : (1st callback) N frames are stored. The
> fat-processing can't execute. Silence is written to the output.
> (cpu is "wasted")
> - time t1 : (2on callback) Now the fat-processing do can execute,
> but is not able to terminate before next callback (t2).
> Now, if only we could do some pre-work to take advantage of the
> unused cpu in the t0-t1 period... but this is not the case,
> because the fat-processing needs all the 2N frames to start its
> I see no solution. Apart, of course of increasing the
> latency/callback buffer size.
> Message: 7
> Date: Thu, 15 Sep 2005 09:26:09 -0700
> From: Xavier Amatriain <xavier at create.ucsb.edu>
> To: Pau Arumi <parumi at iua.upf.es>
> CC: clam at iua.upf.es
> Subject: Re: [CLAM] RT and port sizes (was Re: AudioPorts Usage)
>> I was not meaning that there was any specific problem linked to
> spectral processing. Just that processing a networks with node
> with different buffer sizes, thus "called" >at different rates has
> consequences when activated in a real-time context.
> I don't see what is the difference between having a process
> consuming 2N
> and taking 50% and another one taking N and consuming 100%, it is
> just a
> matter of algorithm optimization.
> To put it simple, when doing block by block processing, latency is the
> issue, not CPU usage of a single process. If you are doing spectral
> processing and you have a process
> that takes more CPU than the FFT, you better take a look at your code
> because the FFT is pretty expensive. And we know that libraries
> such as
> the fftw do it in less than 50% CPU time up to pretty large buffer
> sizes. As a matter of fact, the individual process latency is rarely
> even comparable to the buffer latency.
> I would like to see more details of what they do in Jamin and why they
> do it. In my experience, multithreading is usually not a good idea.
> you gain in CPU spreading you loose in context switching,
> especially on
> some operating systems.
> I am saying this acknowledging, as Pau says, that there are some
> Processing objects in CLAM that are not optimized at all. As a
> matter of
> fact some of them, such as the FundFreqDetect need a whole lot of
> tweeking if you want them to run in acceptable conditions. But that is
> quite "easy" to do once you know the final application it will end
> up in.
I think I was not able to explain clearly what I meant... (-:
But after doing some google, I found that in the CREATE CSL
A ThreadedFrameStream uses a background
thread to compute samples. It caches some number of
buffers from its “producer” sub-graph and supplies
them to its “consumer” thread immediately on
demand. It controls the scheduling of the thread of its
producer. While this obviously introduces latency
within a DSP graph, it is a known latency with (ideally)
no latency jitter.
This is typically the kind of solution I was thinking of : use
another thread to do the main process and "smooth" CPU use , in the
cost of additional latency
The same kind of design is also done in Jamin.
And I think doing multithreading correctly will be the next thing to
do of course on MP machines, but it may help even UP machines.
More information about the clam-users