[CLAM] RT and port sizes (was Re: AudioPorts Usage)
xavier at create.ucsb.edu
Thu Sep 15 09:26:09 PDT 2005
>I was not meaning that there was any specific problem linked to
spectral processing. Just that processing a networks with node running
with different buffer sizes, thus "called" >at different rates has some
consequences when activated in a real-time context.
I don't see what is the difference between having a process consuming 2N
and taking 50% and another one taking N and consuming 100%, it is just a
matter of algorithm optimization.
To put it simple, when doing block by block processing, latency is the
issue, not CPU usage of a single process. If you are doing spectral
processing and you have a process
that takes more CPU than the FFT, you better take a look at your code
because the FFT is pretty expensive. And we know that libraries such as
the fftw do it in less than 50% CPU time up to pretty large buffer
sizes. As a matter of fact, the individual process latency is rarely
even comparable to the buffer latency.
I would like to see more details of what they do in Jamin and why they
do it. In my experience, multithreading is usually not a good idea. What
you gain in CPU spreading you loose in context switching, especially on
some operating systems.
I am saying this acknowledging, as Pau says, that there are some
Processing objects in CLAM that are not optimized at all. As a matter of
fact some of them, such as the FundFreqDetect need a whole lot of
tweeking if you want them to run in acceptable conditions. But that is
quite "easy" to do once you know the final application it will end up in.
Pau Arumi wrote:
> Stéphane Letz wrote:
>> Le 15 sept. 05 à 10:59, xavier at create.ucsb.edu a écrit :
>>> On Thu, 15 Sep 2005, Stéphane Letz wrote:
>>>> But then you have the following issue : imagine a network which is
>>>> driven ( in a thread+blocking i/o or callback based model) by a
>>>> buffer size of N, but where some internal nodes in the network use 2N
>>>> buffers. Then the 2N nodes get called every 2 callbacks but are
>>>> supposed to handle 2N "token" (frames) in the duration of N to meet
>>>> real-time deadline. Thus an algorithm that would use more 50% of CPU
>>>> time would not run in this configuration if everything is computed in
>>>> the same RT thread.
>>>> This is a typical "problem" Jamin software (doing FFT based
>>>> processing) has also and the way they solve it is to use another
>>>> lower-priority thread that will run along the RT thread, with ring-
>>>> buffer based data exchanges between the 2 threads.
>>> I don't quite get the problem (nor the solution). Let's see, if you
>>> nodes in the network that need 2N buffers, the total "process
>>> latency" in
>>> the network will be at least 2N, no matter what you do and how many
>>> threads you have. Whether that is RT or not does obviously depend on N
>>> (and on what you consider real-time;).
>>> You can think about feeding the N tokens faster-than-real-time into
>>> network but that is obviously not possible under streaming
>>> conditions. You
>>> can also think about having the 2N produce before it has finished
>>> processing the 2N tokens. But you still have the 2N latency
>>> That said, I do understand the 50% CPU algorithm limitation you
>>> but I can't think that is a "typical" problem in spectral
>>> processing. I
>>> have done quite some profiling on spectral networks and never found
>>> limitation. You might get that with very large window sizes but
>>> then the
>>> overall latency is so bad that you shouldn't care about RT anymore.
>> I was not meaning that there was any specific problem linked to
>> spectral processing. Just that processing a networks with node
>> running with different buffer sizes, thus "called" at different
>> rates has some consequences when activated in a real-time context.
>> Again with an example: lets takes a node that use N samples in a RT
>> calling scheme when N frames are consumed by the output at each
>> cycle. This node can possibly takes up to the duration of a buffer
>> size of N samples to do its processing and deliver its output in time.
>> Now if a node that uses 2N samples in a RT calling scheme when N
>> frames are consumed by the output at each cycle, it will be
>> activated every 2 cycles and is supposed to handle 2N samples in the
>> same duration of a buffer size of N samples to deliver its output in
>> Thus a processing algorithm that would eat 60% of CPU (thus of the
>> duration of a buffer size of N samples) in the first setup would not
>> be able to process 2N frames in the duration of a buffer size of N
>> samples in the second setup.
>> So in the lack of any special strategy to better "spread" CPU use, i
>> don't see how it can work.
> Well if we face such a fat processing and we need real low-latency, we
> would proceed as usually: do some profiling and improve the
> efficiency of the code.
> I was kind of kidding. But the truth is that I think that some
> spectral processing in CLAM (like peak-continuation) have a great
> space for efficiency improvements.
> Following your example, I can not see how could possibly work
> the "real" solution you proposed:
> - time t0 : (1st callback) N frames are stored. The
> fat-processing can't execute. Silence is written to the output.
> (cpu is "wasted")
> - time t1 : (2on callback) Now the fat-processing do can execute,
> but is not able to terminate before next callback (t2).
> Now, if only we could do some pre-work to take advantage of the
> unused cpu in the t0-t1 period... but this is not the case,
> because the fat-processing needs all the 2N frames to start its
> I see no solution. Apart, of course of increasing the
> latency/callback buffer size.
> CLAM mailing list
> CLAM at iua.upf.es
* Xavier Amatriain *
* Research Director *
* CREATE *
* University of California Santa Barbara *
More information about the clam-users