[CLAM] RT and port sizes (was Re: AudioPorts Usage)

Thu Sep 15 06:38:08 PDT 2005

Stéphane Letz wrote:

>
> Le 15 sept. 05 à 10:59, xavier at create.ucsb.edu a écrit :
>
>> On Thu, 15 Sep 2005, Stéphane Letz wrote:
>>
>>
>>>
>>> But then you have the following issue : imagine a network which is
>>> driven ( in a thread+blocking i/o  or callback based model) by a
>>> buffer size of N, but where some internal nodes in the network use 2N
>>> buffers. Then the 2N nodes get called every 2 callbacks but are
>>> supposed to handle 2N "token" (frames) in the duration of N to meet
>>> real-time deadline. Thus an algorithm that would use more 50% of CPU
>>> time would not run in this configuration if everything is computed in
>>> the same RT thread.
>>>
>>> This is a typical "problem" Jamin software (doing FFT based
>>> processing) has also and the way they solve it is to use another
>>> lower-priority thread that will run along the RT thread, with ring-
>>> buffer based data exchanges between the 2 threads.
>>>
>>
>> I don't quite get the problem (nor the solution). Let's see, if you  
>> have
>> nodes in the network that need 2N buffers, the total "process  
>> latency" in
>> the network will be at least 2N, no matter what you do and how many
>> threads you have. Whether that is RT or not does obviously depend on N
>> (and on what you consider real-time;).
>>
>> You can think about feeding the N tokens faster-than-real-time into  the
>> network but that is obviously not possible under streaming  
>> conditions. You
>> can also think about having the 2N produce before it has finished
>> processing the 2N tokens. But you still have the 2N latency  limitation.
>>
>> That said, I do understand the 50% CPU algorithm limitation you  mention
>> but I can't think that is a "typical" problem in spectral  processing. I
>> have done quite some profiling on spectral networks and never found  
>> that
>> limitation. You might get that with very large window sizes but  then 
>> the
>> overall latency is so bad that you shouldn't care about RT anymore.
>>
>
>
> I was not meaning that there was any specific problem linked to  
> spectral processing. Just that processing a networks with node  
> running with different buffer sizes, thus "called" at different rates  
> has some consequences when activated in a real-time context.
>
> Again with an example: lets takes a node that use N samples in a RT  
> calling scheme when N frames are consumed by the output at each  
> cycle. This node can possibly takes up to the duration of a buffer  
> size of N samples to do its processing and deliver its output in time.
> Now if a node that uses 2N samples in a RT calling scheme when N  
> frames are consumed by the output at each cycle, it will be activated  
> every 2 cycles and is supposed to handle 2N samples in the same  
> duration of a buffer size of N samples to deliver its output in time.
> Thus a processing algorithm that would eat 60% of CPU (thus of the  
> duration of a buffer size of N samples) in the first setup would not  
> be able to process 2N frames in the duration of a buffer size of N  
> samples in the second setup.
>
> So in the lack of any special strategy to better "spread" CPU use, i  
> don't see how it can work.
>
> Stephane
>

Well if we face such a fat processing and we need real low-latency, 
we would proceed as usually: do some profiling and improve the
efficiency of the code.

I was kind of kidding. But the truth is that I think that some
spectral processing in CLAM (like peak-continuation) have a great
space for efficiency improvements.

Following your example, I can not see how could possibly work
the "real" solution you proposed:

 - time t0 : (1st callback)  N frames are stored. The
fat-processing can't execute. Silence is written to the output.
(cpu is "wasted")
 - time t1 : (2on callback) Now the fat-processing do can execute,
but is not able to terminate before next callback (t2).

Now, if only we could do some pre-work to take advantage of the
unused cpu in the t0-t1 period... but this is not the case,
because the fat-processing needs all the 2N frames to start its
algorithm.

I see no solution. Apart, of course of increasing the
latency/callback buffer size.

Pau