[Fwd: Re: [Fwd: Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and port sizes (was Re: AudioPorts Usage))]]

Thu Sep 22 11:03:51 PDT 2005

I'm forwarding this mail from Stephen Pope that didn't get to the list
(only subscribers can send to the list.)

By the way, Stephen, I'm curious about this networked feature of CSL.
Do you know jack-udp (a jack udp transport client for the jack routing 
server) ?
Do you think they both offer the same? I'm interested to learn how they 
compare.

Thanks!

Pau

-------- Original Message --------
Subject: 	Re: [Fwd: Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and 
port sizes (was Re: AudioPorts Usage))]
Date: 	Wed, 21 Sep 2005 11:49:29 -0700
From: 	Stephen Travis Pope <stp at create.ucsb.edu>
To: 	Xavier Amatriain <xavier at create.ucsb.edu>, letz at grame.fr
CC: 	Stephen Pope <stp at create.ucsb.edu>, parumi at iua.upf.es, clam at iua.upf.es
References: 	<43319646.4010000 at create.ucsb.edu>

Hello all,

This sounds like a very interesting thread (of discussion, that is),  
but I'm confused by some of the comments.

If your code is slow, breaking it up into multiple threads will only  
make it slower (on a monoprocessor, that is).

We use BlockResizer in cases where a graph running with a small block  
size (for low latency, for example), wants to use a transform (e.g.,  
FFT or FWT) with a larger block size. All it does is to buffer the  
output of the larger transform and "dribble" it out into the smaller  
buffers. this has no run-time impact.

The ThreadedFrameStream runs its sub-graph in a separate thread,  
which is also a very dumb idea to try to get better performance on a  
monoprocessor; as Xavier said, we use this for our  
RemoteFrameStreams, where a subgraph is running on a separate  
computer and streaming samples over the net (the server side of this  
is called UDP_IO).

Obviously, this introduces at least one block's worth of latency, but  
it does allow us to write graphs that are too large for a single  
processor. We don't use it heavily, but it is quite useful for  
several cases such as (1) when there's a lot of input processing and  
feature extraction to derive control signals (so the required inter- 
processor bandwidth is low) or (2) where several complex synthesis  
servers are to be mixed, spatialized and reverberated together.

See the examples in the file BlockResizer.h; the file  
Server_Tests.cpp shows the client and server set-up for using the  
RemoteStream and UDP_IO together.

Lastly, I should emphasize that CSL is a group effort, and especially  
that many of the good design ideas came from Sekhar Ramakrishnan.

stp

--
  Stephen Travis Pope -- http://create.ucsb.edu/~stp
  Center for Research in Electronic Art Technology, University of  
California, Santa Barbara
        Really—I don't know what the meaning or purpose of life is.
        But it looks exactly as if something were meant by it. — C.  
G. Jung

Begin forwarded message:

> From: Xavier Amatriain <xavier at create.ucsb.edu>
> Date: September 21, 2005 10:20:06 AM PDT
> To: Stephen Travis Pope <stp at create.ucsb.edu>
> Subject: [Fwd: Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and  
> port sizes (was Re: AudioPorts Usage))]
>
> Just so you know about how the discussion is going on.
>
> -------- Original Message --------
> Subject:     Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and  
> port sizes (was Re: AudioPorts Usage))
> Date:     Wed, 21 Sep 2005 11:18:45 +0200
> From:     Stéphane Letz <letz at grame.fr>
> To:     Xavier Amatriain <xavier at create.ucsb.edu>
> CC:     Pau Arumi <parumi at iua.upf.es>, clam at iua.upf.es
>
> Le 20 sept. 05 à 19:26, Xavier Amatriain a écrit :
>
>> After talking to S.T. Pope about his ThreadedFrameStream, this is   
>> more or less what he told me:
>>
>> - The ThreadedFrameStream idea is ONLY used on distributed  
>> graphs,  when samples are being streamed onto a different  
>> processor or  better still to a
>> different machine in the network.
>>
>
> OK.
>
>> - He basically agrees that because of the cost of context  
>> switching  (let appart hiperthreading) is not worth it to thread  
>> these things  on a uniprocessor machine.
>> - For these cases they use a different idea that they call the   
>> BlockResizer [1], which is basically the implementation of an   
>> "eager" or "pull" scheduling scheme.
>>
>> [1] http://www.create.ucsb.edu/CSL/doxygen/_block_resizer_8h-  
>> source.html
>>
>
> Ok but I still maintain that when you have to block-resize to   
> implement some kind of algorithm, you may end with implementations   
> that cannot meat RT constraints, and that could run only if you  
> use  an aditionnal thread and (possibly) more latency.
> You actually have the same idea in a hard-disk system that use an   
> aditionnal thread allowing the costly disk access (in the context  
> of  the RT callback) to be "spread" an executed in the other  
> thread, and  having some frames in advance ... even on a  
> uniprocessor machine.
>
> Stephane
>>
>> Stéphane Letz wrote:
>>>
>>> Le 15 sept. 05 à 19:16, Xavier Amatriain a écrit :
>>>>
>>>> Good that everything stays at home! CSL has been designed by    
>>>> S.T.Pope here at CREATE.
>>>
>>> Yes I know (-:
>>>
>>>> He is actually in an office next to mine so I will have a chat   
>>>> with  him to know about the rationale behind that design.  
>>>> Thanks  for the  pointer !
>>>
>>> I would be interested to know about his answer, please share !   
>>> They  are some good ideas in CSL design ...
>>>
>>> Stephane
>>>
>>>>
>>>>> I think I was not able to explain clearly what I meant... (-:
>>>>>
>>>>> But after doing some google, I found that in the CREATE CSL     
>>>>> documentation (http://www.create.ucsb.edu/CSL/CSL_ICMC_2003.pdf)
>>>>>
>>>>> A ThreadedFrameStream uses a background
>>>>> thread to compute samples. It caches some number of
>>>>> buffers from its “producer” sub-graph and supplies
>>>>> them to its “consumer” thread immediately on
>>>>> demand. It controls the scheduling of the thread of its
>>>>> producer. While this obviously introduces latency
>>>>> within a DSP graph, it is a known latency with (ideally)
>>>>> no latency jitter.
>>>>>
>>>>> This is typically the kind of solution I was thinking of :  
>>>>> use    another thread to do the main process and "smooth" CPU  
>>>>> use , in   the  cost of additional latency
>>>>> The same kind of design is also done in Jamin.
>>>>>
>>>>> And I think doing multithreading correctly will be the next   
>>>>> thing  to  do of course on MP machines, but it may help even  
>>>>> UP  machines.
>>>>>
>>>>> Stephane
>>>>>