[Fwd: Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and port sizes (was Re: AudioPorts Usage))]
Stephen Travis Pope
stp at create.ucsb.edu
Wed Sep 21 11:49:29 PDT 2005
This sounds like a very interesting thread (of discussion, that is),
but I'm confused by some of the comments.
If your code is slow, breaking it up into multiple threads will only
make it slower (on a monoprocessor, that is).
We use BlockResizer in cases where a graph running with a small block
size (for low latency, for example), wants to use a transform (e.g.,
FFT or FWT) with a larger block size. All it does is to buffer the
output of the larger transform and "dribble" it out into the smaller
buffers. this has no run-time impact.
The ThreadedFrameStream runs its sub-graph in a separate thread,
which is also a very dumb idea to try to get better performance on a
monoprocessor; as Xavier said, we use this for our
RemoteFrameStreams, where a subgraph is running on a separate
computer and streaming samples over the net (the server side of this
is called UDP_IO).
Obviously, this introduces at least one block's worth of latency, but
it does allow us to write graphs that are too large for a single
processor. We don't use it heavily, but it is quite useful for
several cases such as (1) when there's a lot of input processing and
feature extraction to derive control signals (so the required inter-
processor bandwidth is low) or (2) where several complex synthesis
servers are to be mixed, spatialized and reverberated together.
See the examples in the file BlockResizer.h; the file
Server_Tests.cpp shows the client and server set-up for using the
RemoteStream and UDP_IO together.
Lastly, I should emphasize that CSL is a group effort, and especially
that many of the good design ideas came from Sekhar Ramakrishnan.
Stephen Travis Pope -- http://create.ucsb.edu/~stp
Center for Research in Electronic Art Technology, University of
California, Santa Barbara
Really—I don't know what the meaning or purpose of life is.
But it looks exactly as if something were meant by it. — C.
Begin forwarded message:
> From: Xavier Amatriain <xavier at create.ucsb.edu>
> Date: September 21, 2005 10:20:06 AM PDT
> To: Stephen Travis Pope <stp at create.ucsb.edu>
> Subject: [Fwd: Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and
> port sizes (was Re: AudioPorts Usage))]
> Just so you know about how the discussion is going on.
> -------- Original Message --------
> Subject: Re: CSL's ThreadedFrameStream (was Re: [CLAM] RT and
> port sizes (was Re: AudioPorts Usage))
> Date: Wed, 21 Sep 2005 11:18:45 +0200
> From: Stéphane Letz <letz at grame.fr>
> To: Xavier Amatriain <xavier at create.ucsb.edu>
> CC: Pau Arumi <parumi at iua.upf.es>, clam at iua.upf.es
> Le 20 sept. 05 à 19:26, Xavier Amatriain a écrit :
>> After talking to S.T. Pope about his ThreadedFrameStream, this is
>> more or less what he told me:
>> - The ThreadedFrameStream idea is ONLY used on distributed
>> graphs, when samples are being streamed onto a different
>> processor or better still to a
>> different machine in the network.
>> - He basically agrees that because of the cost of context
>> switching (let appart hiperthreading) is not worth it to thread
>> these things on a uniprocessor machine.
>> - For these cases they use a different idea that they call the
>> BlockResizer , which is basically the implementation of an
>> "eager" or "pull" scheduling scheme.
>>  http://www.create.ucsb.edu/CSL/doxygen/_block_resizer_8h-
> Ok but I still maintain that when you have to block-resize to
> implement some kind of algorithm, you may end with implementations
> that cannot meat RT constraints, and that could run only if you
> use an aditionnal thread and (possibly) more latency.
> You actually have the same idea in a hard-disk system that use an
> aditionnal thread allowing the costly disk access (in the context
> of the RT callback) to be "spread" an executed in the other
> thread, and having some frames in advance ... even on a
> uniprocessor machine.
>> Stéphane Letz wrote:
>>> Le 15 sept. 05 à 19:16, Xavier Amatriain a écrit :
>>>> Good that everything stays at home! CSL has been designed by
>>>> S.T.Pope here at CREATE.
>>> Yes I know (-:
>>>> He is actually in an office next to mine so I will have a chat
>>>> with him to know about the rationale behind that design.
>>>> Thanks for the pointer !
>>> I would be interested to know about his answer, please share !
>>> They are some good ideas in CSL design ...
>>>>> I think I was not able to explain clearly what I meant... (-:
>>>>> But after doing some google, I found that in the CREATE CSL
>>>>> documentation (http://www.create.ucsb.edu/CSL/CSL_ICMC_2003.pdf)
>>>>> A ThreadedFrameStream uses a background
>>>>> thread to compute samples. It caches some number of
>>>>> buffers from its “producer” sub-graph and supplies
>>>>> them to its “consumer” thread immediately on
>>>>> demand. It controls the scheduling of the thread of its
>>>>> producer. While this obviously introduces latency
>>>>> within a DSP graph, it is a known latency with (ideally)
>>>>> no latency jitter.
>>>>> This is typically the kind of solution I was thinking of :
>>>>> use another thread to do the main process and "smooth" CPU
>>>>> use , in the cost of additional latency
>>>>> The same kind of design is also done in Jamin.
>>>>> And I think doing multithreading correctly will be the next
>>>>> thing to do of course on MP machines, but it may help even
>>>>> UP machines.
More information about the clam-users