Popular Posts

Saturday, January 19, 2013

A Hypothetical Modern Audio Language

A Hypothetical Modern Audio Language

I have been thinking generally about the progression of languages with respect to supporting parallelism.  When C and Unix started to take over the world, parallelism meant processes.  This meant that the languages were very much single threaded, and used interprocess communication to get things done between them.  Then languages like Java and C# came along and popularized the usage of threads with shared memory.  Recently, languages like Erlang and Go (and Lua to some degree) came along to take large scale concurrency seriously; making coroutines mainstream.  Coroutines are like generators or iterators that go both ways between caller and called.  It is a mechanism to allow code to be structured in a truly concurrent way.  What these languages still are missing is what is becoming important for numerically intensive applications and mobile devices: SIMD parallelism.  SIMD (and SIMT) parallelism is what we get from GPUs.  GPUs have very high latency, but the current CPU chips have SIMD instructions that do SIMD on a smaller scale (NEON, SSE, etc).  Like coroutines, the parallelism it provides isn't based on multiple independent thread executions.  Instead, it's based on very wide registers which contain large arrays of values with instructions that can operate on all elements of the array at once.  For branchless computations, this can give a fast speedup, and provide deterministic timing when doing so.

I have been mainly mulling over an ideal language for creating OSC controller patches.  All existing languages are a bit of a monster to build because of the amount of stuff they depend on.  Ideally, it's a pure embeddable C library where callbacks are plugged into the target OS, rather than having the dependencies in the language runtime.  ChucK is almost an ideal fit, but that probject seems to not be going anywhere.  SuperCollider is interesting, but it has a very wide scope.  The vast majority of what it does (and therefore the vast majority of dependencies it has) is irrelevant to what I want to do.  I originally thought that a LISP variant would be an ideal language to try to start such a project because it dramatically simplifies the language aspect of it.  But the more I think of it, I want the capabilities of ChucK, combined with the message passing concurrency of Erlang, combined with a way to hide the intense arithmetic so that it is easy to SIMD parallelize.

Specifically for audio apps, you need good soft real-time behavior.  One of the problems that needs to be handled is sample-accurate jitter handling.  Presume that when client talks to server, the client gets sample-accurate timestamps for when the gesture was done, and sends it to the server.  If the audio buffer size is 5ms, then an ideal zero-jitter audio representation will match client and server clocks to schedule incoming oscillator changes to always have exactly 5ms of latency (no more, and no less.  if there is less, then there must be a non-zero jitter as a trade-off.  jitter is worse than latency.)

So I started to fake up some pseudocode, and it looks like this:

//Load up a sample with rules about how it responds to phase changes

//This is the graph of oscillators
//No sound is made.  It's just data setup.

//tell the dac to start running, ultimately pulling data from oscillator through the chorus and reverb

//This is the osc listener
spawn fun osclistener
  recv o
    /rjf/t0,t = {controllernow}:
      timediff = controllernow-now
    /rjf/p0,iifft = {voice,phase,freq,amp,timestamp,phase=attack}:
      at latencycompute(timediff,timestamp) a{freq=freq,amp=amp,phase=attack}
    /rjf/p0,iifft = {voice,phase,freq,amp,timestamp,phase=sustain}:
      at latencycompute(timediff,timestamp) a{freq=freq,amp=amp,phase=sustain}
    /rjf/p0,iifft = {voice,phase,freq,amp,timestamp,phase=release}:
      at latencycompute(timediff,timestamp) a{freq=freq,amp=amp,phase=release}
    after 100ms:

x:oscil(expr= fun(t)
  t & (t%256)

Erlang and Go do message passing concurrency in a pretty straightforward way.  Erlang claims to do soft real-time well.  What Erlang does not do well is strings (normal string usage is like 8 bytes per char because it represents them as a literal linked list of integers), and high intensity numerical computation.  But OSC messaging conceptually fits masterfully into Erlang's way of thinking.  There needs to be things added, such as branchless code needs to be rendered to SIMD instructions if possible.  What would make sense is to either use Erlang or Go as the actual language for this purpose.  But it's also ideal that the language is a pure C library without a ton of dependencies (Lua actually fits this criteria very very well).

BitWiz is an astonishingly simple example of what you can do with creative use of branchless code.  It's not completely clear to me yet how to apply all of those 8-bit lessons to 16-bit yet, but if you look carefully at BitWiz code, you can see that the entire audio buffer fill can be generated in parallel (per sample).  Those kind of simple and branchless expressions (where values change only in between audio frames) should be part of any audio language.

But if you want to run this code on iOS and WinRT simultaneously, there is a huge problem with dependencies.  You cannot depend on LLVM at runtime as an implementation strategy.  If you could do it on iOS, you can't do it on WinRT.  Lua has always been a problem for latency and jitter (I used it in Xstrument at one point).

But this language could just have a non-general purpose VM as its basis.  The VM would be a domain specific CISC instruction set (ie: lots of domain specific builtin primitives), and the language directly talks to the server in terms of those primitives.  Like SuperCollider, that would put the compiler into the client, and it sends raw instructions over OSC to the server.

Otherwise, this is a thought experiment in how good SuperCollider would really be for this task (other than the huge number of dependencies, GPL licensing issues, etc).  As good as SuperCollider is, I still think it's not so great as a language.  It could learn some lessons from Go, Erlang, Lua, and especially ChucK.


  1. Dear Mr. Fielding,

    what about io language (http://iolanguage.org)? While I am not advocating to use this language, it seems to match at least some of your requirements: It has coroutines and other parallel execution concepts, as well as some SIMD processing, it is embeddable, BSD license (good for iOS), depending itself on 3 BSD-like licensed libraries. What is your opinion of it?

    Sincerely yours,

    Wincent Balin

    1. Hmm... Supercollider is also based on Smalltalk.

  2. Mr. Fielding,

    Have you considered the Rust language? It has message-passing parallel concurrency without sharing memory between threads to avoid race conditions, has a very light run-time, and is C-like enough :)

    I'm considering to use Rust in an audio synthesis hobby project, to clone the Haskell music library Euterpea, and Rust seems a fun and interesting language for such a task.

    Rafael Castañeda

  3. I used Lua in a MIDI controller once (much like how Ableton Push uses Python for this purpose). But like with the Push, the garbage collection timeliness was an issue. Lua and Python are good choices for an embedded runtime (with Lua being more efficient). LISP and Smalltalk often get chosen for this reason. The reason why I considered a language was that vDSP instructions are SIMD instructions (whole array add/sub/mul/div/mod/scatter/gather,etc). To get SIMD/SIMT parallelism, which is totally unlike task parallelism, you either have to write in vector-based assembly language, or compile the synthesis loops from a custom language (LISP keeps it simple?).

    In theory, Go or Erlang look interesting, but the point of this would be for fine-grained DATA parallelism; not task parallelism, because you are on a battery powered device rather than a multicore server farm. You want registers that are 64floats long processed in one pass, rather than 64 threads computing a sample buffer, etc. The audio rendering has to be single-threaded for deterministic timing, but in parallel when the sample compute order commutes.

    But looking like Erlang or Go at the level of messaging would be interesting, because it definitely fits in with how OSC works conceptually.

    but in any case... I moved on to other things. I am not interested in app dev right now. But I might get interested if something like OpenCL or CUDA came out for iOS devices with audio rates for filling buffers. (Graphics cards only need 60fps, but audio buffers are very tiny but need 200 to 400 fps, etc.)

  4. I've also been thinking about this
    I found 5 candidate languages....

    New, working on SIMD GUP compilation
    fairly fast already
    excellent concurrency primitives

    new, backed by Mozilla
    seems fast and clean
    good concurrency primitives

    CPC: Continuation Passing C
    CPS C extension library for very lightweight threads

    C-like framework for heterogeneous computing (CPU, GPU, DSP...)

    stack based concatenative (like Forth)

  5. This comment has been removed by the author.

  6. A Hypothetical Modern Audio Language
    type audio