Popular Posts

Tuesday, August 20, 2013

Types and Calling Orders


Assuming that when a blob of code reaches some length it is re-factored into smaller functions, having no type system means that you can't cut off the search for what could go wrong at the inputs, outputs, and very limited side-effects; which would be like being able to prove that something is right with O(n) code review rather than from O(n lg n) code review up to O(n^2) for code full of side-effects.

If an API has M methods that only have a small number of valid calling orders, then there might be O(M!) wrong orders that kind of work well enough to be ticking time bombs.  Types can be used to document/enforce calling orders almost up to a regular expression; but not a grammar, ie: provably balanced alloc/free, push/pop, etc.  (todo: Enforcing up to a grammar may be possible with a linear type system, ie: Rust.)  Documenting calling orders is possibly more important than input/output type enforcement for this reason; which is why people read examples rather than auto-generated docs.

This extra sensitivity to ordering is what makes shared memory multi-threaded code so much harder than share-nothing concurrent code; in a measurable exhaustive bug-search sense.  It almost always has to do with two things going on that 'dont commute'.  It's a common phenomenon (commutators) in mathematics as the launching point from something that's trivial suddenly becoming hard:

A * inverse(A) = 1, and B * inverse(B) = 1

Yet:

A * B * inverse(A) * inverse(B) != 1

Because A and B don't completely commute.  It's why a Rubiks cube is trivial to solve no matter how many times you scramble opposite faces, but suddenly becomes a real problem once you start mixing faces that share an edge.  A lot of important mathematical problems are rooted in this.  It's exactly this that causes two tasks that share resources to be more complicated than the two tasks in isolation.


In addition to the actual ordering, a lot of type systems are kind of lacking in not supporting subset types, with the most important one being that every pointer is a non-null value by default (and casting a nullable X ptr to a non-null X ptr throws an exception at cast time when null rather than ever allowing nullable X ptr de-reference attempts (at random locations!) in the code.)  Other examples would be simple ranges like [0..12] as types so that there need not be runtime checks (ie: cast 13 to one of these and it throws an exception...it's not possible to find an out of range value assigned to one).


Not all type systems are perfect, but I gotta shake my head when I hear that not having a type system is somehow more productive.  It's procrastination.  You end up revisiting whipped-up code later as it breaks during major re-factoring efforts and writing and maintaining unit tests that essentially check types, assertions, and calling orders.

Friday, April 19, 2013

Upsert Counters In Postgres

Upsert 


When faced with a situation in which you are inserting huge volumes of data ('log volumes of data', think append-only or circular queue tables) and need to keep some statistics, you would like to have counters.  In this situation, I am working with Postgres, but upsert concepts also exist in MySQL apparently.  This is related to a common situation in which you know the primary key and some initial values for a datapoint (to describe the object that the primary key refers to), along with data that will be updated constantly.  You don't want to look these up to see if they exist in order to do an update on them, because you want to do a huge number of inserts in a batch.  The simplest upsert is like this:

sql = "with upsert as
(update ourtable
  set mutable1 = ?
  where id=?
  returning id)
insert into ourtable
(id,constant1,mutable1)
select ?,?,?
where not exists (select 1 from upsert)"

#pseudocode usage like this...
def OurTableUpsert(statement,id,constant1,mutable1):
  #move on to the next item
  statement.item++
  statement.row[item].args[0] = mutable1
  statement.row[item].args[1] = id
  statement.row[item].args[2] = id
  statement.row[item].args[3] = constant1
  statement.row[item].args[4] = mutable1

#Make a giant batch of updates against empty table
statement = prepare(sql)
OurTableUpsert(statement,8,'Ovechkin Goals',1)
....
OurTableUpsert(statement,8,'Ovechkin Goals',2)
...
OurTableUpsert(statement,1003,'Planets',9)
...
OurTableUpsert(statement,1003,'Planets',8)
...
execute(statement)

Which lets us set mutable1 of ourtable to a value given an id, whether this row already exists or not.  This is important because we want to make huge batch updates, and don't have time to go figure out what already exists.  It prevents us from having to do a select on the items to generate inserts on the ones that don't exist and updates on the ones that do exist, and lets us do everything as one batch statement.  The constant fields like constant1 are never updated, only supplied with initial values, and the mutable fields mutable1 are constantly updated.


Counters


A more complicated example that's a collection of counters.  The counters should be sparse such that values of zero don't have a row in the database, and you only write back to the database to increment their values.

Updates alone don't let us do counter-like behavior.  The upsert is general enough that we can supply incremental values.  We change the statement to make the arguments supply increments, and partition the increments by day:

sql = "with upsert as
(update ourcounters 
  set counter = (? + counter)
  where id=? 
  and starttime <= now() and now() < stoptime
    returning id)
insert into ourcounters 
(id,counterdescription,counter,starttime,stoptime)
select ?,?,?,?,
  date_trunc('day',now()),
  date_trunc('day',now() + interval '1 day')
  where not exists (select 1 from upsert)"

#run something like this
def OurCounters(id,description,increment):
  statement.item++
  statement.row[item].args[0] = increment
  statement.row[item].args[1] = id

  statement.row[item].args[2] = description
  statement.row[item].args[3] = increment

#Increment a whole bunch of counters
statement = prepare(sql)
OurCounters(1,'United States',235)
OurCounters(44,'UK',53)

OurCounters(44,'UK',99)
OurCounters(1,'United States',23523)
...
execute(statement)

You can turn this into a stored procedure and supply it with a huge list of counter values to increment.  These are sparse counters, so there won't be a row in the database for them if the value is zero.  It increments the counter for the current day so that you can retrieve total counters over a window of days.  The trick here is that the raw counts get bucketed by day.  That way, when you startup the application from the beginning, you can retrieve a total count:

select id,description,sum(counter) from ourcounters

And then in memory, keep increments to the total that last came from (or was written to) the database.  Periodically you write them out.  But you write them out so that it places it into an increment for the current day.  This way you can keep incremental statistics on a large amount of data, where there is only a lightweight query on first startup, and an efficient way to write updates and filter out for an initial time window.

Summary


I have worked on relational database systems that generally deal with a fast ingest of events (where a mismatch between the insert and delete rate becomes a severe problem when you reach the maximum number of rows you would like to maintain in the database).  With excursions off into Hadoop, Cassandra, Solr, etc, I start to run into situations where relational databases are being used as ringbuffers or append-only immutable data, while the rest is boiling down to having a great system for counters (including nested counters). Upsert was something that I had not seen until recently.  It looks like it will come in handy in a lot of places.



Monday, March 25, 2013

Git Merge Versus Rebase

Git Merge Versus Rebase


You will see a lot of strange advice in Google searches on merge versus rebase, and how to deal with conflict resolution.    The advice conflicts, so it obviously isn't all good advice.  This link here has it right:

http://gitguru.com/2009/02/03/rebase-v-merge-in-git/

The biggest usability goof in the git interface is just the fact that "git pull" isn't a rebase by default.  Rebase causes the commits to appear such that incoming changes come in in between old origin state and our changes, where we make the edits to apply over top of the incoming changes.  (Read gitguru's article until it makes sense!)

The fact that rebase creates a rewrite of history is feared to be a problem by some developers, but it's not a problem if pre-rebased (and pre-squashed) commits are never pushed up into the master commit.  In other words, you can rebase that which is still local to your machine, or only cloned into a throw-away branch that somebody reviewing your code is using.

The Common Case


Your pull (from your master branch) should always be "git pull --rebase", or similarly "git fetch; git rebase", except for a few special cases.  Doing fetch, rebase separately lets you deal with rebase conflicts in the master (which you should not have when you follow this discipline).  It would also be helpful if git branches were forced to be named like file paths to reflect the history of how branches were created.  Consider this slightly verbose way of working:


#Alice has been asked to add Xmpp and Gopher protocol support
#so we make the tree of branches
git clone git://origina/ManhattanProject
cd ManhattanProject
git checkout master
git checkout -b masterProtocolSupport
git checkout -b masterProtocolSupportXmpp
git checkout masterProtocolSupport
git checkout -b masterProtocolSupportGopher

#Alice is asked to fix an SSL bug
git checkout master
git checkout -b masterFixSSLBug

Each bug is in its own branch, because only a subset of these branches will ultimately be approved to be pushed into the repository.  The clone creates a leaf off of the origin master branch.  Doing a checkout -b creates a leaf off of the currently checked out branch and changes over to that branch.  Alice gets to work, making commits into these branches:

masterProtocolSupportGopher
masterProtocolSupportXmpp
masterFixSSLBug

Alice is told that there were bug fixes checked into master and pushed to the origin.  We always follow the rule that we begin at the master and rebase before merge, so that all merges will fastforward.  If there are any conflicts, the conflicts are always resolved in rebases - not in merges.  (This rule is only violated in scenarios where we have to merge trees across different origin servers, where the common ancestor commit is too far in the past.).  This is the default scenario:

git checkout master
git pull --rebase
#we should only get conflicts when rebasing leaf branches - if you get a conflict here, you screwed up
#stop checking into your local copy of master after you fix this mess.
git checkout masterProtocolSupport
git rebase master
git checkout masterProtocolSupportXmpp
#may get conflicts on this rebase - fix them here
git rebase masterProtocolSupport
git commit

git checkout masterProtocolSupportGopher
#fix conflicts here
git rebase masterProtocolSupport

Assuming that the conflicts you get in the Xmpp versus Gopher branch are distinct, this is straightforward.  You then need to figure out what subset of your work will be approved to be merged into the local repository.  So, somebody pulls alice's masterProtocolSupportGopher and masterProtocolSupportXmpp branch.  The reviewer decides that the Gopher support isn't worth the trouble and tells Alice to discard the Gopher work, and push the Xmpp branch.  The SSL fixes are also approved without controversy.  So we need to make a test branch that includes all that is approved:

#start from the root of the tree and just ensure that we rebase before we push in all cases.
#if fastforward merge fails, then fix it in the leaf and merge it up rather than messing with non-leaf
git checkout master
git pull --rebase
git checkout masterProtocolSupport
git rebase master
git checkout masterProtocolSupportXmpp 
git rebase masterProtocolSupport #squash commits after this
git checkout masterFixSSLBug 
git rebase master #squash commits after this
#we are rebased

#make a disposable testmaster branch that merges all approved work
#that we can run tests on.  merge into test rather than master
git checkout master
git checkout -b testmaster
#merge everything down into test
git checkout masterProtocolSupport
git merge masterProtocolSupportXmpp
git checkout testmaster
git merge masterProtocolSupport #merge approved branches
git merge masterFixSSLBug #merge approved branches

#this proves that a push to origin master will 
#fast-forward if you do it now.
#run tests before you do that!
#if it works, then merge this into master, push it, 
#and discard this branch when done.


So we merge protocol support and ssl fixes into our test branch after we squash the commits for our leaf branches to include commentary from the reviewer (who it was, etc), to move out irrelevant noise from the detailed original commits.  Because we always rebase before merge, all the merges should fastforward.  This will give us a very clean and easy to read git log.

Testing


We run our tests and see that they pass.  We can do this test before we ask for a code review of our leaf branches as well, but ultimately we need to run tests on the subset of branches that was approved to ensure that we don't push breakage into the repository when an unapproved branch gets left out.  After the tests pass push the work up,

#the merge into testmaster proved that the push would fast-forward if we did it.
git checkout master
git merge testmaster #merge what we tested, equiv to merging approved branches
git push origin master
#we don't need this branch now
git branch -d testmaster

Exceptions


If you work like this, then you won't be dealing with merge problems in git.  You won't be pushing broken commits up into the master branch for others to figure out (because you can actually integration test before you even push!).  You will still have to resolve conflicts when you rebase, but that is exactly where the resolution needs to be done.  There will still be unusual scenarios where you have to merge in both directions, when you are not dealing with a straightforward tree of branches around the master.  It is possible to use merges for everything, and there is no permanent damage to the repository if you make a mistake.

There will also be scenarios where you want to commit into a non-leaf node like masterProtocolSupport for the cases where the same merge conflicts happen in both leaf branches.  In that case, Xmpp and Gopher would treat masterProtocolSupport like the remote.  The same rebase before merge rule still applies.

There are also cases where you might not rebase from a remote, violating the rebase from root towards leaf rule.  You may have to merge when you have multiple origins (origina,originb) that are evolving independently (because you want to rebase on one consistent parent branch).



Saturday, January 19, 2013

A Hypothetical Modern Audio Language

A Hypothetical Modern Audio Language

I have been thinking generally about the progression of languages with respect to supporting parallelism.  When C and Unix started to take over the world, parallelism meant processes.  This meant that the languages were very much single threaded, and used interprocess communication to get things done between them.  Then languages like Java and C# came along and popularized the usage of threads with shared memory.  Recently, languages like Erlang and Go (and Lua to some degree) came along to take large scale concurrency seriously; making coroutines mainstream.  Coroutines are like generators or iterators that go both ways between caller and called.  It is a mechanism to allow code to be structured in a truly concurrent way.  What these languages still are missing is what is becoming important for numerically intensive applications and mobile devices: SIMD parallelism.  SIMD (and SIMT) parallelism is what we get from GPUs.  GPUs have very high latency, but the current CPU chips have SIMD instructions that do SIMD on a smaller scale (NEON, SSE, etc).  Like coroutines, the parallelism it provides isn't based on multiple independent thread executions.  Instead, it's based on very wide registers which contain large arrays of values with instructions that can operate on all elements of the array at once.  For branchless computations, this can give a fast speedup, and provide deterministic timing when doing so.

I have been mainly mulling over an ideal language for creating OSC controller patches.  All existing languages are a bit of a monster to build because of the amount of stuff they depend on.  Ideally, it's a pure embeddable C library where callbacks are plugged into the target OS, rather than having the dependencies in the language runtime.  ChucK is almost an ideal fit, but that probject seems to not be going anywhere.  SuperCollider is interesting, but it has a very wide scope.  The vast majority of what it does (and therefore the vast majority of dependencies it has) is irrelevant to what I want to do.  I originally thought that a LISP variant would be an ideal language to try to start such a project because it dramatically simplifies the language aspect of it.  But the more I think of it, I want the capabilities of ChucK, combined with the message passing concurrency of Erlang, combined with a way to hide the intense arithmetic so that it is easy to SIMD parallelize.

Specifically for audio apps, you need good soft real-time behavior.  One of the problems that needs to be handled is sample-accurate jitter handling.  Presume that when client talks to server, the client gets sample-accurate timestamps for when the gesture was done, and sends it to the server.  If the audio buffer size is 5ms, then an ideal zero-jitter audio representation will match client and server clocks to schedule incoming oscillator changes to always have exactly 5ms of latency (no more, and no less.  if there is less, then there must be a non-zero jitter as a trade-off.  jitter is worse than latency.)

So I started to fake up some pseudocode, and it looks like this:

//Load up a sample with rules about how it responds to phase changes
soundfile:table{file="chime.wav",attack=0,sustain=256,loop=1024,end=2000}

//This is the graph of oscillators
//No sound is made.  It's just data setup.
a:oscil{table=soundfile}
b:chorus{in=a}
c:reverb{in=b}
d:dac{in=c}

//tell the dac to start running, ultimately pulling data from oscillator through the chorus and reverb
dac.start();

//This is the osc listener
spawn fun osclistener
  o:osclistener{port=9999}  
  recv o
    /rjf/t0,t = {controllernow}:
      timediff = controllernow-now
    /rjf/p0,iifft = {voice,phase,freq,amp,timestamp,phase=attack}:
      at latencycompute(timediff,timestamp) a{freq=freq,amp=amp,phase=attack}
    /rjf/p0,iifft = {voice,phase,freq,amp,timestamp,phase=sustain}:
      at latencycompute(timediff,timestamp) a{freq=freq,amp=amp,phase=sustain}
    /rjf/p0,iifft = {voice,phase,freq,amp,timestamp,phase=release}:
      at latencycompute(timediff,timestamp) a{freq=freq,amp=amp,phase=release}
    after 100ms:
      shutdownAllVoices()

x:oscil(expr= fun(t)
  t & (t%256)
)

Erlang and Go do message passing concurrency in a pretty straightforward way.  Erlang claims to do soft real-time well.  What Erlang does not do well is strings (normal string usage is like 8 bytes per char because it represents them as a literal linked list of integers), and high intensity numerical computation.  But OSC messaging conceptually fits masterfully into Erlang's way of thinking.  There needs to be things added, such as branchless code needs to be rendered to SIMD instructions if possible.  What would make sense is to either use Erlang or Go as the actual language for this purpose.  But it's also ideal that the language is a pure C library without a ton of dependencies (Lua actually fits this criteria very very well).

BitWiz is an astonishingly simple example of what you can do with creative use of branchless code.  It's not completely clear to me yet how to apply all of those 8-bit lessons to 16-bit yet, but if you look carefully at BitWiz code, you can see that the entire audio buffer fill can be generated in parallel (per sample).  Those kind of simple and branchless expressions (where values change only in between audio frames) should be part of any audio language.

But if you want to run this code on iOS and WinRT simultaneously, there is a huge problem with dependencies.  You cannot depend on LLVM at runtime as an implementation strategy.  If you could do it on iOS, you can't do it on WinRT.  Lua has always been a problem for latency and jitter (I used it in Xstrument at one point).

But this language could just have a non-general purpose VM as its basis.  The VM would be a domain specific CISC instruction set (ie: lots of domain specific builtin primitives), and the language directly talks to the server in terms of those primitives.  Like SuperCollider, that would put the compiler into the client, and it sends raw instructions over OSC to the server.

Otherwise, this is a thought experiment in how good SuperCollider would really be for this task (other than the huge number of dependencies, GPL licensing issues, etc).  As good as SuperCollider is, I still think it's not so great as a language.  It could learn some lessons from Go, Erlang, Lua, and especially ChucK.

Sunday, January 6, 2013

Audio Languages

Audio Languages and iOS, WinRT, Android, etc.

I have been examining the various ways to simplify development of music instruments for touch screens.  I wrote Mugician, Geo Synth (also known as Pythagoras in its early development), and Cantor (known as AlephOne early on).  Mugician and AlephOne are open source projects that I have posted into github.  Geo Synth is a commercial project that is released under Wizdom Music (Jordan Rudess, the Dream Theater keyboardist's company).  I built these apps because I believe that the instrument layout that they have on touchscreens will completely overtake the guitar and keyboard industry at some point when latency and pressure sense issues are dealt with.  I have a lot of videos posted about it, and there are a lot of pro musicians that use Geo in particular (and some using Mugician as well):

http://www.youtube.com/rrr00bb

The main reason I believe that this will happen is that this layout on a touch screen solves intonation and pitch problems that guitars and keyboards actually make worse.  It is even a good ergonomic setup because there are 10 fingers on top of the playing surface, with a layout that's familiar to guitarists, and easier than keyboards (even for actual keyboardists).  The regular layout makes it easier to play without having to have extensive feel feedback (ie: reaching proper sharps and flats on a piano where you can't feel where white/black key boundaries are).  This allows you to play very fast; significantly faster than real guitar playing even.

The Beginning and Background MIDI



After a few years of playing around with iOS controller code, a year into it, I came to the conclusion that I am doing something architecturally very wrong (along with the rest of the iOS community).  Before Audiobus came out, there was no way to contain the list of requirements for an iOS based audio app, to keep simple ideas from turning into time-burning monstrosities.  Back then, you could run another app in the background, and even send it MIDI messages.  But MIDI has so many problems that you still need to have an internal audio engine to have any guarantee about the user experience, and to ease setup.  (So, in my view; the existence of internal audio engines in any MIDI controller app means that MIDI doesn't actually work as it was intended.  I will have more on OSC later.  If MIDI did what it is supposed to do really well, then nobody would bother writing a synth to go with a controller, or vice versa.  It would not make sense financially to ever do it.  But the combination over MIDI sucks, so we end up doing both sides of it.)   Because audio was still isolated between apps there was a lot of pressure to make every instrument have:
  • A basic controller.  Most people imitate pianos, because that's what synthesis guys are used to looking at.  Though on the tablet, it's always a horribly unplayable imitation of the piano.  I don't understand why people keep doing this.  We have had 2 years of this, where there's little point in having the controller because it's not playable - but it's included anyway.  You can make a playable controller, but the first thing that has to happen is to drop the requirement that it resemble a piano; as this has been proven for many years to not work on a touch screen.
    • The two main jobs of the controller should be to proxy out to controls for the synths/DAWs in the background when required (ie: volume knobs, timbre sliders, etc), and to simply play voices with the correct amplitudes and pitches.  MIDI's very strong note orientation makes it an awful protocol for meeting these requirements.  OSC can trivially do all of this correctly, but it's a very vague standard.  It's so vague that if you make an OSC controller, you generally need to ship a script that is put into the front of the synth to unpack the OSC messages and do what it needs to do to the synth.  That's where audio languages come in later.
  • A synthesis engine.  This is a pretty steep electrical engineering task to do really well.  This is why Animoog is one of the best ones out there.  Any instrument that simply plays back samples is really unexpressive, and misses the point about music instruments and expressivity.  If you are making a MIDI controller, you can (and should) stop here if at all possible and just send MIDI messages.  When background MIDI came out, it was such a wonderful new thing to have, presuming that you could generate MIDI to do what you want and the majority of synths interpreted this correctly.  What should have happened as a result was that the app world should have split between people providing MIDI controllers and those providing synthesizers, and nobody wasting their time duplicating a mediocre version of whatever the app primarily was not.  This typically means mediocre controllers on apps designed to be synths.
  • A recording functionality, of audio copy and paste.  Actually, it's really a request to include some level of a DAW in every single music instrument.  Because this is tied into the synth, it's generally a mediocre DAW functionality.  You can't really use these instruments naturally because you have relatively short buffers to record audio into.  AudioCopyPaste is quite useless if the primary use case is somebody playing each track non-stop for 20 minutes.  That's precisely the kind of use case I cared about.
The iOS audio system wasn't designed primarily for real-time control (at the audio sample rate).  We are also dealing with ARM processors, because we run on battery power.  Because of this, it has always been a struggle to get instrument-quality latency on any instrument; let alone an instrument that can't stick to doing one thing well and throwing everything else overboard to solidly meet the performance guarantees.  Currently, audio buffers are between 5ms and 10ms for "good" apps, though iOS gives about 20ms as the default (under the assumption that you are just playing back sound files).  It should really get down to about 1ms audio buffers to meet the standards of professional audio quality.  Beyond even that, almost no instruments (including my own) will adjust the latency to reduce jitter to 0ms (by making all latency larger, but constant); because that's usually an expensive thing to implement in an audio engine.  Remember that there is no standard for generating audio, and there is a mix of professionals and amateurs doing the best that they can do while scribbling waveforms into raw audio buffers.  This means that we have a lot of crappy effects units, audio glitching, aliasing, etc.

For reference, there are these different kinds of latency that we have to deal with:
  • Graphics frame rate, about 60fps ("real-time", but a single frame miss will not result in a disaster).  The user interface can cause other stuff to stall when it's over budget.  This is especially true if the interface is in a high level language like C# or Lua.  Also, if you try to use the GPU to render audio frames, then you could be limited to the graphics frame-rate, or at least locked out from rendering audio while graphics rendering is in progress.
  • Audio frame out rate, 44.1khz (44000 "frames" per second, where a single frame miss results in a horrible popping noise.  It's hard-real time).
  • Audio frame in rate, 44.1khz or 22khz.  If you are making a guitar effects processor, then you have to add up the latency of incoming, and latency of outgoing, and latency of minimum time to compute a frame for output.  Because of this, just because you can build a wicked fast tablet instrument, doesn't mean that you can use that same effects chain for a fast guitar effects pedal.
  • Control rate, at least 200fps (256 audio frames).  But would like to quadruple performance to use 64 audio frames at 800fps, for 1ms latency and jitter.  This is somewhat tied into how fast the touchscreen can sense.  If the touch timestamp is sample accurate, then we can send the timestamp to the synth and synth can make the latency even to reduce jitter to zero.
  • MIDI/OSC packet latency/jitter.  It's very network dependent, and can either be negligible or fatal depending on how much of it is.
Latency issues are still a bit of a mess, and are ignored or not understood by a lot of people.  This is especially true on Android, where latency is very far beyond fatal for creating real-time music apps.

Audiobus



Then Audiobus came out.  Audiobus wonderfully fixes the huge problem that iOS had with audio isolation in apps.  It's existence is very necessary.

Audiobus lets every app (input app, effects unit app, output app) read and write to the Audiobus callbacks so that separate apps can be chained together.  So at about 200 times a second, Audiobus is in the background calling every participating app (an input app, and effects app, and an output app) to run all of their callbacks in order.  This has the effect of now having 4 apps have hard real-time requirements at audio rates(!).  AudioBus + controllerApp + fxApp + dawApp ... that is 4 apps.   At 200 times a second, that's 1000 callbacks getting filled out in a second.  Also, that's 4 apps with a lot of memory allocated to wrangling audio in real-time.  ControllerApp is going to have a user interface on it as well.  The user interface can hog resources and cause everything else to stall.  The user interface can easily be the thing that causes the whole pile of audio generating apps to not run properly in real-time.  It's hard to overemphasize the problem that gets created.  If there were only one hard real-time process responsible for generating audio, with everything else being a controller of some kind, then glitching and performance issues mostly go away.

Audiobus also creates a somewhat political issue with app development as well.  Audiobus app will list controls that are input, effects, or output capable.  It has nothing to say about controllers.  If you app is not a synth or a DAW, then realistically it should not be generating audio.  If you app is a controller, then it should implement MIDI or OSC.  But if you are such a controller, you are technically not in the "Audiobus" enabled category; which means that your app essentially doesn't exist for a lot of users.  So what do we do?  We pointlessly add in Audiobus support into controllers for no reason, just so we can get into that category.  If you unnecessarily generate audio, you just eat up memory and CPU; resources that actual Audiobus apps really need.  :-)  Controllers are essential to a chain of Audiobus apps, but controllers don't generate or deal with audio.  Controller to synth is one protocol, and synth to effects to daw is another protocol.  Note that, if Audiobus had a much requested feature to save setups, then it probably would have to include MIDI and OSC connections as well.

Controller -> MIDI/OSC -> Synth -> Audiobus -> Effects -> Audiobus -> DAW

It should be like that.

MIDI versus OSC



I have posted at length about all the technical problems that MIDI has on touch screens; a very long and technical read:

http://rrr00bb.blogspot.com/2012/04/ideal-midi-compatible-protocol.html

The problems are deep issues with what can and cannot be said in the MIDI protocol, especially in the subset that all synths are required to understand.  The main problem with MIDI is that it is oriented around notes, rather than around frequency/pitch.  MIDI's note bending is a hack that has all kinds of obvious corner cases that it can't represent; all of these corner cases don't show up in piano controllers (which is why they never got fixed), but they are essential cases for string instruments (which is why MIDI guitars are not standard, and are deficient in all kinds of various ways when people have tried).  Real oscillators don't know anything about notes, and are frequency oriented.

OSC can be made to handle all of this stuff very nicely.  OSC is just a remote procedure call syntax.  It's literally just named functions with parameters going back and forth, like:

  /guitar/string0,fff 1.0, 440.0, 0.5      #amplitude, frequency, timbre
  /guitar/string0,fff 0.9, 442.0, 0.53     #amplitude, frequency, timbre
  ...

The problem with it of course is that all controllers are essentially custom creations.  The messages going to the synth, and from synth to controller could be anything at all.  If you defined what a keyboard looks like, you could standardize it. 

Audio Languages



So, now this brings me to the heart of the problem I am facing.  I want to completely separate out the audio language from the controller.  I don't want the protocol that the controller speaks to make assumptions place unintended limits on the controller.  And I don't want the controller user interface to hurt the real-time audio engine.  So, I have an experiment here on Windows8 (a 27inch screen), where I have a C# program that sends UDP packets to an audio language ChucK:



A lot of audio language aficionados are fond of SuperCollider, Max/MSP/Pd, and CSound is an older language that is still used.  There are a few more audio languages, but those are the popular ones.  These languages have common characteristics:
  • open up a listening port for incoming OSC messages, and get them into a script that does something with the messages
  • because OSC is just a standard for sending messages, the synth front-end must have a script pushed into it to actually control oscillators and effects parameters, etc.
  • they all let the sound patch be defined completely in a script.
  • the script can be pushed into the synthesizer from the controller.  this means that the real-time synthesis engine is one app (ie: scsynth, csound, chuck), and the patch comes from the controller
  • in some of these languages, ChucK in particular, you setup the network of effects and advance time explicitly.  As an example, you create an oscillator at amplitude 1, and 440hz.  You then tell the engine to move forward 30 milliseconds.  When that happens, 30 milliseconds of audio is generated.  This is a very hard-real-time notion of how a language should work.  It is the main thing that the environments are missing when we try to write synthesizers.  This kind of environment is most critical when you try to do things like increasing latency of events to provide zero jitter; for when you want sample-accurate timing and want to delay every change by 5ms rather than at the beginning of a new audio buffer, which guarantees at least 5ms of jitter (ie: latency of 2.5ms with 5ms jitter vs 5ms latency with 0ms jitter).
  • you can inject an entire sequencer into the sound engine, and only send it control changes after that.
  • you can define effects units like reverbs and distortion units - in scripts that run on the tablet - and install them into the audio engine at runtime.  at this point, the mentality could not be any more different from MIDI (and Audiobus) than this.  This is where environments like Max/MSP make a lot of sense on tablet computers.

Audio Language Present



The current group of audio languages have features that don't make them ideal for what I am trying to do.  The current problem with them is that most of them are oriented around live coding, or in the case of CSound, around offline score rendering.  These are both a different perspective from the use of creating hard real-time OSC protocol synthesizers that are driven primarily by controllers.

Csound is a pretty fast environment.  It is well known in academic circles, where offline rendering of music is a plausible thing to do.  MIDI and OSC support are a horrible after-the-fact hack however.  The language is really low level, and will not appeal to a lot of people who would otherwise write patches for it.  It's designed mostly around building up a static graph of oscillators and filters. It builds for all the primary desktop environments.  CSound also has some pretty bizarre limitations that forced me to change my OSC messaging to break code that could simultaneously talk to SuperCollider and ChucK.

SuperCollider is very actively developed.  But currently, it's under a GPL3 license, though work is being done to detangle things to allow for GPL2 compliant builds.  Because of this, it almost doesn't matter what else is good about it; because this situation is currently fatal for all of the tablet environments.  The $0.99 app model depends on the use of Digital Rights Management (locking down hardware and preventing you from doing whatever you want on your own device), so DRM will never go away.  End users have spoken; and they will never give up the $0.99 app model where they can have the interface polish of commercial apps, and close to the prices of free apps at the same time.  The DRM conflicts with GPL licensing, and the SuperCollider devs seem pretty adamant about not just ignoring this issue and letting SuperCollider be embedded anyway.  GPL2 compliant builds may also have issues as well for user-facing projects that have a short tail, just cannot be open source projects (rock stars don't work for free for very long, it's not just developers required to make apps happen, etc).  But ignoring that huge issue, SuperCollider is very mature, and has a pretty healthy user and developer base.  It is based on a Smalltalk-like language.  The only major technical downsides seem to be that a lot of development is directed towards areas that are irrelevant to the main task of creating realtime OSC synthesizers driven by controllers.  Much work goes into user interface things that won't be useful on the tablet platforms, and on things that seem to be related to the live-coding use cases.

Pd (Max/MSP) is an interesting contender.  The underlying language is hidden behind a simple and standardized visual interface.  In some ways this is good, in that it's easy to do simple things.  In other ways, it's really terrible, in that when faced with doing real work, you can easily face a ball of unmaintainable synthesis code that would be simple with the tried-and-true abstractions available in standard programming languages.  It's BSD licensing is very compatible with commercial products.  Some people have contributed ARM specific improvements.

ChucK is the closest thing to a modern environment in this group.  It is a bit less capable than SuperCollider in some respects, but the language really makes a lot of sense, and because the project is run by Smule, these guys understand the tablet world pretty thoroughly.  Performance issues aside, it is the most interesting option that seems to be the least burdened by irrelevant features.  It isn't freely licensed on iOS however (though its license is not an impossible one for iOS like GPL3).  ChucK's applicability seems to have a lot of applicability outside of just audio synthesis as well.  It's an interesting lesson in how real-time languages should look in general. 

Audio Languages In an Ideal World



Dependencies: One of the more problematic things I encounter with these languages is the kinds of dependencies that they have.  The current languages were very much designed with a desktop world in mind.  When building for a new platform that they did not envision, these are not going to be designed as pure buffer generators that get hooked up into callbacks (ie: Audiobus, CoreAudio callbacks, WASAPI callbacks).  Ideally, the core project should not have branches per platform; but as projects that build around the core.  Any audio language project started from scratch should build without pulling in a lot of dependencies, and should ultimately just invoke callbacks that the language has filled in with audio data.  This is how Audiobus works, and presumably how The Amazing Audio Engine will work (though these projects are very iOS specific).  A real source of heartburn is that even "standard" frameworks will pose a problem.  OpenGL, OpenCL, OpenAL, etc, are the usual route to portability; then Microsoft uses WinRT and insists on DirectX and WASAPI, etc.  Using straight C code with a minimum of dependencies is generally the only way to avoid this problem.

SIMD: Few of these languages take advantage of SIMD in their implementations (single thread lockstep parallelism, the kind that you need for fast and short convolutions, filtering, for just rendering entire audio buffer in parallel, etc).  These are all in C or C++, and there is no good standard for doing this yet.   But it is typically necessary that per-platform, there needs to be SIMD optimized builds for the engine to be feasible on ARM processors.  Examples are vDSP, and ARM intrinsics.  OpenCL addresses these issues in theory, but it's unclear if GPUs can be used in practice for this kind of audio composting.  The SIMD work might be tied into the audio language VM, rather than compiling elemental functions to use SIMD at runtime.

The Language: Because these environments have hard-real-time requirements, there is a real conflict with having a dynamic environment as well.  These languages run in custom virtual machines.  They do real-time garbage collection.  Because of this, the language cannot get overly baroque without messing up the goals of the language.  The language should work well in a hard-real-time environment.  This generally means that memory allocation and de-allocations are much more conservative, and algorithms run in consistent times.

Language Simplicity: A variant of LISP that deals with arrays and SIMD directly seems to be the most obvious candidate to get started with.  There are existing audio languages that use LISP as their basis.  A virtual machine for running an audio language should at least start out very simple, and grow as needed.  The main issue with using LISP in this capacity would be to support actual arrays from the outset, and allow for array operations to be SIMD parallelized (ie: avoid a high garbage collection rate, locality issues, etc).

The OSC Protocol:  The most wonderful thing about SuperCollider is how the language environment (sclang) talks to the real-time engine (scsynth) over OSC.  It is an effective split between hard real-time and soft real-time.  It allows the compiler to simply be removed from environments where it isn't needed.  The controller should do something similar to sclang's equivalent, and use OSC as the protocol over which to inject patch definitions into the synthesizer. 

The VM: The virtual machine could be a traditional one written by hand.  It could also be LLVM output that is consumed and turned into native code.  LLVM is designed to run on many systems, but again I run into issues with standards possibly not being usable in certain places (WinRT?  How about generating CLR as a backend instead?).  OpenGL drivers on OSX already work like this.  They take shader code and generate the card specific assembly language; and this is for a pretty performance area critical part of the system.

Patch Generation

When I was working on Cantor (AlephOne), I had started to write a Python compiler to generate SIMD C code (vDSP - partially realized in my github project DSPCompiler and in AlephOne) for the audio engine from LISP input.  I ran into this problem because OpenCL wasn't available, and I had a similar problem.  When you try to generate wide SIMD code, your code turns completely "inside out" where the outer loop that existed in the serial version goes into every instruction of the "assembly language expansion" of the original C code.  For example:

//can't parallelize that
for(int i; i<N;i++){x[i]=a[i]+b[i]*c[i];}
Becomes like:

for(i : 0..N) mul b, c, x
for(i : 0..N) add x, a, x

But that doesn't support dynamic use cases.  The patches would need to be compiled into releases.  But a VM supporting SIMD instructions appropriately could provide a speedup even when the originating language is high level.

Wednesday, December 5, 2012

An OSC / ChucK Windows8 instrument

OSC


Up until now, I had avoided OSC because getting synths that understand it setup correctly was very inconsistent, if not difficult.  At the time, I was wringing all I could out of MIDI, or rather unhappily building internal audio engines - knowing that the result would not be as good as a battle hardened synth.  I have tinkered with Pd (hard to do non-trivial programming in brute-force visual spaghetti code), SuperCollider (a rather horrible language, but more complete programming capability), and ChucK (a little weird at first, but a great language - but performance is not necessarily great).  The other main issue was that before I found myself on an x86 tablet, the GPL license for SuperCollider and ChucK were problematic.  On iOS, you end up having to bake everything into a monolithic app.




But I really wanted to offload all signal processing into one of these environments somehow, and found out that ChucK does OSC really nicely.  It's pointless (and ridiculous) for me to spend my time implementing an entire windowing system because the native (or just common) toolkits have too much latency, and it's just stupid for me to try to compete with the greatest synthesizers out there.  So, I offloaded absolutely everything that's not in my area of expertise or interest.  The synthesizer behind it?

Here is a ChucK server that implements a 10 voice OSC synth with a timbre on the y axis (implemented a few minutes after the video above).  It's just sending tuples of (voice, amplitude, frequency, timbre):

//chucksrv.ck
//run like: chuck chucksrv.ck
"/rjf,ifff" => string oscAddress;
1973 => int oscPort;
10 => int vMax;
JCRev reverb;
SawOsc voices[vMax];
SawOsc voicesHi[vMax];
for( 0 => int vIdx; vIdx < vMax; vIdx++) {
  voices[vIdx] => reverb;
  voicesHi[vIdx] => reverb;
  0 => voices[vIdx].gain;
  440 => voices[vIdx].freq;
  0 => voicesHi[vIdx].gain;
  880 => voicesHi[vIdx].freq;
}
0.6 => reverb.mix;
reverb => dac;
OscRecv recv;
oscPort => recv.port;
recv.listen();
recv.event( oscAddress ) @=> OscEvent oe;
while( oe => now ) {
  if( oe.nextMsg() ) {
    oe.getInt() => int voice;
    oe.getFloat() => float gain;   0.5 * oe.getFloat() => float freq => voices[voice].freq;
    2 * freq => voicesHi[voice].freq;
    oe.getFloat() => float timbre;
    timbre * 0.025125 * gain => voices[voice].gain;
    (1 - timbre) * 0.0525 * gain => voicesHi[voice].gain;
   
    //<<< "voice=", voice, ",vol=", gain, ",freq=", freq, ",timbre", timbre >>>;
  }
  //me.yield();
}
while( true ) {
  1::second => now;
  me.yield();
}

The two main things about ChucK you need to decipher it are that assignment is syntactically backwards, "rhs => type lhs" rather than the traditional C "type lhs = rhs"; where the "@=>" operator is just assignment by reference.  The other main thing is the special variable "now".  Usually "now" is a read-only value.  But in ChucK, you setup a graph of oscillators and advance time forward explicitly (ie: move forward by 130ms, or move time forward until an event comes in).  So, in this server, I just made an array of oscillators such that incoming messages will use one per finger.  When the messages come in, they select the voice and set volume, frequency, and timbre.  It really is that simple.  Here is a client that generates a random orchestra that sounds like microtonal violinists going kind of crazy (almost all of the code is orthogonal to the task of simply understanding what it does; as the checks against random variables just create reasonable distributions for jumping around by fifths, octaves, and along scales):

//chuckcli.ck
//run like:  chuck chuckcli.ck
"/rjf,ifff" => string oscAddress;
1973 => int oscPort;
10 => int vMax;
"127.0.0.1" => string oscHost;
OscSend xmit;
float freq[vMax];
float vol[vMax];
for( 0 => int vIdx; vIdx < vMax; vIdx++ ) {
  220 => freq[vIdx];
  0.0 => vol[vIdx];
}

[1.0, 9.0/8, 6.0/5, 4.0/3, 3.0/2] @=> float baseNotes[];
float baseShift[vMax];
int noteIndex[vMax];
for( 0 => int vIdx; vIdx < vMax; vIdx++ ) {
  1.0 => baseShift[vIdx];
  0 => noteIndex[vIdx];
}
xmit.setHost(oscHost,oscPort);
while( true )
{
  Std.rand2(0,vMax-1) => int voice;
  //(((Std.rand2(0,255) / 256.0)*1.0-0.5)*0.1*freq[voice] + freq[voice]) => freq[voice];
  ((1.0+((Std.rand2(0,255) / 256.0)*1.0-0.5)*0.0025)*baseShift[voice]) => baseShift[voice];
  //Maybe follow leader
  if(Std.rand2(0,256) < 1) {
    0 => noteIndex[1];
    noteIndex[1] => noteIndex[voice];
    baseShift[1] => baseShift[voice];
  }
  if(Std.rand2(0,256) < 1) {
    0 => noteIndex[0];
    noteIndex[0] => noteIndex[voice];
    baseShift[0] => baseShift[voice];
  }
  //Stay in range
  if(vol[voice] < 0) {
    0 => vol[voice];
  }
  if(vol[voice] > 1) {
    1 => vol[voice];
  }
  if(baseShift[voice] < 1) {
    baseShift[voice] * 2.0 => baseShift[voice];
  }
  if(baseShift[voice] > 32) {
    baseShift[voice] * 0.5 => baseShift[voice];
  }
  //Maybe silent
  if(Std.rand2(0,64) < 1) {
    0 => vol[voice];
  }
  if(Std.rand2(0,3) < 2) {
    0.01 +=> vol[voice];
  }
  if(Std.rand2(0,1) < 1) {
    0.005 -=> vol[voice];
  }
  //Octave jumps
  if(Std.rand2(0,4) < 1) {
    baseShift[voice] * 2.0 => baseShift[voice];
  }
  if(Std.rand2(0,4) < 1) {
    baseShift[voice] * 0.5 => baseShift[voice];
  }
  //Fifth jumps
  if(Std.rand2(0,256) < 1) {
    baseShift[voice] * 3.0/2 => baseShift[voice];
  }
  if(Std.rand2(0,256) < 1) {
    baseShift[voice] * 2.0/3 => baseShift[voice];
  }
  //Walk scale
  if(Std.rand2(0,8) < 1) {
    0 => noteIndex[voice];
  }
  if(Std.rand2(0,16) < 1) {
    (noteIndex[voice] + 1) % 5 => noteIndex[voice];
  }
  if(Std.rand2(0,16) < 1) {
    (noteIndex[voice] - 1 + vMax) % 5 => noteIndex[voice];
  }
  //Make freq
  27.5 * baseShift[voice] * baseNotes[noteIndex[voice]] => freq[voice];
  xmit.startMsg(oscAddress);
  xmit.addInt(voice);
  xmit.addFloat(vol[voice]);
  xmit.addFloat(freq[voice]);
  xmit.addFloat(1.0);
  35::ms +=> now;
  <<< "voice=", voice, ",vol=", vol[voice], ",freq=", freq[voice] >>>;
}

What is important is the xmit code.  So, when I went to implement OSC manually in my windows instrument, I had to work out a few nits in the spec to get things to work.  The main thing is that OSC messages are just as simple as can be imagined (although a bit inefficient compared to MIDI).  The first thing to know is that all elements of OSC messages must be padded to be multiples of 4 bytes.  In combination with Writer stream APIs that don't null terminate for you, you just need to be careful to pad so that there is at least a null terminator with up to 3 extra useless null bytes to pad the data.  So, OSC is like a remote procedure call mechanism where the function name is an URL in ASCII, followed by a function signature in ASCII, followed by the binary data (big-endian 32bit ints and floats, etc).

"/foo"   //4 bytes
\0 \0 0 \0   //this means we need 4 null terminator bytes
",if"          //method signatures start with comma, followed by i for int, f for float (32bit bigendian)
\0              //there were 3 bytes in the string, so 1 null terminator makes it 4 byte boundary
[1234]    //literally, a 4 byte 32-bit big endian int, as the signature stated
[2.3434] //literally, a 4 byte 32-bit big endian float, as signature stated 

There is no other messaging ceremony required.  The set of methods defined is up to the client and server to agree on.

Note that the method signature and null terminators tell the parser exactly how many bytes to expect.  Note also, that the major synths generally use UDP(!!!).  So, you have to write out things as if messages are randomly dropped (they are.  they will be.).  For instance, you will get a stuck note if you only sent volume zero once to turn off voice, or would have leaks if you expected the other end to reliably follow all of the messages.  So, when you design your messages in OSC, you should make heartbeats double as mechanisms to explicitly zero out notes that are assumed to be dead (ie: infrequently send 'note off' to all voices to cover for packet losses).  If you think about it, this means that even though OSC is agnostic about the transport, in practice you will need to at least design the protocol as if UDP is the transport.

OSC Ambiguity

So, the protocol defines little more than a verbose RPC syntax, where the names look like little file paths (where parents are the scope and bottom most file in the directory is the method name to invoke).  You can make a dead simple instrument that only sends tuples to manipulate the basic voice spline changes (voiceid, volume, frequency, timbre).  It will work, and will be a million times easier than trying to do this in MIDI.  Everything, including units is up to you (literal hz frequencies? floating point 'midi numbers' which are log frequencies?, etc.).  That's where the madness comes in.

If you use OSC, you must be prepared to ship with a recommendation for a freeware synth (ie: ChucK or SuperCollider), instructions on exactly how to setup and run them, and an actual protocol specification for your synth (because some synth you don't have doesn't know your custom protocol).  This is the biggest stumbling block to shipping with OSC over MIDI.  But I have finally had enough of trying to make MIDI work for these kinds of instruments.  So, here is an OSC version.  It really is a custom protocol.  The OSC "specification" really just defines a syntax (like Json, etc).  Implementing it means nothing for compatibility, as it's just one layer of compatibility.  But if you plan on using SuperCollider or ChucK for the synth, it's a pretty good choice.  You can scale OSC implementations down to implement exactly and only what you need.

Monday, November 26, 2012

Thought Experiment: Forget MIDI and OSC

 

Thought Experiment: Forget MIDI and OSC

MIDI

Trying to wrangle MIDI into handling polyphonic bending is so difficult that almost no synths get it right.  If you disagree with this statement, then you are surely spending all of your time on discrete-key instruments; a small subset of the kinds of instruments you can make on a touch screen.  If you are using MIDI for something that is polyphonic and fretless, then you will notice that very few synths can do this correctly.  Most of the capability is there if the synthesizers were to simply behave like multi-timbral synths, even when there is one voice assigned to all channels; at that point, it's an issue that setting up bend width is problematic from the point of view of ubiquity.  But MIDI's biggest problem is that it thoroughly vexes end users when there is a requirement to span across all MIDI channels to make a single instrument.  I do nothing but deal with support email for the 90% of synths that don't handle much of anything outside of piano controllers correctly; even if I document in detail what is supposed to work and what you should not try to make work (ie: Arctic vs Animoog, etc.).  MIDI's biggest problem is that the first problem that any music protocol should solve is to simply express pitch and volume correctly.  Maddeningly, MIDI just can't do this because it's very much note and discrete-key oriented.  MIDI is strangling progress on touch screens just as much as it helps progress.  Music notes do not belong in protocols, as they are a premature rounding off of the data.  We must deal with frequencies directly, or everything turns into a mess when handling real-world scenarios from touchscreens.

http://rrr00bb.blogspot.com/2012/04/ideal-midi-compatible-protocol.html

Read that link if you want to know what an unholy mess MIDI can be when you try to do something as simple as get correct pitch handling; where the situation is untenable when going to microtonal and fretless MIDI.

OSC

OSC on the other hand could readily solve all of my technical issues because it can easily represent what I am trying to do.  The problem is that almost nothing implements it.  And where it is implemented, only the syntax is standardized.  It's very much like the situation where you open a *.xml file, and haven't got the faintest clue as to what its semantics are, or what program is supposed to consume and create this format.  Even worse, most "OSC" implementations transport over MIDI.  This is like tunneling floating point numbers over integers; doing things conceptually backwards.  It's a lot of useless indirection that simply guarantees that nobody implements anything correctly.

The Simplest Possible Protocol That Can Possibly Work


So, what happens if I just forget about all standard protocols, and design the simplest thing that could possibly work?  I am convinced that it would be easier to get that standardized than it would be to subset the complex protocols we already have.  Observe the following:
  • The music instrument industry currently has almost nothing to do with the new music instruments now.  The music instrument industry is mostly just chugging along in the same direction it has been going in, using tablet devices for little more than weak signal processing brains, or re-configurable knob/slider surfaces.  Everything they announce is just another piano-controller with a bunch of knobs/sliders, and a brain.  It isn't say... guitars that can do what all synths and all guitars can do (correctly!  MIDI can't do the basic pitch handling.)  It isn't say... violins... either.  It isn't microphones that can do the opposite of auto-tune and take a human voice and play instrument patches at the exact frequencies sung into the mic (even if none of the notes are close to the standard 12 notes).  MIDI is the root cause, because the protocol forces you into a discrete-note oriented mindset.  It's a mistake to think that the music instrument industry is really relevant here; we need to do what is best for tablet apps first.
  • Almost everybody using tablets is reluctant to deal with MIDI cables or Wireless connections anyhow.  The reasons vary from latency concerns, to setup complexity, to a kludgy CameraConnectionKit way of getting connected.  We are standardizing on MIDI only because that was an easily available low-latency pipe.  It's weird that you need to use MIDI protocol just to use the pipe.
  • Since the tablet is both the controller and the synthesizer, there is no reason to let obese hardware oriented specifications slow us down.  Presuming that you needed to fix something, you would get a result an order of magnitude faster if you simply get things working and publish the protocol and wait for the hardware vendors to follow the lead of a popular app that implements it, than to get the MIDI or OSC groups to make a necessary change for you.
So the main thing I need (what kills me about MIDI) is stupidly simple.  I just need to control independent voice splines, with continuous updates.  There is no need for a complex protocol to do this.  I write my own so that it's easy enough that I can describe it to any developer.  So 90% of it looks like this:

//When controller sends messages to the synth, the most basic method just
//sets voice properties.
//
//64 bits, all voice properties are set separately
struct SetProperty {
  int16 voice;  //there are no notes, just voices with multiple properties
  int16 property; //common stuff: phase,amplitude,pitch,timbre[i],etc.
  float32 val;
}

//Define standard mappings that everything is going to understand.
#define PHASE_property 0 //val==0 means sample begin, 0..1 is the period corresponding to freq
#define LOGFREQ_property 1 //val==33.5 would be pitch of 'midi' note's pitch 33.5
#define AMPLITUDE_property 3 //val==1.0 would be full amplitude for wave
#define TIMBRE_0_property 16 //first timbre control - assume it was mapped
#define TIMBRE_1_property 17 //next timbre control - assume it was mapped

This would handle everything that Geo Synthesizer and Cantor can do.  This is enough to handle polyphonic instruments that may be samplers or ADSR synths, because we explicitly set the phase.  This is because when a finger goes down, it maps to a voice (NOT a note!).  That voice will have its phase set to zero (SetProperty message 1), then its frequency set (next message), its timbres (next messages), then amplitudes (next message) set.  Then as the note is held, the pitch can change as the finger moves around or when amplitude must be changed.  Just send new SetProperty values to do aftertouch affects.  This is totally unlike MIDI, which treats aftertouch as special cases.

Note that timer stuff is not in the protocol.  That's because we presume to send the message at the time we want it interpreted.  Having timestamps in the protocol only adds latency or helps when things are in big batches (a different and more complex protocol that we should stay away from for now).

Negotiation

Building in a simple negotiation from the beginning helps to ensure that synth and controller are never sending unintelligible data to each other.  MIDI handles this situation very badly, where you end up having to enumerate synths in an ever growing list (assumption of central control).  As an example, presume that the controller and synth simply exchange lists of properties that they send and understand.  We re-use SetProperty, but use some well known values to note that we are simply exchanging agreement:

#define IUNDERSTAND_voice -1
#define ISEND_voice -2
//ex:  controller sends 64 bit SetProperty messages:
//  (-2,0,0.0),(-2,1,0.0),(-2,3,0.0),(-2,16,0.0),(-2,17,0.0)
//       which means "I send: phase,logfreq,amplitude,timbre0,timbre1"
//if we don't send any "I understand" messages, then the controller knows that this is
//a one-way conversation, and will not try to send messages to the controller.
//
//if we get "I understand" messages from the synth, then we must not send messages
//not in its list.
//

The whole point of this process is to simply recognize that rather than announcing a vendor or product id (which the other side may have never heard of), we announce what we expect to handle gracefully instead. 

Proxying Controls

The other thing besides controlling voices that we would need to do in a instrument controller is to have some mechanism to proxy knobs/sliders, etc of the synths in the background.  This is really important on iOS because we can't have the kind of setup mess that a typical hardware synthesizer user would deal with.  Because we have a negotiation mechanism, we can safely include any baroque sub-protocols that we need.  Presume that we have a message to say to start a blob of bytes.  We can use this to send a string.  The synth would like to name the controls (rather than moving them around...the controller uses fixed locations, and the synth may want to rename them):

#define BLOB_voice -3 // (-3,24,0.0) means to expect 24 bytes out of the stream (strings)
#define RENAME_voice -4 //(-4, 16, 0.0) means to expect a blob and use it to rename timbre_0

ex: Rename timbre_0 to 'Filter' (6 byte name):  (-4,16,0.0) (-3,6,0.0) 'Filter'

A synthesizer may want to expose the options to the controller.  So it would need to send a list of mappings to properties to strings, and a mechanism to remap standard properties to synth-specific ones.   Say that the synth set messages like:

#define IHAVE_voice -5 //(-5, 100,0.0) (-3,24,0.0) 'Filter'
#define REMAPFROM_voice -6 //(-6,16,0.0)(-7,100,0.0)
#define REMAPTO_voice -7

ex Remap standard timbre_0 to synth's 'Filter' at 100:
  (-6,16,0.0)(-7,100,0.0)

Where we knew to use 100, because earlier we got a IHAVE that gave us the name of the control that we showed to the user.  Once remapped, this;

  (3,16,0.5)  //voice three's timbre_0 set to 0.5

Has the same effect on the synth as:

  (3,100,0.5) //which is the same as 'Filter' set to 0.5