Popular Posts

Monday, November 26, 2012

Thought Experiment: Forget MIDI and OSC


Thought Experiment: Forget MIDI and OSC


Trying to wrangle MIDI into handling polyphonic bending is so difficult that almost no synths get it right.  If you disagree with this statement, then you are surely spending all of your time on discrete-key instruments; a small subset of the kinds of instruments you can make on a touch screen.  If you are using MIDI for something that is polyphonic and fretless, then you will notice that very few synths can do this correctly.  Most of the capability is there if the synthesizers were to simply behave like multi-timbral synths, even when there is one voice assigned to all channels; at that point, it's an issue that setting up bend width is problematic from the point of view of ubiquity.  But MIDI's biggest problem is that it thoroughly vexes end users when there is a requirement to span across all MIDI channels to make a single instrument.  I do nothing but deal with support email for the 90% of synths that don't handle much of anything outside of piano controllers correctly; even if I document in detail what is supposed to work and what you should not try to make work (ie: Arctic vs Animoog, etc.).  MIDI's biggest problem is that the first problem that any music protocol should solve is to simply express pitch and volume correctly.  Maddeningly, MIDI just can't do this because it's very much note and discrete-key oriented.  MIDI is strangling progress on touch screens just as much as it helps progress.  Music notes do not belong in protocols, as they are a premature rounding off of the data.  We must deal with frequencies directly, or everything turns into a mess when handling real-world scenarios from touchscreens.


Read that link if you want to know what an unholy mess MIDI can be when you try to do something as simple as get correct pitch handling; where the situation is untenable when going to microtonal and fretless MIDI.


OSC on the other hand could readily solve all of my technical issues because it can easily represent what I am trying to do.  The problem is that almost nothing implements it.  And where it is implemented, only the syntax is standardized.  It's very much like the situation where you open a *.xml file, and haven't got the faintest clue as to what its semantics are, or what program is supposed to consume and create this format.  Even worse, most "OSC" implementations transport over MIDI.  This is like tunneling floating point numbers over integers; doing things conceptually backwards.  It's a lot of useless indirection that simply guarantees that nobody implements anything correctly.

The Simplest Possible Protocol That Can Possibly Work

So, what happens if I just forget about all standard protocols, and design the simplest thing that could possibly work?  I am convinced that it would be easier to get that standardized than it would be to subset the complex protocols we already have.  Observe the following:
  • The music instrument industry currently has almost nothing to do with the new music instruments now.  The music instrument industry is mostly just chugging along in the same direction it has been going in, using tablet devices for little more than weak signal processing brains, or re-configurable knob/slider surfaces.  Everything they announce is just another piano-controller with a bunch of knobs/sliders, and a brain.  It isn't say... guitars that can do what all synths and all guitars can do (correctly!  MIDI can't do the basic pitch handling.)  It isn't say... violins... either.  It isn't microphones that can do the opposite of auto-tune and take a human voice and play instrument patches at the exact frequencies sung into the mic (even if none of the notes are close to the standard 12 notes).  MIDI is the root cause, because the protocol forces you into a discrete-note oriented mindset.  It's a mistake to think that the music instrument industry is really relevant here; we need to do what is best for tablet apps first.
  • Almost everybody using tablets is reluctant to deal with MIDI cables or Wireless connections anyhow.  The reasons vary from latency concerns, to setup complexity, to a kludgy CameraConnectionKit way of getting connected.  We are standardizing on MIDI only because that was an easily available low-latency pipe.  It's weird that you need to use MIDI protocol just to use the pipe.
  • Since the tablet is both the controller and the synthesizer, there is no reason to let obese hardware oriented specifications slow us down.  Presuming that you needed to fix something, you would get a result an order of magnitude faster if you simply get things working and publish the protocol and wait for the hardware vendors to follow the lead of a popular app that implements it, than to get the MIDI or OSC groups to make a necessary change for you.
So the main thing I need (what kills me about MIDI) is stupidly simple.  I just need to control independent voice splines, with continuous updates.  There is no need for a complex protocol to do this.  I write my own so that it's easy enough that I can describe it to any developer.  So 90% of it looks like this:

//When controller sends messages to the synth, the most basic method just
//sets voice properties.
//64 bits, all voice properties are set separately
struct SetProperty {
  int16 voice;  //there are no notes, just voices with multiple properties
  int16 property; //common stuff: phase,amplitude,pitch,timbre[i],etc.
  float32 val;

//Define standard mappings that everything is going to understand.
#define PHASE_property 0 //val==0 means sample begin, 0..1 is the period corresponding to freq
#define LOGFREQ_property 1 //val==33.5 would be pitch of 'midi' note's pitch 33.5
#define AMPLITUDE_property 3 //val==1.0 would be full amplitude for wave
#define TIMBRE_0_property 16 //first timbre control - assume it was mapped
#define TIMBRE_1_property 17 //next timbre control - assume it was mapped

This would handle everything that Geo Synthesizer and Cantor can do.  This is enough to handle polyphonic instruments that may be samplers or ADSR synths, because we explicitly set the phase.  This is because when a finger goes down, it maps to a voice (NOT a note!).  That voice will have its phase set to zero (SetProperty message 1), then its frequency set (next message), its timbres (next messages), then amplitudes (next message) set.  Then as the note is held, the pitch can change as the finger moves around or when amplitude must be changed.  Just send new SetProperty values to do aftertouch affects.  This is totally unlike MIDI, which treats aftertouch as special cases.

Note that timer stuff is not in the protocol.  That's because we presume to send the message at the time we want it interpreted.  Having timestamps in the protocol only adds latency or helps when things are in big batches (a different and more complex protocol that we should stay away from for now).


Building in a simple negotiation from the beginning helps to ensure that synth and controller are never sending unintelligible data to each other.  MIDI handles this situation very badly, where you end up having to enumerate synths in an ever growing list (assumption of central control).  As an example, presume that the controller and synth simply exchange lists of properties that they send and understand.  We re-use SetProperty, but use some well known values to note that we are simply exchanging agreement:

#define IUNDERSTAND_voice -1
#define ISEND_voice -2
//ex:  controller sends 64 bit SetProperty messages:
//  (-2,0,0.0),(-2,1,0.0),(-2,3,0.0),(-2,16,0.0),(-2,17,0.0)
//       which means "I send: phase,logfreq,amplitude,timbre0,timbre1"
//if we don't send any "I understand" messages, then the controller knows that this is
//a one-way conversation, and will not try to send messages to the controller.
//if we get "I understand" messages from the synth, then we must not send messages
//not in its list.

The whole point of this process is to simply recognize that rather than announcing a vendor or product id (which the other side may have never heard of), we announce what we expect to handle gracefully instead. 

Proxying Controls

The other thing besides controlling voices that we would need to do in a instrument controller is to have some mechanism to proxy knobs/sliders, etc of the synths in the background.  This is really important on iOS because we can't have the kind of setup mess that a typical hardware synthesizer user would deal with.  Because we have a negotiation mechanism, we can safely include any baroque sub-protocols that we need.  Presume that we have a message to say to start a blob of bytes.  We can use this to send a string.  The synth would like to name the controls (rather than moving them around...the controller uses fixed locations, and the synth may want to rename them):

#define BLOB_voice -3 // (-3,24,0.0) means to expect 24 bytes out of the stream (strings)
#define RENAME_voice -4 //(-4, 16, 0.0) means to expect a blob and use it to rename timbre_0

ex: Rename timbre_0 to 'Filter' (6 byte name):  (-4,16,0.0) (-3,6,0.0) 'Filter'

A synthesizer may want to expose the options to the controller.  So it would need to send a list of mappings to properties to strings, and a mechanism to remap standard properties to synth-specific ones.   Say that the synth set messages like:

#define IHAVE_voice -5 //(-5, 100,0.0) (-3,24,0.0) 'Filter'
#define REMAPFROM_voice -6 //(-6,16,0.0)(-7,100,0.0)
#define REMAPTO_voice -7

ex Remap standard timbre_0 to synth's 'Filter' at 100:

Where we knew to use 100, because earlier we got a IHAVE that gave us the name of the control that we showed to the user.  Once remapped, this;

  (3,16,0.5)  //voice three's timbre_0 set to 0.5

Has the same effect on the synth as:

  (3,100,0.5) //which is the same as 'Filter' set to 0.5