Popular Posts

Monday, April 23, 2012

An Ideal MIDI-Compatible Protocol

A Torture Test For Correct Pitch Handling

The MIDI protocol is designed as a keyboard-extender. The actual pitches emitted aren't strictly specified, as it is supposed to faithfully capture the gestures coming out of a hardware keyboard. Under this model, instruments are a set of discrete keys with an initial velocity, and one pitch wheel per instrument. This means that any instrument that wants to send MIDI should fundamentally *be* a set of discrete keys and 1 pitch wheel per instrument.

But any instrument that fits that description can't do correct pitch handling, meaning that it is able to correctly track simultaneous timelines of (pitch,volume,phase). In other words, you can't model a fretless instrument with it. We can put together multiple channels to come up with the best approximation that is backwards compatible with MIDI though. But fair warning, the client generating fretless MIDI is VERY complicated at the level of MIDI bytes. It should be done through a common API that is used by many people. Fortunately, it doesn't make the synth (the server) any more complicated. In fact, it simplifies a lot of things in the server by allowing things like explicitly enumerated modes and tuning tables to just be gotten rid of.

If you think of the microtonal cases as something bizarre that won't be used in practice, then ignore these extreme cases. But realize that you can't properly model any string instrument unless you can model fretlessness correctly; if just for the reason that guitars can and do emit simultaneous pitches that are not exactly some number of semitones apart when bending notes and chording at the same time.

Touch Screens And Fretlessness

Background (pure MIDI engine vs ThumbJam with MIDI chorusing):

When using touch screens, fretlessness and per-finger bending is readily available to us. Everything is continuous in reality, and the notion of discrete keys being turned on and off is a major annoyance. In this space, we think more of a finger that goes down and can move to any volume or pitch, expression value, over its lifetime. Especially when trying to model any kind of string instrument, this sort of polyphonic fretlessness is a basic requirement to be able to faithfully model realistic behavior. In any case, touchscreen instruments can trivially create this situation and MIDI is quite terrible at modelling it.

  1. Every finger down can have its own completely arbitrary pitch
  2. Any pitch can bend to any other pitch over its lifetime, including from MIDI 0 to MIDI 127
  3. All pitch bending, no matter how wide happens at the fullest pitch resolution
  4. A note can attack at any pitch, and therefore needs its own pitch wheel
  5. A note can release at any pitch, and therefore needs its pitch wheel to not be disturbed until release completes
  6. Due to 5, a note turned off is still reserved for some time after the note has been turned off.
  7. Due to 1, every note is in its own channel, of which there are only 16.
  8. Due to 7, we are limited to 16 polyphony, including releasing notes that are actually turned off already.
  9. Because of 4, 5 and 8, there is a speed limit that we can play without pitch problems. We will have to simply steal the channel of the note that has been dead the longest, and hope that the releasing tail of the note is inaudible as we alter its pitch wheel for the new note.
  10. Because of 2, we can bend beyond the maximum pitch wheel depth. If we do so, we have no choice but to stop the note (at its current arbitrary pitch - and therefore release at this pitch) and restart it (on a new channel), where we can place a note-tie for synths to understand that these are not two separate notes; that the note is being *migrated* to a new name and a new channel.
  11. The effect of a note tie is to move the state of the note to a new channel and note, skipping the attack phase of the note. This means that the note tie can be used to legato notes together. A straight up note-tie in support of a wide bend moves a note from one note/channel to another with 0 cents difference in pitch, where legatoing two notes does the same thing with some large number of cents different in the transition.
  12. Because of 3, we can't simply support absurd pitch bend sizes like +/- 48 semitones. We need bending to be at full resolution.
* My Item 12 is debateable.  Conversations with the maintainer of the MIDI spec suggest that very large bends are not a problem in themselves, though many existing synths have a maxiumum of 12 or 24 semitones if they allow bend width to change at all.  As an example, use MIDI note 64 as a starting point with +/- 60 semitones gives a span of 120 of the MIDI notes, and the 1 cent resolution minimum resolution that microtonalists talk about (8192 steps up and down).  You can set to a non-octave number of semitones to do +/-64 to ensure that you can hit midi notes 0 and 127 as well.  So the idea of simply ensuring that very large bends are supported is a good one.  There is still the issue of sending 1 MIDI stream to multiple devices and having to live with the minimum supported bend, or possibly setting to 1 semitone bends (half the default) so that pianos simply render chromatic passages.  Note ties do more than handle an exception where bend width is exceeded.  You may want to use them for other reasons; but the complexity may not be worth it if you have very wide bends.

Pitches To MIDI And Backwards Compatibility
Warning: Take the source code here as what I really mean, as it has been pretty well vetted to work in practice, and should have fewer mistakes than this document, which is being written off the top of my head:


As long as there is no definite pitch assigned to MIDI notes, the setup will be too involved for this kind of instrument. Ultimately, the notion of setting bend width and how many channels to span over has to go away. Under iOS, every instrument can supply its own virtual MIDI device anyway (with 16 channels to itself).
  • c0 is the lowest note in MIDI, with note value zero.
  • MIDI Note n is defined as : mNote(n:float):float = c0 * 2^(n/12)
  • If midi note 33 is 440hz, then: 440hz = c0 * 2^(33/12)
  • Integer values of n cause mNote(n) to return the pitches of chromatic notes
  • bend 8192 is the center value representing zero cents adjustment
  • semis is the number of semitones up or down. it is 2 by default.
static void Fretless_fnoteToNoteBendPair( 
    struct Fretless_context* ctxp, 
    float fnote,
    int* notep,
    int* bendp)
    //Find the closest 12ET note
    *notep = (int)(fnote+0.5);
    //Compute the bend in terms of -1.0 to 1.0 range
    float floatBend = (fnote - *notep);
    *bendp = (BENDCENTER + floatBend*BENDCENTER/ctxp->channelBendSemis);

This example code converts a floating point "MIDI" note into a integer note value and integer bend value.

MultiTimbral synths that can set bend width

We include channelBendSemis as an input to gracefully handle synths that don't understand note ties. If note ties are not understood, then at least we can set the bend width high to minimize breaks in the note bending (because it's harder to exceed the bend width). To get this level of backwards compatibility, it is generally sufficient to set a number of channels to the exact same patch that matches maximum required polyphony (ie: channels 1,2,3,4) and set bend width to +/-12 semitones.

Mixing fretless synths with pianos

The other issue is that of playing a fretless voice with piano in the background. Either two completely different MIDI streams would need to be sent to each device (ie: fretless MIDI for the violin, and a chromatic rendition that inserts every chromatic note as notes bend for the piano), or the bend width should be set to +/-1 semitone and have only the violin respect bends and note ties (and channeling for that matter).

The Biggest Weakness Of MIDI

Converting actual pitches to MIDI

When dealing with a device that takes a polyphonic signal (or set of mono signals) and converts it to MIDI, there is the distinct possibility that the original audio is to be mixed in with the MIDI. In my opinion, this is the biggest weakness of not taking on a frequency orientation for MIDI. The output of the speakers will resonate with a guitar-body that's generating MIDI and feed back in...think about the consequences of that! If a guitarist tunes a quartertone flat to match some un-tuneable instrument in the ensemble and runs through a device to generate MIDI that matches the guitar pitches, then there is no reason for any of the MIDI pitches to come out out of tune with the real audio under any circumstances. There is no good reason to make him do some kind of setup to make that happen either.

In the same way, an acapella singer that is some number of cents off with standard tuning (perhaps due to a resonance in the environment or singing in a Just Intonation), then the MIDI should LEAVE IT ALONE! and render bytes that will create those exact pitches on any standard MIDI device - because the original pitch isn't wrong. Auto-tuning isn't up to the protocol, and it isn't up to the synth. You can write a controller that rounds off pitches or sends a re-tuned variant of the original audio if that's what you want.

And it can still be backwards compatible with existing MIDI if it can cycle over channels and tie notes together.

How Note Ties Work

Note ties are something that I added to Geo Synth and AlephOne in an attempt to fix this problem. Because we need to be backwards compatible without creating wrong pitches or stuck notes, we have to be valid MIDI for everything which synths already know, and allow synths that don't understand note ties to ignore them without major problems in the sound.

#Turn on a note with a 0% bend and turn it off
0xE1 0x40 0x00 #bend to center position
0x91 0x21 0x7F #turn note on
0xE1 0x7F 0x7F #bend +100%
0x91 0x21 0x00 #turn note off

If we want to continue the bend, we need a new note. Because any note needs to be able to attack and release at any pitch, then it must be on its own channel when it does so, so there is a channel transition as would be in any other case:

#Turn on a note with a 0% bend and turn it off
0xE1 0x40 0x00 #bend to center position
0x91 0x21 0x7F #turn note on
0xE1 0x7F 0x7F #bend +100%
0x91 0x21 0x00 #turn note off
0xE2 0x40 0x00 #same pitch as note releasing on channel 1 (notice its different bend)
0x92 0x23 0x7F #continue note at new channel

So, any multi-timbral synth will recognize this bend in some way. If there is no audible attack-phase for the patch, it seems that we are done; it already sounds correct. But if there is an audibly different attack phase, then we can hear the note break before it transitions to the new value (what usually happens). So we need to put an NRPN into the stream to warn the MIDI device to expect a note off that is actually a legato to the next note; on a different channel even!. It basically just sends the number "1223" as an NRPN, to warn the MIDI engine about what is coming. It is this code here:

void Fretless_noteTie( 
    struct Fretless_context* ctxp,
    struct Fretless_fingerState* fsPtr)
    int lsb;
    int msb;
    int channel = fsPtr->channel;
    int note = fsPtr->note;
    //Coarse parm
    ctxp->midiPutch(0xB0 + channel);
    //Fine parm
    ctxp->midiPutch(0xB0 + channel);
    //Val parm
    ctxp->midiPutch(0xB0 + channel);
    ///* I am told that the reset is bad for some synths
    ctxp->midiPutch(0xB0 + channel);
    ctxp->midiPutch(0xB0 + channel);

When this sequence is seen, then the sound engine will simply *remember* what note is being turned off without actually doing it. Then when the next note-on is given, it transfer the current phase, pitch, and volume to the new note; and the pitch will have to legato over to its new pitch and volume values as fast as it can.

#Turn on a note with a 0% bend and turn it off
# we have no idea what the future is once we turn note on...
0xE1 0x40 0x00 #bend to center position
0x91 0x21 0x7F #turn note on
0xE1 0x7F 0x7F #bend +100% #a surprise event when the finger bent up really high
oxB2 0x62 msb(1223) #Note tie warning
0xB2 0x63 lsb(1223) #Note tie warning
oxB2 0x06 0x21
0x91 0x21 0x00 #turn note off
0xE2 0x40 0x00 #same pitch as note releasing on channel 1 (notice its different bend)
0x92 0x23 0x7F #continue note at new channel

The Lifecycle Of A Finger

I hide the actual channels and notes of MIDI behind an API for many reasons. The most important of which is that a pair is not a good 'primary key' for a finger. If you bend a note past its maximum bend width, then you have to rewrite note for certain, and also channel because of my choice to hop channels even on note-off because of release time. So, an abstraction over MIDI is more like this:

beginOn finger0
express finger0 11 127
express finger0 42 127
endOn finger0 pitch vol polyGroup0
move finger0 pitch vol
express finger0 43 127
move finger0 pitch vol
off finger0

On a touchscreen, the only stable identifiers over the course of a gesture are finger numbers. The actual note and channel are a MIDI concept that we hide.

When we intend to turn a note on, we just do enough work to allocate a MIDI channel for it. Once we have a MIDI channel, and know the starting values for the various per-channel CC expression parameters we can send them - BEFORE the note is turned on. Then when we have the pitch and are about to turn the note on we send the bend value that will be required to make the right pitch - again, BEFORE the note is turned on. The channel pressure's initial value is also sent before the note turns on (in my case it's the same as volume). Then the note is finally turned on. As the note moves around in pitch and volume, it can easily hop around channels many times because of exceeding max bend width. Then finally the note is turned off by finger. The finger knows which channel and note was used. This design makes it very hard to have mistakes that create stuck notes.

And of course, part of that lifecycle not shown is that a previous finger could be down that it could legato to. In which case, the note off part of the lifecycle simply transfers note state onto the next finger.

Synth Engine's State Machine

The synth engine is actually really simple under this scenario. The synth doesn't have to know anything about the various polyphony modes or whether to legato a note, or to track much. There is very little state to track.

  • Since we don't have the value for the finger (it's lost as a (note,channel) combination), we keep track of each (note,channel) that is somewhere between turning on and still releasing. If you want to simplify the engine further, you can simply keep track of channel and insist that there is *never* more than one note per channel. It isn't realistic with other controllers feeding input, but will work for your own controller; and it will dramatically simplify the engine.
  • For each finger (tracked as (note,channel)), keep track of the note phase, pitch, and volume. Note that these are *exactly* the main things that we want to have arbitrary control of over the life of a voice.
  • There is one pitch wheel per channel, and CC values are remembered per channel.
  • If we get a warning that a note tie is about to happen, then the next note off determines the source and next note on determines the destination. Once we have source and destination, we transfer all note state from source to destination. This includes aftertouch, CC values, current phase, current volume, and current pitch. There should be no audible change in voice characteristics, as we have basically only renamed the note. (Another approach could be that a voice independent of channel could have been used and the channel is simply re-assigned to that voice.) The new pitch and volume implied in the note on is something that we begin to ramp towards. What is most important of all is that the phase of the note matches as it is moved to the new channel. Other characteristics are kept the same to prevent impulsing.
  • We interpret pitch bending according to our current bend setting.
  • Note on and off all behave as normal.
  • Because polyphony was done in the controller, we don't need any rules for solo versus poly mode. There is a notion of polyphony groups in the C API that generates the MIDI messages, but it shows up as nothing more than notes turning on and off (possibly tied together) at the synthesizer end. We don't try to take advantage of note overlaps; the controller already figured this part out for us.
  • Because legato was done in the controller, we simply play notes with the attack unless it is a note on that's part of a note-tie. Similarly, we don't try to use note overlaps; the controller already told us exactly what to do. We only legato if the note has been tied to another.
  • Using CC per channel works fine, because if each finger is in its own channel then it's also per-voice by coincidence.
The internal engine is very simple. In my implementation it's just one function with a few parameters, and very little state being maintained in the engine. What state there is exists entirely in a per-voice way. (I simplified my own engine by only handling one note per channel, and require polyphony to span channels.). So, the complexity created by fixing these pitch problems is only created in the controller. It actually simplifies the synth.

This is my internal engine, which is driven by the MIDI messaging.  I would have greatly preferred to ship a controller with no internal engine at all though.  The whole point of MIDI is co-existence.  I could free up all of the memory, get rid of all of the extra code, not get rated on the quality of the sound engine (which should be the job of a dedicated synth), and never get requests for sound tweaks.  The engine can literally add months of development to a controller that was long finished from the standpoint of MIDI.  The fact that every iOS app has its own internal engine in spite of MIDI support suggests that MIDI is too hard to setup and the results of connecting a controller to a synth are too unpredictable to leave it up to whatever combination the user picked.

* Note to self * - Maybe we can allow 'legato' without any tied note to let us start off in sustain phase of a note ramping up from zero volume without attack. This could happen to a string that starts to resonate without being plucked as happens with sympathetic strings. Perhaps a note could be started with the lowest possible volume (unfortunately, it can't be exactly zero due to the protocol treating vol 0 as note off!) or started off with a later than normal phase. In the same way that I am using MIDI to do manual chorusing in places, it would be useful to do sympathetic strings at the controller as well, rather than some sitar post-processing effect. But note also how easily we could easily get far beyond 16 voice polyphony like that.

A Tetrachord Example

A backwards compatible re-interpretation of existing MIDI messages can give us what we need. I will use the basic tetrachord (Ajam Bayati) as an example of how these problems arise, and how to model them as MIDI messages. This tetrachord is a note of a fundamental and fourth, minor third, and a note that falls in the middle between the root and the minor third. In this notation, we have notes D, E-quarterflat,F,G. But this notation is just an approximation to the real intonation. It is likely that this is the real intonation that will be played:

  • If D is taken to be the fundamental, then its pitch ratio is 1/1
  • G is a pitch ratio of 4/3 with respect to D
  • F may be the perfect minor third 6/5 with respect to D
  • E-quarterflat has a few plausible choices, 13/12 is a plausible one, with respect to D
When changing modes (ex: phrase moves up by 4/3 or 3/2), the whole tetrachord is played in these ratios relative to the root. So, the exact pitches move around based on root notes. Scales are a fiction that just don't exist in this system. Any attempt to remap the 12 notes to keys that turn on and off will result in some kind of failure to do what is required because notes continually move around to fit the context.

If we take the simplest approximation that D is midi note 38.0, E-quarterflat is 39.5, F is 41, and G is 43, then the note 39.5 must be on its own channel to work at all. But what's even worse is that it doesn't really matter what these notes start as. Their future lifetimes are unknown, and they can all bend off in independent directions. Every note must be on its own channel always. Thus, we have 16 polyphony in MIDI if we want fretlessness. This is a reasonable limitation for 1 instrument. We can have multiple MIDI devices for multiple instruments on iOS.

And, NO, tuning tables don't help here. Only fretlessness really does the job. It is easy to intermix straight 12ET playing with quartertones, and at some points to adjust pitches to Just Intonation (so that the quartertones don't cause horrible clashing, etc.). This always happens when there is an ensemble of fretless instruments in which the singer is doing Just Intonation but fretted instruments like pianos and bass guitars have been mixed in. The band often copes by relegating the 12ET guys to pentatonic lines, or adjusting their pitches to hit Just Intervals versus the 12ET notes that are technically out of tune. It is actually better to think of a series of approximations, where it takes about 53 notes per octave to accurately locate the various third and fifth related intervals, of which the 12 tone pitches are just approximations, etc. In the fretless world, people that know what they are doing will move these pitches around at will. You can't just remap keys because the number of possible keys will change. If you insist on frets, you still need a moveable fret system where the number of frets can be changed, and the whole fretboard can be re-fretted at run-time. As a result, MIDI should just stay out of the music theory mess and just play the required pitches. Scales and pitch snapping are the controller's job. MIDI interfering here is no different than trying to impose diatonic scales onto unknown musical input.

The pitch wheel positions

The image to the right is a graphical representation of the pitch wheels. Straight up is midi channel 1, marked by the gold triangle on the top. Going around the circle there is a gold triangle tick on channel 16 to show where the channel span ends. The blue triangles going in from the middle radius represent how far down the bend for that channel's pitch wheel is. The red triangles going out show a sharp pitch bend. The green rays sticking out show the channels that still have a note on in them. So, in this picture, we see six notes down total. The pitch wheels are all over the place at slightly different values. This is because in this picture, we are creating a chorusing effect by doubling each note as two MIDI notes slightly bent out of tune from each other. When we run our fingers up and down the screen, exceeding the whole tone limit (what the bend happens to be set to), a single note will hop around from channel to channel, leaving the pitch wheel in the position it was in when the note was turned off. We cycle channels clockwise looking for the least used channels (hopefully one with zero notes active) and pick a channel for our new note that way. As we play fretlessly, all of the channels get left in unusual positions as play goes on. If I intentionally played everything a about a quartertone high, most of these pitch wheels would be sticking out with a red triangle of 25% out towards the larger radius to make all notes a quartertone sharp. It would match exactly what my internal audio engine is producing.

The idea that a note starts with bend zero and is only bent by the user later is a piano-ism that injects an unwarranted music theory constraint into a protocol that's supposed to be setting the pitches that I asked for.


Because the MIDI protocol at this level is considered as a fretless protocol, any fretting rules are outside the scope of this layer of the API. However, because it's fretless you can simply round off pitches in the controller to get back to strict 12ET pitches, or more subtly handle rules for:


Because we are not constrained to moving from discrete pitches, and have complete pitch freedom, we want to dispense with the concept of portamento. Portamento is a discrete key concept, and produces wrong pitches. If you start from midi note 0 and increase towards midi note 127 at a rate of 1 midi note per second, you will fall on an integer pitch value once every second (on pitch and on time) and the transition between notes will be smooth. More importantly, the pitch will represent where your finger actually *is*. It won't lag behind as you move up 1 chromatic note per second while providing an arbitrary ramp that is not a perfectly smooth bend overall. This is really only doable in practice on a continuous surface instrument.

This is what I mean when I say that "Portamento is wrong". It's a discrete key concept that is not useful on string instruments. String instruments track where your finger actually *is*. In fact, at the controller there are three things to track with respect to pitch:

  • Where your finger actually is located (ie: pitch 33.001)
  • Where your pitch is tuning to (ie: pitch 33.0)
  • Where your pitch actually is, usually somewhere in between these two (ie: pitch 33.0005)
You need all of these pieces of information in the controller, because you send the last item, but use the first two items to render the user interface.


The concepts of when to play the attack part of a note is separate from other rules having to do with whether notes are turned on and off for mono/poly play. So legato only refers to when the attack portion of a note is played. In our API, we have a notion of polyphony groups , which are a sort of pseudo channel. The first note down in a polyphony group will definitely play the attack phase, and further notes down in this same polyphony group will do a legato continuation of the note. All notes in this poly group behave as a solo mode. Note that every note goes into its own MIDI channel, and poly groups specify how these notes get grouped if the note-tie is a recognized message.


Polyphony rules include the standard solo mode (mono) that would be expected, and the full polyphony mode (poly) that is also expected. But in the spirit of doing away with enumerating special cases (that MIDI does a lot of), every note can be put into one of 16 polyphony modes. If every note is put into a different polyphony group then we get "full polyphony". If they are all placed into the same group (ie: group 0), then it becomes solo mode.

If notes are grouped according to guitar strings then we have something in between. When multiple notes are chorded within the same poly group, they act like isolated solo modes. But since each string is its own group, then chording and hammer on/hammer off effects just happen as a side effect. By default, the legato rule is to have attack on first note and legato on others. This rule can be toggled off to attack on every note (thus legato and poly are not identical concepts), or a controller can have it set for every note (ie: via velocity, finger area, or some gesture to signify that the note should have an explicit "pick attack".) There is NO WAY to make an even moderately realistic string instrument rendition of MIDI notes without these poly and legato concepts. They are fundamental to functioning correctly.

For example:

//Put down one finger into poly group 0

beginOn finger0
express finger0 11 127
express finger0 42 127
endOn finger0 33 vol polyGroup0

//Put down another finger into poly group 0

beginOn finger1
express finger1 11 127
express finger1 22 127
endOn finger1 35 vol polyGroup0

//Put down another finger into a different poly group 1

beginOn finger2
express finger2 11 127
express finger2 33 127
endOn finger2 35 vol polyGroup1
move finger0 pitch vol
express finger0 43 127
move finger0 pitch vol

off finger1
off finger0
off finger2

We end up with chording and solo-mode-like tricks happening simultaneously.  finger0 and finger1 are in the same poly group.  So the original pitch 33 has a finger with pitch 35 buried over it.  So when finger 1 goes down, finger 0 is silent.  When finger 1 comes up, finger 0 turns back on again - without a pick attack(!).  This is like solo mode.  But after finger 1 went down, finger 2 went down.  Finger 2 stays on until the end.  So we have soloing and chording at the same time.  This always happens on guitar, where one string will be hammered on while other strings ring through.  Since Geo Synthesizer and AlephOne are two-handed instruments, it is a very common thing to do.  One hand will play riffs with trills and hammer-ons with one hand, while chords are held down or created on other strings.

The beauty of doing it this way is that the synth is completely unaware of modes of any kind.  They simply don't exist.  The synth is polyphonic and stays in that mode.  But we explicitly turn notes on and off and explicitly tie notes together to play a note without a pick attack, from the controller.  This is a crucial element of string instrument gesturing.

Note that a slightly different way to render this whole sequence would be to assume that we have very large pitch bend that can cover any possible range, as in the example of every channel simply playing midi note 64 with a pitch bend of +/- 60 or perhaps +/-64.  Instead of playing new notes, we can simply bend 33 to 35 because they are in the same poly group.  This is a suggestion made by the MIDI Manufacturers Association.  It is simpler to implement if you don't want to add in note-ties, and my objections about the lower pitch resolution required to do it that way might not be as big of a problem as have always assumed.  The question arises as to what the coarsest acceptable pitch resolution is, given that we need to be able to represent exact pitch intervals without beating, and be able to double voices to get chorusing effects.  (Is it within 1 cent for any given interval made by two notes?).  For note 64 as center with +/- 64 semitones, that system, the entire set of available pitches, which I think is (haven't checked it):    

c0 * 2^( (128*n/16384)/12)

For pitch bend n from 0 to 16383, where c0 is the pitch of MIDI note 0, with 16 independent voices per instrument that can do this.


This should not even be an issue, except when starting with a piano keyboard as a model for what an instrument is, there is a mistake being made that notes of the same name (or even just pitch) are (or can be) unique. It is completely expected to have multiple versions of the same exact note at multiple locations. So this means that there should not be situations where we remember a note by its note number and look it up to do something with it later. We use (note,channel) pair and can treat *that* as unique only if every note can be placed into its own channel. But when mashing down this fretless protocol into 1 MIDI channel, we run into this problem again. We deal with it by sending a note off for a MIDI note before retriggering it a second time. Ie:

#what we want to send
ch1 on 35 127
ch2 on 35 127
ch3 on 35 127

ch1 on 35 0
ch2 on 35 0
ch3 on 35 0

#what we have to send because it's all mashed down to 1 channel
ch1 on 35 127
ch1 on 35 0
ch1 on 35 127
ch1 on 35 0
ch1 on 35 127
ch1 on 35 0

Note that when we overfill MIDI channels with more than 1 note, that we have multiple problems to deal with. The first problem is the obvious problem of having 1 pitch wheel per note so that notes are not independent. The second problem is less obvious; and it's the problem that we have to render messages in a different ORDER to get correct behavior. Supporting fewer channels than polyphony creates a lot of complications like this in the DSPCompiler code.

CC Transfer

One of the problems with abusing channels to get pitch bend independence, is that if you move a note from one channel to another that you need to also move all the per-channel values for other expression parameters. So, just as the pitch bend is remembered and altered if it must be, the same must be done for channel pressure, for per-channel expression values (of which there may be dozens! unfortunately!). So if between channel 5 and channel 6, a dozen CC values are known to differ, when there is a note-tie from channel 5 to channel 6, all dozen CC values that are different need to be sent. The Fretless protocol is *not* doing this. This is something that should be added, though in practice there is one fixed CC being used in practice right now (CC 11).

iOS Specifics

One of the things that is specific to iOS that must be fixed about MIDI is that setup is a horrible experience, just like on hardware devices. The kinds of terrible VCR-programming-like experiences offered by hardware devices are not tolerated by iOS users at all. Specifically:

  • The user has to know far too much about how MIDI works to get it setup
  • Correct pitch handling per finger is not an out of the box experience (what almost all of this document is about), which means at least: a bend width, channel span, channel number setting, and something to disable pitch bending for instruments that just can't handle it right. Most users don't have the faintest clue as to why they must span channels to get correct pitch handling, because it's a workaround for a deeply non-intuitive limitation. (4 unnecessary controls at least)
  • There is no real reason to stick a bunch of MIDI devices on a small number of channels, as VirtualMIDI can add multiple devices as easily as adding multiple channels. When talking to external hardware, it makes sense to span a small number of channels to make room for more instruments. But what really matters on iOS (the 90% case) is the issues that pertain to what can happen all inside of one iOS device.
  • The inconsistency of things like channel pressure handling (ie: is it a continuation of velocity change?) means that you can't ship an instrument one way. You need a switch to turn it off if it misbehaves versus some synth. (at least 1 more unnecessary control).
  • Modes are set in the synth for things that should be controller-driven. My example here is the legato and polyphony issues. The controller has plenty of CPU power to handle these things, and it should be defined there rather than in the synth. The use of modes prevents per-note expression types that arise on expressive controllers.
  • Capability Negotiation should be one of the very first things to happen in an MIDI session. We should never be sending device and vendor specific identifiers unless the idea is that everything is just so buggy that we must code to specific devices. Otherwise, a protocol for negotiating what the sides expect each other should have been one of the first things to go into the standard. If you can negotiate a small set of required things among controller and synth, then you know that things should work between them and they can keep things simple internally. You need to know if something you send or expect to receive is not going to work.
  • Once there is capability negotiation, there should be a way to know that you can query for knobs and sliders for the current patch. On iOS, you are not dealing with a hardware box plugged into a controller that you can physically touch *both* of them at the same time. You end up with a controller in the foreground, and that foreground controller needs to present controllers to the synth in the background. The background synth is unreachable in real-time. The names of knobs and sliders are going to be patch specific, and the knobs themselves will come and go depending on patch and synth chosen. So the common MIDI wisdom of picking names from pre-defined lists won't work. In current MIDI, there is a chronic pattern of enumerating many known cases rather than coming up with a few general mechanisms that combine well. They made sense at the time, but the controllers are not dumb controllers. They are fully functional computers that should be taking on most of the complexity tasks, while synthesizers are more specialized signal processors that don't really have their own interfaces now.
  • These patches might be loaded from the controller (ie: pick a file from foreground app) and pushed into the synthesizer. They will be defined versus some standard, possibly something based on Pure Data (Pd, libpd, Max/MSP) or CSound, or something iOS specific (related to AudioBus?). In any case, a capability to get and push patches into the synth should be an obvious and consistent part of the standard. Currently, developers end up trying to create and hardcode all possible patches for users, rather than letting the user community build most of that.
  • On iOS, it's clear that the synthesizers will start to get "cored out" so that we don't have the present situation continue as it now is. Ironically, every app defines its internal synth (usually not the greatest synth engine) and a controller (usually a keyboard) in spite of having a MIDI implementation, in addition to having an in-app-record facility (a DAW too!?). This represents enormous waste of effort. It creates too much user interface code to be written, requires too much synthesis knowledge, and causes controllers to limit themselves to the most common forms (ie: keyboards). When MIDI is actually doing its job correctly, controllers will stick to controlling stuff and synths will stick to synthesizing stuff, and DAWs will stick to recording stuff (via AudioBus?). All of these things will evolve independently, and much faster as a result.
  • This is very similar to how OpenGLES2.0 "cored out" the graphics APIs to stop enumerating all possibilities and introduced lower level primitives and a shading language. The MIDI primitives need to be much more strictly defined and allowances made to use these primitives to build unseen possibilities. This is the opposite of the current situation, which leaves many things up to the implementer; which causes setup headaches due to extremely basic things not being consistent among devices.
A Capabilities "Shell"

This is a hypothetical notion that I don't have code for at the moment.

A major part of the problem with MIDI is its combination of complexity, fragility, and ambiguity.  The ambiguity of it is sometimes touted as a way to get flexibility, where there is only the guarantee of reproducing gestures coming out of the hardware, but no guarantee that the same signals will produce a reasonable result against different synthesizers.  Because there isn't a common mechanism to negotiate (or simply dictate to the synth) a "language" during a MIDI session, there is no guarantee that the bytes being sent from controller to synth are interpreted as intended.  The current way around this is to put in device-specific hacks, involving setup by the user, or proprietary messages defined between specific controller/synth pairs.  So, now I am going to define a hypothetical mechanism by which we can deduce the minimum language that will be understood by the other side, without user setup in most cases.  The other issue is to be able to factor out common sub-languages into standard extensions to make it difficult to need actual proprietary messaging.

So suppose that all new MIDI devices can listen for just one type SysEx message used for this purpose.  Devices that do not respond to these SysEx messages are taken to be old MIDI devices.  Take this example:

c -> s: i can read [negotiation]
c -> s: i can send [negotiation,noteOnOff,bend[2..64],noteTie]
s -> c: i can read [negotiation,noteOnOff,bend[2..60]]
s -> c: i can send [negotiation]

In this case, the controller should note that there is no point in sending note ties, and to try to use a bend width of 60 to create legato phrasing.  We have no idea what vendor synth or hardware, etc.  That's irrelevant.  What we need to know is what we can *send* and expect to be understood, and what we can allow to be *sent* to us.  It amounts to specifying a legal state machine for the traffic between the two devices.  How about this:

c -> s: i can send [negotiation, OSC-pitchcontrol,...]
s -> c: i can read [negotiation, OSC-pitchcontrol, ...]
c -> s: expect OSC

If we have a negotiation protocol, then we can use it to abandon MIDI completely after the negotiation phase.  This is a way out of the backwards compatibility mess.  It isn't a problem that we switched protocols like this because we *negotiated* this so that it is guaranteed that this only happens if both sides agree.  The command to expect OSC is "proprietary", but it is known to be handled because OSC-pitchcontrol says that it will be understood.  What is most important is that each side can implement the minimum protocol required, with the overhead of a negotiation shell, rather than having to implement the whole MIDI spec.

Some standard capability name spaces that we could expect would be ones that expose the knobs and sliders of the synthesizer back to the controller; giving them names, default values, and keeping them in sync. Under iOS, it is very important to be able to proxy a synth parameter in the controller because the synth is sitting inaccessible in the background.


What I describe sounds a lot like OSC in a lot of ways. The main issue is how to at least be pseudo-compatible with MIDI, and how to take advantage of the fact that a background MIDI pipe exists on iOS but not a background OSC pipe (nor many synths that are even pseudo-compatible with OSC). If trying to use MIDI this way becomes overly complicated, or attempts to fix it are too incompatible to matter, then something will emerge from the iOS world. If it isn't OSC, then it will probably be something similar being smuggled over AudioBus. iOS spawning a pseudo-compatible-with-MIDI standard sounds like a bad thing; but realistically, there is no point in being compatible when the required scenarios can't be made to work. The only options would be to completely forget about any kind of MIDI compatibility, to make OSC finally take hold, or to make something entirely new happen.


  1. Wow! What a post! As an overly curious end-user I made my way through it and have been following the discussion elsewhere. I have really gained an understanding of how and why your channel cycling set-up works and this has allowed me to use Geo synths full capabilities with Thumbjam voices. I think you are right that most users will not have the patience to understand these arcane exploits and thus not be able to set-up them up and properly enjoy them; this makes me sad because of how awesome it really is when you get it working right. I intend to spread the word, at least in so far as a 'how do you set this up' sense, set x to on, y to on, leave z set to off..

    Regarding backwards compatibility: I think there will always be a group of people who are attached to their old analog keyboards, and this is an important group because so many of them are serious and devoted musicians. Perhaps in the end we will see ios controllers with two Midi modes; if this were the case, the complications involved in setup will be ok for the 'backwards-compatibility'-mode users, because they are used to it anyways.

    Anyways, best wishes with the development effrots, and thank you you for the in-depth explanation!

  2. I agree with Guaranath, and think that "something must happen" in iOS too.
    Just, do you have a plain, simple explanation on how to manage the Cantor intonation table ?

    1. There are no tuning tables, and that's the whole point of the rant. :-) What this thing says (in a bazillion words) is that MIDI needs to stop being note oriented (which only works on pianos) and be *frequency* oriented to support a more general class of controllers. The irony of taking this viewpoint is that you can massively simplify the synthesizer in the process - but you take complexity out of the synth and move it into the controller. The MMA could learn a lesson from how OpenGL evolved.

      In OpenGL, they used to try to enumerate all the possible things (lighting scenarios, etc) you could do. The newer OpenGL designs stop doing that, and simply provide more general primitives and leave it up to the application to make the special cases (shaders, vertex buffers).

  3. Can it be OSC compatible with Audiobus and/or Jack ?

    1. MIDI, OSC, and Audiobus are three completely different protocols. This article is about abusing MIDI spec ambiguities to get frequency oriented behavior. Audiobus has a protocol (that is hidden inside of an objectivec library), but that protocol only moves audiobuffers around; it has nothing to do with sending notes, amplitudes, or frequencies. OSC is usually used to send amplitudes and frequencies, but is very open-ended, and underspecified; OSC is just a simple remote procedure call framework that doesnt even define anything specific to music.

    2. Jack handles audio and midi routing. I dont know if it also handles OSC. But if you handle OSC, you need to also handle something like an audio language (ie: kyma, supercollider, pd, max, abletonPython, etc), because something needs to decode the messaged in the absence of standard semantics.

  4. Well you can define semantics on top of it, much as http and HTML are built on top of tcp/ip

    That would be the best solution. Osc was really meant only as a protocol for sending timing events. Just define your semantics for the timing events on top of it.

    And you are right midi really needs to go away...