Popular Posts

Monday, April 23, 2012

An Ideal MIDI-Compatible Protocol

A Torture Test For Correct Pitch Handling

The MIDI protocol is designed as a keyboard-extender. The actual pitches emitted aren't strictly specified, as it is supposed to faithfully capture the gestures coming out of a hardware keyboard. Under this model, instruments are a set of discrete keys with an initial velocity, and one pitch wheel per instrument. This means that any instrument that wants to send MIDI should fundamentally *be* a set of discrete keys and 1 pitch wheel per instrument.

But any instrument that fits that description can't do correct pitch handling, meaning that it is able to correctly track simultaneous timelines of (pitch,volume,phase). In other words, you can't model a fretless instrument with it. We can put together multiple channels to come up with the best approximation that is backwards compatible with MIDI though. But fair warning, the client generating fretless MIDI is VERY complicated at the level of MIDI bytes. It should be done through a common API that is used by many people. Fortunately, it doesn't make the synth (the server) any more complicated. In fact, it simplifies a lot of things in the server by allowing things like explicitly enumerated modes and tuning tables to just be gotten rid of.

If you think of the microtonal cases as something bizarre that won't be used in practice, then ignore these extreme cases. But realize that you can't properly model any string instrument unless you can model fretlessness correctly; if just for the reason that guitars can and do emit simultaneous pitches that are not exactly some number of semitones apart when bending notes and chording at the same time.

Touch Screens And Fretlessness

Background (pure MIDI engine vs ThumbJam with MIDI chorusing):



When using touch screens, fretlessness and per-finger bending is readily available to us. Everything is continuous in reality, and the notion of discrete keys being turned on and off is a major annoyance. In this space, we think more of a finger that goes down and can move to any volume or pitch, expression value, over its lifetime. Especially when trying to model any kind of string instrument, this sort of polyphonic fretlessness is a basic requirement to be able to faithfully model realistic behavior. In any case, touchscreen instruments can trivially create this situation and MIDI is quite terrible at modelling it.

  1. Every finger down can have its own completely arbitrary pitch
  2. Any pitch can bend to any other pitch over its lifetime, including from MIDI 0 to MIDI 127
  3. All pitch bending, no matter how wide happens at the fullest pitch resolution
  4. A note can attack at any pitch, and therefore needs its own pitch wheel
  5. A note can release at any pitch, and therefore needs its pitch wheel to not be disturbed until release completes
  6. Due to 5, a note turned off is still reserved for some time after the note has been turned off.
  7. Due to 1, every note is in its own channel, of which there are only 16.
  8. Due to 7, we are limited to 16 polyphony, including releasing notes that are actually turned off already.
  9. Because of 4, 5 and 8, there is a speed limit that we can play without pitch problems. We will have to simply steal the channel of the note that has been dead the longest, and hope that the releasing tail of the note is inaudible as we alter its pitch wheel for the new note.
  10. Because of 2, we can bend beyond the maximum pitch wheel depth. If we do so, we have no choice but to stop the note (at its current arbitrary pitch - and therefore release at this pitch) and restart it (on a new channel), where we can place a note-tie for synths to understand that these are not two separate notes; that the note is being *migrated* to a new name and a new channel.
  11. The effect of a note tie is to move the state of the note to a new channel and note, skipping the attack phase of the note. This means that the note tie can be used to legato notes together. A straight up note-tie in support of a wide bend moves a note from one note/channel to another with 0 cents difference in pitch, where legatoing two notes does the same thing with some large number of cents different in the transition.
  12. Because of 3, we can't simply support absurd pitch bend sizes like +/- 48 semitones. We need bending to be at full resolution.
* My Item 12 is debateable.  Conversations with the maintainer of the MIDI spec suggest that very large bends are not a problem in themselves, though many existing synths have a maxiumum of 12 or 24 semitones if they allow bend width to change at all.  As an example, use MIDI note 64 as a starting point with +/- 60 semitones gives a span of 120 of the MIDI notes, and the 1 cent resolution minimum resolution that microtonalists talk about (8192 steps up and down).  You can set to a non-octave number of semitones to do +/-64 to ensure that you can hit midi notes 0 and 127 as well.  So the idea of simply ensuring that very large bends are supported is a good one.  There is still the issue of sending 1 MIDI stream to multiple devices and having to live with the minimum supported bend, or possibly setting to 1 semitone bends (half the default) so that pianos simply render chromatic passages.  Note ties do more than handle an exception where bend width is exceeded.  You may want to use them for other reasons; but the complexity may not be worth it if you have very wide bends.


Pitches To MIDI And Backwards Compatibility
Warning: Take the source code here as what I really mean, as it has been pretty well vetted to work in practice, and should have fewer mistakes than this document, which is being written off the top of my head:

https://github.com/rfielding/DSPCompiler/blob/master/Fretless.c



As long as there is no definite pitch assigned to MIDI notes, the setup will be too involved for this kind of instrument. Ultimately, the notion of setting bend width and how many channels to span over has to go away. Under iOS, every instrument can supply its own virtual MIDI device anyway (with 16 channels to itself).
  • c0 is the lowest note in MIDI, with note value zero.
  • MIDI Note n is defined as : mNote(n:float):float = c0 * 2^(n/12)
  • If midi note 33 is 440hz, then: 440hz = c0 * 2^(33/12)
  • Integer values of n cause mNote(n) to return the pitches of chromatic notes
  • bend 8192 is the center value representing zero cents adjustment
  • semis is the number of semitones up or down. it is 2 by default.
static void Fretless_fnoteToNoteBendPair( 
    struct Fretless_context* ctxp, 
    float fnote,
    int* notep,
    int* bendp)
{
    //Find the closest 12ET note
    *notep = (int)(fnote+0.5);
    //Compute the bend in terms of -1.0 to 1.0 range
    float floatBend = (fnote - *notep);
    *bendp = (BENDCENTER + floatBend*BENDCENTER/ctxp->channelBendSemis);
}

This example code converts a floating point "MIDI" note into a integer note value and integer bend value.

MultiTimbral synths that can set bend width

We include channelBendSemis as an input to gracefully handle synths that don't understand note ties. If note ties are not understood, then at least we can set the bend width high to minimize breaks in the note bending (because it's harder to exceed the bend width). To get this level of backwards compatibility, it is generally sufficient to set a number of channels to the exact same patch that matches maximum required polyphony (ie: channels 1,2,3,4) and set bend width to +/-12 semitones.

Mixing fretless synths with pianos

The other issue is that of playing a fretless voice with piano in the background. Either two completely different MIDI streams would need to be sent to each device (ie: fretless MIDI for the violin, and a chromatic rendition that inserts every chromatic note as notes bend for the piano), or the bend width should be set to +/-1 semitone and have only the violin respect bends and note ties (and channeling for that matter).

The Biggest Weakness Of MIDI

Converting actual pitches to MIDI

When dealing with a device that takes a polyphonic signal (or set of mono signals) and converts it to MIDI, there is the distinct possibility that the original audio is to be mixed in with the MIDI. In my opinion, this is the biggest weakness of not taking on a frequency orientation for MIDI. The output of the speakers will resonate with a guitar-body that's generating MIDI and feed back in...think about the consequences of that! If a guitarist tunes a quartertone flat to match some un-tuneable instrument in the ensemble and runs through a device to generate MIDI that matches the guitar pitches, then there is no reason for any of the MIDI pitches to come out out of tune with the real audio under any circumstances. There is no good reason to make him do some kind of setup to make that happen either.

In the same way, an acapella singer that is some number of cents off with standard tuning (perhaps due to a resonance in the environment or singing in a Just Intonation), then the MIDI should LEAVE IT ALONE! and render bytes that will create those exact pitches on any standard MIDI device - because the original pitch isn't wrong. Auto-tuning isn't up to the protocol, and it isn't up to the synth. You can write a controller that rounds off pitches or sends a re-tuned variant of the original audio if that's what you want.

And it can still be backwards compatible with existing MIDI if it can cycle over channels and tie notes together.

How Note Ties Work

Note ties are something that I added to Geo Synth and AlephOne in an attempt to fix this problem. Because we need to be backwards compatible without creating wrong pitches or stuck notes, we have to be valid MIDI for everything which synths already know, and allow synths that don't understand note ties to ignore them without major problems in the sound.

#Turn on a note with a 0% bend and turn it off
0xE1 0x40 0x00 #bend to center position
0x91 0x21 0x7F #turn note on
0xE1 0x7F 0x7F #bend +100%
0x91 0x21 0x00 #turn note off

If we want to continue the bend, we need a new note. Because any note needs to be able to attack and release at any pitch, then it must be on its own channel when it does so, so there is a channel transition as would be in any other case:

#Turn on a note with a 0% bend and turn it off
0xE1 0x40 0x00 #bend to center position
0x91 0x21 0x7F #turn note on
0xE1 0x7F 0x7F #bend +100%
0x91 0x21 0x00 #turn note off
0xE2 0x40 0x00 #same pitch as note releasing on channel 1 (notice its different bend)
0x92 0x23 0x7F #continue note at new channel

So, any multi-timbral synth will recognize this bend in some way. If there is no audible attack-phase for the patch, it seems that we are done; it already sounds correct. But if there is an audibly different attack phase, then we can hear the note break before it transitions to the new value (what usually happens). So we need to put an NRPN into the stream to warn the MIDI device to expect a note off that is actually a legato to the next note; on a different channel even!. It basically just sends the number "1223" as an NRPN, to warn the MIDI engine about what is coming. It is this code here:

void Fretless_noteTie( 
    struct Fretless_context* ctxp,
    struct Fretless_fingerState* fsPtr)
{
    int lsb;
    int msb;
    Fretless_numTo7BitNums(1223,&lsb,&msb);
    int channel = fsPtr->channel;
    int note = fsPtr->note;
    //Coarse parm
    ctxp->midiPutch(0xB0 + channel);
    ctxp->midiPutch(0x63);
    ctxp->midiPutch(msb);
    //Fine parm
    ctxp->midiPutch(0xB0 + channel);
    ctxp->midiPutch(0x62);
    ctxp->midiPutch(lsb);
    //Val parm
    ctxp->midiPutch(0xB0 + channel);
    ctxp->midiPutch(0x06);
    ctxp->midiPutch(note);
    ///* I am told that the reset is bad for some synths
    /*
    ctxp->midiPutch(0xB0 + channel);
    ctxp->midiPutch(0x63);
    ctxp->midiPutch(0x7f);
    ctxp->midiPutch(0xB0 + channel);
    ctxp->midiPutch(0x62);
    ctxp->midiPutch(0x7f);
    */
    //*/
}




When this sequence is seen, then the sound engine will simply *remember* what note is being turned off without actually doing it. Then when the next note-on is given, it transfer the current phase, pitch, and volume to the new note; and the pitch will have to legato over to its new pitch and volume values as fast as it can.

#Turn on a note with a 0% bend and turn it off
# we have no idea what the future is once we turn note on...
0xE1 0x40 0x00 #bend to center position
0x91 0x21 0x7F #turn note on
0xE1 0x7F 0x7F #bend +100% #a surprise event when the finger bent up really high
oxB2 0x62 msb(1223) #Note tie warning
0xB2 0x63 lsb(1223) #Note tie warning
oxB2 0x06 0x21
0x91 0x21 0x00 #turn note off
0xE2 0x40 0x00 #same pitch as note releasing on channel 1 (notice its different bend)
0x92 0x23 0x7F #continue note at new channel

The Lifecycle Of A Finger

I hide the actual channels and notes of MIDI behind an API for many reasons. The most important of which is that a pair is not a good 'primary key' for a finger. If you bend a note past its maximum bend width, then you have to rewrite note for certain, and also channel because of my choice to hop channels even on note-off because of release time. So, an abstraction over MIDI is more like this:

beginOn finger0
express finger0 11 127
express finger0 42 127
...
endOn finger0 pitch vol polyGroup0
...
move finger0 pitch vol
...
express finger0 43 127
...
move finger0 pitch vol
...
off finger0

On a touchscreen, the only stable identifiers over the course of a gesture are finger numbers. The actual note and channel are a MIDI concept that we hide.

When we intend to turn a note on, we just do enough work to allocate a MIDI channel for it. Once we have a MIDI channel, and know the starting values for the various per-channel CC expression parameters we can send them - BEFORE the note is turned on. Then when we have the pitch and are about to turn the note on we send the bend value that will be required to make the right pitch - again, BEFORE the note is turned on. The channel pressure's initial value is also sent before the note turns on (in my case it's the same as volume). Then the note is finally turned on. As the note moves around in pitch and volume, it can easily hop around channels many times because of exceeding max bend width. Then finally the note is turned off by finger. The finger knows which channel and note was used. This design makes it very hard to have mistakes that create stuck notes.

And of course, part of that lifecycle not shown is that a previous finger could be down that it could legato to. In which case, the note off part of the lifecycle simply transfers note state onto the next finger.

Synth Engine's State Machine

The synth engine is actually really simple under this scenario. The synth doesn't have to know anything about the various polyphony modes or whether to legato a note, or to track much. There is very little state to track.

  • Since we don't have the value for the finger (it's lost as a (note,channel) combination), we keep track of each (note,channel) that is somewhere between turning on and still releasing. If you want to simplify the engine further, you can simply keep track of channel and insist that there is *never* more than one note per channel. It isn't realistic with other controllers feeding input, but will work for your own controller; and it will dramatically simplify the engine.
  • For each finger (tracked as (note,channel)), keep track of the note phase, pitch, and volume. Note that these are *exactly* the main things that we want to have arbitrary control of over the life of a voice.
  • There is one pitch wheel per channel, and CC values are remembered per channel.
  • If we get a warning that a note tie is about to happen, then the next note off determines the source and next note on determines the destination. Once we have source and destination, we transfer all note state from source to destination. This includes aftertouch, CC values, current phase, current volume, and current pitch. There should be no audible change in voice characteristics, as we have basically only renamed the note. (Another approach could be that a voice independent of channel could have been used and the channel is simply re-assigned to that voice.) The new pitch and volume implied in the note on is something that we begin to ramp towards. What is most important of all is that the phase of the note matches as it is moved to the new channel. Other characteristics are kept the same to prevent impulsing.
  • We interpret pitch bending according to our current bend setting.
  • Note on and off all behave as normal.
  • Because polyphony was done in the controller, we don't need any rules for solo versus poly mode. There is a notion of polyphony groups in the C API that generates the MIDI messages, but it shows up as nothing more than notes turning on and off (possibly tied together) at the synthesizer end. We don't try to take advantage of note overlaps; the controller already figured this part out for us.
  • Because legato was done in the controller, we simply play notes with the attack unless it is a note on that's part of a note-tie. Similarly, we don't try to use note overlaps; the controller already told us exactly what to do. We only legato if the note has been tied to another.
  • Using CC per channel works fine, because if each finger is in its own channel then it's also per-voice by coincidence.
The internal engine is very simple. In my implementation it's just one function with a few parameters, and very little state being maintained in the engine. What state there is exists entirely in a per-voice way. (I simplified my own engine by only handling one note per channel, and require polyphony to span channels.). So, the complexity created by fixing these pitch problems is only created in the controller. It actually simplifies the synth.



This is my internal engine, which is driven by the MIDI messaging.  I would have greatly preferred to ship a controller with no internal engine at all though.  The whole point of MIDI is co-existence.  I could free up all of the memory, get rid of all of the extra code, not get rated on the quality of the sound engine (which should be the job of a dedicated synth), and never get requests for sound tweaks.  The engine can literally add months of development to a controller that was long finished from the standpoint of MIDI.  The fact that every iOS app has its own internal engine in spite of MIDI support suggests that MIDI is too hard to setup and the results of connecting a controller to a synth are too unpredictable to leave it up to whatever combination the user picked.

* Note to self * - Maybe we can allow 'legato' without any tied note to let us start off in sustain phase of a note ramping up from zero volume without attack. This could happen to a string that starts to resonate without being plucked as happens with sympathetic strings. Perhaps a note could be started with the lowest possible volume (unfortunately, it can't be exactly zero due to the protocol treating vol 0 as note off!) or started off with a later than normal phase. In the same way that I am using MIDI to do manual chorusing in places, it would be useful to do sympathetic strings at the controller as well, rather than some sitar post-processing effect. But note also how easily we could easily get far beyond 16 voice polyphony like that.

A Tetrachord Example

A backwards compatible re-interpretation of existing MIDI messages can give us what we need. I will use the basic tetrachord (Ajam Bayati) as an example of how these problems arise, and how to model them as MIDI messages. This tetrachord is a note of a fundamental and fourth, minor third, and a note that falls in the middle between the root and the minor third. In this notation, we have notes D, E-quarterflat,F,G. But this notation is just an approximation to the real intonation. It is likely that this is the real intonation that will be played:

  • If D is taken to be the fundamental, then its pitch ratio is 1/1
  • G is a pitch ratio of 4/3 with respect to D
  • F may be the perfect minor third 6/5 with respect to D
  • E-quarterflat has a few plausible choices, 13/12 is a plausible one, with respect to D
When changing modes (ex: phrase moves up by 4/3 or 3/2), the whole tetrachord is played in these ratios relative to the root. So, the exact pitches move around based on root notes. Scales are a fiction that just don't exist in this system. Any attempt to remap the 12 notes to keys that turn on and off will result in some kind of failure to do what is required because notes continually move around to fit the context.

If we take the simplest approximation that D is midi note 38.0, E-quarterflat is 39.5, F is 41, and G is 43, then the note 39.5 must be on its own channel to work at all. But what's even worse is that it doesn't really matter what these notes start as. Their future lifetimes are unknown, and they can all bend off in independent directions. Every note must be on its own channel always. Thus, we have 16 polyphony in MIDI if we want fretlessness. This is a reasonable limitation for 1 instrument. We can have multiple MIDI devices for multiple instruments on iOS.

And, NO, tuning tables don't help here. Only fretlessness really does the job. It is easy to intermix straight 12ET playing with quartertones, and at some points to adjust pitches to Just Intonation (so that the quartertones don't cause horrible clashing, etc.). This always happens when there is an ensemble of fretless instruments in which the singer is doing Just Intonation but fretted instruments like pianos and bass guitars have been mixed in. The band often copes by relegating the 12ET guys to pentatonic lines, or adjusting their pitches to hit Just Intervals versus the 12ET notes that are technically out of tune. It is actually better to think of a series of approximations, where it takes about 53 notes per octave to accurately locate the various third and fifth related intervals, of which the 12 tone pitches are just approximations, etc. In the fretless world, people that know what they are doing will move these pitches around at will. You can't just remap keys because the number of possible keys will change. If you insist on frets, you still need a moveable fret system where the number of frets can be changed, and the whole fretboard can be re-fretted at run-time. As a result, MIDI should just stay out of the music theory mess and just play the required pitches. Scales and pitch snapping are the controller's job. MIDI interfering here is no different than trying to impose diatonic scales onto unknown musical input.

The pitch wheel positions

The image to the right is a graphical representation of the pitch wheels. Straight up is midi channel 1, marked by the gold triangle on the top. Going around the circle there is a gold triangle tick on channel 16 to show where the channel span ends. The blue triangles going in from the middle radius represent how far down the bend for that channel's pitch wheel is. The red triangles going out show a sharp pitch bend. The green rays sticking out show the channels that still have a note on in them. So, in this picture, we see six notes down total. The pitch wheels are all over the place at slightly different values. This is because in this picture, we are creating a chorusing effect by doubling each note as two MIDI notes slightly bent out of tune from each other. When we run our fingers up and down the screen, exceeding the whole tone limit (what the bend happens to be set to), a single note will hop around from channel to channel, leaving the pitch wheel in the position it was in when the note was turned off. We cycle channels clockwise looking for the least used channels (hopefully one with zero notes active) and pick a channel for our new note that way. As we play fretlessly, all of the channels get left in unusual positions as play goes on. If I intentionally played everything a about a quartertone high, most of these pitch wheels would be sticking out with a red triangle of 25% out towards the larger radius to make all notes a quartertone sharp. It would match exactly what my internal audio engine is producing.

The idea that a note starts with bend zero and is only bent by the user later is a piano-ism that injects an unwarranted music theory constraint into a protocol that's supposed to be setting the pitches that I asked for.

Miscellaneous

Because the MIDI protocol at this level is considered as a fretless protocol, any fretting rules are outside the scope of this layer of the API. However, because it's fretless you can simply round off pitches in the controller to get back to strict 12ET pitches, or more subtly handle rules for:

Portamento

Because we are not constrained to moving from discrete pitches, and have complete pitch freedom, we want to dispense with the concept of portamento. Portamento is a discrete key concept, and produces wrong pitches. If you start from midi note 0 and increase towards midi note 127 at a rate of 1 midi note per second, you will fall on an integer pitch value once every second (on pitch and on time) and the transition between notes will be smooth. More importantly, the pitch will represent where your finger actually *is*. It won't lag behind as you move up 1 chromatic note per second while providing an arbitrary ramp that is not a perfectly smooth bend overall. This is really only doable in practice on a continuous surface instrument.

This is what I mean when I say that "Portamento is wrong". It's a discrete key concept that is not useful on string instruments. String instruments track where your finger actually *is*. In fact, at the controller there are three things to track with respect to pitch:

  • Where your finger actually is located (ie: pitch 33.001)
  • Where your pitch is tuning to (ie: pitch 33.0)
  • Where your pitch actually is, usually somewhere in between these two (ie: pitch 33.0005)
You need all of these pieces of information in the controller, because you send the last item, but use the first two items to render the user interface.

Legato

The concepts of when to play the attack part of a note is separate from other rules having to do with whether notes are turned on and off for mono/poly play. So legato only refers to when the attack portion of a note is played. In our API, we have a notion of polyphony groups , which are a sort of pseudo channel. The first note down in a polyphony group will definitely play the attack phase, and further notes down in this same polyphony group will do a legato continuation of the note. All notes in this poly group behave as a solo mode. Note that every note goes into its own MIDI channel, and poly groups specify how these notes get grouped if the note-tie is a recognized message.

Polyphony

Polyphony rules include the standard solo mode (mono) that would be expected, and the full polyphony mode (poly) that is also expected. But in the spirit of doing away with enumerating special cases (that MIDI does a lot of), every note can be put into one of 16 polyphony modes. If every note is put into a different polyphony group then we get "full polyphony". If they are all placed into the same group (ie: group 0), then it becomes solo mode.

If notes are grouped according to guitar strings then we have something in between. When multiple notes are chorded within the same poly group, they act like isolated solo modes. But since each string is its own group, then chording and hammer on/hammer off effects just happen as a side effect. By default, the legato rule is to have attack on first note and legato on others. This rule can be toggled off to attack on every note (thus legato and poly are not identical concepts), or a controller can have it set for every note (ie: via velocity, finger area, or some gesture to signify that the note should have an explicit "pick attack".) There is NO WAY to make an even moderately realistic string instrument rendition of MIDI notes without these poly and legato concepts. They are fundamental to functioning correctly.



For example:


//Put down one finger into poly group 0

beginOn finger0
express finger0 11 127
express finger0 42 127
...
endOn finger0 33 vol polyGroup0


//Put down another finger into poly group 0

beginOn finger1
express finger1 11 127
express finger1 22 127
...
endOn finger1 35 vol polyGroup0


//Put down another finger into a different poly group 1

beginOn finger2
express finger2 11 127
express finger2 33 127
endOn finger2 35 vol polyGroup1
...
move finger0 pitch vol
...
express finger0 43 127
...
move finger0 pitch vol
...

off finger1
off finger0
off finger2

We end up with chording and solo-mode-like tricks happening simultaneously.  finger0 and finger1 are in the same poly group.  So the original pitch 33 has a finger with pitch 35 buried over it.  So when finger 1 goes down, finger 0 is silent.  When finger 1 comes up, finger 0 turns back on again - without a pick attack(!).  This is like solo mode.  But after finger 1 went down, finger 2 went down.  Finger 2 stays on until the end.  So we have soloing and chording at the same time.  This always happens on guitar, where one string will be hammered on while other strings ring through.  Since Geo Synthesizer and AlephOne are two-handed instruments, it is a very common thing to do.  One hand will play riffs with trills and hammer-ons with one hand, while chords are held down or created on other strings.

The beauty of doing it this way is that the synth is completely unaware of modes of any kind.  They simply don't exist.  The synth is polyphonic and stays in that mode.  But we explicitly turn notes on and off and explicitly tie notes together to play a note without a pick attack, from the controller.  This is a crucial element of string instrument gesturing.

Note that a slightly different way to render this whole sequence would be to assume that we have very large pitch bend that can cover any possible range, as in the example of every channel simply playing midi note 64 with a pitch bend of +/- 60 or perhaps +/-64.  Instead of playing new notes, we can simply bend 33 to 35 because they are in the same poly group.  This is a suggestion made by the MIDI Manufacturers Association.  It is simpler to implement if you don't want to add in note-ties, and my objections about the lower pitch resolution required to do it that way might not be as big of a problem as have always assumed.  The question arises as to what the coarsest acceptable pitch resolution is, given that we need to be able to represent exact pitch intervals without beating, and be able to double voices to get chorusing effects.  (Is it within 1 cent for any given interval made by two notes?).  For note 64 as center with +/- 64 semitones, that system, the entire set of available pitches, which I think is (haven't checked it):    


c0 * 2^( (128*n/16384)/12)


For pitch bend n from 0 to 16383, where c0 is the pitch of MIDI note 0, with 16 independent voices per instrument that can do this.

Unison

This should not even be an issue, except when starting with a piano keyboard as a model for what an instrument is, there is a mistake being made that notes of the same name (or even just pitch) are (or can be) unique. It is completely expected to have multiple versions of the same exact note at multiple locations. So this means that there should not be situations where we remember a note by its note number and look it up to do something with it later. We use (note,channel) pair and can treat *that* as unique only if every note can be placed into its own channel. But when mashing down this fretless protocol into 1 MIDI channel, we run into this problem again. We deal with it by sending a note off for a MIDI note before retriggering it a second time. Ie:

#what we want to send
ch1 on 35 127
ch2 on 35 127
ch3 on 35 127

ch1 on 35 0
ch2 on 35 0
ch3 on 35 0

#what we have to send because it's all mashed down to 1 channel
ch1 on 35 127
ch1 on 35 0
ch1 on 35 127
ch1 on 35 0
ch1 on 35 127
ch1 on 35 0

Note that when we overfill MIDI channels with more than 1 note, that we have multiple problems to deal with. The first problem is the obvious problem of having 1 pitch wheel per note so that notes are not independent. The second problem is less obvious; and it's the problem that we have to render messages in a different ORDER to get correct behavior. Supporting fewer channels than polyphony creates a lot of complications like this in the DSPCompiler code.

CC Transfer

One of the problems with abusing channels to get pitch bend independence, is that if you move a note from one channel to another that you need to also move all the per-channel values for other expression parameters. So, just as the pitch bend is remembered and altered if it must be, the same must be done for channel pressure, for per-channel expression values (of which there may be dozens! unfortunately!). So if between channel 5 and channel 6, a dozen CC values are known to differ, when there is a note-tie from channel 5 to channel 6, all dozen CC values that are different need to be sent. The Fretless protocol is *not* doing this. This is something that should be added, though in practice there is one fixed CC being used in practice right now (CC 11).

iOS Specifics

One of the things that is specific to iOS that must be fixed about MIDI is that setup is a horrible experience, just like on hardware devices. The kinds of terrible VCR-programming-like experiences offered by hardware devices are not tolerated by iOS users at all. Specifically:

  • The user has to know far too much about how MIDI works to get it setup
  • Correct pitch handling per finger is not an out of the box experience (what almost all of this document is about), which means at least: a bend width, channel span, channel number setting, and something to disable pitch bending for instruments that just can't handle it right. Most users don't have the faintest clue as to why they must span channels to get correct pitch handling, because it's a workaround for a deeply non-intuitive limitation. (4 unnecessary controls at least)
  • There is no real reason to stick a bunch of MIDI devices on a small number of channels, as VirtualMIDI can add multiple devices as easily as adding multiple channels. When talking to external hardware, it makes sense to span a small number of channels to make room for more instruments. But what really matters on iOS (the 90% case) is the issues that pertain to what can happen all inside of one iOS device.
  • The inconsistency of things like channel pressure handling (ie: is it a continuation of velocity change?) means that you can't ship an instrument one way. You need a switch to turn it off if it misbehaves versus some synth. (at least 1 more unnecessary control).
  • Modes are set in the synth for things that should be controller-driven. My example here is the legato and polyphony issues. The controller has plenty of CPU power to handle these things, and it should be defined there rather than in the synth. The use of modes prevents per-note expression types that arise on expressive controllers.
  • Capability Negotiation should be one of the very first things to happen in an MIDI session. We should never be sending device and vendor specific identifiers unless the idea is that everything is just so buggy that we must code to specific devices. Otherwise, a protocol for negotiating what the sides expect each other should have been one of the first things to go into the standard. If you can negotiate a small set of required things among controller and synth, then you know that things should work between them and they can keep things simple internally. You need to know if something you send or expect to receive is not going to work.
  • Once there is capability negotiation, there should be a way to know that you can query for knobs and sliders for the current patch. On iOS, you are not dealing with a hardware box plugged into a controller that you can physically touch *both* of them at the same time. You end up with a controller in the foreground, and that foreground controller needs to present controllers to the synth in the background. The background synth is unreachable in real-time. The names of knobs and sliders are going to be patch specific, and the knobs themselves will come and go depending on patch and synth chosen. So the common MIDI wisdom of picking names from pre-defined lists won't work. In current MIDI, there is a chronic pattern of enumerating many known cases rather than coming up with a few general mechanisms that combine well. They made sense at the time, but the controllers are not dumb controllers. They are fully functional computers that should be taking on most of the complexity tasks, while synthesizers are more specialized signal processors that don't really have their own interfaces now.
  • These patches might be loaded from the controller (ie: pick a file from foreground app) and pushed into the synthesizer. They will be defined versus some standard, possibly something based on Pure Data (Pd, libpd, Max/MSP) or CSound, or something iOS specific (related to AudioBus?). In any case, a capability to get and push patches into the synth should be an obvious and consistent part of the standard. Currently, developers end up trying to create and hardcode all possible patches for users, rather than letting the user community build most of that.
  • On iOS, it's clear that the synthesizers will start to get "cored out" so that we don't have the present situation continue as it now is. Ironically, every app defines its internal synth (usually not the greatest synth engine) and a controller (usually a keyboard) in spite of having a MIDI implementation, in addition to having an in-app-record facility (a DAW too!?). This represents enormous waste of effort. It creates too much user interface code to be written, requires too much synthesis knowledge, and causes controllers to limit themselves to the most common forms (ie: keyboards). When MIDI is actually doing its job correctly, controllers will stick to controlling stuff and synths will stick to synthesizing stuff, and DAWs will stick to recording stuff (via AudioBus?). All of these things will evolve independently, and much faster as a result.
  • This is very similar to how OpenGLES2.0 "cored out" the graphics APIs to stop enumerating all possibilities and introduced lower level primitives and a shading language. The MIDI primitives need to be much more strictly defined and allowances made to use these primitives to build unseen possibilities. This is the opposite of the current situation, which leaves many things up to the implementer; which causes setup headaches due to extremely basic things not being consistent among devices.
A Capabilities "Shell"

This is a hypothetical notion that I don't have code for at the moment.

A major part of the problem with MIDI is its combination of complexity, fragility, and ambiguity.  The ambiguity of it is sometimes touted as a way to get flexibility, where there is only the guarantee of reproducing gestures coming out of the hardware, but no guarantee that the same signals will produce a reasonable result against different synthesizers.  Because there isn't a common mechanism to negotiate (or simply dictate to the synth) a "language" during a MIDI session, there is no guarantee that the bytes being sent from controller to synth are interpreted as intended.  The current way around this is to put in device-specific hacks, involving setup by the user, or proprietary messages defined between specific controller/synth pairs.  So, now I am going to define a hypothetical mechanism by which we can deduce the minimum language that will be understood by the other side, without user setup in most cases.  The other issue is to be able to factor out common sub-languages into standard extensions to make it difficult to need actual proprietary messaging.

So suppose that all new MIDI devices can listen for just one type SysEx message used for this purpose.  Devices that do not respond to these SysEx messages are taken to be old MIDI devices.  Take this example:

c -> s: i can read [negotiation]
c -> s: i can send [negotiation,noteOnOff,bend[2..64],noteTie]
s -> c: i can read [negotiation,noteOnOff,bend[2..60]]
s -> c: i can send [negotiation]

In this case, the controller should note that there is no point in sending note ties, and to try to use a bend width of 60 to create legato phrasing.  We have no idea what vendor synth or hardware, etc.  That's irrelevant.  What we need to know is what we can *send* and expect to be understood, and what we can allow to be *sent* to us.  It amounts to specifying a legal state machine for the traffic between the two devices.  How about this:

c -> s: i can send [negotiation, OSC-pitchcontrol,...]
..
s -> c: i can read [negotiation, OSC-pitchcontrol, ...]
..
c -> s: expect OSC

If we have a negotiation protocol, then we can use it to abandon MIDI completely after the negotiation phase.  This is a way out of the backwards compatibility mess.  It isn't a problem that we switched protocols like this because we *negotiated* this so that it is guaranteed that this only happens if both sides agree.  The command to expect OSC is "proprietary", but it is known to be handled because OSC-pitchcontrol says that it will be understood.  What is most important is that each side can implement the minimum protocol required, with the overhead of a negotiation shell, rather than having to implement the whole MIDI spec.

Some standard capability name spaces that we could expect would be ones that expose the knobs and sliders of the synthesizer back to the controller; giving them names, default values, and keeping them in sync. Under iOS, it is very important to be able to proxy a synth parameter in the controller because the synth is sitting inaccessible in the background.

OSC

What I describe sounds a lot like OSC in a lot of ways. The main issue is how to at least be pseudo-compatible with MIDI, and how to take advantage of the fact that a background MIDI pipe exists on iOS but not a background OSC pipe (nor many synths that are even pseudo-compatible with OSC). If trying to use MIDI this way becomes overly complicated, or attempts to fix it are too incompatible to matter, then something will emerge from the iOS world. If it isn't OSC, then it will probably be something similar being smuggled over AudioBus. iOS spawning a pseudo-compatible-with-MIDI standard sounds like a bad thing; but realistically, there is no point in being compatible when the required scenarios can't be made to work. The only options would be to completely forget about any kind of MIDI compatibility, to make OSC finally take hold, or to make something entirely new happen.

Wednesday, April 4, 2012

Tumbleweed

Meh

AlephOne had been coming along swell. See videos of it here:


Especially this one, which is AlephOne controlling ThumbJam over MIDI, where the chorusing effect is literally done by MIDI note duplication rather than as a post-processing effect:


It was designed to be a clean re-design of everything I learned when doing Geo Synthesizer, Pythagoras (initial Geo code), and Mugician, with all the icky parts rethought. It was largely successful in that regard from the standpoint of cleaning up bugs and making the code much more understandable (specifically, isolating components nicely). But between worldly distractions having nothing to do with apps, the utter crash in end-user interest a few months after Geo was released, watching what others release; I need to step back and do some soul searching.

First, I released some of the important parts of AlephOne (my private project that I only shared with a few people) on github, without any licensing terms written (I tried a license with Mugician, and they don't matter unless you really want to get lawyers involved at some point.) This code is at:


The Python vDSPCompiler

It's a mash of two ideas. The first was an abortive attempt at automating the process of turning my synthesis code into SIMD instructions that would greatly speed up AlephOne's internal engine. It was the beginning of a compiler written in Python to ingest a LISP syntax language for generating vDSP instructions, to render the entire audio buffer in parallel by sample. I got most of the timbre of AlephOne written by hand when this started. Until I figure out a few parts that would let me generate the entire effects chain (reverb, chorus, etc), the DSPCompiler class isn't of practical use. But for now, trying to automate it is a giant distraction. I might get back to that later.

Fretless MIDI

The second part is VERY important. It is the MIDI code that is used in AlephOne. Fretless.* and DeMIDI.* are the function to generate and parse MIDI from input gestures. Fretless implements all of my rants about everything that needs to be fixed with MIDI. It lets you treat MIDI much like OSC and free it from being stuck to twelve tone scales and bent notes. MIDI's abstraction here is appropriate for piano-like keyboards, and hideously/disastrously wrong for everything else. MIDI's design is liquid brain damage that gets injected into every effort to create a fine expressive instrument. It forces you to choose between stupid pitch handling and stupid polyphony. Attempts to fix it with MIDI HD look like they will be incompatible, yet tied to MIDI's past - the worst of both worlds. I am bound to MIDI just because right now, it's the only reasonable IPC between synths and controllers. You can abuse it and take advantage of ambiguities in the spec to get very close to full polyphonic pitch control, at the cost of moving all complexity into the controller and being willfully incompatible with a lot of stuff. So that's what Fretless.* does. It was explicitly designed versus ThumbJam and SampleWiz and Arctic.

So if you are designing an app that wants a fully bendy instrument with extremely natural pitch control, then this code should clear up what the hard issues are when rendering to MIDI. It boils down to creating a floating point representation of MIDI note numbers, at the cost of being limited to 16 note polyphony (by forcing one note per channel and using up to 16 of them. There is no notion of notes in this API. Rounding off pitches is the job of the controller (doing it anywhere else is premature rounding that loses important information). Generally, you need to keep separate the notions of what pitch the gesture actually implies (always slightly out of tune), what pitch you want to fret/autotune to (the "chromatic" notes and scales), and what pitch is actually rendered (somewhere in between these two things). Because touch screens draw the interface underneath the surface, all of these things must be available in the controller. The synth really only needs to know what is actually rendered. MIDI gets it backwards, because in 1980, controllers were dumb devices and the synths had the brains. It doesn't actually work like that any more. If nothing else, the new mobile device paradigm is to *install* the patch into the controller/synth device and avoid talking over the network, routing the audio instead.



I added in a new notion of legato and polyphony handling, because it's absolutely necessary for string instruments. Polyphony isn't a mode for the instrument. It's a gestural phenomenon that depends on the instrument, and is created at the controller. It is similar to channels. The instrument itself has to span many channels (to cover active and releasing notes) so that bends can be completely independent. But you can put notes into polyphony groups, which will control when notes are silenced and re-enabled for solo-mode and string-behavior such as trills. But because it is not a synthesizer/controller mode, it needs to make allowances so that chording and polyphony can be done at the same time. Related is legato, or whether the note attack is re-triggered. Generally, the first note down in a polyphony group plays the attack and every other note is a continuation of the current phase. But note that legato and polyphony are separate. In a real string instrument, the decision to pick or legato a note is made on every note - it's not a mode for the instrument that gets turned on or off.

Most importantly, it implements note-ties for MIDI. This is such a fundamental concept, that had it existed in the standard, most of the other broken-ness of MIDI might never have happened. MIDI allows the definition of pitch bend to be changed, because it assumes that you are on some kind of keyboard with a pitch wheel. But pitch wheels have a top and a bottom and a center position. The position is 14 bits. The standard interpretation is plus or minus a whole tone, where it can be increased to 12 whole tones up or down. This means that you still can't do arbitrary bend sizes (let alone *independent* bends of arbitrary size). If you have note ties, then you can dispense with all of this nonsense. If you bend a note A up to the A# position, you can do a note-tie between A bend up a semitone and rename the note as A#. You can continue to do this for as many octaves as you want, at full pitch resolution. This is exactly how written music notation actually behaves as well. Standard synths that don't understand the note-tie will experience a note-retrigger as the note is bent beyond its full up or down position. ThumbJam, SampleWiz, and Arctic understand these note ties.

I think the internal MIDI engine versus these synths is just awesome, especially on the phone. But experience with Geo suggests that maybe 10% of all people downloading the instrument can figure out how to use MIDI at all, and of those, maybe 10% ever get an understanding of why channel cycling is necessary. So if I release AlephOne as is, I will get pummeled with a lot of "why doesn't this just work!?" complaints from people who can't figure out the tool (Every hardware MIDI synth is different like VCR programming, so there is no manual you can just follow. You have to know what you are doing.). It's like trying to sell circular saws to people that need them without getting sued by people who should not have bought them. I am running out of UDID slots, so I have to either just release it as it is or shelve the project; maybe until AudioBus provides better options. (And Oh It Might!)

Original rant here:


Internal Audio Engine

I am here because I got a bit stuck and distracted when it came time to move away from just doing the MIDI part to building my own internal engine to consume the MIDI. I had to do this because no existing MIDI engine really does the behavior I need 100% correctly. So I go back into the endless task of listening to minor changes with headphones on and tweaking parameters to get the internal engine code sounding good and performant. I have no idea when AlephOne will release, but I have always had a small number of highly enthusiastic users. I am looking for inspiration at this point. I had fooled around with bringing libpd back in (which Pythagoras actually used at one point) or trying CSound. I will see where it goes.

But until then, take a look at DSPCompiler if you are using MIDI and have read some of my rants about MIDI. They aren't theoretical problems. I would move on to OSC if I could. But talking to other apps in the background with low latency only seems to have MIDI (or an abuse of it) as the only option right now.


Wednesday, February 1, 2012

Disconnected Procedure Calls

Disconnected Procedure Calls

A few years ago, when I had to write a remote procedure call (RPC) system to talk between two systems that needed to handle very spotty availability, and high latency for the calls, I was still playing online Go at nights during that period of my life. It is a perfect information game that has similarities to Chess playing. I had started a correspondence game, where each side only makes a move once every few days most of the time, but might wish to have a burst of activity and make dozens of moves in one sitting. I had never actually finished a game because the latency is absurdly high, given that you had to wait for the other side to get around to making a move. It can take a month to get through these games which can have a total of 300 moves or so. A major flaw in this sort of system is that you cannot opportunistically upgrade to playing in real-time for few times where both players happen to be online at the same time. Chess has a lot fewer moves on average, but the problem is quite similar. This problem of making communications robust in the face of constant outages and absurd latencies in a network protocol struck me as being very similar to a correspondence game. It is precisely what happens when everything moves to mobile devices (including some that have no cellular access, like iPad with wifi only) as well.

I also work at an international company where this phenomenon can really kill us for cooperating on projects. Due to time zones and a different weekend schedule, it can be quite normal to come into work to an ambiguous email requesting a code change. Often it is best handled by simply implementing *both* interpretations of how to write it in a pair of branches, and wait for the reaction to a smoke test in the morning. When I get a reaction back on the two ways of doing it, I can commit one of them to the repository immediately. The benefit of this is that I can reorder getting clarification with getting the fix written, and avoid the artificial latency created by waiting for clarification.

A similar scenario would be to get driving directions from one location to another, and pull as much of expected data as possible before we find ourselves half-way through the trip with no connectivity. If we stay on course and were able to retrieve all data, we can make it through the trip with no more connectivity. If we stray off the course, then we will have to sync with the server for more data.

This is all about designing for a minimum number of contacts with the remote system, and designing the procedure call mechanism so that this is the simplest and default case, rather than something that has to be painfully hand-crafted into each application in a specific manner.

Question and Answer Weaving To Speed Up Protocols

If I am to play Chess with a remote opponent, I will often have a high level strategy in mind. It is not strictly necessary for both sides to take turns making moves. This is an analogy to network packets going back and forth between host machines. When the game begins, both sides will have an opening plan in mind. The first player can submit a tree with the first move committed, and a tree that covers all of the expected responses to the opponent's move. When the second player gets this message, he will only see the first move. He will come up with his response, but also send a tree of all expected responses to the first player's responses. They are exchanging partial game trees. When a player gets a game tree from his opponent and makes a move against it, another move is made against him without any more contact to the remote player, and this continues until a place in the tree is reached where the remote player could not foresee the move. So, to both players, it may hard to tell if the opponent is actually currently online or not. They simply play a game of Chess with a remote opponent. While they are thinking, they can game out the whole tree of expected moves if the opponent is taking too long to respond.

At the point where an unforeseen move is made, the game stops progressing, and both sides are still making their game tree responses at their leisure.. This process continues until the game ends.

Remember that we assume that both opponents are almost never online at the same time. The beauty of this is that the game can proceed at a much faster rate than would be allowed if players only progressed at one move at a time. If player one responds at breakfast, and player two responds at lunch, then there would normally never be more than 2 moves in a 24hr period. But if both sides can predict responses at an average of 4 moves out, then the game proceeds at 4x the maximum rate imposed by moving one move at a time.

In a communications protocol, exactly this sort of thing is going on. A packet of data is sent to the other machine and we wait for a response. If we knew all possible responses that we would get back, then we could also send responses to those messages as well. Under this scenario, we assume that the latency between the machines is so high, that sending whole trees of responses (or some function that generates the tree) is trivial in cost.

Now Imagine Two Chess Engines Playing Each Other

If we had two humans cheating with Chess Engines (or simply taking a long time to come up with their game trees), then they could exchange very deep trees with each other and make a lot of progress. Boring sequences (ie: long predictable chains that aren't really avoidable) would be exchanged in large chunks, and the game would only have to wait when something surprising happens. If both sides had the same chess engine, then the trees of responses might be very easy to compress into small messages.

Now Imagine That The Network Is Down 99% Of The Time

The main motivation for this style of communication is to mask network outages or to mask the unavailability of the opponent. Take the example of a cell phone in airplane mode, but with Wifi enabled, as an example. One of the opponents is out on travel in a foreign country where normal connectivity is turned off to avoid high phone bills, but there will be bursts of connectivity (open Wifi) available for short periods of time. Presuming that there is some predictability to the conversation that will be had, boring sequences can be skipped over to speed things along.

If the network is almost always down, then the remote procedure call must be designed to work asynchronously. Furthermore, both sides must write to a local message queue. When the network is down, we can work normally and pretend that we will eventually get a response to everything that we are doing that has a question for the other side. When connectivity is found, the two sides reconcile the stream of messages they have for each other, making progress.

Every Remote Call Must Have A Timeout

When an asynchronous call is made, a timeout is mandatory, even if it is very long. The timeout must be within the time that we are guaranteed to remember that we made the request and are waiting on an answer. When a timeout happens, we will get an exception from our local queue. This exception will cause effects similar to what would happen if the remote end actually got the message and had to report an error.

Servers Reboot Constantly and Take Forever To Respond

If all progress is written into a persistent queue, then we can take the persistent queues to be the major state of the system. As long as all parties sync with each other before passing any timeout thresholds, no errors are observed, and the system makes progress as designed. It should not matter how often the remote computer is turned on or talking to the network. It could be a hand-held device that is turned off when it is not actively being used.

What RPC Needs To Look Like In This World

Synchronous calls that fail because something remote is currently unavailable will just produce lots of "errors" that aren't really errors. You can't design protocols for mobile systems like this without implicitly forcing them all to drain their batteries and run up phone bills polling the network. I am saying that the current design of the web is deeply flawed now that we aren't all sitting at desktops. If you visit a website from a mobile device, and the website can state that it doesn't expect the data to change for another 24hrs, then it is reasonable to expect that for the next 24hrs you can continue to view the data without a network connection. Currently, almost every web site and browser combination in existence doesn't behave like this.

These systems must base their communications on persistent local queues that will drain data to (and reconcile with) the remote systems when opportunity arises. It is similar to a queue based replication scheme. Data already gotten needs to be smartly cached, and exchanges of a conversation need to be smartly predicted.

Communications API

I would expect that an in-browser JavaScript version of this sort of library would be highly useful, especially when designed for mobile apps. Web browsers talking via an async API mediated by queues. (Ajax, except it's not an error if the remote end isn't available. It's only an error if the remote end can't respond in time. And the message queue and implicit app state needs to survive reboots(!!), because it can take days to get a response in some cases.)

//Tell remote that we would like to run its makeMove with given args,
//and invoke makeMoveDone with its response, and consider the game over if we don't get
//a response in two days.
dpc.invoke( opponent, game0, makeMoveSequence("(e1 (c6 (e2)))"), makeMoveDone, TwoDaysFromNow() );
//When makeMoveDone is invoked, we may have made multiple moves of progress,
//all committed by the remote end at his leisure at different times (ie: an hour apart for each
//response.)

makeMoveDone might get an exception argument back, either due to the server being unable to handle it, or simply from the timelimit getting missed. Besides guaranteeing that an exception or an answer is eventually gotten back, the weaving together of sequences of moves to minimize latency is important.

Summary

I know that existing message queue libraries address some of these requirements, but am not quite so sure that anything exists to weave together questions and answers in a general way. Very few internet systems are designed from the ground up to assume that neither user is available for the rare periods in which connectivity is available, while trying to let applications be built as if the user is actually available (but very slow to respond). The benefit of doing it this way is that the periods in which both users are online will suddenly allow rapid progress, yet allow things to proceed smoothly when this is not the case.

Tuesday, December 27, 2011

MIDI and Over-Abstraction

Let me know specifically where this is wrong. I have actual code that works, so even if it's wrong, it's not theoretical - it works very well.

I am getting my thoughts together on the rationale for doing MIDI the way that I have. The feedback I get on what I am trying to do shows a divide between people that worry about keeping hardware simple versus those trying to keep software simple. It is instructive to first talk not about MIDI and sound rendering, but about OpenGL and drawing APIs.

Cooked Standards

The OpenGL that I am used to is the desktop 1.x versions of it. It provides a state machine that feels a lot like an assembly language for some real hardware device. But it is not painfully low level, because it provides the things that you are almost guaranteed to need in every app. The basic idea is that it will do buffering and matrix transforms for you, and you just feed it what you want. For example pseudo code:

loadIdentityMatrix();
setCameraPointLookingAt(x,y,z, atX,atY,atZ);
turnOnLightSource(lightX,lightY,lightZ, lightR,lightG,lightB);
...
push(TRIANGLES);
vertex(x,y,z, r,g,b);
vertex(x+dx,y,z, r,g,b);
vertex(x+dx,y+dy,z, r,g,b);
...
pop(TRIANGLES);
...
render();

What is really cool about doing things this way is that you don't have to supply all of the equations for moving points around in a 3D projection. You don't have to build any kind of API to buffer data. These things are provided for you, because you will need this functionality in every app. But newer versions of OpenGL threw all of this away. The main reason for it is that the huge number of APIs to cover every scenario for lighting and coordinate transforms keep growing. Their existence makes the spec unstable. These 'fundamental' things such as lighting, buffering, and coordinate transforms are really conveniences that should be out of the standard.

Raw Standards

The problem with this is that something like OpenGL needs to make the implementation as low-level as possible, while still allowing broad compatibility. So, the requirement for programmable shaders does not mean that it should just be shoehorned into the existing specification. What they did was to introduce an even lower-level API as the new standard, where old code is essentially emulated on top of it. The new APIs are kind of like this:

setVerticesArray(arrayOf3DVertices);
setColorsArray(arrayOfColorsForThe3DVertices);
setNormalsArray(arrayOfNormalsForThe3DVertices);
...
setVertexShaderProgram(compiledVertexShader1);
setFragmentShaderProgram(compiledFragShader1);
...
doRendering();

So if you want the old-code look, then you use an API that looks like the old code, but ultimately turns into this underneath. If you need the new features, then you write a framework that lives inside of your framework that approximates the old APIs where you can, and goes its own way where it must do so. This keeps the specification from being larded with everything that the designers didn't think of. The hardware has a smaller burden of what it must recognize, as does the OpenGL graphics driver. But the burden moved to the programmer to use (or write) a library over top of it to get it going. In short, they solved the stability problem by making the primitives more general.

So newer versions of OpenGL recognize that in the end, all of this code configures shaders in the graphics pipeline and builds buffers of different kinds of vertices. So new versions of OpenGL don't even bother providing 'fundamental' things like lighting, buffering, and coordinate transforms. The burden is on the developer to create these things. This is not a hassle if OpenGL developers use a standard software library above the raw library. All that has happened is that complexity moved out of the hardware (or at least the API that hides the hardware) into the client software.

VRML and other SceneGraph APIs went nowhere. They provided really easy abstractions, at the expense of creating objects that were too high level. But in the end, the ability to simply make shaded/textured triangles won. Making the API too high level simplifies things for somebody who insists on writing directly to the hardware, at the expense of imposing limitations. The limitations need to be worked around, so then garbage creeps into the spec because the spec is too high level. For OpenGL, staying very low level is the strategy for having the spec withstand change.

It makes the applications more complex to build, at the expense of taking out parts of the spec that are subject to much variation, by only including primitives.

MIDI Is Too High Level, Making The Spec Too (F-ing) Large

MIDI seriously suffers from this over-abstraction problem. Because hardware devices speaking raw MIDI with very little computational power on-board are the primary focus, it is understandable that the model is of turning notes on and off and bending them. In this view, it isn't the controller's job to either: 1) determine exact pitch 2) manipulate note on/off to create solo mode 3) manipulate the envelope of a note. This view is tightly tied to the assumption of discrete keyboard keys that send a MIDI message when the key goes down. The idea of completely continuous controllers could be shoe-horned in as an incompatible addition to the spec, but that's not a good thing because it takes an already complex spec and makes it more complicated without even being backwards compatible. What would be better is to make the primitive elements more general, and make controllers handle it themselves in a backwards compatible way.

MIDI isn't primitive enough. The abuses of the spec that are currently possible, but dicey, need to be standardized so that they are totally expected and legitimate. Note on/off are clearly fundamental. Bends are fundamental, but their semantics are underspecified. You should be able to compute a frequency given a note and a bend value, and work in actual frequencies. This is because frequencies are fundamental, even though MIDI doesn't quite get this concept. Envelopes on the sound are fundamental as well.

Bend And Legato

Because almost all MIDI devices only support the on/off/bend messages, and ONLY that reliably, it's foolish to implement huge parts of the MIDI spec and simply demand that synths and controllers understand all of it. This is especially true on iOS, where such demands will simply result in incompatibility on the part of developers who will simply implement what they need to ship. I am sticking to what is known to work everywhere; and then using the NRPN to putty in the cracks in such a way that it is not a disaster if the synth doesn't understand. This is analagous to rendering 3D graphics as simple triangles if a special shader is unusable. This principle is also why web pages are not usually completely broken when different browsers are used against non-portable pages. A wrong pitch due to an unrecognized message is far more debilitating than getting a right pitch with a note re-trigger at an inappropriate time, especially because many patches don't have a noticeable attack anyway.

Fundamental Elements

So, these are the fundamental elements to build up a correctly functioning MIDI synth that has control over pitch, polyphony, and legato:

0x90 - note on/off
0x80 - note off (i send 0x90, but should recognize 0x80 as equivalent
0xe0 - bend (pitch bend setting still applies)
0xbxxxxx - a special 'note tie' NRPN that states that the next note on and note/off pair are actually tied together.

The first rule is that there are no real 'notes', only frequencies that we make by a combination of midiNote number and bend. We don't try to re-tune the notes, but use pitch bend in combination with note to get the exact pitch. We also place all current notes into unique channels, and try to behave reasonably when this is not possible (ie: channel bend reflects last note down). This is the only really reasonable way to do this because the note on that we choose is what we *name* the note. This note name is what a piano that doesn't understand bends will play if it's asked to play our note.

Because we can always exceed the bend width, note tie says to 'continue note in the same state from the note turning off to the next one turning on'. This note can, and usually does change channels, because of the requirement for every note going down to go down into its own channel. You have to hold off on reusing a channel for as long as possible, because when a note is turned off, it will still respond to bends while it is releasing.

Keeping It Simple In Software, Hardware Isn't Our Problem

We are not worried about requiring a full-blown brain in the client, as hardware vendors might object to. Moving complexity out of the synth and into the controller makes an incredible amount of sense on iOS. This is because the controller will need some of the low level details in order to render itself on the screen. We have the pitch implied by where the finger touches, the pitch that we are drifting to due to fretting rules, and the actual pitch being played. We need to know all of this information in the controller. The standard MIDI equivalent would simply have the controller knowing about where the fingers are, and being more-or-less ignorant of what the synth is doing with this information. So in our case, the controller manipulates the pitch wheel to create the fretting system, and the synth has no idea what intonation we are using. It's not the synth's business to know this.

Similarly with polyphony rules, the synth can't just have a 'solo mode' setting. AlephOne and Geo both have a per-string polyphony that essentially adds the concept of 'polyphony groups'. The polyphony groups act similar to channels in that the controller will turn notes on and off to get the right polyphony behavior. This way we can chord and do legato at the same time. It's a controller-dependent thing, and it's not the synth's business to know any of this.

Similarly with legato. Legato *usually* tracks polyphony by playing attack on the first note down in the polyphony group. But in reality, on a string instrument, whether to pick or legato a note is decided on a per note-basis. It's not a mode that is enabled or disabled for the whole controller.

Because almost nothing recognizes more than note on/off/bend, anything else that the MIDI spec states is quite irrelevant in practice. The note tie addresses something that nothing in the spec does, and doubles as the legato, and it's not a major problem if it's not implemented. To somebody implementing a synth, a small number of primitives (only one thing beyond the standard elements) gives good overall behavior.

There is also the issue of the same note being played multiple times. AlephOne does chorusing. It doesn't do this with any post-processing effects. It works by playing the same note, microtonally displaced, twice everywhere. This is one of the reasons why simply picking note numbers and bending them around is a bad idea. On a guitar, the high E note is played simultaneously from 2 or 3 positions all the time. The assumption that you bend a key is rooted in the idea of a keyboard with one key per 'note'.

Hacks

So, yeah, this is a hack with current MIDI. OSC is too general (in the sense that OSC messages have no inherent semantics, just syntax - it's XML hell all over again). And what I have read of MIDI proprosals that aren't made of on/off/bend seem unworkable in practice. If we are all on iOS and MIDI becomes too complex and not compatible with existing hardware anyway, we will simply throw the protocol away entirely and use on-board synths.

Real World Implementation

This is actually impemented in Geo Synthesizer and AlephOne on the client end, and in SampleWiz, ThumbJam (where it is especially good, specifically on the JR Zendrix patch), and Arctic Synth on the server end (I believe - I don't have a new enough version of the OS to run it). But anyways, it's not a theoretical idea. It has been implemented multiple times, and is demoed here (switching intonations while playing and doing polyphonic bends - and single polyphonic playing of this works fine against a lot of synths that have weak MIDI implementations):

Tuesday, September 20, 2011

A Complete Pitch Geometry




In previous posts, I explained how close 53ET is to being a 5-limit system, and how to create intuitive diagrams that do away with obscure lists of pitch ratios and vague explainations. Pitch geometry is a way of diagramming the actual harmonic series as viewed on a Just Fourths tuned string instrument so that things line up (or mis-align) based on the harmonic series. This gives us a practical way to view it without getting overly technical.

And best of all, it's baked directly into a real instrument that can be played. You can see simple ratios up to 13 with it. In particular, you can see everything from 2/1, 3/2, 4/3, 5/4, ... , 16/15. Which happens to be every "superparticular" ratio (meaning (N+1)/N) up to the commonly used Just Major Semitone. It's the Just Intonation version of the chromatic note. Click the picture to see the fine details!

I would like to mention this book by Cameron Powers:

http://www.amazon.com/Lost-Secrets-Perfect-Harmony-Indigenous/dp/1933983183

It is about Middle Eastern scales in particular, but it's also very much about the sort of future that touchscreen instruments are going to bring with them; Just Intonation becoming normal practice once again, assisted by the digital age.

Geo Synthesizer (aka Pythagoras, Geo Synth) is not just a fun toy for playing with virtuosity on iPhone and iPad. It was designed from the beginning to feed virtuosity in every way that it can. From allowing for very fast chromatic play, to learning the deep magic of the harmonic series (Just Intonation based microtonality), or just playing fretless; it is designed to teach you things you might not already know. This way you can learn the satisfaction of constantly improving your musicianship, rather than the short-lived satisfaction of an instrument that's merely easy to use.

Octaves, or 2-limit (2,1/2)
This is a trivial system of notes. It is everything you can create out of positive and negative powers of 2, meaning: 2^-1, 2^0, 2^1,... (ie: 1/2, 1, 2,...). It is obvious that octaves are one of the more basic equivalences that anybody can tune by ear alone.

Pythagorean Tuning, or 3-limit (3/2, 4/3 - blue lines)

Tuning a piano is not as straightforward as it seems, because the instrument is a real-world resonating body. In the real-world, the 12 tone system is a numerological fiction that roughly approximates the behavior of the harmonic series. Like a pair of drummers that play different time signatures but have the same number of beats per minute, a pair of waves that progress at a rate in simple ratios will regularly meet at the exact same point in the shortest time possible.

A simple way to tune a piano would be to start from one note that we will use as the reference note that is "in tune", and listen to the fifth overtone in the string. We then tune another string to that pitch, or an octave lower than it to stay in the same octave range. By the time we have done this to 5 strings, we have a Pythagorean Tuned pentatonic scale. If we do this for 7 strings, then we have a Pythagorean Tuned diatonic scale. It is obvious that these seven notes will not be equally spaced when we play them in order, because there will be whole tones and half tones. But the unevenness goes even deeper than this, because if we extend out to 12 strings, then we have a Pythagorean tuned chromatic scale. This scale will sound beautiful as long as you don't do anything that assumes that the 12 tones wrap around at the ends. This 12 tone scale overshoots the octave by about 1/9 of a Just whole tone.

The Blue lines that go up and down by fourths and fifths are the 3-limit. So this is every ratio that you can make with powers of primes up to the number 3. For example: 1/3, 2/3, 3/2, 9/8, etc. Because of how the 2-limit (powers of octaves!) and the 3-limit (powers of fifths, and therefore fourths as well) line up, it very nearly creates a 53 note per octave equal tempered scale. It isn't exact, but it is close enough that you can forget about the numbers not lining up at the extremities, as it's a perfect Pythagorean scale in practice. It is the first usefully close approximation to a Just Circle Of Fifths.

Just Intonation, or 5-Limit (5/4, 6/5 - green lines)
Usually, when Just Intonation is spoken of, 5-limit is what is meant. This means inclusion of perfect thirds, major and minor. It is a very fortunate coincidence that the 53 note per octave scale, happens to line up almost exactly. Almost every world-music system is using some variant of this scale. Flexible intonation instruments like Sitar, Violin, Voice, etc will invariably use this system or some subset of it.

Higher Limits, 7-limit (yellow), 11-limit (turquoise), 13-limit (purple)

These lines were recently added so that I could locate limits that get used in special contexts. Arabic Maqam is quoted as using a few different variations on what a "quartertone" is. It is typically notated as the dead center between a 12ET major third (or described as a 3/4 tone). But this is not what actually happens. Those intervals literally taken are awful for chording. The just intoned variations that can be found by ear and practice are what are used.

The scale families known as "Bayati" have quartertones in their bottom tetrachord. This would mean, D, E quarterflat, F, G. If what is meant is the note that is very close to the chromatic quartertone, then it means the ratio 11/10. This ratio, as well as 12/11 is unreachable in 53ET, which is otherwise excellent for this type of music. If what is meant is the note that fits almost exactly into 53ET, then it means 13/12.

Count The Lines

In the picture above, I can go over the various pitch ratios counter-clockwise.

  • 3/2 - This is the Just fifth. It is a blue line
  • 4/3 - This is a Just fourth. It is blue because it's the dual of 3/2.
  • 5/4 - This is the Major Third. It is green. (to the left of F# on bottom row in the picture)
  • 6/5 - This is the Minor Third. It is green.
  • 7/6 - Yellow.
  • 8/7 - Yellow.
  • 9/8 - The Just Whole Tone. Grey. This is a scale building block
  • 10/9 - The Minor Just Whole Tone. Gray. It arises often.
  • 11/10 - Turquoise. A high quartertone. It falls almost exactly between two 53ET frets.
  • 12/11 - Turquoise. The quartertone that matches a chromatic Quartertone closely. It falls almost exactly between two 53ET frets.
  • 13/12 - Purple. A low quartertone that almost falls in the 53ET scale.
  • 14/13 - Purple. A low quartertone. It falls almost exactly between two 53ET frets. Note: because of this pattern of quartertones, some people advocate splitting the way that 12ET is split to 24 to get quartertones. If you use 106ET, then you can specify these quartertones pretty exactly.
  • 15/14 - Gray.
  • 16/15 - The Just Major Semitone. A building block of scales.
  • ...
  • ... (they get smaller and smaller)
  • ...
  • 81/80 - It's not explicitly shown here, but it is very special. It almost coincides with 1 53ET fret, which is the distance between the vertical lines. It is the difference between two whole tones and a major third. Ie: (9/8 * 9/8) / (4/5) = 81/64 * 4/5 =81/80. The 53ET fret is also almost the amount by which 6 whole tones overshoot an octave. Many tuning systems want two whole tones to equal a major third, so tempering is done to make this so.
The good thing about the way it is drawn out onto the instrument, you don't have to think too hard about pitch ratios or anything like that. You just line up lines. The colors give an indication what family the pitch ratio is in. This style of playing doesn't use frets to keep you in tune, but uses instruments with a lot of resonance, so that the resonance is what keeps you in tune; which is getting help from the laws of physics rather than hard constraints.

Amazingly, it's not hard to play pretty accurately on the phone. You can certainly be more accurate than 12ET is. You can turn up the delay feedback to the top, and set the delay rate to 0.01seconds to make the audio resonate with a drone pitch (which I can't fine tune, but it's fretless, so you can adjust!). What is important is knowing exactly what you are doing with pitches, and by exactly how much you are falling short.

Read my older post on 5-limit for a deeper explanation of Pitch Geometry for 53ET and Just Intonation. It's almost identical to the user interface for the instrument, but is much more clear.

http://rrr00bb.blogspot.com/2011/08/53et-5-limit-in-geo-explained.html

Sunday, September 4, 2011

MultiTouch MIDI Recommendation

MultiTouch MIDI

This is an extension of http://rrr00bb.blogspot.com/2011/08/midi-instruments-on-ios.html, of which this could be a more focused restatement.

In order to facilitate the correct interpretation of MIDI instruments that are based on iPad and iPhone instruments, we need to be specific about some of the areas that MIDI leaves up to the implementer (to get wrong in most cases!). This is being written for iOS developers who are writing MIDI instruments.

Wizdom Music is involved with, or actually makes these instruments at the current time:

  • MorphWiz
  • MorphWizMidi
  • SampleWiz
  • Geo Synth
  • Haken Continuum (not owned by Wizdom, just very involved)
These instruments, and just about every iOS instrument that is not slavishly mimicking the limits of a piano controller, will encounter problems representing MIDI input and output in a way that matches common sense on non-piano controllers. They have more stringent pitch handling requirements than a piano controller. They all potentially can have a pitch bend per finger, not per controller. They are a special case of Omni mode; actually something that's half way between single channel and omni.

These new instruments need to be able to do smooth bends from the lowest renderable note to the highest at the fullest pitch resolution. A simple use case is to just hand your ipad to a child. Putting one finger on the glass and dragging it around for the duration of an entire song, or doing so with two fingers bending in different directions; this is obvious, and so needs to be made to work within what MIDI standardizes.

They are also a special case in that we are not doing bending with any real pitch wheels, and these virtual pitch wheels have no real top and bottom position. So we include a definition of a Non-Registered-Parameter-Number (NRPN) that functions as a note tie. It allows for bends to be as wide as you would like, independent of bend width. It's best to NOT set the bend width away from the small default, because the bend width only has 14 bits of resolution, which is quite small.

The NRPN is a standard feature of MIDI, but it's up to us to standardize any definitions that we create.

Frequency

MIDI specifies notes because it was originally tightly tied to discrete keys going up and down. The frequency of the note was implicit. But when pitch wheels become involved you can make the desired frequency explicit. Assume for a moment that we set the pitch wheel to mean plus or minus only one semitone:

frequency = baseFrequencyLowC * 2^((note + bend)/12)

Note is an integer value, and bend is a number between -1 and 1. This gives the exact intended frequency. The synth should not be taking any liberties and do exactly what is stated here.

Pitch Wheels

Pitch wheels are assumed to be centered at zero when the synth starts. Once messages come in, the last value that the pitch wheel was set to for a channel should stay where it is until a new bend message comes in. This is important because if the pitch wheel is not centered and a note comes on, the initial pitch on attack includes the bend in its frequency - with no portamento, and when the note is released, it also releases at the exact frequency; which includes that channel's bend value.

Assume that the pitch wheel is set to the 2 semitone default here. We behave as if we had gotten this when there is a reset of MIDI:

bend ch1 0%
bend ch2 0%
....
bend ch16 0%

So if we know this...

off ch1 33
...
bend ch1 100%
on ch1 33
off ch1 33

Because the bend width is 2 semitones, then this note strikes exactly the same pitch as B on attack and on release. There should be no audible difference from:

off ch1 33
...
on ch1 35
off ch1 35

If this isn't done right, then fretlessnes doesn't work right. Fretlessness is the foundation to getting correct *great* pitch handling. It's the only way to get microtonality working right as well, because real-world microtonality requires actual fretlessness (not MIDI tuning tables). Choral singers do NOT actually sing in 12ET and violinists don't play in it; they will tend to Just and Pythagorean intervals (which arise by the laws of physics, not man-made music theory) in the absence of other instruments providing the constant reference tones to keep that from happening. This is even more true with ethnic instruments like Sitar. It's also true that every instrument has characteristic intonation differences that helps to identify it.

So, these pitch issues are not something to be taken lightly as 'it never happens in the real world'. This is part of the reason why electronic music is still having trouble getting past the 'uncanny valley' into being convincing. The ability to start solving this problem is one of the more exciting things about using an iPad (or any continuous surface) as an instrument.

MIDI Channels

Omni is provided as a way to have a synth render MIDI that was originally being sent out across multiple channels. But what seems to typically get done is to just substitute the original channel number with its own before interpretation, as an easy hack that mostly works. Like this:

on ch1 33
bend ch1 +25%
on ch2 33
bend ch2 -25%
off ch1 33

The expected result is two instances of the same note where one is bent up a quartertone, and the other is bent down a quartertone. At the end of this the note bent down a quartertone is still playing. So, if we are doing omni, just treat it all like it's channel 1:

on ch1 33
bend ch1 +25%
on ch1 33
bend ch1 -25%
off ch1 33

This is wrong, given what the original channels were, and it's what most synths will do. First, we end up with silence at the end of the sequence, a dropped note (almost as bad as a stuck note), because we told the only note 33 on ch1 to be silent. The second thing is instead of chording two of the same note at detuned values (ie: chorusing), the first note 33 just got bent to the value of the second instance of 33.

So specifically, you must pass this test:

on ch1 33
bend ch1 +25%
on ch2 33
bend ch2 -25%
off ch1 33

The note should still be playing, detuned a quartertone here. This means that there are two separate instances of A, simultaneously played, chording, possibly out of phase, etc. This is no stranger than playing A and A# together. It should work, and this happens all the time on string instruments.

New Note - New Channel

You don't know if a note will be bent in the future, so it must have its own pitch wheel. If you run out of channels and have to share a note with another channel then you will get artifacts if one of the notes on that channel needs to be bent.

As a compromise, you don't have to monopolize all 16 channels for this. You just need a reasonable number, which if you are emulating a string instrument, is generally going to be the number of strings, or maybe just the size of the largest playable chord.

Adding more channels does help if you can afford it in performance, because a note being turned off is not the end of its life; releases are still happening when the note is turned off.

Single Patch "Omni"

Just because there is only one patch at a time, it should not mean that the MIDI data should be misinterpreted like the above situation. The channels specified in the original MIDI output exist to keep the various expression parameters separate. It is so important that the pitch wheel is kept separate; this intent should never be violated on pitch, even if it has to be violated for other parameters.

As a hack to make the separate pitch bends work right, a typical way to do this is to setup a multi-timbral synth to simply duplicate the same exact patch and settings across all of the channels (or just a subset of them). But just because you have a synth that can only load one patch should not change the meaning of the bend messages.

Overlaps

One of the issues created by MIDI channeling is overlaps. You must not get a silence when note on/off pairs for same note overlap. This happens when you try to play more notes than there are channels available to play them on:

on ch1 33
on ch1 33
off ch1 33

This should play two notes and then silence.

on ch1 33
on ch2 33
off ch1 33

This should play two notes, where you can hear two of the exact same note playing, then one of them cuts off. You should not hear silence yet. This happens because on a string instrument, you will trill on the exact same note to play very fast, like guitar picking:

on ch1 33
on ch2 33
off ch1 33
off ch2 33

Note: The MIDI specification states that it is up to the implementer what happens when note off and note on are not exactly balanced. But as with internet protocols "Be conservative in what you send and liberal in what you understand".

http://www.gweep.net/~prefect/eng/reference/protocol/midispec.html

"
If a device receives a Note On for a note (number) that is already playing (ie, hasn't been turned off yet), it is up the device whether to layer another "voice" playing the same pitch, or cut off the voice playing the preceding note of that same pitch in order to "retrigger" that note.
"
This is a disaster in the making. It is up to the implementer, but you will get a stuck note if the client and server choose different ways. Don't take that statement to mean that you should create a synth that will leave a stuck note when the notes don't balance.

Most synths take the (on, note, vol) tuple to be setting the state for a note, not pushing an instance of the note onto a stack, to be popped by a note off. If you are liberal in what you send, then you will only add to the number of synths that get stuck notes. If you need to play a voice twice, you should be using the channels. This is consistent with the practice of new note new channel that we are using here.

Think about the fact that (on, note, 0) is an alias for a note off. (on, note, vol) leaves vol as a variable. you can have balances get undone if the vol is equal to zero even though the intent wasn't to turn the note off.

Release Time

When a note is turned off, that's not the end of its life. That's the beginning of the release phase only. The frequency that it releases on will change if you mess with the pitch wheel for that channel before the sound is gone. It will be an annoying artifact for instruments like hammered dulcimers with special tunings (Indian music, etc).

Because every new note down picks a new channel of the 16 available, we need to carefully cycle through the available channels evenly. This will maximize the amount of time that a channel has been dead before we steal that channel to play a new note.




Note Tie Non Registered Parameter Number (restated from previous post)

It is not impossible to bend MIDI notes to any width you want at fullest possible resolution. the problem is that there is no defacto or dejure standard on how this is done. Imagine a piano player trying to simulate a bend, and it's on our channel cycling instrument....

bend ch1 0%
bend ch2 0%
...
bend ch16 0%
on ch1 33
...
off ch1 33
on ch2 34
...
off ch2 34
on ch3 35
...
off ch3 35
on ch4 36
...

So he's playing chromatics to simulate the bend because that's the best he can do. But if we are on a synth that inserts bend messages, the synth can at least bend from one chromatic to the next like this:

bend ch1 0%
bend ch2 0%
...
bend ch16 0%
on ch1 33
bend ch1 20%
bend ch1 40%
bend ch1 60%
bend ch1 80%
bend ch1 100%
off ch1 33
on ch2 34
bend ch2 20%
bend ch2 40%
bend ch2 60%
bend ch2 80%
bend ch2 100%
off ch2 34
on ch3 35
bend ch3 20%
bend ch3 40%
bend ch3 60%
bend ch3 80%
bend ch3 100%
off ch3 35
on ch4 36
bend ch4 20%
bend ch4 40%
bend ch4 60%
bend ch4 80%
bend ch4 100%

So, this would be a smooth bend, except we hear the note retrigger every time we reach the next chromatic. So let's say that we have a special message that notes that there is a note tie coming and that it's done when the next note on appears.

bend ch1 0%
bend ch2 0%
...
bend ch16 0%
on ch1 33
bend ch1 20%
bend ch1 40%
bend ch1 60%
bend ch1 80%
bend ch1 100%
tie ch1 33
off ch1 33
on ch2 34
bend ch2 20%
bend ch2 40%
bend ch2 60%
bend ch2 80%
bend ch2 100%
tie ch2 34
off ch2 34
bend ch3 0% #note that from tie to note on, we expect bends and a note off to happen
on ch3 35
bend ch3 20%
bend ch3 40%
bend ch3 60%
bend ch3 80%
bend ch3 100%
tie ch3 35
off ch3 35
on ch4 36
bend ch4 20%
bend ch4 40%
bend ch4 60%
bend ch4 80%
bend ch4 100%

We can continue this from the lowest note on the keyboard to the highest for a super-wide bend. It is at the full pitch resolution as well because we aren't playing tricks with the MIDI bend width. It is also the case that if we broadcast this both to a piano that can't bend, and the synth that understands, we get a similar result. It degrades gracefully on the piano, and sounds perfect on the synth that understands. We can use this to track up to 16 fingers at arbitrary pitches (in MIDI range of course!) bending in whatever wild directions they need.

The NRPN looks like this in our code:

#define TRANSITION 1223

static inline void sendNRPN(int ochannel,int msg,int val)
{
//B0 63 6D
//B0 62 30
//B0 06 100
int lsb = msg&0x7f;
int msb = (msg>>7)&0x7f;
//midiPlatform_sendMidiPacket7(0xB0+ochannel, 0x63, msb, 0x62, lsb, 6, val);

midiPlatform_sendMidiPacket3(0xB0+ochannel, 0x63, msb);
midiPlatform_sendMidiPacket3(0xB0+ochannel, 0x62, lsb);
midiPlatform_sendMidiPacket3(0xB0+ochannel, 6, val);
}

static inline void retriggerNewMidiNote(int finger,float midiFloat,int vol,int expr)
{
int channel = midiFingerUsesChannel[finger];
if(channel >= 0)
{
int ochannel = midiChannelOChannelSent[channel];
sendNRPN(ochannel,TRANSITION,midiChannelNote[channel]);
}
stopMidiNote(finger);
startNewMidiNote(finger,midiFloat,vol,expr);
}


Let us know if there is something unreasonable about that message. I haven't used NRPNs before, and since we write both ends of it, they could both be 'wrong' and work just fine between our synths.

Bend Width RPN

Just in case it isn't completely obvious what happens when we change the bend width slider, here's the code that tells the synth our assumed bend width when we move that slider:


static void bendRangeSliderMoved() {
// Check the current bend range with the previous. If it has changed, send the appropriate RPN.
static int previousBendRange = -1;
int bendRange = bendRangeSlider.scaledValue;
if (bendRange != previousBendRange)
{
NSLog(@"bend range has changed from %d to %d", previousBendRange, bendRange);

previousBendRange = bendRange;

// Activate the pitch-bend sensitivity (bend range) RPN.
SoundEngine_sendMIDIPacket3(0xB0, 101, 0);
SoundEngine_sendMIDIPacket3(0xB0, 100, 0);

// Send the new bend range value.
//BUG: earlier versions sent 2*bendRange, which was wrong
SoundEngine_sendMIDIPacket3(0xB0, 6, bendRange );
SoundEngine_sendMIDIPacket3(0xB0, 38, 0);

// Deactivate the pitch-bend sensitivity RPN.
SoundEngine_sendMIDIPacket3(0xB0, 101, 127);
SoundEngine_sendMIDIPacket3(0xB0, 100, 127);

// Flush the MIDI queue.
midiPlatform_flush();
}

// Update the bend range with the sound engine.
SoundEngine_midiBendsInSemitones(bendRange);

// Flag that the texture needs to be updated.
bendRangeSlider.needsNewTexture = YES;
}