Popular Posts

Tuesday, September 29, 2015

Correspondence Chess, Latency, and Offline Progress

Offline Applications

In server and desktop systems, it is normal to simply assume that connectivity works when both client and server are up and running.  When using this style of application, it is normal to show errors to the user when an attempt is made to contact the other party, while it is unreachable.  This situation is somewhat tolerable in desktop and server applications, because connectivity is generally wired and highly reliable.  But this situation is completely intolerable for just about every other kind of situation.  As an example, TCP masks packet loss at the byte stream level and hides errors that don't result in missed deadlines.  But most TCP apps will claim errors if the underlying connections reset, in spite of the fact that there is still plenty of time to try again.

In mobile applications, there will be frequent bad connectivity.  For phones, it is caused by going through subway tunnels, low batteries, and simply moving out of range.  It is a terrible user experience to be in the middle of web browsing, and to try to bring up something that was already loaded, and have it get lost due to the network being down.  This is simply bad application design.  HTML5 supports offline apps, but it is highly unusual to be dealing with even mobile apps that behave correctly in the face of being mostly-offline.

For servers, outages are rare, but they can be caused by maintenance mishaps, or outages of dependent services (such as the ISP).  If a client can specify a real-time deadline for an answer, then it doesn't matter what connectivity looks like if answers always come back on time.  Especially when deadlines are long (on the order of hours or minutes), machines can simply wake up and connect periodically to stay up to date.  This allows battery powered machines experiencing low demand to spend most of their lives hibernating.  Note that the latency of such long deadlines says nothing about throughput of requests, which may actually be required to be high in some circumstances.

One thing that a TCP stream cannot do is to hibernate sessions for truly long periods of time.  TCP streams cannot take out leases of hours or days and then persist to survive process or operating
system restarts.

Queued Transaction Processing

What is necessary is for machines that are executing a protocol (having a session) to be able to simply queue up outbound requests while there is nothing ready to receive them.  If all requests have deadlines on them, we can get errors from our local queue if the deadline passes without a response.  If a deadline passes while the unacknowledged request is being serviced, we must assume that the other party may have given up.  And for any unacknowledged tasks in progress, we must be able to send a cancellation request, and get back an answer (with a definite deadline of its own) to whether the cancellation succeeded.

Even the notion of whether the other party is "online" is expressed merely in terms of deadlines.  If the other party doesn't respond to a request to exchange packets in a timely fashion, it is offline for all practical purposes.  A server that exchanges packets with us in a timely fashion can be considered online if it meets the deadline for that criteria; though it may actually be asleep when the first attempt to contact it is made.

Correspondence Chess

Masking errors when the network is down is a rather obvious thing to do.  Something that is far less obvious is to handle "Phone Tag" situations to dramatically reduce latency.  Consider a game of correspondence chess:

  1. e4 e5  2. Nf3 Nc6  3. Bc4 Bc5

Now assume that each player can only make one move per day, at 8:30am in their own timezone on opposite ends of the world, while having breakfast before going into work. There is never a time when both parties are actually online.  Even with moves forwarded into a third party queue that is always up, a 30 move game is going to take two weeks. This is intolerable, particularly because almost all of the moves are predictable. Action should only stop when surprises arise.

So, let's model correspondence chess as message passing state machines.  The state machine is the combination of the board position and who moves next.  We have moves, and contingencies.  Contingencies are moves that we foresaw the other side making.  The first player, Alice, plays a game that starts in state A.

A -> B
B --> C
C -> D
B --> E
E -> F

The single arrow is a move being committed to (where the actual move is not being specified, and we are only tracking state changes to the game).  From any state, there is only one single arrow that moves to a new state.  But states that have double arrows can have multiple possibilities.  These are all the possibilities that we foresaw.  After those possibilities, we will generally include our response to it.  Bob, who is playing the other side of the game , cannot see Alice's thought process.  But Bob thinks and makes predictions of his own:

A --> B
B -> C
C --> D
D -> G
A --> H
H -> I

Bob is expecting Alice to either move the game into state B or H.  If Alice gets Bob's messages, she can simply append them:

A -> B
B --> C
C -> D
B --> E
E -> F
A --> B
B -> C
C --> D
D -> G
A --> H
H -> I

Alice can use a naive algorithm that is similar to computing group permutations.  Knowing that we are in state A, we see that it will transition to B.  If there was no prediction of this, then the answer to this merge would simply be "B", in which case, the game resumes at state "B", and Alice assumes that Bob will come to the same conclusion.  But if we walk through the list (looping around the top if we have to), we see "A --> B".  This means that it was predicted.  So that means that we search for "B -> " to figure out what the next state is.  The answer is C.  We walk forward through the list and wrap around to find "B --> C", so we use one more move to find D.  The transition to D was predicted, due to "C --> D".  That gives us one more transition to G.  "D --> G" is not in the list.  So the transitions stop moving forward.  The final state is now G, that Bob put the game into.  Alice how plays from G to J, and doesn't bother making a prediction yet.

G -> J

When Bob gets Alices's data, he appends Alice's initial exchange to his list as well.  The lists are in a different order, but the answer comes out the same.  They both agree that the game has progressed to state G.  In doing this, they agreed upon: "A -> B -> C -> D -> G".  If they can continue to make 5 moves per exchange, then the game completes 5 times faster.  Alice has decided to move from G to J, but hasn't communicated that to Bob yet.  Alice insists that the game is at J, while Bob insists that the game is at G.  But this is consistent, since Alice passed the game through G as well.  When they exchange again, they will both pass through J.

Now imagine that when Alice and Bob play, they are offline almost all the time.  When they realize that the other side is offline, they just keep making more predictions.  Within trade-offs of sending huge predictions (and the power consumption of simply making the predictions) versus minimizing pointless round trips, they make as much progress per move as possible.  In other games, like Go, the games can be much longer (300 moves!), and have more boring sequences in between the surprises that cause prediction to fail.  This feature of Go makes offline correspondence games take unbearably long.

Notice that this is an O(n^2) algorithm.  This can be fixed to handle large exchanges by sorting all transitions to perform a binary search for transitions, which reduces the algorithm to O(n lg n).  The exchanged move predictions can be garbage collected to start at the last state that both sides agree on.  However, for actual games, keeping the full record around is useful for analyzing how predictions failed.

Correspondence Protocols

Now imagine that we substitute games with networking protocols in general.  The distributed protocol family, Paxos, uses the distribution of a change log to keep multiple copies of a state machine synchronized.  When the responses within the protocols only have a few answers (ie: booleans, enumerations), both sides can specify their predictions as far out into the future as possible.  This is conceptually similar to TCP windows moving the stream forward on both sides.

Like Paxos, there isn't a reason to limit the exchange to only 2 participants.  If a protocol has 3 or more parties taking turns, exchanging messages for correspondence gets all parties up to date.

Note that in general, network protocols with a lot of chatty back and forth are often badly designed.  Relational database protocols (ie: MySQL, Postgres) often have this problem.  When client and server are co-located, there doesn't seem to be an obvious issue.  But put them over a high bandwidth delay link such as a satellite, and performance can become terrible even when throughput should be very high.  Doing correspondence handles foreseeable back and forth exchanges to minimize latency.  This was a phenomenon we actually observed, when moving to an application server happened to batch up queries to minimize round trips to create extremely dramatic performance improvements.

When working with people in other timezones, it may be advantageous to handle requirements ambiguities by coding all of the possible solutions into git branches.  When there is a back and forth later on about what the requirements actually are, we find out which branch we should be merging.

Delay/Disruption Tolerant Networking

Some of these ideas are not new, because they are exactly what happens in the extreme case of space travel.  As nodes orbit and move around, connectivity is only possible during opportunities.  So, DTN protocol handles this at the level of routing.  The main idea is to persistently queue data until there is a send opportunity.  If the destination node is never reachable directly, then sending to some intermediate nodes that queue up data (and take custody over it) will have to do.  In the RFCs, there is mention of designing application protocols to minimize round trips; without it ever detailing such a scenario.  This implies that even though the problem is generic, it is an application protocol concern.  It is also not clear whether timeouts for messages are always put in at the DTN layer.  This is important, because if connectivity is out 95% of the time, then the 5% time window where communication is working, you need to forward data that built up in queues during the other 95% of the time.

For terrestrial apps, DTN may be overkill in a mobile setting.  This is because the unreliability isn't so much in the routing, but in the clients connections to the internet.  For offline correspondence apps, it may be perfectly fine to treat "The Internet" as up or down as a whole.  Because the point is for clients to queue up outgoing requests, and speculate to minimize round trips (ie: playing out all expected responses in game trees).  When there is connectivity, it will probably be reliable and have enough bandwidth to drain the client queue.  The intermediate queue will almost always be up because it is in a replicated data center.


Having an actual state machine might be too stringent.  In many cases, it may be better to think not in terms of state machines, but preconditions.  A precondition that we are in an exact state matches exactly one state.  A precondition that is a bit weaker could match a large number of states.  But we may be certain of how we need to respond to it.  For instance, if our king goes into check during Chess while there is exactly one way out of it, we might as well commit to making that response ahead of time.  In Go endgames, the game may break down into independent subgames.  We may want to commit to our response to moves in the independent subgames, rather than computing all of the combinations of states exactly. 

Thursday, February 12, 2015

Signed, Unsigned, and Circular

Signed, Unsigned, Circular

In doing networking work and related kernel work, we have seen every new developer run into a variation of a simple problem with the use of counters that wrap around.  In the case of TCP, it is sequence numbers.  In the case of kernel work, it is jiffies.  It also happens when you are waiting for your two digit number to be called in a fast food place.

Suppose that two events happen, named a and b.  The event a happened earlier in real time than b happened.  Records of these events are stamped with a limited resolution timer, and upper bits that tell you the true ordering are discarded.  To make the problem easier to reason about, we will use an 8-bit timer.
struct Event { uint8 sequenceNumber; .... }
We are assuming a 2s compliment representation of integers; the standard computer integer representation.  So, should these timestamps really be signed or unsigned values?  They are usually implemented using unsigned values, but they are actually neither.  People get bugs in their code by asking if one event happened before another by comparing sequenceNumber using the less than operator, which is wrong.  We are assuming a 2s compliment representation of integers; the standard computer integer representation.  So, should these timestamps really be signed or unsigned values?

Geometry Of Circular Numbers

Firstly, 2s compliment numbers are circular in some sense just by virtue of their limited number of bits.  For all of them, addition and multiplication work in the same way.  But whether a number is signed, unsigned, or circular depends on: the definition of bit extension when assigning to a larger number of bits, the definition of the comparators, and the definition of whether some arithmetic overflowed.  Beware of ignoring these distinctions!

Assume that events are stamped around a clock going clockwise.  Event a is defined to be less than b when a diameter (a,a') contains b in the arc clockwise to it.  In this graphic, the diameter(b,b') contains a only in the counterclockwise arc, so b is greater than a.  This is the same thing as saying that the distance between from a to b is positive (as a signed integer).

The comparison is only valid if the real distance (including the lost upper bits in the sequence number) between a and b is half of the total distance around the circle.  Also, circular numbers cannot really be extended (ie: assigned to a value with more bits) without  invalidating ordering relationships of the original values.

Because they can't really be sign extended (without changing their values), and have their own comparison operators, they really should have their own type.  (Don't get me started about network versus host ordered ints not being native!)
circular int8 sequenceNumber;
In the Linux kernel headers, jiffies.h explicitly states that it initializes the jiffies counter on bootup to be negative 5 minutes into the past so that the kernel will crash 5 minutes after startup if you treat jiffies like unsigned integers.

Circular numbers are in use when you see macros that redefine comparison like:  "((signed)b - (signed)a) > 0", where b and a are usually unsigned storage types.

Geometry of Unsigned and Unsigned Numbers

The point a is considered (signed) less than b if clockwise along the arc from the maximum negative number m, we first encounter a.  Signed values are extended by copying the higest bit (a sign bit) into the new bytes; ie: extend negative numbers with 1s and positives with 0s.

The point b is considered (unsigned) less than a if clockwise along the arc from z we first encounter b.  They are extended by padding with 0s.