Creating a Multi-Language, Serverless Chat Program - Part 1: Protocol Design
Written By: Nathan Baker
- 11 Aug 2006 -
Description: In order to explore how different languages approach GUI programming, network sockets, and object-oriented concepts, we will embark on a journey in creating a serverless chat protocol, and clients for it in many different languages. This is the first installment, where I describe the protocol and the project.
The Distributed Protocol
For our in-depth look, we will first examine the distributed protocol. Essentially, this is the part of the protocol used to make sure all the nodes are "on the same page"--i.e., that they all see the same number of users, and that chatting works seamlessly. This can be a little hard to get right if you're new to this area, so I'll try to explain things as I go. If you read over this and are still a little confused, don't worry. When you see some actual code it should make more sense.
Before I go into too much detail, let me give you another fantastic diagram to help illustrate what I'm going to be saying:
We have nodes A through G, each one connected to one other node. Nodes can be connected to multiple nodes (it doesn't matter which part of the node the line connects to), but there can be no loops. For example, node E cannot connect to node G, because node G is already connected to node C, so connecting to node G would create a loop.
Now, let's look at node A for a second. As far as it knows, all other nodes, B-G, are all living on node B. So if it wants to send a message, it sends a single message to B. Let's say it's a text message. At this point, B will send the text message to its client, but it still sees five other nodes, all living on node D. So it forwards the message to node D. Node D also displays the message, and forwards it to node C. Now, node C sees three other nodes, so it both displays the message (via the client), and then forwards the message to E, F, and G. This can all be done naïvely--since each node establishes exactly one connection (except the first node, which establishes none), there are no cycles. This means that most of the complexity of routing is removed. There should always be exactly one route between any two nodes, so all that is needed is to know which of the directly connected nodes each message should go out to.
Bootstrapping/Joins
So, how do we get this whole thingy set up? One pitfall when designing something like this is to ignore 'bootstrapping', which is a strange computer-word for 'setting up a system'. Robust distributed protocols use something called peer discovery, which involves one (or more) of several methods for finding out where nodes are and then connecting to them. This is just an undesirable bit of complexity, so we will say that the user must know the IP address and port of at least one node in order to connect. Thus, each node must be prepared to accept an incoming connection. When that connection is received, it will send a notification out on every other connection (for example, if a new node H connected to node D, node D would inform nodes B and C of this new user). When such a notification is received, it will be forwarded out on all other connections. Each server will also add the new user to its user list. This method is very simple, but effective. The primary problem, as you have no doubt already thought of, is the question of what happens when two users of the same name join at the same time but at opposite ends of the graph? We must refuse to let this happen. So when a join notification arrives for a user with a name that we already have in our list, we will send a message that says "no, I cannot accept this new user". In that case, the join will be rolled back along the same route it originally took. This will also entail sending out messages to nodes that were previously notified of the join. In order to make this less confusing (and reduce the possibility of a state screw-up (that's a technical term)), each join message will also have a join number with it. This number can be referred to in order to ensure that the correct user is being removed in a roll-back. This is a guard against border cases--usually a DDOS attack instead of something that might be encountered in legitimate operation--but also makes the operation of the protocol a bit easier to reason about.
Are you confused yet? If not, then good: that was the most complicated part. If so, then sorry. Re-read that ridiculously long paragraph, ignoring everything after the sentence that starts "The primary problem". That's a relatively minor detail that we will cover adequately in code later. Meanwhile, learning how node joining works is actually a bit important, so make sure you at least have that down. Also note that this works even for the case where there's just one node and a second one wants to join. However, it does not address the case of unification, where two networks join together (for example, in the above picture, if node D joined node C after both had already accrued their other nodes). It is possible to implement this explicitly, where a node says "I am joining, and so are my friends here". However, I don't want to clutter the protocol with something this obscure, so instead I'll just say that in this case, node D must first join, and then it must pretend like nodes A and B just joined up, and go through the normal protocol for this join. What happens if there is a name clash is up to the implementer, but the two obvious options are either to disconnect the offending node or to abort the join operation entirely. It would also be possible to leave the offending node connected, unaware of the fact that the two networks have now merged.
Other messages
Parts are handled the same as join, with the added benefit that now the issue of name clashes does not have to be handled. There is the obvious question of security here, which will be addressed next. But that's it for parts. Actually, that's it for most protocol messages, since every message is propagated this same way, except one, which is private messages.
Private messages are a bit of a problem, since if we allow intervening nodes to relay them then they aren't really private, and someone using an altered server could easily snoop on them. However, there is no other way for one node to reach another non-directly-connected node than through the intermediate nodes. So we compromise: if I want to send you a private message, I will send you a message telling you so. If you trust me and want to receive a message from me, you can choose to accept it. I will then directly connect to you, and we can exchange private messages through this direct link. Thus, all the intermediate nodes know is that we are talking privately, but they don't know about what. It is also possible to send private messages through normal channels, however, for simplicity's sake, if you're talking about something you don't mind others listening in on.
Security
But what about security? Specifically, distributed protocols need to worry about attackers. Or, rather, they should. Unfortunately, security adds an extra layer of complexity. I certainly don't want to give the opinion that security isn't important--because it is--but in order to keep this as simple as possible, I will ignore all security concerns except low-hanging fruit that I can address with little or no extra effort. In short, however, a robust distributed protocol needs to deal with attackers masquerading as legitimate nodes for the purpose of disrupting the system. There are lots of little things that malicious nodes could do to subvert the protocol--not reporting joins, for instance, or not relaying messages. Or it could function perfectly well until it is connected to a large number of nodes and then go silent, effectively partitioning the space into two. Or it could just send out spurious part notifications so that legitimately-connected nodes would be removed from the active list. Defending against these attacks would require an order of magnitude more complexity to do correctly, so we're not going to do them. Sorry.