Feb 13

Bencoding: WTF?

Posted by Colin

I’ve started writing a BitTorrent client needed for another project I have going (I looked at the Python client but it looked to be enough of a pain to integrate, and I can’t use a GPL licensed client).

The protocol itself doesn’t look too hard to implement. The one thing that has been bothering me though is BitTorrent’s odd use of a serialization format called Bencoding (a quick Google search pulls up a few academic papers on Bencoding that don’t seem to have much to do with BitTorrent’s Bencoding, besides that the rest of the results all have to do with BitTorrent crashes due to bad Bencding). Bencoding’s role in life seems to do about the same thing as XML (hint, hint), take a big dictionary of data and serialize it into a text file.

Bencoding did a good job at driving me up a wall last night. Here is how the BitTorrent docs define Bencoding:

Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to ’spam’.
Integers are represented by an ‘i’ followed by the number in base 10 followed by an ‘e’. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0.
Lists are encoded as an ‘l’ followed by their elements (also bencoded) followed by an ‘e’. For example l4:spam4:eggse corresponds to ['spam', 'eggs'].
Dictionaries are encoded as a ‘d’ followed by a list of alternating keys and their corresponding values followed by an ‘e’. For example, d3:cow3:moo4:spam4:eggse corresponds to {’cow’: ‘moo’, ’spam’: ‘eggs’} and d4:spaml1:a1:bee corresponds to {’spam’: ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

Seems pretty simple, right? It gets a little more complicated when you start having dictionaries within dictionaries. Notice that all types are prefixed by their length except for one, dictionaries. This makes harder to just call a recursive method because once the recursion finishes you have to skip past the dictionary you just parsed. But how do you know where to skip to? You don’t know how long the dictionary you just parsed was.

No problem! We’ll just scan for the terminating character of the dictionary and begin parsing again on the character after. Oh wait:

Dictionaries are encoded as a ‘d’ followed by a list of alternating keys and their corresponding values followed by an ‘e’.

That’s right! The terminating character for a dictionary is e. Not null. The letter e. And of course when you’re dealing with a dictionary which contains a bunch of hashes, and chances are very very good one of those hashes contains an e. So in the end I just had to have an integer pointer that is passed as part of the recursion containing the current index the parser is at. I’ve now good bencoded dictionaries transposing into NSDictionaries.

The documentation (http://www.bittorrent.com/protocol.html) is kind of… odd too. One of my favorite lines:

“Data transfer takes place whenever one side is interested and the other side is not choking. Interest state must be kept up to date at all times - whenever a downloader doesn’t have something they currently would ask a peer for in unchoked, they must express lack of interest, despite being choked. Implementing this properly is tricky, but makes it possible for downloaders to know which peers will start downloading immediately if unchoked.”

I think what the documentation means is this:

“If your client is currently choked and receives a request for a piece it does not currently have, it must reply to the request with a not interested, despite being choked. This lets the downloader that is requesting the piece know that you’re not going to send the piece simply because you are choked. Instead, it means that you don’t have the piece at all. This lets clients figure out which choked servers simply don’t have the piece, and which choked servers do have the piece and can supply it once they become unchoked.”

BitTorrent really needs better documentation.

Feb 9

What is White Magic Labs?

Posted by Colin

In early 2004 a partnership was forged between Carpe Stellarem and The Triage Group. The purpose of this partnership was to ship ProToys. Carpe Stellarem was no longer interested in working on it, as they worked too closely with Apple and it didn’t fit the company theme very well. The Triage Group agreed to work on the project, confident that they could build something better than what the competition had.

The name “White Magic Labs” is derived from several things. First off, bad hackers are usually referred to as “Black Hats”, while good ones are referred to as “White Hats” (and we like to think of ourselves as White Hats.) This is where you would make the leap to white magic vs. black magic. Calling ourselves a lab seemed to make a lot of sense. We play with a lot of concepts, and generally play with code in general. Just of a lot of experimenting. Hence, the name “White Magic Labs” was born.

The wonderful Cabbage Design has agreed to do our logo for us. As we get closer to launch I’m sure we’ll see something come of that. (While I’m at it, I’ll plug ifyouloveitsomuch.com. Isn’t that cool?).

I’m stuck in Seattle right now but usually I’m running between Seattle and Cupertino for perfectly innocent reasons, and I know we’ve picked up a few readers/coders from Cupertino/Santa Clara with the DJ Puzzles podcast, so give Bri a shout if you’d ever like to hang out (not that I won’t hang out with Seattle people, I’m headed to a dBug meeting tonight.) I’m always eager to talk to people and show off all my concept code. Don’t be shy now.