Monday, February 5, 2007

Requirements MUST be strict

To further echo some issues I brought up earlier, I think we need to start rejecting proposed standards that include grammars that are more complicated than required. For example, the SIP RFC is littered with examples like this:

  from-spec   =  ( name-addr / addr-spec )
               *( SEMI from-param )
  from-param  =  tag-param / generic-param
  tag-param   =  "tag" EQUAL token
 
For this example, just pay attention to the *( SEMI from-param ) and its fallout. This is saying that "there can be zero or more "SEMI" followed by a from-param, where an example of a "from-param" is "tag" EQUAL token.

When I first encountered this about 3 years ago I was annoyed, but it has recently become more annoying the more I think about it. The problem is in the definitions of "SEMI" and "EQUAL". They are:

   SEMI    =  SWS ";" SWS ; semicolon
   EQUAL   =  SWS "=" SWS ; equal
 
And "SWS" is:
   LWS  =  [*WSP CRLF] 1*WSP ; linear whitespace
   SWS  =  [LWS] ; sep whitespace
 
So, something as simple as:
  From: sip:me@example.com;tag=12345
 
can be written as:
  From: sip:me@example.com
    ; 
    tag
    =
    12345
 
and are expected to be equivalent.

Am I insane to think that this flexibility in the input is insane and unreasonable? Now, it is not really that hard to parse those to be equivalent, but my question is this. Why make a standard that says that either form is acceptable? Certainly proponents of text-based protocols (and I am not one of them) would argue that it allows you to have better readability in some cases, but I think this type of flexibility is a little too much.

Sunday, January 28, 2007

Time, How I Loathe Thee

Grace Hopper once said that "The wonderful thing about standards is that there are so many of them to choose from.". How true it is for time and dates as well. To name just a few off the top of my head:

  • RFC 822/2822
  • RFC 3339
  • ISO 8601
  • UTC
  • TAI
  • Unix Time
  • Java Time
And there are lots more.

I know that is it too much to ask that we all agree on how we want the time displayed on our screen, or how it is to be printed on a newspaper. But surely we, as a computing community, can come to agreement on a couple things about data transmission and parsing:

  1. If it is machine readable, the machine can format it nicely for a human
  2. If it is machine readable, it should be easily machine readable. I don't want to have to build a compiler to parse a date/time.
  3. No redundancy. If it tells me the month is "Oct" but it lists it as "9", which one is right?
  4. Numbers only please, not localization issues
  5. No timezones. The machine doing the local formatting can also do the offset to the local time.
  6. Ability to encode sub-seconds
The best I've seen for this is the RFC 3339 spec. However, it too has its flaws. Within the ABNF, it is setting us up for the Y10k problem ;). The 'T' and 'Z' characters "may" be in lower case. Why? You are also allowed to -- "for the sake of readability" -- use a space instead of the 'T' as a separator. Why?

If you want to have optional parts of the format, fine. I'm open, for example, to using "2007-01-27" to represent today without having to specify a time. However, I think it is stupid to author a new standard that has the aforementioned flaws. The added complexity to any parser needed is not that great, but it is entirely unnecessary.

 

Wednesday, October 18, 2006

More RFID Passport Problems

A great post on the ACM Risks forum from September: http://catless.ncl.ac.uk/Risks/24.42.html#subj3.1 makes a great point about security of passports before you even get it:

Despite there being nothing blatantly obvious on the envelope to identify it as a passport the delivery driver knew that it was a passport. If this is the case then it seems to me that it would be fairly straightforward for a courier using a standard RFID reader to scan each passport, in its envelope, as he or she delivers it and hand the details on to an accomplice at some later time.
And, yes its encrypted. But so are DVDs. That was "state of the art" at the time as well.

Sunday, October 15, 2006

The Problem with RFID Passports

Ever since I did my undergrad honours thesis in cyptography, I have had an interest in network security. This interest has widend to most things to do with security. The latest interest is on RFID passports.

In the latest release of Bruce Schneier's Crypto-Gram, he urges us all to renew our passports to ensure we get a version without an RFID. This led me to think about my earier post to his earlier post on the subject. About a year ago, here it is:

I've been following this topic for a long time now, especially since the Canadian Government was thinking of putting bio-metric information into passports. It doesn't belong there... _maybe_ somewhere else. I'll explain.

I think there is a fundamental flaw in putting the "same information" on the RFID chip that is printed on the page. That serves no purpose; certainly it would foil the more naive person who wishes to produce a fake passport, but certainly anyone with any amount of intelligence is going to produce a fake with fake information on the RFID as well as in print. It is folly to believe otherwise. Any kind of biometric information would be the same... If I'm going to create a fake and the validation of the passport is to compare what's on the passport to the person standing there, I'm going to put my own biometric information on the RFID as well. You really need a method of authenticating that the passport is REAL and VALID.

This brings me to my point. If the US State Department is now requiring that the passport be placed into a reader to get the encryption key that will decrypt the information in the RFID chip that will most certainly be identical to that which is printed on the page, doesn't that defeat the whole idea of the RFID in the first place? Wasn't it the intent to have a reader system that didn't have to come in contact with the passport? Doesn't this requirement make that impossible?

And, since that is impossible, why not try to implement a system that actually works? A system that will take into account that passports can be revoked, and that fakes are going to be really good. Wouldn't it be better to have a system that read some sort of serial number off the passport (this already exists) and queries a US State Department database of passports (which already exists)... then the information that the US State Department has on file as being associated with that passport number would then pop up on the screen of the customs official (or whomever else with proper access) and the information can be verified by looking at it. An automated system could do things like make sure all the text is correct... and the official could look at the two pictures, and look at the person, and see if they match... they could even... if they wanted... send an "update" to the picture and you could have not only a copy of the photo that is on the passport, but you could watch someone grow a beard in extreme slow motion :)

Why is this better? It means that only people with authenticated access to the US State Department system can get information about you from your passport automatically... everyone else would be limited to the information that is printed on the page. That's not that good for anyone. The only thing that might happen is that someone could create a copy of your passport and they try to make themselves look like you... they wouldn't be able to change the picture as that is validated in real-time with what is on file.

I think in this case the RFID is a bit of technology that is being applied where it shouldn't. It doesn't belong here... there are other ways that are less prone to problems that could solve the problem.
Further on, I say:
After reading some more of this, I think there is a fundamental misunderstanding of passports that is missing. Passports were brougt into being, partially, because up until quite recently we didn't have any sort of infrastructure to do real-time authentication of a person at the border. We can do that; we have the technology. The idea of putting my picture, biometric ID, visas, revocations, and a list of previously visited countries on or in my passport is very unsettling. This information shouldn't be anywhere in my control.

We have the techology to build the infrastructure that will allow us to do better than a system designed wholly on the fact that "fakes" are hard to make. They aren't hard to make any more, and any system that relies only on information presented by the authenticatee will be much more open to intrusion.

I'm not saying that such a system is easy to make. Certainly not. However, it is a better direction.

Others then critisized this idea with arguments that the idea of having all this information on your passport was "Less unsettling than the idea of putting them in a central database!". To think that such a database does not already exist, or being deployed, is naive.

I don't have enough information to solve this problem, but I know that the current implementation is bad. There is no good reason to have the information that is printed on the passport be on the RFID. There is no good reason to have an RFID (especially with the "contact" requirements). There is no good reason to store any information electronically on your passport. Passports are a means by which people can verify your identity. I have used them in many places, and never has it been a problem for the information to be read by a person. Any enhacement to the system can only be made by allowing for the information to be validated in real-time to allow for the detection of fake, lost, stolen, or revoked credentials.

Wednesday, August 23, 2006

SIP = Simply Insane Protocol

Yet another of the subtle and not-very-thought-out details of SIP has shown its ugliness to me again today.

REGISTER

Great concept, extremely poor execution.

When you send a REGISTER request (which, for the uninitiated maintains a binding for where you can be reached), you have to always send it with the same CallID and an monotonically increasing CSeq (sequence) number. Sounds simple. Except it kinda breaks a whole bunch of things about SIP.

You aren't supposed to use the same CallID unless you are inside a dialogue... unless it is a register. This basically means the protocol is asymetric and any implementation must have special rules for special messages. In this case, we need a special rule that means that we can supply our own CallID and CSeq for a REGISTER message, but no other messages. The implication is that any correctness checking before sending off a transaction has to have a special case for REGISTER.

Why do I care?

SIP isn't just some hobby protocol anymore. It is quickly on its way to replacing inter-carrier trunks and is basically the defacto new standard for phone calls. How can we expect the same level of reliability from the network if the protocols are insanely complex? We are now on the second version of SIP. The RFC has 269 pages, there are 26 other related RFCs and some 30 drafts for new RFCs (and counting). There has to be some consolodation on this, otherwise it will be meaningless to say that you are "SIP Compliant" or have "SIP Interoperability" etc.

A Better Way

REGISTER messages have to be re-sent every so often to maintain the bindings in the registrar. This is a reasonable method of determining if the client is still there, or if it crashed, or the network went away. When you want to remove the bindings (you close the client), you send a request to expire all of them. This sounds like a session to me.

Why wasn't the registration concept implemented similar to the INVITE? INVITE transactions expire after a while unless you refresh them (basically a ping/ack mechanism inside the dialogue to keep the signaling alive, and detect dead calls). Couldn't REGISTER have been modeled the same way? Wouldn't having REGISTER be a dialogue and using the same type of refresh model have made it easier? For that matter, couldn't you really have accomplished the same thing by sending an INVITE, but instead of having a session description in it to establish a call, or a conference, or a text chat, etc., you indicated that it was a registration?

Mirror, Mirror

When designing any protocol, the goal should be orthogonality. Special rules should be a red-flag that you are doing something wrong. Go back, re-think, and hopefully come up with a better design. Sometimes special rules have to exist, but they should be avoided.

SIP is in no way orthogonal.

Thursday, August 17, 2006

RFC Speak

So, at the start of most modern RFCs there is a section called "Terminology" that refers back to RFC 2119. In there, there are definitions of "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY" and "OPTIONAL". I have to read a LOT of RFCs as I build the SIP/RTP stuff for the mothership, and these always anger me.

To simplify things a little bit, here's the equivalence list:

  • "MUST", "SHALL" and "REQUIRED" are all the same
  • "MUST NOT" and "SHALL NOT" are the same
  • "SHOULD" and "RECOMMENDED" are the same
  • "SHOULD NOT" and "NOT RECOMMENDED" are the same
  • "MAY" and "OPTIONAL" are the same
That part is pretty obvious.

I have basically come to the conclusion that "SHOULD", "RECOMMENDED", "SHOULD NOT", "NOT RECOMMENDED", "MAY" and "OPTIONAL" should be removed from the list of available terms that people can use when writing an RFC.

Why?

RFCs have basically become requirements specifications for protocols, extensions, etc. They are not really "Requests For Comments" anymore, as the RFC Draft process has replaced that. Once something has become an RFC it has been vetted by peers in the field and has likely had at least some experimental (or otherwise) implementation. If we are specifying the behavior of a system we can't depend on any of the optional bits, nor can we expect anything we are integrating with to pay attention to them. Any portion of an RFC that is specified as a "SHOULD" should perhaps be an extension. The specification for that extension could then use "MUST" to more clearly specify its requirements.

As a simple example, take the interpretation of the "Expires" header in a REGISTER request from the SIP rfc:

Implementations MAY treat values larger than 232-1 (4294967295 seconds or 136 years) as equivalent to 232-1. Malformed values SHOULD be treated as equivalent to 3600.
This is what I would LOVE to have seen:
Values larger and 232-1 and malformed values MUST result in a 400 (Invalid Request)
or, "if you send garbage, I'm not going to process it". Why should every implementation out there have to "fix" things like this? This type of loose specification only serves to promote laziness, bad code, and unpredictable behavior.

Monday, August 7, 2006

Cheap, Fast, Good. Pick Any Two

Whenever you are doing anything, looking for a product or a service,  this basic axiom holds.  You can have it cheap and fast, but it won't be good (ex. McDonald's).  You can have it cheap and good, but it won't be fast (ex. making a gourmet meal at home).  And you can have it good and fast, but it won't be cheap (going to a gormet restauraunt).

The arguments don't work in reverse.  Similarly, you can have one of the three. There are lots of examples of things that are expensive, slow and good, expensive slow and bad, cheap slow and bad, etc.  But I have never seen an example of something that is truely cheap, truly fast, and truly good.

Sometimes we might be fooled into thinking that we have found something that fits all three constraints, but there is something we have missed.  Perhaps we have not accounted for some hidden cost and have violated the cheap constraint.  Perhaps we have missed a step or a detail and accounting for that violates the fast constraint.  Or perhaps we are fooled into thinking something is good and it ends up not being any good.

The main idea here is that there is no silver bullet.  There is no perpetual motion machine.  There is no money tree.  This is not a negative outlook on the world, but it is one that accepts the realities of it.  There are laws of physics.  There are rules of economies.  There are costs of innovation.