Getting tokens for verification

Recently I’ve been working on implementing SSL authentication in Okupy (as you can see from the previous post). The specifics of chosen solution required the authentication to occur on a separate virtual host. Due to specifics of django, it was impossible to directly access the initial session from the dedicated vhost.

I had two possibilities. The supposedly simpler one involved passing the session identifier to the dedicated vhost so that it would be able to access the session information and store the authentication result there. But it involved hacking a fair bit of django (since the new versions no longer give access to the session identifier directly), starting the session early (it needn’t be started until the user is actually authenticated) and could introduce security issues.

The other involved passing the authentication results outside of the session framework and using dedicated tokens to claim the results. Those tokens have similar requirements to the tokens used e.g. for e-mail address verification.

First of all, the tokens must be guaranteed to be unique. Otherwise, there would be a finite probability that two users will be given same token. In case of e-mail address verification, this would mean that one user could confirm the other user’s (possibly invalid) e-mail address. In case of SSL authentication, one user would be able to claim other user’s login.

Secondly, the tokens need to be semi-random and hard to guess. The user, being able to obtain multiple valid authentication tokens in sequence (e.g. through requesting multiple valid e-mail account verifications), must not be able to predict the value of the token for another (possibly invalid) address. At least with reasonable resources.

Plain random tokens

Likely the simplest and the most common way of generating tokens is through using random strings. As you may guess, this is not the best solution there could be and its security depends on the quality of random. That is, with a poor random generator and low load the attacker could supposedly obtain a sequence of random tokens that he could use to guess further tokens.

The most important issue, however, is that pure random tokens usually don’t guarantee uniqueness of the token (well, the idea of being random involves that they randomly repeat). For this reason, it’s either necessary to pair it with a unique token or to enforce uniqueness explicitly.

The former solution is quite simple but may look a bit unprofessional. You send the e-mail address or the unique database identifier (that is sequential) along with the verification token and you’re fine. Yet you have to send both.

The latter solution is used more commonly though it’s rather poor man’s solution. It requires that checking the token for uniqueness and adding it is atomic. Otherwise, duplicate tokens could be added as a result of race condition. This could be done through adding uniqueness constraint to the database and retrying adding random tokens till one satisfies the constraint. On the cons, this makes the number of loop iterations indeterminate.

Both solutions involve using storing generated tokens and e-mail addresses in a backing store.

Signed identifiers

This solution is somehow similar to «1a» above, after removing the random part. The token is made through pairing the unique identifier (the e-mail address or sequential database identifier) with a signature made using a secret key. The authenticity of the token is checked through verifying the signature.

The major advantage of this solution is that it utilizes built-in django mechanisms (signatures) with minimal overhead. There’s no need to generate random identifiers and only the e-mail address needs to be stored in the database.

The disadvantage of this solution is that the resulting tokens can supposedly be used to compromise the secret key.

Encrypted identifiers

The smartest solution so far I’ve found on stackoverflow. It uses the secret key to encrypt the unique identifiers and obtain semi-random and unique tokens.

Since the unique identifiers themselves may be quite predictable, the security of this solution can be enhanced through placing additional random data inside the blocks. This data should successfully prevent the attacker from using the tokens to obtain the secret key.

Disadvantages? A fair amount of additional logic. The simple solution doesn’t really verify the decrypted identifiers. That is, when an invalid token is passed, the code tries to access an invalid identifier. In our case, it often causes OverflowErrors due to the identifiers being outside the numeric range supported by the database.

To avoid the fore-mentioned issue and increase both the security and the performance a bit, the encrypted identifier may be stored in the database as well. This allows the system to check the identifiers without even needing to decrypt them and refuse invalid encrypted identifiers without trying their decrypted identifier.