Using a sequentially increasing counter to generate an id token is easy. Database sequences and auto-number columns make it fairly trivial to implement. If that isn't available a simple file or shared memory counter can be implemented in minutes. Displaying such a number to a client however may give them more information than you would really like them to have about the number of ids you are allocating per unit time. We'd really like to obfuscate the id somehow while retaining the uniqueness of the original sequence.
One way to do this is to use a combination of multiplication and modulo arithmetic to map the sequence number into a constrained set. With careful choice of the multiplicative constant and the modulo value the resulting number can be made to wander rather effectively over the entire space of the target set.
The basic math looks like this: f(n) := (n * p) % q
n
:= input sequence valuep
:= step sizeq
:= maximum result size
p
and q
must be chosen such that:
p
<q
p
*q
< arithmetic limit (2^31, 2^32, 2^63, 2^64, ... depending on the precision of the underlying system)p
⊥q
(coprime or relatively prime)
With p := 5
and q := 12
our function will generate this output:
n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|
f(n) | 5 | 10 | 3 | 8 | 1 | 6 | 11 | 4 | 9 | 2 | 7 |
Change p
to 7 and you'll get:
n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|
f(n) | 7 | 2 | 9 | 4 | 11 | 6 | 1 | 8 | 3 | 10 | 5 |
The rational for keeping p * q < limit
is that as n
approaches q
the
initial multiplication will approach p * q
and if this calculation overflows
the available precision the result will wrap back into a previously traversed
space causing duplication. The same sort of thing will occur if p
and q
are not coprime. The result of the modulo will exhibit a period equivalent to
the GCD1 of p
and q
rather than mapping the entire range of q
evenly.
Careful choice of p
and q
are key to getting a good spread in the output
of the function and maintaining the uniqueness of the result. One easy way to
ensure that the chosen coefficients are coprime is to make them both be prime
powers of prime numbers (eg 9^17, 13^11, 13^15, 19^7, ...).
This method is a type of Linear congruential generator almost exactly equivalent to the Park–Miller random number generator.
Examples
PHP
<?php
/**
* Obfuscate an id generated from a linear sequence.
*
* @param int $n Input value
* @param int $p Random walk step size
* @param int $q Maximum result value
* @return int Obfuscated result
*/
function obfuscate_id ($n, $p, $q) {
return ($n * $p) % $q;
}
PL/SQL
FUNCTION obfuscate_id (n NUMBER, p NUMBER, q NUMBER) RETURN NUMBER IS
BEGIN
RETURN MOD(n * p, q);
END f;
Thanks to Tim for explaining all of this to me several times without becoming annoyed at the parts I wasn't getting.
-
Greatest Common Divisor ↩
Comments