Misapplied Math

Trading, Data Science, CS

Random From Afar but Far From Random

When Random is a Good Thing

The term "random" gets tossed around frequently when describing chaotic processes both physical and artificial. However, true randomness is rare, and a deep enough dive will usually tease out at least some latent structure. Brownian motion enjoys widespread use in academia as a model for "random" processes ranging from molecular diffusion to stock market returns. However, on a small enough scale the laws of quantum mechanics govern diffusion, and exchange microstructure governs the short term price evolution of capital markets.

Randomness plays an integral role in science, mathematics, and technology. High quality random numbers underpin everything from the cryptographic systems that secure e-commerce to Monte Carlo simulations that help us develop pharmaceuticals and predict the weather. Generating truly random numbers can't be done in software alone as code is inherently deterministic. In order to generate random numbers of cryptographic quality Linux and most operating systems rely on an entropy pool generated by stochastic processes such as disk access and thermal fluctuations. Generating entropy takes time, so instead of consuming from the pool directly the values generated are used to initialize a pseudo random number generator (PRNG) which can in turn produce longer streams of numbers efficiently. Starting with Ivy Bridge, Intel x86 processors provide an instruction to do this in hardware (the RDRAND opcode) but historically heavy users of random numbers relied on userspace code.

PRNG algorithms aim to generate values that appear chaotic and statistically independent. These numbers may exhibit all of the hallmarks of randomness, but the underlying process remains deterministic. Some generators are suitable for cryptographic applications (CSPRNG), and other faster algorithms such as Mersenne Twister are preferred for simulation. Regardless of the algorithm used, subtle flaws and deviations from true randomness may exist in any pseudo random number generator. The RANDU algorithm serves as an infamous example.

Random From Afar but Far From Random

As an early PRNG in the family of Linear Congruential Generators RANDU enjoyed widespread use; it's fast, easily implemented, and mathematically simplistic. Testing for randomness isn't an exact science and academic literature focuses largely on heuristics – there's no definitive method, and with the methods of the time RANDU appeared, well, random.

RANDU in two dimensions

In the image above, successive outputs of RANDU were chosen as (x, y) coordinates. Random, right? What happens if we instead choose successive outputs as (x, y, z) coordinates?

RANDU in three dimensions

Whoops – clearly not random. Values generated by RANDU fall into 15 two dimensional planes in a three dimensional space – bad news for Monte Carlo. Imagine a simulation generating a random mesh by choosing pseudo random x, y, and z coordinate in succession. RANDU won't even come close to equally representing all possible coordinates in 3 space. Hindsight being 20/20 the issue with RANDU was purely structural. A simple recurrence relationship exists in which the previous two values returned by the algorithm uniquely determine the next. As every value produced is a linear combination of the last two, numbers produced by RANDU "live" in a structured, three dimensional space. One notes that most PRNGs have this characteristic – just in much higher dimensions where the effects aren't as obvious or damaging to simulations of physical processes.

Myopic Views

While randomness is a fascinating topic in its own right the profound realization comes from the image above. Viewed in one or two dimensions the flaw in RANDU isn't obvious; structure only appears under the appropriate transformation to a higher dimensional space. System identification addresses the problem of estimating a state space model (which is itself an n dimensional space where n is the number of states) from noisy, transformed lower dimensional observations. So to what extent can seemingly chaotic data be viewed as a projection down from an orderly high dimensional space? Stay tuned for more on teasing high dimensional structure out of low dimensional data.

Comments