In cryptography, entropy refers to the randomness collected by a system for use in algorithms that require random data. The higher the entropy, the more unpredictable the data, which is crucial for cryptographic security.
How to Compute Entropy
Entropy is mathematically defined using Claude Shannon's information theory formula:
\( H = -\sum_{i} P(x_i) \log_2 P(x_i) \)
Where:
\(H\) is the entropy measured in bits
\(P(x_i)\) is the probability of character \(x_i\) appearing in the data stream
\(\sum\) represents the sum over all possible characters in the data
Entropy of Normal Text vs Encrypted Text
Normal text has predictable patterns and character distributions:
English text typically has an entropy of only ~1.5-2 bits per character
Certain letters (e, t, a) appear frequently while others (z, q, x) are rare
Digrams like "th" and "er" are common, creating predictable patterns
Encrypted text aims to maximize entropy:
Well-encrypted text approaches the theoretical maximum entropy (~8 bits per character for 8-bit ASCII)
Character distribution becomes uniform (all characters appear with roughly equal probability)
No discernible patterns or statistical biases remain
Examples of Entropy Calculation
Example 1: Consider the string "aaaaa" (5 identical characters)
Probability of 'a' = \(5/5 = 1\)
Entropy = \(-1 \times \log_2(1) = 0\) bits
This string has zero entropy because it's completely predictable.
Example 2: Consider the string "abcde" (5 unique characters)
The encrypted text has higher entropy (closer to the maximum possible entropy for this length) because:
It has a more uniform distribution (each character appears exactly once)
It uses a wider range of characters (digits, symbols, and letters)
It lacks the predictable patterns found in natural language
Min-Entropy: A More Conservative Measure
While Shannon entropy measures average unpredictability, min-entropy focuses on the worst-case scenario—the most predictable outcome:
\( H_{\infty} = -\log_2(\max_{i} P(x_i)) \)
Min-entropy is always less than or equal to Shannon entropy and provides a more conservative estimate of randomness. This makes it particularly valuable in cryptographic applications where we must assume adversaries exploit the most probable outcomes.
Pseudo-Random Number Generators (PRNGs)
Since computers are deterministic machines, generating true randomness is challenging. Instead, we use Pseudo-Random Number Generators (PRNGs), which are algorithms that produce sequences of numbers that approximate the properties of random numbers.
PRNGs use a seed value and mathematical formulas to generate sequences that appear random. However, these sequences are deterministic—given the same seed, a PRNG will always produce the same sequence.
Examples of PRNGs
Cryptographically secure PRNGs:
HMAC-DRBG: Uses HMAC with a cryptographic hash function
ChaCha20: Stream cipher repurposed as a PRNG in many systems
/dev/urandom: Linux kernel's CSPRNG that pools entropy from various system events
CryptGenRandom: Windows cryptographic random number generator
Insecure PRNGs and Their Weaknesses
Linear Congruential Generators (LCGs):
Used in functions like rand() in many programming languages, LCGs follow the formula:
\( X_{n+1} = (aX_n + c) \bmod m \)
Weaknesses:
Predictable patterns: Observing a few outputs allows predicting all future outputs
Short periods: The sequence repeats after a certain number of iterations
Lower bits have much shorter periods and very low entropy
RANDU: A notorious LCG used on IBM mainframes in the 1960s:
\( X_{n+1} = 65539 \cdot X_n \bmod 2^{31} \)
It exhibited strong correlations between consecutive triplets of numbers, creating a distinct lattice pattern when plotted in 3D space.
Mersenne Twister:
While mathematically sophisticated and suitable for simulations, it's not cryptographically secure because:
Its internal state can be recovered after observing 624 consecutive outputs
Once the state is known, all future outputs can be predicted
Attacks on PRNGs
State recovery attacks:
Example: In 2008, hackers exploited predictable PRNG in the Debian Linux distribution's OpenSSL package. A coding error reduced the entropy used for key generation to just 15 bits, allowing attackers to enumerate all possible SSH and SSL keys.
Impact: Thousands of SSL certificates and SSH keys had to be regenerated worldwide.
Seed prediction attacks:
Example: In 2012, researchers demonstrated that the PRNG in PHP (versions before 5.3.4) could be exploited to predict the entire sequence of "random" numbers, compromising session IDs and CSRF tokens.
Attack method: By observing the timestamp-based seed and a few outputs, attackers could reconstruct the internal state.
Side-channel attacks:
Example: The Dual EC DRBG algorithm (proposed by NIST) contained a suspected backdoor. The algorithm relied on parameters that could potentially allow anyone knowing a secret value to predict all future random numbers.
Real-world impact: After the Snowden revelations, it was widely believed this algorithm was intentionally weakened to allow intelligence agencies to predict "random" values.
Insufficient entropy during boot:
Example: IoT and embedded devices often generate keys immediately after booting when entropy pools are nearly empty.
Attack consequence: In 2015, researchers found thousands of IoT devices using identical "randomly generated" keys, making them vulnerable to impersonation attacks.
The Challenge of True Randomness
Achieving true randomness in computing systems is difficult because:
Computers are designed to be deterministic and predictable
Software-generated randomness is inherently algorithmic
Even when using physical phenomena as entropy sources (like timing between keystrokes, mouse movements, or atmospheric noise), collecting and processing this data introduces biases
Hardware Random Number Generators (HRNGs)
To address the limitations of algorithmic randomness, cryptographic systems often incorporate hardware-based entropy sources:
Electronic noise-based generators:
Thermal noise: Random electron movement in resistors due to heat produces voltage fluctuations that can be measured and digitized
Shot noise: Unpredictable variations in electrical current due to quantum effects
Avalanche noise: Random breakdown effects in reverse-biased semiconductor junctions
Intel's RDRAND instruction:
Modern Intel processors include a built-in hardware random number generator that:
Uses thermal noise from transistor operation
Implements a conditioning algorithm to remove bias
Provides cryptographic-quality random numbers directly through a CPU instruction
Passes extensive statistical testing for randomness quality
Quantum Random Number Generators (QRNGs)
Quantum mechanics offers the ultimate source of randomness through inherently probabilistic quantum phenomena:
Photonic QRNGs:
Exploit the quantum behavior of light photons
Use semi-transparent mirrors (beam splitters) where each photon has exactly 50% probability of being reflected or transmitted
Detection of the photon's path generates truly random bits
Commercial implementations achieve rates of hundreds of Mbits/second
Quantum vacuum fluctuations:
Measure electromagnetic field fluctuations in a vacuum
Based on Heisenberg's uncertainty principle
Provides theoretically perfect randomness
These quantum sources are provably unpredictable even to an adversary with unlimited computational power, providing the gold standard for randomness in cryptographic applications.
Entropy Pools and Operating System Management
Modern operating systems maintain an entropy pool—a collection of random bits gathered from various sources:
Linux's /dev/random and /dev/urandom:
Kernel maintains an entropy pool fed by hardware events (keyboard timing, mouse movements, disk I/O timing, etc.)
/dev/random: Blocks when entropy is depleted until more is gathered (used for critical cryptographic operations)
/dev/urandom: Never blocks; uses a CSPRNG seeded from the entropy pool
Linux 5.6+ implements a hybrid approach with the CRNG (Cryptographic Random Number Generator)
Entropy estimation challenges:
Accurately measuring how much true randomness exists in input sources is difficult
Overestimation risks using predictable values for security operations
Different OS kernels use different estimation methods, leading to varying security properties
Why Randomness Matters in Cryptography
Strong randomness is essential in cryptography for several reasons:
Key generation: Cryptographic keys must be unpredictable to prevent attackers from guessing them
Initialization vectors: Many encryption algorithms require random values to ensure the same plaintext encrypts to different ciphertexts
Nonces: One-time values used in many protocols to prevent replay attacks
Session tokens: Random values that secure communications between parties
If randomness is weak or predictable, the entire cryptographic system may be compromised, regardless of how mathematically sound the algorithms are.
Measuring and Testing Entropy
Cryptographers use various statistical tests to evaluate the quality of random data:
Frequency analysis: Testing if all values appear with equal probability
Run tests: Checking for patterns in sequences of similar values
NIST Statistical Test Suite: A comprehensive collection of tests for evaluating randomness
Entropy estimation tools: Software like ent, dieharder, and TestU01 for measuring entropy quality
Improving Randomness in Cryptosystems
Cryptographically secure PRNGs use entropy pools fed by hardware random number generators (like thermal noise, radioactive decay, or quantum phenomena) to seed their algorithms. Modern systems typically combine multiple entropy sources to maximize unpredictability.
Best Practices for Developers
When implementing cryptographic systems that require randomness, developers should follow these guidelines:
DO:
Use cryptographically secure random number generators provided by your language or platform:
Generate sufficiently long random values for security-critical applications (typically 256 bits or more for keys)
Use entropy services like random.org as supplementary sources, not sole sources
Consider using specialized hardware when available (TPMs, HSMs, or dedicated random number generators)
DON'T:
Implement your own random number generator without deep cryptographic expertise
Use time-based seeds alone (like time() or currentTimeMillis())
Use standard PRNGs (rand(), Math.random()) for cryptographic purposes
Generate cryptographic keys immediately after system startup when entropy may be low
Reuse random values across different contexts or sessions
The Future of Entropy: Post-Quantum Considerations
As quantum computing advances, requirements for entropy in cryptographic systems are evolving:
Post-quantum cryptographic algorithms often require significantly more random bits than traditional algorithms
Some quantum-resistant algorithms are highly sensitive to biases in random number generation
Quantum random number generators may become standard components in high-security systems
Research continues on entropy extraction methods that remain secure even against quantum adversaries
The quest for perfect randomness remains one of the most fundamental challenges in cryptography, and its importance will only increase as attacks become more sophisticated and computing power grows.