compression

Reducing data size by eliminating redundancy. Encoding information in fewer bits than the original representation.

lossless

Every bit of original data is perfectly recoverable.

Huffman coding: variable-length codes based on symbol frequency
Lempel-Ziv (LZ77, LZ78, LZW): dictionary-based, replacing repeated patterns with references. Used in gzip, PNG
arithmetic coding: encoding entire messages as single numbers in [0,1)
run-length encoding: replacing consecutive identical symbols with count + symbol

lossy

Discards information deemed less perceptible. Smaller files, irreversible.

JPEG: discrete cosine transform, quantization of high-frequency components in images
MP3/AAC: psychoacoustic models, discarding sounds humans perceive poorly
H.264/H.265: video compression, motion estimation, inter-frame prediction

theoretical limit

Shannon entropy defines the minimum average bits per symbol for lossless compression. No algorithm can compress below this bound. Information content is incompressible.

applications

storage: fitting more data on disk, databases use compression internally
transmission: faster transfer over networks, reduced bandwidth cost
IPFS: content-addressed storage benefits from deduplication, a form of compression
cyber: compressed representations in the knowledge graph

connections

Rooted in information theory and Shannon entropy. encryption and compression interact: encrypted data is incompressible (maximum entropy). algorithms for compression are studied in complexity theory.

Cyber

Explorer

compression

compression

lossless

lossy

theoretical limit

applications

connections

Graph View

Table of Contents

Backlinks