Notes From The Comp.Compression FAQ
This page is derived from the notes on patents in the
comp.compression FAQ document (as it was at 21-Mar-1996).
Run Length Encoding Patents
Tsukiyama has two patents on run length encoding:
US Patent 4586027 and US Patent 4872009. These were
granted in 1986 and 1989 respectively. The first one covers run length
encoding in its most primitive form: a length byte followed by the
repeated byte. The second patent covers limiting the
run length to 16 bytes and thus the encoding of the length on 4 bits.
O'Brien has patented run length encoding followed by LZ77
(US Patent 4988998).
LZ77 Patents
Waterworth patented a LZ77 variant (US Patent 4701745).
This algorithm is generally referred to as as LZRW1,
because Ross Williams reinvented it later and posted it on
comp.compression on April 22, 1991.
The same algorithm has later been
patented by Gibson & Graybill (US Patent 5049881).
The patent office failed
to recognize that the same algorithm was patented twice, even though
the wording used in the two patents is very similar.
The Waterworth patent is now owned by Stac, which won a lawsuit
against Microsoft, concerning the compression feature of MS-DOS 6.0.
Damages awarded were $120 million. (Microsoft and Stac later
settled out of court.)
Fiala and Greene obtained in 1990 a patent
(US Patent 4906991) on all
implementations of LZ77 using a tree data structure. Claim 1 of the
patent is much broader than the algorithms published by Fiala and
Greene in Comm. ACM, April 1989. The patent covers the algorithm
published by Rodeh and Pratt in 1981 (J. of the ACM, vol 28, no 1,
pp 16-24). It also covers the algorithms used in LHARC, LHA and ZOO.
Notenboom (from Microsoft US Patent 4955066)
uses three levels of compression, starting with run length encoding.
The Gibson & Graybill patent (US Patent 5049881)
covers the LZRW1 algorithm
previously patented by Waterworth and reinvented by Ross Williams.
Claims 4 and 12 are very general and could be interpreted as
applying to any LZ algorithm using hashing (including all variants
of LZ78):
4. A compression method for compressing a stream of input data into
a compressed stream of output data based on a minimum number of
characters in each input data string to be compressed, said
compression method comprising the creation of a hash table, hashing
each occurrence of a string of input data and subsequently searching
for identical strings of input data and if such an identical string
of input data is located whose string size is at least equal to the
minimum compression size selected, compressing the second and all
subsequent occurrences of such identical string of data, if a string
of data is located which does not match to a previously compressed
string of data, storing such data as uncompressed data, and for each
input strings after each hash is used to find a possible previous
match location of the string, the location of the string is stored
in the hash table, thereby using the previously processed data to
act as a compression dictionary.
Claim 12 is identical, with 'method' replaced with 'apparatus'. Since
the 'minimal compression size' can be as small as 2, the claim could
cover any dictionary technique of the LZ family. However the text of the
patent and the other claims make clear that the patent should cover the
LZRW1 algorithm only. (In any case the Gibson & Graybill patent is likely
to be invalid because of the prior art in the Waterworth patent.)
Phil Katz, author of PKZIP, also has a patent on LZ77
(US Patent 5051745) but the claims only apply to sorted hash tables,
and when the hash table is substantially smaller than the window size.
IBM patented (US Patent 5001478) the idea of combining
a history buffer (the LZ77 technique) and a lexicon (as in LZ78).
Stac patented
(US Patent 5016009 and US Patent 5126739) yet another variation of
LZ77 with hashing. The '009 patent was used in the lawsuit against Microsoft
(see above). Stac also has a patent on LZ77 with parallel lookup in
hardware (US Patent 5003307).
Robert Jung, author of ARJ, has been granted patent
US Patent 5140321 for one variation of LZ77 with hashing. This patent
covers the LZRW3-A algorithm, also previously discovered by
Ross Williams. LZRW3-A was posted on comp.compression on
July 15, 1991. The patent was filed two months later
on Sept 4, 1991. (The US patent system allowed this because of the
"invention date" rule.)
Chambers US Patent 5155484 is yet another
variation of LZ77 with hashing. The hash function is just the juxtaposition
of two input bytes. The hash table is named 'direct lookup table'.
LZ78 Patents
One form of the original LZ78 algorithm was patented
(US Patent 4464650) by its authors Lempel, Ziv, Cohn and Eastman. This
patent is owned by Unisys.
The LZW algorithm used in 'compress' is patented by
IBM (US Patent 4814746) and Unisys
(US Patent 4558302). It is also used in the V.42bis compression
standard (see question 11 on V.42bis below), in Postscript Level 2, in
GIF and TIFF. Unisys sells the license to modem manufacturers for a
onetime fee (contact: Welch Patent Desk, Unisys Corp., P.O. Box 500,
Bluebell, PA 19424 Mailcode C SW 19). CompuServe is licensing the
usage of LZW in GIF products for 1.5% of the product price, of which
1% goes to Unisys; usage of LZW in non-GIF products must be licensed
directly from Unisys. For more information, see
http://www.unisys.com/ or email to
mailto:lzw_info@unisys.com.
The IBM patent application was first filed three weeks before that of
Unisys, but the US patent office failed to recognize that they
covered the same algorithm. (The IBM patent is more general, but its
claim 7 is exactly LZW.)
Klaus Holtz also claims that US Patent 4366551
for his "autosophy" data compression method covers LZ78 and LZW. According
to Holtz, most of the largest V.42bis modem manufacturers have paid for
patent licenses.
AP coding is patented by Storer (US Patent 4876541).
(Get the yabba package for source code, see question 2 above, file type .Y)
Storer also claims that his patent covers V.42bis.
Arithmetic Coding
IBM holds many patents on arithmetic coding
(US Patent 4286256,
US Patent 4295125,
US Patent 4463342,
US Patent 4467317,
US Patent 4633490,
US Patent 4652856,
US Patent 4891643,
US Patent 4905297,
US Patent 4935882).
It has
patented in particular the Q-coder implementation of arithmetic coding.
The JBIG standard, and the arithmetic coding option of the JPEG standard
requires use of the patented algorithm. No JPEG-compatible method is
possible without infringing the patent, because what IBM actually claims
rights to is the underlying probability model (the heart of an arithmetic
coder).
AT&T has 3 patents on arithmetic coding
(US Patent 4973961, US Patent 5023611, US Patent 5025258).
Predictor Algorithm
The "predictor" algorithm was first described in the paper:
Raita, T. and Teuhola, J. (1987), "Predictive text compression by hashing",
ACM Conference on Information Retrieval.
This algorithm was patented (US Patent 5229768)
by K. Thomas in 1993. It is used in the Internet Draft "PPP Predictor
Compression Protocol"
(see
ftp://venera.isi.edu/internet-drafts/draft-ietf-pppext-predictor-00.txt).
General Comments
As can be seen from the above list, some of the most popular compression
programs (COMPRESS, PKZIP, ZOO, LHZ, ASJ) are now covered by patents.
(This says nothing about the validity of these patents.)