Simple light-weight LZ compression library.

Dependents:   display-puck

Committer:
sigveseb
Date:
Tue Jul 29 14:02:53 2014 +0000
Revision:
1:0c9aeae76ab2
Change to cpp

Who changed what in which revision?

UserRevisionLine numberNew contents of line
sigveseb 1:0c9aeae76ab2 1 /*************************************************************************
sigveseb 1:0c9aeae76ab2 2 * Name: lz.c
sigveseb 1:0c9aeae76ab2 3 * Author: Marcus Geelnard
sigveseb 1:0c9aeae76ab2 4 * Description: LZ77 coder/decoder implementation.
sigveseb 1:0c9aeae76ab2 5 * Reentrant: Yes
sigveseb 1:0c9aeae76ab2 6 *
sigveseb 1:0c9aeae76ab2 7 * The LZ77 compression scheme is a substitutional compression scheme
sigveseb 1:0c9aeae76ab2 8 * proposed by Abraham Lempel and Jakob Ziv in 1977. It is very simple in
sigveseb 1:0c9aeae76ab2 9 * its design, and uses no fancy bit level compression.
sigveseb 1:0c9aeae76ab2 10 *
sigveseb 1:0c9aeae76ab2 11 * This is my first attempt at an implementation of a LZ77 code/decoder.
sigveseb 1:0c9aeae76ab2 12 *
sigveseb 1:0c9aeae76ab2 13 * The principle of the LZ77 compression algorithm is to store repeated
sigveseb 1:0c9aeae76ab2 14 * occurrences of strings as references to previous occurrences of the same
sigveseb 1:0c9aeae76ab2 15 * string. The point is that the reference consumes less space than the
sigveseb 1:0c9aeae76ab2 16 * string itself, provided that the string is long enough (in this
sigveseb 1:0c9aeae76ab2 17 * implementation, the string has to be at least 4 bytes long, since the
sigveseb 1:0c9aeae76ab2 18 * minimum coded reference is 3 bytes long). Also note that the term
sigveseb 1:0c9aeae76ab2 19 * "string" refers to any kind of byte sequence (it does not have to be
sigveseb 1:0c9aeae76ab2 20 * an ASCII string, for instance).
sigveseb 1:0c9aeae76ab2 21 *
sigveseb 1:0c9aeae76ab2 22 * The coder uses a brute force approach to finding string matches in the
sigveseb 1:0c9aeae76ab2 23 * history buffer (or "sliding window", if you wish), which is very, very
sigveseb 1:0c9aeae76ab2 24 * slow. I recon the complexity is somewhere between O(n^2) and O(n^3),
sigveseb 1:0c9aeae76ab2 25 * depending on the input data.
sigveseb 1:0c9aeae76ab2 26 *
sigveseb 1:0c9aeae76ab2 27 * There is also a faster implementation that uses a large working buffer
sigveseb 1:0c9aeae76ab2 28 * in which a "jump table" is stored, which is used to quickly find
sigveseb 1:0c9aeae76ab2 29 * possible string matches (see the source code for LZ_CompressFast() for
sigveseb 1:0c9aeae76ab2 30 * more information). The faster method is an order of magnitude faster,
sigveseb 1:0c9aeae76ab2 31 * but still quite slow compared to other compression methods.
sigveseb 1:0c9aeae76ab2 32 *
sigveseb 1:0c9aeae76ab2 33 * The upside is that decompression is very fast, and the compression ratio
sigveseb 1:0c9aeae76ab2 34 * is often very good.
sigveseb 1:0c9aeae76ab2 35 *
sigveseb 1:0c9aeae76ab2 36 * The reference to a string is coded as a (length,offset) pair, where the
sigveseb 1:0c9aeae76ab2 37 * length indicates the length of the string, and the offset gives the
sigveseb 1:0c9aeae76ab2 38 * offset from the current data position. To distinguish between string
sigveseb 1:0c9aeae76ab2 39 * references and literal strings (uncompressed bytes), a string reference
sigveseb 1:0c9aeae76ab2 40 * is preceded by a marker byte, which is chosen as the least common byte
sigveseb 1:0c9aeae76ab2 41 * symbol in the input data stream (this marker byte is stored in the
sigveseb 1:0c9aeae76ab2 42 * output stream as the first byte).
sigveseb 1:0c9aeae76ab2 43 *
sigveseb 1:0c9aeae76ab2 44 * Occurrences of the marker byte in the stream are encoded as the marker
sigveseb 1:0c9aeae76ab2 45 * byte followed by a zero byte, which means that occurrences of the marker
sigveseb 1:0c9aeae76ab2 46 * byte have to be coded with two bytes.
sigveseb 1:0c9aeae76ab2 47 *
sigveseb 1:0c9aeae76ab2 48 * The lengths and offsets are coded in a variable length fashion, allowing
sigveseb 1:0c9aeae76ab2 49 * values of any magnitude (up to 4294967295 in this implementation).
sigveseb 1:0c9aeae76ab2 50 *
sigveseb 1:0c9aeae76ab2 51 * With this compression scheme, the worst case compression result is
sigveseb 1:0c9aeae76ab2 52 * (257/256)*insize + 1.
sigveseb 1:0c9aeae76ab2 53 *
sigveseb 1:0c9aeae76ab2 54 *-------------------------------------------------------------------------
sigveseb 1:0c9aeae76ab2 55 * Copyright (c) 2003-2006 Marcus Geelnard
sigveseb 1:0c9aeae76ab2 56 *
sigveseb 1:0c9aeae76ab2 57 * This software is provided 'as-is', without any express or implied
sigveseb 1:0c9aeae76ab2 58 * warranty. In no event will the authors be held liable for any damages
sigveseb 1:0c9aeae76ab2 59 * arising from the use of this software.
sigveseb 1:0c9aeae76ab2 60 *
sigveseb 1:0c9aeae76ab2 61 * Permission is granted to anyone to use this software for any purpose,
sigveseb 1:0c9aeae76ab2 62 * including commercial applications, and to alter it and redistribute it
sigveseb 1:0c9aeae76ab2 63 * freely, subject to the following restrictions:
sigveseb 1:0c9aeae76ab2 64 *
sigveseb 1:0c9aeae76ab2 65 * 1. The origin of this software must not be misrepresented; you must not
sigveseb 1:0c9aeae76ab2 66 * claim that you wrote the original software. If you use this software
sigveseb 1:0c9aeae76ab2 67 * in a product, an acknowledgment in the product documentation would
sigveseb 1:0c9aeae76ab2 68 * be appreciated but is not required.
sigveseb 1:0c9aeae76ab2 69 *
sigveseb 1:0c9aeae76ab2 70 * 2. Altered source versions must be plainly marked as such, and must not
sigveseb 1:0c9aeae76ab2 71 * be misrepresented as being the original software.
sigveseb 1:0c9aeae76ab2 72 *
sigveseb 1:0c9aeae76ab2 73 * 3. This notice may not be removed or altered from any source
sigveseb 1:0c9aeae76ab2 74 * distribution.
sigveseb 1:0c9aeae76ab2 75 *
sigveseb 1:0c9aeae76ab2 76 * Marcus Geelnard
sigveseb 1:0c9aeae76ab2 77 * marcus.geelnard at home.se
sigveseb 1:0c9aeae76ab2 78 *************************************************************************/
sigveseb 1:0c9aeae76ab2 79
sigveseb 1:0c9aeae76ab2 80
sigveseb 1:0c9aeae76ab2 81 /*************************************************************************
sigveseb 1:0c9aeae76ab2 82 * Constants used for LZ77 coding
sigveseb 1:0c9aeae76ab2 83 *************************************************************************/
sigveseb 1:0c9aeae76ab2 84
sigveseb 1:0c9aeae76ab2 85 /* Maximum offset (can be any size < 2^31). Lower values give faster
sigveseb 1:0c9aeae76ab2 86 compression, while higher values gives better compression. The default
sigveseb 1:0c9aeae76ab2 87 value of 100000 is quite high. Experiment to see what works best for
sigveseb 1:0c9aeae76ab2 88 you. */
sigveseb 1:0c9aeae76ab2 89 #define LZ_MAX_OFFSET 100000
sigveseb 1:0c9aeae76ab2 90
sigveseb 1:0c9aeae76ab2 91 /*************************************************************************
sigveseb 1:0c9aeae76ab2 92 * INTERNAL FUNCTIONS *
sigveseb 1:0c9aeae76ab2 93 *************************************************************************/
sigveseb 1:0c9aeae76ab2 94
sigveseb 1:0c9aeae76ab2 95
sigveseb 1:0c9aeae76ab2 96 /*************************************************************************
sigveseb 1:0c9aeae76ab2 97 * _LZ_StringCompare() - Return maximum length string match.
sigveseb 1:0c9aeae76ab2 98 *************************************************************************/
sigveseb 1:0c9aeae76ab2 99
sigveseb 1:0c9aeae76ab2 100 static unsigned int _LZ_StringCompare( unsigned char * str1,
sigveseb 1:0c9aeae76ab2 101 unsigned char * str2, unsigned int minlen, unsigned int maxlen )
sigveseb 1:0c9aeae76ab2 102 {
sigveseb 1:0c9aeae76ab2 103 unsigned int len;
sigveseb 1:0c9aeae76ab2 104
sigveseb 1:0c9aeae76ab2 105 for( len = minlen; (len < maxlen) && (str1[len] == str2[len]); ++ len );
sigveseb 1:0c9aeae76ab2 106
sigveseb 1:0c9aeae76ab2 107 return len;
sigveseb 1:0c9aeae76ab2 108 }
sigveseb 1:0c9aeae76ab2 109
sigveseb 1:0c9aeae76ab2 110
sigveseb 1:0c9aeae76ab2 111 /*************************************************************************
sigveseb 1:0c9aeae76ab2 112 * _LZ_WriteVarSize() - Write unsigned integer with variable number of
sigveseb 1:0c9aeae76ab2 113 * bytes depending on value.
sigveseb 1:0c9aeae76ab2 114 *************************************************************************/
sigveseb 1:0c9aeae76ab2 115
sigveseb 1:0c9aeae76ab2 116 static int _LZ_WriteVarSize( unsigned int x, unsigned char * buf )
sigveseb 1:0c9aeae76ab2 117 {
sigveseb 1:0c9aeae76ab2 118 unsigned int y;
sigveseb 1:0c9aeae76ab2 119 int num_bytes, i, b;
sigveseb 1:0c9aeae76ab2 120
sigveseb 1:0c9aeae76ab2 121 /* Determine number of bytes needed to store the number x */
sigveseb 1:0c9aeae76ab2 122 y = x >> 3;
sigveseb 1:0c9aeae76ab2 123 for( num_bytes = 5; num_bytes >= 2; -- num_bytes )
sigveseb 1:0c9aeae76ab2 124 {
sigveseb 1:0c9aeae76ab2 125 if( y & 0xfe000000 ) break;
sigveseb 1:0c9aeae76ab2 126 y <<= 7;
sigveseb 1:0c9aeae76ab2 127 }
sigveseb 1:0c9aeae76ab2 128
sigveseb 1:0c9aeae76ab2 129 /* Write all bytes, seven bits in each, with 8:th bit set for all */
sigveseb 1:0c9aeae76ab2 130 /* but the last byte. */
sigveseb 1:0c9aeae76ab2 131 for( i = num_bytes-1; i >= 0; -- i )
sigveseb 1:0c9aeae76ab2 132 {
sigveseb 1:0c9aeae76ab2 133 b = (x >> (i*7)) & 0x0000007f;
sigveseb 1:0c9aeae76ab2 134 if( i > 0 )
sigveseb 1:0c9aeae76ab2 135 {
sigveseb 1:0c9aeae76ab2 136 b |= 0x00000080;
sigveseb 1:0c9aeae76ab2 137 }
sigveseb 1:0c9aeae76ab2 138 *buf ++ = (unsigned char) b;
sigveseb 1:0c9aeae76ab2 139 }
sigveseb 1:0c9aeae76ab2 140
sigveseb 1:0c9aeae76ab2 141 /* Return number of bytes written */
sigveseb 1:0c9aeae76ab2 142 return num_bytes;
sigveseb 1:0c9aeae76ab2 143 }
sigveseb 1:0c9aeae76ab2 144
sigveseb 1:0c9aeae76ab2 145
sigveseb 1:0c9aeae76ab2 146 /*************************************************************************
sigveseb 1:0c9aeae76ab2 147 * _LZ_ReadVarSize() - Read unsigned integer with variable number of
sigveseb 1:0c9aeae76ab2 148 * bytes depending on value.
sigveseb 1:0c9aeae76ab2 149 *************************************************************************/
sigveseb 1:0c9aeae76ab2 150
sigveseb 1:0c9aeae76ab2 151 static int _LZ_ReadVarSize( unsigned int * x, unsigned char * buf )
sigveseb 1:0c9aeae76ab2 152 {
sigveseb 1:0c9aeae76ab2 153 unsigned int y, b, num_bytes;
sigveseb 1:0c9aeae76ab2 154
sigveseb 1:0c9aeae76ab2 155 /* Read complete value (stop when byte contains zero in 8:th bit) */
sigveseb 1:0c9aeae76ab2 156 y = 0;
sigveseb 1:0c9aeae76ab2 157 num_bytes = 0;
sigveseb 1:0c9aeae76ab2 158 do
sigveseb 1:0c9aeae76ab2 159 {
sigveseb 1:0c9aeae76ab2 160 b = (unsigned int) (*buf ++);
sigveseb 1:0c9aeae76ab2 161 y = (y << 7) | (b & 0x0000007f);
sigveseb 1:0c9aeae76ab2 162 ++ num_bytes;
sigveseb 1:0c9aeae76ab2 163 }
sigveseb 1:0c9aeae76ab2 164 while( b & 0x00000080 );
sigveseb 1:0c9aeae76ab2 165
sigveseb 1:0c9aeae76ab2 166 /* Store value in x */
sigveseb 1:0c9aeae76ab2 167 *x = y;
sigveseb 1:0c9aeae76ab2 168
sigveseb 1:0c9aeae76ab2 169 /* Return number of bytes read */
sigveseb 1:0c9aeae76ab2 170 return num_bytes;
sigveseb 1:0c9aeae76ab2 171 }
sigveseb 1:0c9aeae76ab2 172
sigveseb 1:0c9aeae76ab2 173
sigveseb 1:0c9aeae76ab2 174
sigveseb 1:0c9aeae76ab2 175 /*************************************************************************
sigveseb 1:0c9aeae76ab2 176 * PUBLIC FUNCTIONS *
sigveseb 1:0c9aeae76ab2 177 *************************************************************************/
sigveseb 1:0c9aeae76ab2 178
sigveseb 1:0c9aeae76ab2 179
sigveseb 1:0c9aeae76ab2 180 /*************************************************************************
sigveseb 1:0c9aeae76ab2 181 * LZ_Compress() - Compress a block of data using an LZ77 coder.
sigveseb 1:0c9aeae76ab2 182 * in - Input (uncompressed) buffer.
sigveseb 1:0c9aeae76ab2 183 * out - Output (compressed) buffer. This buffer must be 0.4% larger
sigveseb 1:0c9aeae76ab2 184 * than the input buffer, plus one byte.
sigveseb 1:0c9aeae76ab2 185 * insize - Number of input bytes.
sigveseb 1:0c9aeae76ab2 186 * The function returns the size of the compressed data.
sigveseb 1:0c9aeae76ab2 187 *************************************************************************/
sigveseb 1:0c9aeae76ab2 188
sigveseb 1:0c9aeae76ab2 189 int LZ_Compress( unsigned char *in, unsigned char *out,
sigveseb 1:0c9aeae76ab2 190 unsigned int insize )
sigveseb 1:0c9aeae76ab2 191 {
sigveseb 1:0c9aeae76ab2 192 unsigned char marker, symbol;
sigveseb 1:0c9aeae76ab2 193 unsigned int inpos, outpos, bytesleft, i;
sigveseb 1:0c9aeae76ab2 194 unsigned int maxoffset, offset, bestoffset;
sigveseb 1:0c9aeae76ab2 195 unsigned int maxlength, length, bestlength;
sigveseb 1:0c9aeae76ab2 196 unsigned int histogram[ 256 ];
sigveseb 1:0c9aeae76ab2 197 unsigned char *ptr1, *ptr2;
sigveseb 1:0c9aeae76ab2 198
sigveseb 1:0c9aeae76ab2 199
sigveseb 1:0c9aeae76ab2 200 /* Do we have anything to compress? */
sigveseb 1:0c9aeae76ab2 201 if( insize < 1 )
sigveseb 1:0c9aeae76ab2 202 {
sigveseb 1:0c9aeae76ab2 203 return 0;
sigveseb 1:0c9aeae76ab2 204 }
sigveseb 1:0c9aeae76ab2 205
sigveseb 1:0c9aeae76ab2 206
sigveseb 1:0c9aeae76ab2 207 /* Create histogram */
sigveseb 1:0c9aeae76ab2 208 for( i = 0; i < 256; ++ i )
sigveseb 1:0c9aeae76ab2 209 {
sigveseb 1:0c9aeae76ab2 210 histogram[ i ] = 0;
sigveseb 1:0c9aeae76ab2 211 }
sigveseb 1:0c9aeae76ab2 212 for( i = 0; i < insize; ++ i )
sigveseb 1:0c9aeae76ab2 213 {
sigveseb 1:0c9aeae76ab2 214 ++ histogram[ in[ i ] ];
sigveseb 1:0c9aeae76ab2 215 }
sigveseb 1:0c9aeae76ab2 216
sigveseb 1:0c9aeae76ab2 217 /* Find the least common byte, and use it as the marker symbol */
sigveseb 1:0c9aeae76ab2 218 marker = 0;
sigveseb 1:0c9aeae76ab2 219 for( i = 1; i < 256; ++ i )
sigveseb 1:0c9aeae76ab2 220 {
sigveseb 1:0c9aeae76ab2 221 if( histogram[ i ] < histogram[ marker ] )
sigveseb 1:0c9aeae76ab2 222 {
sigveseb 1:0c9aeae76ab2 223 marker = i;
sigveseb 1:0c9aeae76ab2 224 }
sigveseb 1:0c9aeae76ab2 225 }
sigveseb 1:0c9aeae76ab2 226
sigveseb 1:0c9aeae76ab2 227 /* Remember the marker symbol for the decoder */
sigveseb 1:0c9aeae76ab2 228 out[ 0 ] = marker;
sigveseb 1:0c9aeae76ab2 229
sigveseb 1:0c9aeae76ab2 230 /* Start of compression */
sigveseb 1:0c9aeae76ab2 231 inpos = 0;
sigveseb 1:0c9aeae76ab2 232 outpos = 1;
sigveseb 1:0c9aeae76ab2 233
sigveseb 1:0c9aeae76ab2 234 /* Main compression loop */
sigveseb 1:0c9aeae76ab2 235 bytesleft = insize;
sigveseb 1:0c9aeae76ab2 236 do
sigveseb 1:0c9aeae76ab2 237 {
sigveseb 1:0c9aeae76ab2 238
sigveseb 1:0c9aeae76ab2 239
sigveseb 1:0c9aeae76ab2 240 /* Determine most distant position */
sigveseb 1:0c9aeae76ab2 241 if( inpos > LZ_MAX_OFFSET ) maxoffset = LZ_MAX_OFFSET;
sigveseb 1:0c9aeae76ab2 242 else maxoffset = inpos;
sigveseb 1:0c9aeae76ab2 243
sigveseb 1:0c9aeae76ab2 244 /* Get pointer to current position */
sigveseb 1:0c9aeae76ab2 245 ptr1 = &in[ inpos ];
sigveseb 1:0c9aeae76ab2 246
sigveseb 1:0c9aeae76ab2 247 /* Search history window for maximum length string match */
sigveseb 1:0c9aeae76ab2 248 bestlength = 3;
sigveseb 1:0c9aeae76ab2 249 bestoffset = 0;
sigveseb 1:0c9aeae76ab2 250 for( offset = 3; offset <= maxoffset; ++ offset )
sigveseb 1:0c9aeae76ab2 251 {
sigveseb 1:0c9aeae76ab2 252 /* Get pointer to candidate string */
sigveseb 1:0c9aeae76ab2 253 ptr2 = &ptr1[ -(int)offset ];
sigveseb 1:0c9aeae76ab2 254
sigveseb 1:0c9aeae76ab2 255 /* Quickly determine if this is a candidate (for speed) */
sigveseb 1:0c9aeae76ab2 256 if( (ptr1[ 0 ] == ptr2[ 0 ]) &&
sigveseb 1:0c9aeae76ab2 257 (ptr1[ bestlength ] == ptr2[ bestlength ]) )
sigveseb 1:0c9aeae76ab2 258 {
sigveseb 1:0c9aeae76ab2 259 /* Determine maximum length for this offset */
sigveseb 1:0c9aeae76ab2 260 maxlength = (bytesleft < offset ? bytesleft : offset);
sigveseb 1:0c9aeae76ab2 261
sigveseb 1:0c9aeae76ab2 262 /* Count maximum length match at this offset */
sigveseb 1:0c9aeae76ab2 263 length = _LZ_StringCompare( ptr1, ptr2, 0, maxlength );
sigveseb 1:0c9aeae76ab2 264
sigveseb 1:0c9aeae76ab2 265 /* Better match than any previous match? */
sigveseb 1:0c9aeae76ab2 266 if( length > bestlength )
sigveseb 1:0c9aeae76ab2 267 {
sigveseb 1:0c9aeae76ab2 268 bestlength = length;
sigveseb 1:0c9aeae76ab2 269 bestoffset = offset;
sigveseb 1:0c9aeae76ab2 270 }
sigveseb 1:0c9aeae76ab2 271 }
sigveseb 1:0c9aeae76ab2 272 }
sigveseb 1:0c9aeae76ab2 273
sigveseb 1:0c9aeae76ab2 274 /* Was there a good enough match? */
sigveseb 1:0c9aeae76ab2 275 if( (bestlength >= 8) ||
sigveseb 1:0c9aeae76ab2 276 ((bestlength == 4) && (bestoffset <= 0x0000007f)) ||
sigveseb 1:0c9aeae76ab2 277 ((bestlength == 5) && (bestoffset <= 0x00003fff)) ||
sigveseb 1:0c9aeae76ab2 278 ((bestlength == 6) && (bestoffset <= 0x001fffff)) ||
sigveseb 1:0c9aeae76ab2 279 ((bestlength == 7) && (bestoffset <= 0x0fffffff)) )
sigveseb 1:0c9aeae76ab2 280 {
sigveseb 1:0c9aeae76ab2 281 out[ outpos ++ ] = (unsigned char) marker;
sigveseb 1:0c9aeae76ab2 282 outpos += _LZ_WriteVarSize( bestlength, &out[ outpos ] );
sigveseb 1:0c9aeae76ab2 283 outpos += _LZ_WriteVarSize( bestoffset, &out[ outpos ] );
sigveseb 1:0c9aeae76ab2 284 inpos += bestlength;
sigveseb 1:0c9aeae76ab2 285 bytesleft -= bestlength;
sigveseb 1:0c9aeae76ab2 286 }
sigveseb 1:0c9aeae76ab2 287 else
sigveseb 1:0c9aeae76ab2 288 {
sigveseb 1:0c9aeae76ab2 289 /* Output single byte (or two bytes if marker byte) */
sigveseb 1:0c9aeae76ab2 290 symbol = in[ inpos ++ ];
sigveseb 1:0c9aeae76ab2 291 out[ outpos ++ ] = symbol;
sigveseb 1:0c9aeae76ab2 292 if( symbol == marker )
sigveseb 1:0c9aeae76ab2 293 {
sigveseb 1:0c9aeae76ab2 294 out[ outpos ++ ] = 0;
sigveseb 1:0c9aeae76ab2 295 }
sigveseb 1:0c9aeae76ab2 296 -- bytesleft;
sigveseb 1:0c9aeae76ab2 297 }
sigveseb 1:0c9aeae76ab2 298 }
sigveseb 1:0c9aeae76ab2 299 while( bytesleft > 3 );
sigveseb 1:0c9aeae76ab2 300
sigveseb 1:0c9aeae76ab2 301 /* Dump remaining bytes, if any */
sigveseb 1:0c9aeae76ab2 302 while( inpos < insize )
sigveseb 1:0c9aeae76ab2 303 {
sigveseb 1:0c9aeae76ab2 304 if( in[ inpos ] == marker )
sigveseb 1:0c9aeae76ab2 305 {
sigveseb 1:0c9aeae76ab2 306 out[ outpos ++ ] = marker;
sigveseb 1:0c9aeae76ab2 307 out[ outpos ++ ] = 0;
sigveseb 1:0c9aeae76ab2 308 }
sigveseb 1:0c9aeae76ab2 309 else
sigveseb 1:0c9aeae76ab2 310 {
sigveseb 1:0c9aeae76ab2 311 out[ outpos ++ ] = in[ inpos ];
sigveseb 1:0c9aeae76ab2 312 }
sigveseb 1:0c9aeae76ab2 313 ++ inpos;
sigveseb 1:0c9aeae76ab2 314 }
sigveseb 1:0c9aeae76ab2 315
sigveseb 1:0c9aeae76ab2 316 return outpos;
sigveseb 1:0c9aeae76ab2 317 }
sigveseb 1:0c9aeae76ab2 318
sigveseb 1:0c9aeae76ab2 319
sigveseb 1:0c9aeae76ab2 320 /*************************************************************************
sigveseb 1:0c9aeae76ab2 321 * LZ_CompressFast() - Compress a block of data using an LZ77 coder.
sigveseb 1:0c9aeae76ab2 322 * in - Input (uncompressed) buffer.
sigveseb 1:0c9aeae76ab2 323 * out - Output (compressed) buffer. This buffer must be 0.4% larger
sigveseb 1:0c9aeae76ab2 324 * than the input buffer, plus one byte.
sigveseb 1:0c9aeae76ab2 325 * insize - Number of input bytes.
sigveseb 1:0c9aeae76ab2 326 * work - Pointer to a temporary buffer (internal working buffer), which
sigveseb 1:0c9aeae76ab2 327 * must be able to hold (insize+65536) unsigned integers.
sigveseb 1:0c9aeae76ab2 328 * The function returns the size of the compressed data.
sigveseb 1:0c9aeae76ab2 329 *************************************************************************/
sigveseb 1:0c9aeae76ab2 330
sigveseb 1:0c9aeae76ab2 331 int LZ_CompressFast( unsigned char *in, unsigned char *out,
sigveseb 1:0c9aeae76ab2 332 unsigned int insize, unsigned int *work )
sigveseb 1:0c9aeae76ab2 333 {
sigveseb 1:0c9aeae76ab2 334 unsigned char marker, symbol;
sigveseb 1:0c9aeae76ab2 335 unsigned int inpos, outpos, bytesleft, i, index, symbols;
sigveseb 1:0c9aeae76ab2 336 unsigned int offset, bestoffset;
sigveseb 1:0c9aeae76ab2 337 unsigned int maxlength, length, bestlength;
sigveseb 1:0c9aeae76ab2 338 unsigned int histogram[ 256 ], *lastindex, *jumptable;
sigveseb 1:0c9aeae76ab2 339 unsigned char *ptr1, *ptr2;
sigveseb 1:0c9aeae76ab2 340
sigveseb 1:0c9aeae76ab2 341 /* Do we have anything to compress? */
sigveseb 1:0c9aeae76ab2 342 if( insize < 1 )
sigveseb 1:0c9aeae76ab2 343 {
sigveseb 1:0c9aeae76ab2 344 return 0;
sigveseb 1:0c9aeae76ab2 345 }
sigveseb 1:0c9aeae76ab2 346
sigveseb 1:0c9aeae76ab2 347 /* Assign arrays to the working area */
sigveseb 1:0c9aeae76ab2 348 lastindex = work;
sigveseb 1:0c9aeae76ab2 349 jumptable = &work[ 65536 ];
sigveseb 1:0c9aeae76ab2 350
sigveseb 1:0c9aeae76ab2 351 /* Build a "jump table". Here is how the jump table works:
sigveseb 1:0c9aeae76ab2 352 jumptable[i] points to the nearest previous occurrence of the same
sigveseb 1:0c9aeae76ab2 353 symbol pair as in[i]:in[i+1], so in[i] == in[jumptable[i]] and
sigveseb 1:0c9aeae76ab2 354 in[i+1] == in[jumptable[i]+1], and so on... Following the jump table
sigveseb 1:0c9aeae76ab2 355 gives a dramatic boost for the string search'n'match loop compared
sigveseb 1:0c9aeae76ab2 356 to doing a brute force search. The jump table is built in O(n) time,
sigveseb 1:0c9aeae76ab2 357 so it is a cheap operation in terms of time, but it is expensice in
sigveseb 1:0c9aeae76ab2 358 terms of memory consumption. */
sigveseb 1:0c9aeae76ab2 359 for( i = 0; i < 65536; ++ i )
sigveseb 1:0c9aeae76ab2 360 {
sigveseb 1:0c9aeae76ab2 361 lastindex[ i ] = 0xffffffff;
sigveseb 1:0c9aeae76ab2 362 }
sigveseb 1:0c9aeae76ab2 363 for( i = 0; i < insize-1; ++ i )
sigveseb 1:0c9aeae76ab2 364 {
sigveseb 1:0c9aeae76ab2 365 symbols = (((unsigned int)in[i]) << 8) | ((unsigned int)in[i+1]);
sigveseb 1:0c9aeae76ab2 366 index = lastindex[ symbols ];
sigveseb 1:0c9aeae76ab2 367 lastindex[ symbols ] = i;
sigveseb 1:0c9aeae76ab2 368 jumptable[ i ] = index;
sigveseb 1:0c9aeae76ab2 369 }
sigveseb 1:0c9aeae76ab2 370 jumptable[ insize-1 ] = 0xffffffff;
sigveseb 1:0c9aeae76ab2 371
sigveseb 1:0c9aeae76ab2 372 /* Create histogram */
sigveseb 1:0c9aeae76ab2 373 for( i = 0; i < 256; ++ i )
sigveseb 1:0c9aeae76ab2 374 {
sigveseb 1:0c9aeae76ab2 375 histogram[ i ] = 0;
sigveseb 1:0c9aeae76ab2 376 }
sigveseb 1:0c9aeae76ab2 377 for( i = 0; i < insize; ++ i )
sigveseb 1:0c9aeae76ab2 378 {
sigveseb 1:0c9aeae76ab2 379 ++ histogram[ in[ i ] ];
sigveseb 1:0c9aeae76ab2 380 }
sigveseb 1:0c9aeae76ab2 381
sigveseb 1:0c9aeae76ab2 382 /* Find the least common byte, and use it as the marker symbol */
sigveseb 1:0c9aeae76ab2 383 marker = 0;
sigveseb 1:0c9aeae76ab2 384 for( i = 1; i < 256; ++ i )
sigveseb 1:0c9aeae76ab2 385 {
sigveseb 1:0c9aeae76ab2 386 if( histogram[ i ] < histogram[ marker ] )
sigveseb 1:0c9aeae76ab2 387 {
sigveseb 1:0c9aeae76ab2 388 marker = i;
sigveseb 1:0c9aeae76ab2 389 }
sigveseb 1:0c9aeae76ab2 390 }
sigveseb 1:0c9aeae76ab2 391
sigveseb 1:0c9aeae76ab2 392 /* Remember the marker symbol for the decoder */
sigveseb 1:0c9aeae76ab2 393 out[ 0 ] = marker;
sigveseb 1:0c9aeae76ab2 394
sigveseb 1:0c9aeae76ab2 395 /* Start of compression */
sigveseb 1:0c9aeae76ab2 396 inpos = 0;
sigveseb 1:0c9aeae76ab2 397 outpos = 1;
sigveseb 1:0c9aeae76ab2 398
sigveseb 1:0c9aeae76ab2 399 /* Main compression loop */
sigveseb 1:0c9aeae76ab2 400 bytesleft = insize;
sigveseb 1:0c9aeae76ab2 401 do
sigveseb 1:0c9aeae76ab2 402 {
sigveseb 1:0c9aeae76ab2 403 /* Get pointer to current position */
sigveseb 1:0c9aeae76ab2 404 ptr1 = &in[ inpos ];
sigveseb 1:0c9aeae76ab2 405
sigveseb 1:0c9aeae76ab2 406 /* Search history window for maximum length string match */
sigveseb 1:0c9aeae76ab2 407 bestlength = 3;
sigveseb 1:0c9aeae76ab2 408 bestoffset = 0;
sigveseb 1:0c9aeae76ab2 409 index = jumptable[ inpos ];
sigveseb 1:0c9aeae76ab2 410 while( (index != 0xffffffff) && ((inpos - index) < LZ_MAX_OFFSET) )
sigveseb 1:0c9aeae76ab2 411 {
sigveseb 1:0c9aeae76ab2 412 /* Get pointer to candidate string */
sigveseb 1:0c9aeae76ab2 413 ptr2 = &in[ index ];
sigveseb 1:0c9aeae76ab2 414
sigveseb 1:0c9aeae76ab2 415 /* Quickly determine if this is a candidate (for speed) */
sigveseb 1:0c9aeae76ab2 416 if( ptr2[ bestlength ] == ptr1[ bestlength ] )
sigveseb 1:0c9aeae76ab2 417 {
sigveseb 1:0c9aeae76ab2 418 /* Determine maximum length for this offset */
sigveseb 1:0c9aeae76ab2 419 offset = inpos - index;
sigveseb 1:0c9aeae76ab2 420 maxlength = (bytesleft < offset ? bytesleft : offset);
sigveseb 1:0c9aeae76ab2 421
sigveseb 1:0c9aeae76ab2 422 /* Count maximum length match at this offset */
sigveseb 1:0c9aeae76ab2 423 length = _LZ_StringCompare( ptr1, ptr2, 2, maxlength );
sigveseb 1:0c9aeae76ab2 424
sigveseb 1:0c9aeae76ab2 425 /* Better match than any previous match? */
sigveseb 1:0c9aeae76ab2 426 if( length > bestlength )
sigveseb 1:0c9aeae76ab2 427 {
sigveseb 1:0c9aeae76ab2 428 bestlength = length;
sigveseb 1:0c9aeae76ab2 429 bestoffset = offset;
sigveseb 1:0c9aeae76ab2 430 }
sigveseb 1:0c9aeae76ab2 431 }
sigveseb 1:0c9aeae76ab2 432
sigveseb 1:0c9aeae76ab2 433 /* Get next possible index from jump table */
sigveseb 1:0c9aeae76ab2 434 index = jumptable[ index ];
sigveseb 1:0c9aeae76ab2 435 }
sigveseb 1:0c9aeae76ab2 436
sigveseb 1:0c9aeae76ab2 437 /* Was there a good enough match? */
sigveseb 1:0c9aeae76ab2 438 if( (bestlength >= 8) ||
sigveseb 1:0c9aeae76ab2 439 ((bestlength == 4) && (bestoffset <= 0x0000007f)) ||
sigveseb 1:0c9aeae76ab2 440 ((bestlength == 5) && (bestoffset <= 0x00003fff)) ||
sigveseb 1:0c9aeae76ab2 441 ((bestlength == 6) && (bestoffset <= 0x001fffff)) ||
sigveseb 1:0c9aeae76ab2 442 ((bestlength == 7) && (bestoffset <= 0x0fffffff)) )
sigveseb 1:0c9aeae76ab2 443 {
sigveseb 1:0c9aeae76ab2 444 out[ outpos ++ ] = (unsigned char) marker;
sigveseb 1:0c9aeae76ab2 445 outpos += _LZ_WriteVarSize( bestlength, &out[ outpos ] );
sigveseb 1:0c9aeae76ab2 446 outpos += _LZ_WriteVarSize( bestoffset, &out[ outpos ] );
sigveseb 1:0c9aeae76ab2 447 inpos += bestlength;
sigveseb 1:0c9aeae76ab2 448 bytesleft -= bestlength;
sigveseb 1:0c9aeae76ab2 449 }
sigveseb 1:0c9aeae76ab2 450 else
sigveseb 1:0c9aeae76ab2 451 {
sigveseb 1:0c9aeae76ab2 452 /* Output single byte (or two bytes if marker byte) */
sigveseb 1:0c9aeae76ab2 453 symbol = in[ inpos ++ ];
sigveseb 1:0c9aeae76ab2 454 out[ outpos ++ ] = symbol;
sigveseb 1:0c9aeae76ab2 455 if( symbol == marker )
sigveseb 1:0c9aeae76ab2 456 {
sigveseb 1:0c9aeae76ab2 457 out[ outpos ++ ] = 0;
sigveseb 1:0c9aeae76ab2 458 }
sigveseb 1:0c9aeae76ab2 459 -- bytesleft;
sigveseb 1:0c9aeae76ab2 460 }
sigveseb 1:0c9aeae76ab2 461 }
sigveseb 1:0c9aeae76ab2 462 while( bytesleft > 3 );
sigveseb 1:0c9aeae76ab2 463
sigveseb 1:0c9aeae76ab2 464 /* Dump remaining bytes, if any */
sigveseb 1:0c9aeae76ab2 465 while( inpos < insize )
sigveseb 1:0c9aeae76ab2 466 {
sigveseb 1:0c9aeae76ab2 467 if( in[ inpos ] == marker )
sigveseb 1:0c9aeae76ab2 468 {
sigveseb 1:0c9aeae76ab2 469 out[ outpos ++ ] = marker;
sigveseb 1:0c9aeae76ab2 470 out[ outpos ++ ] = 0;
sigveseb 1:0c9aeae76ab2 471 }
sigveseb 1:0c9aeae76ab2 472 else
sigveseb 1:0c9aeae76ab2 473 {
sigveseb 1:0c9aeae76ab2 474 out[ outpos ++ ] = in[ inpos ];
sigveseb 1:0c9aeae76ab2 475 }
sigveseb 1:0c9aeae76ab2 476 ++ inpos;
sigveseb 1:0c9aeae76ab2 477 }
sigveseb 1:0c9aeae76ab2 478
sigveseb 1:0c9aeae76ab2 479 return outpos;
sigveseb 1:0c9aeae76ab2 480 }
sigveseb 1:0c9aeae76ab2 481
sigveseb 1:0c9aeae76ab2 482
sigveseb 1:0c9aeae76ab2 483 /*************************************************************************
sigveseb 1:0c9aeae76ab2 484 * LZ_Uncompress() - Uncompress a block of data using an LZ77 decoder.
sigveseb 1:0c9aeae76ab2 485 * in - Input (compressed) buffer.
sigveseb 1:0c9aeae76ab2 486 * out - Output (uncompressed) buffer. This buffer must be large
sigveseb 1:0c9aeae76ab2 487 * enough to hold the uncompressed data.
sigveseb 1:0c9aeae76ab2 488 * insize - Number of input bytes.
sigveseb 1:0c9aeae76ab2 489 *************************************************************************/
sigveseb 1:0c9aeae76ab2 490
sigveseb 1:0c9aeae76ab2 491 void LZ_Uncompress( unsigned char *in, unsigned char *out,
sigveseb 1:0c9aeae76ab2 492 unsigned int insize )
sigveseb 1:0c9aeae76ab2 493 {
sigveseb 1:0c9aeae76ab2 494 unsigned char marker, symbol;
sigveseb 1:0c9aeae76ab2 495 unsigned int i, inpos, outpos, length, offset;
sigveseb 1:0c9aeae76ab2 496
sigveseb 1:0c9aeae76ab2 497 /* Do we have anything to uncompress? */
sigveseb 1:0c9aeae76ab2 498 if( insize < 1 )
sigveseb 1:0c9aeae76ab2 499 {
sigveseb 1:0c9aeae76ab2 500 return;
sigveseb 1:0c9aeae76ab2 501 }
sigveseb 1:0c9aeae76ab2 502
sigveseb 1:0c9aeae76ab2 503 /* Get marker symbol from input stream */
sigveseb 1:0c9aeae76ab2 504 marker = in[ 0 ];
sigveseb 1:0c9aeae76ab2 505 inpos = 1;
sigveseb 1:0c9aeae76ab2 506
sigveseb 1:0c9aeae76ab2 507 /* Main decompression loop */
sigveseb 1:0c9aeae76ab2 508 outpos = 0;
sigveseb 1:0c9aeae76ab2 509 do
sigveseb 1:0c9aeae76ab2 510 {
sigveseb 1:0c9aeae76ab2 511 symbol = in[ inpos ++ ];
sigveseb 1:0c9aeae76ab2 512 if( symbol == marker )
sigveseb 1:0c9aeae76ab2 513 {
sigveseb 1:0c9aeae76ab2 514 /* We had a marker byte */
sigveseb 1:0c9aeae76ab2 515 if( in[ inpos ] == 0 )
sigveseb 1:0c9aeae76ab2 516 {
sigveseb 1:0c9aeae76ab2 517 /* It was a single occurrence of the marker byte */
sigveseb 1:0c9aeae76ab2 518 out[ outpos ++ ] = marker;
sigveseb 1:0c9aeae76ab2 519 ++ inpos;
sigveseb 1:0c9aeae76ab2 520 }
sigveseb 1:0c9aeae76ab2 521 else
sigveseb 1:0c9aeae76ab2 522 {
sigveseb 1:0c9aeae76ab2 523 /* Extract true length and offset */
sigveseb 1:0c9aeae76ab2 524 inpos += _LZ_ReadVarSize( &length, &in[ inpos ] );
sigveseb 1:0c9aeae76ab2 525 inpos += _LZ_ReadVarSize( &offset, &in[ inpos ] );
sigveseb 1:0c9aeae76ab2 526
sigveseb 1:0c9aeae76ab2 527 /* Copy corresponding data from history window */
sigveseb 1:0c9aeae76ab2 528 for( i = 0; i < length; ++ i )
sigveseb 1:0c9aeae76ab2 529 {
sigveseb 1:0c9aeae76ab2 530 out[ outpos ] = out[ outpos - offset ];
sigveseb 1:0c9aeae76ab2 531 ++ outpos;
sigveseb 1:0c9aeae76ab2 532 }
sigveseb 1:0c9aeae76ab2 533 }
sigveseb 1:0c9aeae76ab2 534 }
sigveseb 1:0c9aeae76ab2 535 else
sigveseb 1:0c9aeae76ab2 536 {
sigveseb 1:0c9aeae76ab2 537 /* No marker, plain copy */
sigveseb 1:0c9aeae76ab2 538 out[ outpos ++ ] = symbol;
sigveseb 1:0c9aeae76ab2 539 }
sigveseb 1:0c9aeae76ab2 540 }
sigveseb 1:0c9aeae76ab2 541 while( inpos < insize );
sigveseb 1:0c9aeae76ab2 542 }