Simple light-weight LZ compression library.

Dependents:   display-puck

Committer:
sigveseb
Date:
Thu Jul 17 14:16:11 2014 +0000
Revision:
0:ce4be867658b
-

Who changed what in which revision?

UserRevisionLine numberNew contents of line
sigveseb 0:ce4be867658b 1 /*************************************************************************
sigveseb 0:ce4be867658b 2 * Name: lz.c
sigveseb 0:ce4be867658b 3 * Author: Marcus Geelnard
sigveseb 0:ce4be867658b 4 * Description: LZ77 coder/decoder implementation.
sigveseb 0:ce4be867658b 5 * Reentrant: Yes
sigveseb 0:ce4be867658b 6 *
sigveseb 0:ce4be867658b 7 * The LZ77 compression scheme is a substitutional compression scheme
sigveseb 0:ce4be867658b 8 * proposed by Abraham Lempel and Jakob Ziv in 1977. It is very simple in
sigveseb 0:ce4be867658b 9 * its design, and uses no fancy bit level compression.
sigveseb 0:ce4be867658b 10 *
sigveseb 0:ce4be867658b 11 * This is my first attempt at an implementation of a LZ77 code/decoder.
sigveseb 0:ce4be867658b 12 *
sigveseb 0:ce4be867658b 13 * The principle of the LZ77 compression algorithm is to store repeated
sigveseb 0:ce4be867658b 14 * occurrences of strings as references to previous occurrences of the same
sigveseb 0:ce4be867658b 15 * string. The point is that the reference consumes less space than the
sigveseb 0:ce4be867658b 16 * string itself, provided that the string is long enough (in this
sigveseb 0:ce4be867658b 17 * implementation, the string has to be at least 4 bytes long, since the
sigveseb 0:ce4be867658b 18 * minimum coded reference is 3 bytes long). Also note that the term
sigveseb 0:ce4be867658b 19 * "string" refers to any kind of byte sequence (it does not have to be
sigveseb 0:ce4be867658b 20 * an ASCII string, for instance).
sigveseb 0:ce4be867658b 21 *
sigveseb 0:ce4be867658b 22 * The coder uses a brute force approach to finding string matches in the
sigveseb 0:ce4be867658b 23 * history buffer (or "sliding window", if you wish), which is very, very
sigveseb 0:ce4be867658b 24 * slow. I recon the complexity is somewhere between O(n^2) and O(n^3),
sigveseb 0:ce4be867658b 25 * depending on the input data.
sigveseb 0:ce4be867658b 26 *
sigveseb 0:ce4be867658b 27 * There is also a faster implementation that uses a large working buffer
sigveseb 0:ce4be867658b 28 * in which a "jump table" is stored, which is used to quickly find
sigveseb 0:ce4be867658b 29 * possible string matches (see the source code for LZ_CompressFast() for
sigveseb 0:ce4be867658b 30 * more information). The faster method is an order of magnitude faster,
sigveseb 0:ce4be867658b 31 * but still quite slow compared to other compression methods.
sigveseb 0:ce4be867658b 32 *
sigveseb 0:ce4be867658b 33 * The upside is that decompression is very fast, and the compression ratio
sigveseb 0:ce4be867658b 34 * is often very good.
sigveseb 0:ce4be867658b 35 *
sigveseb 0:ce4be867658b 36 * The reference to a string is coded as a (length,offset) pair, where the
sigveseb 0:ce4be867658b 37 * length indicates the length of the string, and the offset gives the
sigveseb 0:ce4be867658b 38 * offset from the current data position. To distinguish between string
sigveseb 0:ce4be867658b 39 * references and literal strings (uncompressed bytes), a string reference
sigveseb 0:ce4be867658b 40 * is preceded by a marker byte, which is chosen as the least common byte
sigveseb 0:ce4be867658b 41 * symbol in the input data stream (this marker byte is stored in the
sigveseb 0:ce4be867658b 42 * output stream as the first byte).
sigveseb 0:ce4be867658b 43 *
sigveseb 0:ce4be867658b 44 * Occurrences of the marker byte in the stream are encoded as the marker
sigveseb 0:ce4be867658b 45 * byte followed by a zero byte, which means that occurrences of the marker
sigveseb 0:ce4be867658b 46 * byte have to be coded with two bytes.
sigveseb 0:ce4be867658b 47 *
sigveseb 0:ce4be867658b 48 * The lengths and offsets are coded in a variable length fashion, allowing
sigveseb 0:ce4be867658b 49 * values of any magnitude (up to 4294967295 in this implementation).
sigveseb 0:ce4be867658b 50 *
sigveseb 0:ce4be867658b 51 * With this compression scheme, the worst case compression result is
sigveseb 0:ce4be867658b 52 * (257/256)*insize + 1.
sigveseb 0:ce4be867658b 53 *
sigveseb 0:ce4be867658b 54 *-------------------------------------------------------------------------
sigveseb 0:ce4be867658b 55 * Copyright (c) 2003-2006 Marcus Geelnard
sigveseb 0:ce4be867658b 56 *
sigveseb 0:ce4be867658b 57 * This software is provided 'as-is', without any express or implied
sigveseb 0:ce4be867658b 58 * warranty. In no event will the authors be held liable for any damages
sigveseb 0:ce4be867658b 59 * arising from the use of this software.
sigveseb 0:ce4be867658b 60 *
sigveseb 0:ce4be867658b 61 * Permission is granted to anyone to use this software for any purpose,
sigveseb 0:ce4be867658b 62 * including commercial applications, and to alter it and redistribute it
sigveseb 0:ce4be867658b 63 * freely, subject to the following restrictions:
sigveseb 0:ce4be867658b 64 *
sigveseb 0:ce4be867658b 65 * 1. The origin of this software must not be misrepresented; you must not
sigveseb 0:ce4be867658b 66 * claim that you wrote the original software. If you use this software
sigveseb 0:ce4be867658b 67 * in a product, an acknowledgment in the product documentation would
sigveseb 0:ce4be867658b 68 * be appreciated but is not required.
sigveseb 0:ce4be867658b 69 *
sigveseb 0:ce4be867658b 70 * 2. Altered source versions must be plainly marked as such, and must not
sigveseb 0:ce4be867658b 71 * be misrepresented as being the original software.
sigveseb 0:ce4be867658b 72 *
sigveseb 0:ce4be867658b 73 * 3. This notice may not be removed or altered from any source
sigveseb 0:ce4be867658b 74 * distribution.
sigveseb 0:ce4be867658b 75 *
sigveseb 0:ce4be867658b 76 * Marcus Geelnard
sigveseb 0:ce4be867658b 77 * marcus.geelnard at home.se
sigveseb 0:ce4be867658b 78 *************************************************************************/
sigveseb 0:ce4be867658b 79
sigveseb 0:ce4be867658b 80
sigveseb 0:ce4be867658b 81 /*************************************************************************
sigveseb 0:ce4be867658b 82 * Constants used for LZ77 coding
sigveseb 0:ce4be867658b 83 *************************************************************************/
sigveseb 0:ce4be867658b 84
sigveseb 0:ce4be867658b 85 /* Maximum offset (can be any size < 2^31). Lower values give faster
sigveseb 0:ce4be867658b 86 compression, while higher values gives better compression. The default
sigveseb 0:ce4be867658b 87 value of 100000 is quite high. Experiment to see what works best for
sigveseb 0:ce4be867658b 88 you. */
sigveseb 0:ce4be867658b 89 #define LZ_MAX_OFFSET 100000
sigveseb 0:ce4be867658b 90
sigveseb 0:ce4be867658b 91
sigveseb 0:ce4be867658b 92
sigveseb 0:ce4be867658b 93 /*************************************************************************
sigveseb 0:ce4be867658b 94 * INTERNAL FUNCTIONS *
sigveseb 0:ce4be867658b 95 *************************************************************************/
sigveseb 0:ce4be867658b 96
sigveseb 0:ce4be867658b 97
sigveseb 0:ce4be867658b 98 /*************************************************************************
sigveseb 0:ce4be867658b 99 * _LZ_StringCompare() - Return maximum length string match.
sigveseb 0:ce4be867658b 100 *************************************************************************/
sigveseb 0:ce4be867658b 101
sigveseb 0:ce4be867658b 102 static unsigned int _LZ_StringCompare( unsigned char * str1,
sigveseb 0:ce4be867658b 103 unsigned char * str2, unsigned int minlen, unsigned int maxlen )
sigveseb 0:ce4be867658b 104 {
sigveseb 0:ce4be867658b 105 unsigned int len;
sigveseb 0:ce4be867658b 106
sigveseb 0:ce4be867658b 107 for( len = minlen; (len < maxlen) && (str1[len] == str2[len]); ++ len );
sigveseb 0:ce4be867658b 108
sigveseb 0:ce4be867658b 109 return len;
sigveseb 0:ce4be867658b 110 }
sigveseb 0:ce4be867658b 111
sigveseb 0:ce4be867658b 112
sigveseb 0:ce4be867658b 113 /*************************************************************************
sigveseb 0:ce4be867658b 114 * _LZ_WriteVarSize() - Write unsigned integer with variable number of
sigveseb 0:ce4be867658b 115 * bytes depending on value.
sigveseb 0:ce4be867658b 116 *************************************************************************/
sigveseb 0:ce4be867658b 117
sigveseb 0:ce4be867658b 118 static int _LZ_WriteVarSize( unsigned int x, unsigned char * buf )
sigveseb 0:ce4be867658b 119 {
sigveseb 0:ce4be867658b 120 unsigned int y;
sigveseb 0:ce4be867658b 121 int num_bytes, i, b;
sigveseb 0:ce4be867658b 122
sigveseb 0:ce4be867658b 123 /* Determine number of bytes needed to store the number x */
sigveseb 0:ce4be867658b 124 y = x >> 3;
sigveseb 0:ce4be867658b 125 for( num_bytes = 5; num_bytes >= 2; -- num_bytes )
sigveseb 0:ce4be867658b 126 {
sigveseb 0:ce4be867658b 127 if( y & 0xfe000000 ) break;
sigveseb 0:ce4be867658b 128 y <<= 7;
sigveseb 0:ce4be867658b 129 }
sigveseb 0:ce4be867658b 130
sigveseb 0:ce4be867658b 131 /* Write all bytes, seven bits in each, with 8:th bit set for all */
sigveseb 0:ce4be867658b 132 /* but the last byte. */
sigveseb 0:ce4be867658b 133 for( i = num_bytes-1; i >= 0; -- i )
sigveseb 0:ce4be867658b 134 {
sigveseb 0:ce4be867658b 135 b = (x >> (i*7)) & 0x0000007f;
sigveseb 0:ce4be867658b 136 if( i > 0 )
sigveseb 0:ce4be867658b 137 {
sigveseb 0:ce4be867658b 138 b |= 0x00000080;
sigveseb 0:ce4be867658b 139 }
sigveseb 0:ce4be867658b 140 *buf ++ = (unsigned char) b;
sigveseb 0:ce4be867658b 141 }
sigveseb 0:ce4be867658b 142
sigveseb 0:ce4be867658b 143 /* Return number of bytes written */
sigveseb 0:ce4be867658b 144 return num_bytes;
sigveseb 0:ce4be867658b 145 }
sigveseb 0:ce4be867658b 146
sigveseb 0:ce4be867658b 147
sigveseb 0:ce4be867658b 148 /*************************************************************************
sigveseb 0:ce4be867658b 149 * _LZ_ReadVarSize() - Read unsigned integer with variable number of
sigveseb 0:ce4be867658b 150 * bytes depending on value.
sigveseb 0:ce4be867658b 151 *************************************************************************/
sigveseb 0:ce4be867658b 152
sigveseb 0:ce4be867658b 153 static int _LZ_ReadVarSize( unsigned int * x, unsigned char * buf )
sigveseb 0:ce4be867658b 154 {
sigveseb 0:ce4be867658b 155 unsigned int y, b, num_bytes;
sigveseb 0:ce4be867658b 156
sigveseb 0:ce4be867658b 157 /* Read complete value (stop when byte contains zero in 8:th bit) */
sigveseb 0:ce4be867658b 158 y = 0;
sigveseb 0:ce4be867658b 159 num_bytes = 0;
sigveseb 0:ce4be867658b 160 do
sigveseb 0:ce4be867658b 161 {
sigveseb 0:ce4be867658b 162 b = (unsigned int) (*buf ++);
sigveseb 0:ce4be867658b 163 y = (y << 7) | (b & 0x0000007f);
sigveseb 0:ce4be867658b 164 ++ num_bytes;
sigveseb 0:ce4be867658b 165 }
sigveseb 0:ce4be867658b 166 while( b & 0x00000080 );
sigveseb 0:ce4be867658b 167
sigveseb 0:ce4be867658b 168 /* Store value in x */
sigveseb 0:ce4be867658b 169 *x = y;
sigveseb 0:ce4be867658b 170
sigveseb 0:ce4be867658b 171 /* Return number of bytes read */
sigveseb 0:ce4be867658b 172 return num_bytes;
sigveseb 0:ce4be867658b 173 }
sigveseb 0:ce4be867658b 174
sigveseb 0:ce4be867658b 175
sigveseb 0:ce4be867658b 176
sigveseb 0:ce4be867658b 177 /*************************************************************************
sigveseb 0:ce4be867658b 178 * PUBLIC FUNCTIONS *
sigveseb 0:ce4be867658b 179 *************************************************************************/
sigveseb 0:ce4be867658b 180
sigveseb 0:ce4be867658b 181
sigveseb 0:ce4be867658b 182 /*************************************************************************
sigveseb 0:ce4be867658b 183 * LZ_Compress() - Compress a block of data using an LZ77 coder.
sigveseb 0:ce4be867658b 184 * in - Input (uncompressed) buffer.
sigveseb 0:ce4be867658b 185 * out - Output (compressed) buffer. This buffer must be 0.4% larger
sigveseb 0:ce4be867658b 186 * than the input buffer, plus one byte.
sigveseb 0:ce4be867658b 187 * insize - Number of input bytes.
sigveseb 0:ce4be867658b 188 * The function returns the size of the compressed data.
sigveseb 0:ce4be867658b 189 *************************************************************************/
sigveseb 0:ce4be867658b 190
sigveseb 0:ce4be867658b 191 int LZ_Compress( unsigned char *in, unsigned char *out,
sigveseb 0:ce4be867658b 192 unsigned int insize )
sigveseb 0:ce4be867658b 193 {
sigveseb 0:ce4be867658b 194 unsigned char marker, symbol;
sigveseb 0:ce4be867658b 195 unsigned int inpos, outpos, bytesleft, i;
sigveseb 0:ce4be867658b 196 unsigned int maxoffset, offset, bestoffset;
sigveseb 0:ce4be867658b 197 unsigned int maxlength, length, bestlength;
sigveseb 0:ce4be867658b 198 unsigned int histogram[ 256 ];
sigveseb 0:ce4be867658b 199 unsigned char *ptr1, *ptr2;
sigveseb 0:ce4be867658b 200
sigveseb 0:ce4be867658b 201 /* Do we have anything to compress? */
sigveseb 0:ce4be867658b 202 if( insize < 1 )
sigveseb 0:ce4be867658b 203 {
sigveseb 0:ce4be867658b 204 return 0;
sigveseb 0:ce4be867658b 205 }
sigveseb 0:ce4be867658b 206
sigveseb 0:ce4be867658b 207 /* Create histogram */
sigveseb 0:ce4be867658b 208 for( i = 0; i < 256; ++ i )
sigveseb 0:ce4be867658b 209 {
sigveseb 0:ce4be867658b 210 histogram[ i ] = 0;
sigveseb 0:ce4be867658b 211 }
sigveseb 0:ce4be867658b 212 for( i = 0; i < insize; ++ i )
sigveseb 0:ce4be867658b 213 {
sigveseb 0:ce4be867658b 214 ++ histogram[ in[ i ] ];
sigveseb 0:ce4be867658b 215 }
sigveseb 0:ce4be867658b 216
sigveseb 0:ce4be867658b 217 /* Find the least common byte, and use it as the marker symbol */
sigveseb 0:ce4be867658b 218 marker = 0;
sigveseb 0:ce4be867658b 219 for( i = 1; i < 256; ++ i )
sigveseb 0:ce4be867658b 220 {
sigveseb 0:ce4be867658b 221 if( histogram[ i ] < histogram[ marker ] )
sigveseb 0:ce4be867658b 222 {
sigveseb 0:ce4be867658b 223 marker = i;
sigveseb 0:ce4be867658b 224 }
sigveseb 0:ce4be867658b 225 }
sigveseb 0:ce4be867658b 226
sigveseb 0:ce4be867658b 227 /* Remember the marker symbol for the decoder */
sigveseb 0:ce4be867658b 228 out[ 0 ] = marker;
sigveseb 0:ce4be867658b 229
sigveseb 0:ce4be867658b 230 /* Start of compression */
sigveseb 0:ce4be867658b 231 inpos = 0;
sigveseb 0:ce4be867658b 232 outpos = 1;
sigveseb 0:ce4be867658b 233
sigveseb 0:ce4be867658b 234 /* Main compression loop */
sigveseb 0:ce4be867658b 235 bytesleft = insize;
sigveseb 0:ce4be867658b 236 do
sigveseb 0:ce4be867658b 237 {
sigveseb 0:ce4be867658b 238 /* Determine most distant position */
sigveseb 0:ce4be867658b 239 if( inpos > LZ_MAX_OFFSET ) maxoffset = LZ_MAX_OFFSET;
sigveseb 0:ce4be867658b 240 else maxoffset = inpos;
sigveseb 0:ce4be867658b 241
sigveseb 0:ce4be867658b 242 /* Get pointer to current position */
sigveseb 0:ce4be867658b 243 ptr1 = &in[ inpos ];
sigveseb 0:ce4be867658b 244
sigveseb 0:ce4be867658b 245 /* Search history window for maximum length string match */
sigveseb 0:ce4be867658b 246 bestlength = 3;
sigveseb 0:ce4be867658b 247 bestoffset = 0;
sigveseb 0:ce4be867658b 248 for( offset = 3; offset <= maxoffset; ++ offset )
sigveseb 0:ce4be867658b 249 {
sigveseb 0:ce4be867658b 250 /* Get pointer to candidate string */
sigveseb 0:ce4be867658b 251 ptr2 = &ptr1[ -(int)offset ];
sigveseb 0:ce4be867658b 252
sigveseb 0:ce4be867658b 253 /* Quickly determine if this is a candidate (for speed) */
sigveseb 0:ce4be867658b 254 if( (ptr1[ 0 ] == ptr2[ 0 ]) &&
sigveseb 0:ce4be867658b 255 (ptr1[ bestlength ] == ptr2[ bestlength ]) )
sigveseb 0:ce4be867658b 256 {
sigveseb 0:ce4be867658b 257 /* Determine maximum length for this offset */
sigveseb 0:ce4be867658b 258 maxlength = (bytesleft < offset ? bytesleft : offset);
sigveseb 0:ce4be867658b 259
sigveseb 0:ce4be867658b 260 /* Count maximum length match at this offset */
sigveseb 0:ce4be867658b 261 length = _LZ_StringCompare( ptr1, ptr2, 0, maxlength );
sigveseb 0:ce4be867658b 262
sigveseb 0:ce4be867658b 263 /* Better match than any previous match? */
sigveseb 0:ce4be867658b 264 if( length > bestlength )
sigveseb 0:ce4be867658b 265 {
sigveseb 0:ce4be867658b 266 bestlength = length;
sigveseb 0:ce4be867658b 267 bestoffset = offset;
sigveseb 0:ce4be867658b 268 }
sigveseb 0:ce4be867658b 269 }
sigveseb 0:ce4be867658b 270 }
sigveseb 0:ce4be867658b 271
sigveseb 0:ce4be867658b 272 /* Was there a good enough match? */
sigveseb 0:ce4be867658b 273 if( (bestlength >= 8) ||
sigveseb 0:ce4be867658b 274 ((bestlength == 4) && (bestoffset <= 0x0000007f)) ||
sigveseb 0:ce4be867658b 275 ((bestlength == 5) && (bestoffset <= 0x00003fff)) ||
sigveseb 0:ce4be867658b 276 ((bestlength == 6) && (bestoffset <= 0x001fffff)) ||
sigveseb 0:ce4be867658b 277 ((bestlength == 7) && (bestoffset <= 0x0fffffff)) )
sigveseb 0:ce4be867658b 278 {
sigveseb 0:ce4be867658b 279 out[ outpos ++ ] = (unsigned char) marker;
sigveseb 0:ce4be867658b 280 outpos += _LZ_WriteVarSize( bestlength, &out[ outpos ] );
sigveseb 0:ce4be867658b 281 outpos += _LZ_WriteVarSize( bestoffset, &out[ outpos ] );
sigveseb 0:ce4be867658b 282 inpos += bestlength;
sigveseb 0:ce4be867658b 283 bytesleft -= bestlength;
sigveseb 0:ce4be867658b 284 }
sigveseb 0:ce4be867658b 285 else
sigveseb 0:ce4be867658b 286 {
sigveseb 0:ce4be867658b 287 /* Output single byte (or two bytes if marker byte) */
sigveseb 0:ce4be867658b 288 symbol = in[ inpos ++ ];
sigveseb 0:ce4be867658b 289 out[ outpos ++ ] = symbol;
sigveseb 0:ce4be867658b 290 if( symbol == marker )
sigveseb 0:ce4be867658b 291 {
sigveseb 0:ce4be867658b 292 out[ outpos ++ ] = 0;
sigveseb 0:ce4be867658b 293 }
sigveseb 0:ce4be867658b 294 -- bytesleft;
sigveseb 0:ce4be867658b 295 }
sigveseb 0:ce4be867658b 296 }
sigveseb 0:ce4be867658b 297 while( bytesleft > 3 );
sigveseb 0:ce4be867658b 298
sigveseb 0:ce4be867658b 299 /* Dump remaining bytes, if any */
sigveseb 0:ce4be867658b 300 while( inpos < insize )
sigveseb 0:ce4be867658b 301 {
sigveseb 0:ce4be867658b 302 if( in[ inpos ] == marker )
sigveseb 0:ce4be867658b 303 {
sigveseb 0:ce4be867658b 304 out[ outpos ++ ] = marker;
sigveseb 0:ce4be867658b 305 out[ outpos ++ ] = 0;
sigveseb 0:ce4be867658b 306 }
sigveseb 0:ce4be867658b 307 else
sigveseb 0:ce4be867658b 308 {
sigveseb 0:ce4be867658b 309 out[ outpos ++ ] = in[ inpos ];
sigveseb 0:ce4be867658b 310 }
sigveseb 0:ce4be867658b 311 ++ inpos;
sigveseb 0:ce4be867658b 312 }
sigveseb 0:ce4be867658b 313
sigveseb 0:ce4be867658b 314 return outpos;
sigveseb 0:ce4be867658b 315 }
sigveseb 0:ce4be867658b 316
sigveseb 0:ce4be867658b 317
sigveseb 0:ce4be867658b 318 /*************************************************************************
sigveseb 0:ce4be867658b 319 * LZ_CompressFast() - Compress a block of data using an LZ77 coder.
sigveseb 0:ce4be867658b 320 * in - Input (uncompressed) buffer.
sigveseb 0:ce4be867658b 321 * out - Output (compressed) buffer. This buffer must be 0.4% larger
sigveseb 0:ce4be867658b 322 * than the input buffer, plus one byte.
sigveseb 0:ce4be867658b 323 * insize - Number of input bytes.
sigveseb 0:ce4be867658b 324 * work - Pointer to a temporary buffer (internal working buffer), which
sigveseb 0:ce4be867658b 325 * must be able to hold (insize+65536) unsigned integers.
sigveseb 0:ce4be867658b 326 * The function returns the size of the compressed data.
sigveseb 0:ce4be867658b 327 *************************************************************************/
sigveseb 0:ce4be867658b 328
sigveseb 0:ce4be867658b 329 int LZ_CompressFast( unsigned char *in, unsigned char *out,
sigveseb 0:ce4be867658b 330 unsigned int insize, unsigned int *work )
sigveseb 0:ce4be867658b 331 {
sigveseb 0:ce4be867658b 332 unsigned char marker, symbol;
sigveseb 0:ce4be867658b 333 unsigned int inpos, outpos, bytesleft, i, index, symbols;
sigveseb 0:ce4be867658b 334 unsigned int offset, bestoffset;
sigveseb 0:ce4be867658b 335 unsigned int maxlength, length, bestlength;
sigveseb 0:ce4be867658b 336 unsigned int histogram[ 256 ], *lastindex, *jumptable;
sigveseb 0:ce4be867658b 337 unsigned char *ptr1, *ptr2;
sigveseb 0:ce4be867658b 338
sigveseb 0:ce4be867658b 339 /* Do we have anything to compress? */
sigveseb 0:ce4be867658b 340 if( insize < 1 )
sigveseb 0:ce4be867658b 341 {
sigveseb 0:ce4be867658b 342 return 0;
sigveseb 0:ce4be867658b 343 }
sigveseb 0:ce4be867658b 344
sigveseb 0:ce4be867658b 345 /* Assign arrays to the working area */
sigveseb 0:ce4be867658b 346 lastindex = work;
sigveseb 0:ce4be867658b 347 jumptable = &work[ 65536 ];
sigveseb 0:ce4be867658b 348
sigveseb 0:ce4be867658b 349 /* Build a "jump table". Here is how the jump table works:
sigveseb 0:ce4be867658b 350 jumptable[i] points to the nearest previous occurrence of the same
sigveseb 0:ce4be867658b 351 symbol pair as in[i]:in[i+1], so in[i] == in[jumptable[i]] and
sigveseb 0:ce4be867658b 352 in[i+1] == in[jumptable[i]+1], and so on... Following the jump table
sigveseb 0:ce4be867658b 353 gives a dramatic boost for the string search'n'match loop compared
sigveseb 0:ce4be867658b 354 to doing a brute force search. The jump table is built in O(n) time,
sigveseb 0:ce4be867658b 355 so it is a cheap operation in terms of time, but it is expensice in
sigveseb 0:ce4be867658b 356 terms of memory consumption. */
sigveseb 0:ce4be867658b 357 for( i = 0; i < 65536; ++ i )
sigveseb 0:ce4be867658b 358 {
sigveseb 0:ce4be867658b 359 lastindex[ i ] = 0xffffffff;
sigveseb 0:ce4be867658b 360 }
sigveseb 0:ce4be867658b 361 for( i = 0; i < insize-1; ++ i )
sigveseb 0:ce4be867658b 362 {
sigveseb 0:ce4be867658b 363 symbols = (((unsigned int)in[i]) << 8) | ((unsigned int)in[i+1]);
sigveseb 0:ce4be867658b 364 index = lastindex[ symbols ];
sigveseb 0:ce4be867658b 365 lastindex[ symbols ] = i;
sigveseb 0:ce4be867658b 366 jumptable[ i ] = index;
sigveseb 0:ce4be867658b 367 }
sigveseb 0:ce4be867658b 368 jumptable[ insize-1 ] = 0xffffffff;
sigveseb 0:ce4be867658b 369
sigveseb 0:ce4be867658b 370 /* Create histogram */
sigveseb 0:ce4be867658b 371 for( i = 0; i < 256; ++ i )
sigveseb 0:ce4be867658b 372 {
sigveseb 0:ce4be867658b 373 histogram[ i ] = 0;
sigveseb 0:ce4be867658b 374 }
sigveseb 0:ce4be867658b 375 for( i = 0; i < insize; ++ i )
sigveseb 0:ce4be867658b 376 {
sigveseb 0:ce4be867658b 377 ++ histogram[ in[ i ] ];
sigveseb 0:ce4be867658b 378 }
sigveseb 0:ce4be867658b 379
sigveseb 0:ce4be867658b 380 /* Find the least common byte, and use it as the marker symbol */
sigveseb 0:ce4be867658b 381 marker = 0;
sigveseb 0:ce4be867658b 382 for( i = 1; i < 256; ++ i )
sigveseb 0:ce4be867658b 383 {
sigveseb 0:ce4be867658b 384 if( histogram[ i ] < histogram[ marker ] )
sigveseb 0:ce4be867658b 385 {
sigveseb 0:ce4be867658b 386 marker = i;
sigveseb 0:ce4be867658b 387 }
sigveseb 0:ce4be867658b 388 }
sigveseb 0:ce4be867658b 389
sigveseb 0:ce4be867658b 390 /* Remember the marker symbol for the decoder */
sigveseb 0:ce4be867658b 391 out[ 0 ] = marker;
sigveseb 0:ce4be867658b 392
sigveseb 0:ce4be867658b 393 /* Start of compression */
sigveseb 0:ce4be867658b 394 inpos = 0;
sigveseb 0:ce4be867658b 395 outpos = 1;
sigveseb 0:ce4be867658b 396
sigveseb 0:ce4be867658b 397 /* Main compression loop */
sigveseb 0:ce4be867658b 398 bytesleft = insize;
sigveseb 0:ce4be867658b 399 do
sigveseb 0:ce4be867658b 400 {
sigveseb 0:ce4be867658b 401 /* Get pointer to current position */
sigveseb 0:ce4be867658b 402 ptr1 = &in[ inpos ];
sigveseb 0:ce4be867658b 403
sigveseb 0:ce4be867658b 404 /* Search history window for maximum length string match */
sigveseb 0:ce4be867658b 405 bestlength = 3;
sigveseb 0:ce4be867658b 406 bestoffset = 0;
sigveseb 0:ce4be867658b 407 index = jumptable[ inpos ];
sigveseb 0:ce4be867658b 408 while( (index != 0xffffffff) && ((inpos - index) < LZ_MAX_OFFSET) )
sigveseb 0:ce4be867658b 409 {
sigveseb 0:ce4be867658b 410 /* Get pointer to candidate string */
sigveseb 0:ce4be867658b 411 ptr2 = &in[ index ];
sigveseb 0:ce4be867658b 412
sigveseb 0:ce4be867658b 413 /* Quickly determine if this is a candidate (for speed) */
sigveseb 0:ce4be867658b 414 if( ptr2[ bestlength ] == ptr1[ bestlength ] )
sigveseb 0:ce4be867658b 415 {
sigveseb 0:ce4be867658b 416 /* Determine maximum length for this offset */
sigveseb 0:ce4be867658b 417 offset = inpos - index;
sigveseb 0:ce4be867658b 418 maxlength = (bytesleft < offset ? bytesleft : offset);
sigveseb 0:ce4be867658b 419
sigveseb 0:ce4be867658b 420 /* Count maximum length match at this offset */
sigveseb 0:ce4be867658b 421 length = _LZ_StringCompare( ptr1, ptr2, 2, maxlength );
sigveseb 0:ce4be867658b 422
sigveseb 0:ce4be867658b 423 /* Better match than any previous match? */
sigveseb 0:ce4be867658b 424 if( length > bestlength )
sigveseb 0:ce4be867658b 425 {
sigveseb 0:ce4be867658b 426 bestlength = length;
sigveseb 0:ce4be867658b 427 bestoffset = offset;
sigveseb 0:ce4be867658b 428 }
sigveseb 0:ce4be867658b 429 }
sigveseb 0:ce4be867658b 430
sigveseb 0:ce4be867658b 431 /* Get next possible index from jump table */
sigveseb 0:ce4be867658b 432 index = jumptable[ index ];
sigveseb 0:ce4be867658b 433 }
sigveseb 0:ce4be867658b 434
sigveseb 0:ce4be867658b 435 /* Was there a good enough match? */
sigveseb 0:ce4be867658b 436 if( (bestlength >= 8) ||
sigveseb 0:ce4be867658b 437 ((bestlength == 4) && (bestoffset <= 0x0000007f)) ||
sigveseb 0:ce4be867658b 438 ((bestlength == 5) && (bestoffset <= 0x00003fff)) ||
sigveseb 0:ce4be867658b 439 ((bestlength == 6) && (bestoffset <= 0x001fffff)) ||
sigveseb 0:ce4be867658b 440 ((bestlength == 7) && (bestoffset <= 0x0fffffff)) )
sigveseb 0:ce4be867658b 441 {
sigveseb 0:ce4be867658b 442 out[ outpos ++ ] = (unsigned char) marker;
sigveseb 0:ce4be867658b 443 outpos += _LZ_WriteVarSize( bestlength, &out[ outpos ] );
sigveseb 0:ce4be867658b 444 outpos += _LZ_WriteVarSize( bestoffset, &out[ outpos ] );
sigveseb 0:ce4be867658b 445 inpos += bestlength;
sigveseb 0:ce4be867658b 446 bytesleft -= bestlength;
sigveseb 0:ce4be867658b 447 }
sigveseb 0:ce4be867658b 448 else
sigveseb 0:ce4be867658b 449 {
sigveseb 0:ce4be867658b 450 /* Output single byte (or two bytes if marker byte) */
sigveseb 0:ce4be867658b 451 symbol = in[ inpos ++ ];
sigveseb 0:ce4be867658b 452 out[ outpos ++ ] = symbol;
sigveseb 0:ce4be867658b 453 if( symbol == marker )
sigveseb 0:ce4be867658b 454 {
sigveseb 0:ce4be867658b 455 out[ outpos ++ ] = 0;
sigveseb 0:ce4be867658b 456 }
sigveseb 0:ce4be867658b 457 -- bytesleft;
sigveseb 0:ce4be867658b 458 }
sigveseb 0:ce4be867658b 459 }
sigveseb 0:ce4be867658b 460 while( bytesleft > 3 );
sigveseb 0:ce4be867658b 461
sigveseb 0:ce4be867658b 462 /* Dump remaining bytes, if any */
sigveseb 0:ce4be867658b 463 while( inpos < insize )
sigveseb 0:ce4be867658b 464 {
sigveseb 0:ce4be867658b 465 if( in[ inpos ] == marker )
sigveseb 0:ce4be867658b 466 {
sigveseb 0:ce4be867658b 467 out[ outpos ++ ] = marker;
sigveseb 0:ce4be867658b 468 out[ outpos ++ ] = 0;
sigveseb 0:ce4be867658b 469 }
sigveseb 0:ce4be867658b 470 else
sigveseb 0:ce4be867658b 471 {
sigveseb 0:ce4be867658b 472 out[ outpos ++ ] = in[ inpos ];
sigveseb 0:ce4be867658b 473 }
sigveseb 0:ce4be867658b 474 ++ inpos;
sigveseb 0:ce4be867658b 475 }
sigveseb 0:ce4be867658b 476
sigveseb 0:ce4be867658b 477 return outpos;
sigveseb 0:ce4be867658b 478 }
sigveseb 0:ce4be867658b 479
sigveseb 0:ce4be867658b 480
sigveseb 0:ce4be867658b 481 /*************************************************************************
sigveseb 0:ce4be867658b 482 * LZ_Uncompress() - Uncompress a block of data using an LZ77 decoder.
sigveseb 0:ce4be867658b 483 * in - Input (compressed) buffer.
sigveseb 0:ce4be867658b 484 * out - Output (uncompressed) buffer. This buffer must be large
sigveseb 0:ce4be867658b 485 * enough to hold the uncompressed data.
sigveseb 0:ce4be867658b 486 * insize - Number of input bytes.
sigveseb 0:ce4be867658b 487 *************************************************************************/
sigveseb 0:ce4be867658b 488
sigveseb 0:ce4be867658b 489 void LZ_Uncompress( unsigned char *in, unsigned char *out,
sigveseb 0:ce4be867658b 490 unsigned int insize )
sigveseb 0:ce4be867658b 491 {
sigveseb 0:ce4be867658b 492 unsigned char marker, symbol;
sigveseb 0:ce4be867658b 493 unsigned int i, inpos, outpos, length, offset;
sigveseb 0:ce4be867658b 494
sigveseb 0:ce4be867658b 495 /* Do we have anything to uncompress? */
sigveseb 0:ce4be867658b 496 if( insize < 1 )
sigveseb 0:ce4be867658b 497 {
sigveseb 0:ce4be867658b 498 return;
sigveseb 0:ce4be867658b 499 }
sigveseb 0:ce4be867658b 500
sigveseb 0:ce4be867658b 501 /* Get marker symbol from input stream */
sigveseb 0:ce4be867658b 502 marker = in[ 0 ];
sigveseb 0:ce4be867658b 503 inpos = 1;
sigveseb 0:ce4be867658b 504
sigveseb 0:ce4be867658b 505 /* Main decompression loop */
sigveseb 0:ce4be867658b 506 outpos = 0;
sigveseb 0:ce4be867658b 507 do
sigveseb 0:ce4be867658b 508 {
sigveseb 0:ce4be867658b 509 symbol = in[ inpos ++ ];
sigveseb 0:ce4be867658b 510 if( symbol == marker )
sigveseb 0:ce4be867658b 511 {
sigveseb 0:ce4be867658b 512 /* We had a marker byte */
sigveseb 0:ce4be867658b 513 if( in[ inpos ] == 0 )
sigveseb 0:ce4be867658b 514 {
sigveseb 0:ce4be867658b 515 /* It was a single occurrence of the marker byte */
sigveseb 0:ce4be867658b 516 out[ outpos ++ ] = marker;
sigveseb 0:ce4be867658b 517 ++ inpos;
sigveseb 0:ce4be867658b 518 }
sigveseb 0:ce4be867658b 519 else
sigveseb 0:ce4be867658b 520 {
sigveseb 0:ce4be867658b 521 /* Extract true length and offset */
sigveseb 0:ce4be867658b 522 inpos += _LZ_ReadVarSize( &length, &in[ inpos ] );
sigveseb 0:ce4be867658b 523 inpos += _LZ_ReadVarSize( &offset, &in[ inpos ] );
sigveseb 0:ce4be867658b 524
sigveseb 0:ce4be867658b 525 /* Copy corresponding data from history window */
sigveseb 0:ce4be867658b 526 for( i = 0; i < length; ++ i )
sigveseb 0:ce4be867658b 527 {
sigveseb 0:ce4be867658b 528 out[ outpos ] = out[ outpos - offset ];
sigveseb 0:ce4be867658b 529 ++ outpos;
sigveseb 0:ce4be867658b 530 }
sigveseb 0:ce4be867658b 531 }
sigveseb 0:ce4be867658b 532 }
sigveseb 0:ce4be867658b 533 else
sigveseb 0:ce4be867658b 534 {
sigveseb 0:ce4be867658b 535 /* No marker, plain copy */
sigveseb 0:ce4be867658b 536 out[ outpos ++ ] = symbol;
sigveseb 0:ce4be867658b 537 }
sigveseb 0:ce4be867658b 538 }
sigveseb 0:ce4be867658b 539 while( inpos < insize );
sigveseb 0:ce4be867658b 540 }