Simple light-weight LZ compression library.
lz.c@0:ce4be867658b, 2014-07-17 (annotated)
- Committer:
- sigveseb
- Date:
- Thu Jul 17 14:16:11 2014 +0000
- Revision:
- 0:ce4be867658b
-
Who changed what in which revision?
User | Revision | Line number | New contents of line |
---|---|---|---|
sigveseb | 0:ce4be867658b | 1 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 2 | * Name: lz.c |
sigveseb | 0:ce4be867658b | 3 | * Author: Marcus Geelnard |
sigveseb | 0:ce4be867658b | 4 | * Description: LZ77 coder/decoder implementation. |
sigveseb | 0:ce4be867658b | 5 | * Reentrant: Yes |
sigveseb | 0:ce4be867658b | 6 | * |
sigveseb | 0:ce4be867658b | 7 | * The LZ77 compression scheme is a substitutional compression scheme |
sigveseb | 0:ce4be867658b | 8 | * proposed by Abraham Lempel and Jakob Ziv in 1977. It is very simple in |
sigveseb | 0:ce4be867658b | 9 | * its design, and uses no fancy bit level compression. |
sigveseb | 0:ce4be867658b | 10 | * |
sigveseb | 0:ce4be867658b | 11 | * This is my first attempt at an implementation of a LZ77 code/decoder. |
sigveseb | 0:ce4be867658b | 12 | * |
sigveseb | 0:ce4be867658b | 13 | * The principle of the LZ77 compression algorithm is to store repeated |
sigveseb | 0:ce4be867658b | 14 | * occurrences of strings as references to previous occurrences of the same |
sigveseb | 0:ce4be867658b | 15 | * string. The point is that the reference consumes less space than the |
sigveseb | 0:ce4be867658b | 16 | * string itself, provided that the string is long enough (in this |
sigveseb | 0:ce4be867658b | 17 | * implementation, the string has to be at least 4 bytes long, since the |
sigveseb | 0:ce4be867658b | 18 | * minimum coded reference is 3 bytes long). Also note that the term |
sigveseb | 0:ce4be867658b | 19 | * "string" refers to any kind of byte sequence (it does not have to be |
sigveseb | 0:ce4be867658b | 20 | * an ASCII string, for instance). |
sigveseb | 0:ce4be867658b | 21 | * |
sigveseb | 0:ce4be867658b | 22 | * The coder uses a brute force approach to finding string matches in the |
sigveseb | 0:ce4be867658b | 23 | * history buffer (or "sliding window", if you wish), which is very, very |
sigveseb | 0:ce4be867658b | 24 | * slow. I recon the complexity is somewhere between O(n^2) and O(n^3), |
sigveseb | 0:ce4be867658b | 25 | * depending on the input data. |
sigveseb | 0:ce4be867658b | 26 | * |
sigveseb | 0:ce4be867658b | 27 | * There is also a faster implementation that uses a large working buffer |
sigveseb | 0:ce4be867658b | 28 | * in which a "jump table" is stored, which is used to quickly find |
sigveseb | 0:ce4be867658b | 29 | * possible string matches (see the source code for LZ_CompressFast() for |
sigveseb | 0:ce4be867658b | 30 | * more information). The faster method is an order of magnitude faster, |
sigveseb | 0:ce4be867658b | 31 | * but still quite slow compared to other compression methods. |
sigveseb | 0:ce4be867658b | 32 | * |
sigveseb | 0:ce4be867658b | 33 | * The upside is that decompression is very fast, and the compression ratio |
sigveseb | 0:ce4be867658b | 34 | * is often very good. |
sigveseb | 0:ce4be867658b | 35 | * |
sigveseb | 0:ce4be867658b | 36 | * The reference to a string is coded as a (length,offset) pair, where the |
sigveseb | 0:ce4be867658b | 37 | * length indicates the length of the string, and the offset gives the |
sigveseb | 0:ce4be867658b | 38 | * offset from the current data position. To distinguish between string |
sigveseb | 0:ce4be867658b | 39 | * references and literal strings (uncompressed bytes), a string reference |
sigveseb | 0:ce4be867658b | 40 | * is preceded by a marker byte, which is chosen as the least common byte |
sigveseb | 0:ce4be867658b | 41 | * symbol in the input data stream (this marker byte is stored in the |
sigveseb | 0:ce4be867658b | 42 | * output stream as the first byte). |
sigveseb | 0:ce4be867658b | 43 | * |
sigveseb | 0:ce4be867658b | 44 | * Occurrences of the marker byte in the stream are encoded as the marker |
sigveseb | 0:ce4be867658b | 45 | * byte followed by a zero byte, which means that occurrences of the marker |
sigveseb | 0:ce4be867658b | 46 | * byte have to be coded with two bytes. |
sigveseb | 0:ce4be867658b | 47 | * |
sigveseb | 0:ce4be867658b | 48 | * The lengths and offsets are coded in a variable length fashion, allowing |
sigveseb | 0:ce4be867658b | 49 | * values of any magnitude (up to 4294967295 in this implementation). |
sigveseb | 0:ce4be867658b | 50 | * |
sigveseb | 0:ce4be867658b | 51 | * With this compression scheme, the worst case compression result is |
sigveseb | 0:ce4be867658b | 52 | * (257/256)*insize + 1. |
sigveseb | 0:ce4be867658b | 53 | * |
sigveseb | 0:ce4be867658b | 54 | *------------------------------------------------------------------------- |
sigveseb | 0:ce4be867658b | 55 | * Copyright (c) 2003-2006 Marcus Geelnard |
sigveseb | 0:ce4be867658b | 56 | * |
sigveseb | 0:ce4be867658b | 57 | * This software is provided 'as-is', without any express or implied |
sigveseb | 0:ce4be867658b | 58 | * warranty. In no event will the authors be held liable for any damages |
sigveseb | 0:ce4be867658b | 59 | * arising from the use of this software. |
sigveseb | 0:ce4be867658b | 60 | * |
sigveseb | 0:ce4be867658b | 61 | * Permission is granted to anyone to use this software for any purpose, |
sigveseb | 0:ce4be867658b | 62 | * including commercial applications, and to alter it and redistribute it |
sigveseb | 0:ce4be867658b | 63 | * freely, subject to the following restrictions: |
sigveseb | 0:ce4be867658b | 64 | * |
sigveseb | 0:ce4be867658b | 65 | * 1. The origin of this software must not be misrepresented; you must not |
sigveseb | 0:ce4be867658b | 66 | * claim that you wrote the original software. If you use this software |
sigveseb | 0:ce4be867658b | 67 | * in a product, an acknowledgment in the product documentation would |
sigveseb | 0:ce4be867658b | 68 | * be appreciated but is not required. |
sigveseb | 0:ce4be867658b | 69 | * |
sigveseb | 0:ce4be867658b | 70 | * 2. Altered source versions must be plainly marked as such, and must not |
sigveseb | 0:ce4be867658b | 71 | * be misrepresented as being the original software. |
sigveseb | 0:ce4be867658b | 72 | * |
sigveseb | 0:ce4be867658b | 73 | * 3. This notice may not be removed or altered from any source |
sigveseb | 0:ce4be867658b | 74 | * distribution. |
sigveseb | 0:ce4be867658b | 75 | * |
sigveseb | 0:ce4be867658b | 76 | * Marcus Geelnard |
sigveseb | 0:ce4be867658b | 77 | * marcus.geelnard at home.se |
sigveseb | 0:ce4be867658b | 78 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 79 | |
sigveseb | 0:ce4be867658b | 80 | |
sigveseb | 0:ce4be867658b | 81 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 82 | * Constants used for LZ77 coding |
sigveseb | 0:ce4be867658b | 83 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 84 | |
sigveseb | 0:ce4be867658b | 85 | /* Maximum offset (can be any size < 2^31). Lower values give faster |
sigveseb | 0:ce4be867658b | 86 | compression, while higher values gives better compression. The default |
sigveseb | 0:ce4be867658b | 87 | value of 100000 is quite high. Experiment to see what works best for |
sigveseb | 0:ce4be867658b | 88 | you. */ |
sigveseb | 0:ce4be867658b | 89 | #define LZ_MAX_OFFSET 100000 |
sigveseb | 0:ce4be867658b | 90 | |
sigveseb | 0:ce4be867658b | 91 | |
sigveseb | 0:ce4be867658b | 92 | |
sigveseb | 0:ce4be867658b | 93 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 94 | * INTERNAL FUNCTIONS * |
sigveseb | 0:ce4be867658b | 95 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 96 | |
sigveseb | 0:ce4be867658b | 97 | |
sigveseb | 0:ce4be867658b | 98 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 99 | * _LZ_StringCompare() - Return maximum length string match. |
sigveseb | 0:ce4be867658b | 100 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 101 | |
sigveseb | 0:ce4be867658b | 102 | static unsigned int _LZ_StringCompare( unsigned char * str1, |
sigveseb | 0:ce4be867658b | 103 | unsigned char * str2, unsigned int minlen, unsigned int maxlen ) |
sigveseb | 0:ce4be867658b | 104 | { |
sigveseb | 0:ce4be867658b | 105 | unsigned int len; |
sigveseb | 0:ce4be867658b | 106 | |
sigveseb | 0:ce4be867658b | 107 | for( len = minlen; (len < maxlen) && (str1[len] == str2[len]); ++ len ); |
sigveseb | 0:ce4be867658b | 108 | |
sigveseb | 0:ce4be867658b | 109 | return len; |
sigveseb | 0:ce4be867658b | 110 | } |
sigveseb | 0:ce4be867658b | 111 | |
sigveseb | 0:ce4be867658b | 112 | |
sigveseb | 0:ce4be867658b | 113 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 114 | * _LZ_WriteVarSize() - Write unsigned integer with variable number of |
sigveseb | 0:ce4be867658b | 115 | * bytes depending on value. |
sigveseb | 0:ce4be867658b | 116 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 117 | |
sigveseb | 0:ce4be867658b | 118 | static int _LZ_WriteVarSize( unsigned int x, unsigned char * buf ) |
sigveseb | 0:ce4be867658b | 119 | { |
sigveseb | 0:ce4be867658b | 120 | unsigned int y; |
sigveseb | 0:ce4be867658b | 121 | int num_bytes, i, b; |
sigveseb | 0:ce4be867658b | 122 | |
sigveseb | 0:ce4be867658b | 123 | /* Determine number of bytes needed to store the number x */ |
sigveseb | 0:ce4be867658b | 124 | y = x >> 3; |
sigveseb | 0:ce4be867658b | 125 | for( num_bytes = 5; num_bytes >= 2; -- num_bytes ) |
sigveseb | 0:ce4be867658b | 126 | { |
sigveseb | 0:ce4be867658b | 127 | if( y & 0xfe000000 ) break; |
sigveseb | 0:ce4be867658b | 128 | y <<= 7; |
sigveseb | 0:ce4be867658b | 129 | } |
sigveseb | 0:ce4be867658b | 130 | |
sigveseb | 0:ce4be867658b | 131 | /* Write all bytes, seven bits in each, with 8:th bit set for all */ |
sigveseb | 0:ce4be867658b | 132 | /* but the last byte. */ |
sigveseb | 0:ce4be867658b | 133 | for( i = num_bytes-1; i >= 0; -- i ) |
sigveseb | 0:ce4be867658b | 134 | { |
sigveseb | 0:ce4be867658b | 135 | b = (x >> (i*7)) & 0x0000007f; |
sigveseb | 0:ce4be867658b | 136 | if( i > 0 ) |
sigveseb | 0:ce4be867658b | 137 | { |
sigveseb | 0:ce4be867658b | 138 | b |= 0x00000080; |
sigveseb | 0:ce4be867658b | 139 | } |
sigveseb | 0:ce4be867658b | 140 | *buf ++ = (unsigned char) b; |
sigveseb | 0:ce4be867658b | 141 | } |
sigveseb | 0:ce4be867658b | 142 | |
sigveseb | 0:ce4be867658b | 143 | /* Return number of bytes written */ |
sigveseb | 0:ce4be867658b | 144 | return num_bytes; |
sigveseb | 0:ce4be867658b | 145 | } |
sigveseb | 0:ce4be867658b | 146 | |
sigveseb | 0:ce4be867658b | 147 | |
sigveseb | 0:ce4be867658b | 148 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 149 | * _LZ_ReadVarSize() - Read unsigned integer with variable number of |
sigveseb | 0:ce4be867658b | 150 | * bytes depending on value. |
sigveseb | 0:ce4be867658b | 151 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 152 | |
sigveseb | 0:ce4be867658b | 153 | static int _LZ_ReadVarSize( unsigned int * x, unsigned char * buf ) |
sigveseb | 0:ce4be867658b | 154 | { |
sigveseb | 0:ce4be867658b | 155 | unsigned int y, b, num_bytes; |
sigveseb | 0:ce4be867658b | 156 | |
sigveseb | 0:ce4be867658b | 157 | /* Read complete value (stop when byte contains zero in 8:th bit) */ |
sigveseb | 0:ce4be867658b | 158 | y = 0; |
sigveseb | 0:ce4be867658b | 159 | num_bytes = 0; |
sigveseb | 0:ce4be867658b | 160 | do |
sigveseb | 0:ce4be867658b | 161 | { |
sigveseb | 0:ce4be867658b | 162 | b = (unsigned int) (*buf ++); |
sigveseb | 0:ce4be867658b | 163 | y = (y << 7) | (b & 0x0000007f); |
sigveseb | 0:ce4be867658b | 164 | ++ num_bytes; |
sigveseb | 0:ce4be867658b | 165 | } |
sigveseb | 0:ce4be867658b | 166 | while( b & 0x00000080 ); |
sigveseb | 0:ce4be867658b | 167 | |
sigveseb | 0:ce4be867658b | 168 | /* Store value in x */ |
sigveseb | 0:ce4be867658b | 169 | *x = y; |
sigveseb | 0:ce4be867658b | 170 | |
sigveseb | 0:ce4be867658b | 171 | /* Return number of bytes read */ |
sigveseb | 0:ce4be867658b | 172 | return num_bytes; |
sigveseb | 0:ce4be867658b | 173 | } |
sigveseb | 0:ce4be867658b | 174 | |
sigveseb | 0:ce4be867658b | 175 | |
sigveseb | 0:ce4be867658b | 176 | |
sigveseb | 0:ce4be867658b | 177 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 178 | * PUBLIC FUNCTIONS * |
sigveseb | 0:ce4be867658b | 179 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 180 | |
sigveseb | 0:ce4be867658b | 181 | |
sigveseb | 0:ce4be867658b | 182 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 183 | * LZ_Compress() - Compress a block of data using an LZ77 coder. |
sigveseb | 0:ce4be867658b | 184 | * in - Input (uncompressed) buffer. |
sigveseb | 0:ce4be867658b | 185 | * out - Output (compressed) buffer. This buffer must be 0.4% larger |
sigveseb | 0:ce4be867658b | 186 | * than the input buffer, plus one byte. |
sigveseb | 0:ce4be867658b | 187 | * insize - Number of input bytes. |
sigveseb | 0:ce4be867658b | 188 | * The function returns the size of the compressed data. |
sigveseb | 0:ce4be867658b | 189 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 190 | |
sigveseb | 0:ce4be867658b | 191 | int LZ_Compress( unsigned char *in, unsigned char *out, |
sigveseb | 0:ce4be867658b | 192 | unsigned int insize ) |
sigveseb | 0:ce4be867658b | 193 | { |
sigveseb | 0:ce4be867658b | 194 | unsigned char marker, symbol; |
sigveseb | 0:ce4be867658b | 195 | unsigned int inpos, outpos, bytesleft, i; |
sigveseb | 0:ce4be867658b | 196 | unsigned int maxoffset, offset, bestoffset; |
sigveseb | 0:ce4be867658b | 197 | unsigned int maxlength, length, bestlength; |
sigveseb | 0:ce4be867658b | 198 | unsigned int histogram[ 256 ]; |
sigveseb | 0:ce4be867658b | 199 | unsigned char *ptr1, *ptr2; |
sigveseb | 0:ce4be867658b | 200 | |
sigveseb | 0:ce4be867658b | 201 | /* Do we have anything to compress? */ |
sigveseb | 0:ce4be867658b | 202 | if( insize < 1 ) |
sigveseb | 0:ce4be867658b | 203 | { |
sigveseb | 0:ce4be867658b | 204 | return 0; |
sigveseb | 0:ce4be867658b | 205 | } |
sigveseb | 0:ce4be867658b | 206 | |
sigveseb | 0:ce4be867658b | 207 | /* Create histogram */ |
sigveseb | 0:ce4be867658b | 208 | for( i = 0; i < 256; ++ i ) |
sigveseb | 0:ce4be867658b | 209 | { |
sigveseb | 0:ce4be867658b | 210 | histogram[ i ] = 0; |
sigveseb | 0:ce4be867658b | 211 | } |
sigveseb | 0:ce4be867658b | 212 | for( i = 0; i < insize; ++ i ) |
sigveseb | 0:ce4be867658b | 213 | { |
sigveseb | 0:ce4be867658b | 214 | ++ histogram[ in[ i ] ]; |
sigveseb | 0:ce4be867658b | 215 | } |
sigveseb | 0:ce4be867658b | 216 | |
sigveseb | 0:ce4be867658b | 217 | /* Find the least common byte, and use it as the marker symbol */ |
sigveseb | 0:ce4be867658b | 218 | marker = 0; |
sigveseb | 0:ce4be867658b | 219 | for( i = 1; i < 256; ++ i ) |
sigveseb | 0:ce4be867658b | 220 | { |
sigveseb | 0:ce4be867658b | 221 | if( histogram[ i ] < histogram[ marker ] ) |
sigveseb | 0:ce4be867658b | 222 | { |
sigveseb | 0:ce4be867658b | 223 | marker = i; |
sigveseb | 0:ce4be867658b | 224 | } |
sigveseb | 0:ce4be867658b | 225 | } |
sigveseb | 0:ce4be867658b | 226 | |
sigveseb | 0:ce4be867658b | 227 | /* Remember the marker symbol for the decoder */ |
sigveseb | 0:ce4be867658b | 228 | out[ 0 ] = marker; |
sigveseb | 0:ce4be867658b | 229 | |
sigveseb | 0:ce4be867658b | 230 | /* Start of compression */ |
sigveseb | 0:ce4be867658b | 231 | inpos = 0; |
sigveseb | 0:ce4be867658b | 232 | outpos = 1; |
sigveseb | 0:ce4be867658b | 233 | |
sigveseb | 0:ce4be867658b | 234 | /* Main compression loop */ |
sigveseb | 0:ce4be867658b | 235 | bytesleft = insize; |
sigveseb | 0:ce4be867658b | 236 | do |
sigveseb | 0:ce4be867658b | 237 | { |
sigveseb | 0:ce4be867658b | 238 | /* Determine most distant position */ |
sigveseb | 0:ce4be867658b | 239 | if( inpos > LZ_MAX_OFFSET ) maxoffset = LZ_MAX_OFFSET; |
sigveseb | 0:ce4be867658b | 240 | else maxoffset = inpos; |
sigveseb | 0:ce4be867658b | 241 | |
sigveseb | 0:ce4be867658b | 242 | /* Get pointer to current position */ |
sigveseb | 0:ce4be867658b | 243 | ptr1 = &in[ inpos ]; |
sigveseb | 0:ce4be867658b | 244 | |
sigveseb | 0:ce4be867658b | 245 | /* Search history window for maximum length string match */ |
sigveseb | 0:ce4be867658b | 246 | bestlength = 3; |
sigveseb | 0:ce4be867658b | 247 | bestoffset = 0; |
sigveseb | 0:ce4be867658b | 248 | for( offset = 3; offset <= maxoffset; ++ offset ) |
sigveseb | 0:ce4be867658b | 249 | { |
sigveseb | 0:ce4be867658b | 250 | /* Get pointer to candidate string */ |
sigveseb | 0:ce4be867658b | 251 | ptr2 = &ptr1[ -(int)offset ]; |
sigveseb | 0:ce4be867658b | 252 | |
sigveseb | 0:ce4be867658b | 253 | /* Quickly determine if this is a candidate (for speed) */ |
sigveseb | 0:ce4be867658b | 254 | if( (ptr1[ 0 ] == ptr2[ 0 ]) && |
sigveseb | 0:ce4be867658b | 255 | (ptr1[ bestlength ] == ptr2[ bestlength ]) ) |
sigveseb | 0:ce4be867658b | 256 | { |
sigveseb | 0:ce4be867658b | 257 | /* Determine maximum length for this offset */ |
sigveseb | 0:ce4be867658b | 258 | maxlength = (bytesleft < offset ? bytesleft : offset); |
sigveseb | 0:ce4be867658b | 259 | |
sigveseb | 0:ce4be867658b | 260 | /* Count maximum length match at this offset */ |
sigveseb | 0:ce4be867658b | 261 | length = _LZ_StringCompare( ptr1, ptr2, 0, maxlength ); |
sigveseb | 0:ce4be867658b | 262 | |
sigveseb | 0:ce4be867658b | 263 | /* Better match than any previous match? */ |
sigveseb | 0:ce4be867658b | 264 | if( length > bestlength ) |
sigveseb | 0:ce4be867658b | 265 | { |
sigveseb | 0:ce4be867658b | 266 | bestlength = length; |
sigveseb | 0:ce4be867658b | 267 | bestoffset = offset; |
sigveseb | 0:ce4be867658b | 268 | } |
sigveseb | 0:ce4be867658b | 269 | } |
sigveseb | 0:ce4be867658b | 270 | } |
sigveseb | 0:ce4be867658b | 271 | |
sigveseb | 0:ce4be867658b | 272 | /* Was there a good enough match? */ |
sigveseb | 0:ce4be867658b | 273 | if( (bestlength >= 8) || |
sigveseb | 0:ce4be867658b | 274 | ((bestlength == 4) && (bestoffset <= 0x0000007f)) || |
sigveseb | 0:ce4be867658b | 275 | ((bestlength == 5) && (bestoffset <= 0x00003fff)) || |
sigveseb | 0:ce4be867658b | 276 | ((bestlength == 6) && (bestoffset <= 0x001fffff)) || |
sigveseb | 0:ce4be867658b | 277 | ((bestlength == 7) && (bestoffset <= 0x0fffffff)) ) |
sigveseb | 0:ce4be867658b | 278 | { |
sigveseb | 0:ce4be867658b | 279 | out[ outpos ++ ] = (unsigned char) marker; |
sigveseb | 0:ce4be867658b | 280 | outpos += _LZ_WriteVarSize( bestlength, &out[ outpos ] ); |
sigveseb | 0:ce4be867658b | 281 | outpos += _LZ_WriteVarSize( bestoffset, &out[ outpos ] ); |
sigveseb | 0:ce4be867658b | 282 | inpos += bestlength; |
sigveseb | 0:ce4be867658b | 283 | bytesleft -= bestlength; |
sigveseb | 0:ce4be867658b | 284 | } |
sigveseb | 0:ce4be867658b | 285 | else |
sigveseb | 0:ce4be867658b | 286 | { |
sigveseb | 0:ce4be867658b | 287 | /* Output single byte (or two bytes if marker byte) */ |
sigveseb | 0:ce4be867658b | 288 | symbol = in[ inpos ++ ]; |
sigveseb | 0:ce4be867658b | 289 | out[ outpos ++ ] = symbol; |
sigveseb | 0:ce4be867658b | 290 | if( symbol == marker ) |
sigveseb | 0:ce4be867658b | 291 | { |
sigveseb | 0:ce4be867658b | 292 | out[ outpos ++ ] = 0; |
sigveseb | 0:ce4be867658b | 293 | } |
sigveseb | 0:ce4be867658b | 294 | -- bytesleft; |
sigveseb | 0:ce4be867658b | 295 | } |
sigveseb | 0:ce4be867658b | 296 | } |
sigveseb | 0:ce4be867658b | 297 | while( bytesleft > 3 ); |
sigveseb | 0:ce4be867658b | 298 | |
sigveseb | 0:ce4be867658b | 299 | /* Dump remaining bytes, if any */ |
sigveseb | 0:ce4be867658b | 300 | while( inpos < insize ) |
sigveseb | 0:ce4be867658b | 301 | { |
sigveseb | 0:ce4be867658b | 302 | if( in[ inpos ] == marker ) |
sigveseb | 0:ce4be867658b | 303 | { |
sigveseb | 0:ce4be867658b | 304 | out[ outpos ++ ] = marker; |
sigveseb | 0:ce4be867658b | 305 | out[ outpos ++ ] = 0; |
sigveseb | 0:ce4be867658b | 306 | } |
sigveseb | 0:ce4be867658b | 307 | else |
sigveseb | 0:ce4be867658b | 308 | { |
sigveseb | 0:ce4be867658b | 309 | out[ outpos ++ ] = in[ inpos ]; |
sigveseb | 0:ce4be867658b | 310 | } |
sigveseb | 0:ce4be867658b | 311 | ++ inpos; |
sigveseb | 0:ce4be867658b | 312 | } |
sigveseb | 0:ce4be867658b | 313 | |
sigveseb | 0:ce4be867658b | 314 | return outpos; |
sigveseb | 0:ce4be867658b | 315 | } |
sigveseb | 0:ce4be867658b | 316 | |
sigveseb | 0:ce4be867658b | 317 | |
sigveseb | 0:ce4be867658b | 318 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 319 | * LZ_CompressFast() - Compress a block of data using an LZ77 coder. |
sigveseb | 0:ce4be867658b | 320 | * in - Input (uncompressed) buffer. |
sigveseb | 0:ce4be867658b | 321 | * out - Output (compressed) buffer. This buffer must be 0.4% larger |
sigveseb | 0:ce4be867658b | 322 | * than the input buffer, plus one byte. |
sigveseb | 0:ce4be867658b | 323 | * insize - Number of input bytes. |
sigveseb | 0:ce4be867658b | 324 | * work - Pointer to a temporary buffer (internal working buffer), which |
sigveseb | 0:ce4be867658b | 325 | * must be able to hold (insize+65536) unsigned integers. |
sigveseb | 0:ce4be867658b | 326 | * The function returns the size of the compressed data. |
sigveseb | 0:ce4be867658b | 327 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 328 | |
sigveseb | 0:ce4be867658b | 329 | int LZ_CompressFast( unsigned char *in, unsigned char *out, |
sigveseb | 0:ce4be867658b | 330 | unsigned int insize, unsigned int *work ) |
sigveseb | 0:ce4be867658b | 331 | { |
sigveseb | 0:ce4be867658b | 332 | unsigned char marker, symbol; |
sigveseb | 0:ce4be867658b | 333 | unsigned int inpos, outpos, bytesleft, i, index, symbols; |
sigveseb | 0:ce4be867658b | 334 | unsigned int offset, bestoffset; |
sigveseb | 0:ce4be867658b | 335 | unsigned int maxlength, length, bestlength; |
sigveseb | 0:ce4be867658b | 336 | unsigned int histogram[ 256 ], *lastindex, *jumptable; |
sigveseb | 0:ce4be867658b | 337 | unsigned char *ptr1, *ptr2; |
sigveseb | 0:ce4be867658b | 338 | |
sigveseb | 0:ce4be867658b | 339 | /* Do we have anything to compress? */ |
sigveseb | 0:ce4be867658b | 340 | if( insize < 1 ) |
sigveseb | 0:ce4be867658b | 341 | { |
sigveseb | 0:ce4be867658b | 342 | return 0; |
sigveseb | 0:ce4be867658b | 343 | } |
sigveseb | 0:ce4be867658b | 344 | |
sigveseb | 0:ce4be867658b | 345 | /* Assign arrays to the working area */ |
sigveseb | 0:ce4be867658b | 346 | lastindex = work; |
sigveseb | 0:ce4be867658b | 347 | jumptable = &work[ 65536 ]; |
sigveseb | 0:ce4be867658b | 348 | |
sigveseb | 0:ce4be867658b | 349 | /* Build a "jump table". Here is how the jump table works: |
sigveseb | 0:ce4be867658b | 350 | jumptable[i] points to the nearest previous occurrence of the same |
sigveseb | 0:ce4be867658b | 351 | symbol pair as in[i]:in[i+1], so in[i] == in[jumptable[i]] and |
sigveseb | 0:ce4be867658b | 352 | in[i+1] == in[jumptable[i]+1], and so on... Following the jump table |
sigveseb | 0:ce4be867658b | 353 | gives a dramatic boost for the string search'n'match loop compared |
sigveseb | 0:ce4be867658b | 354 | to doing a brute force search. The jump table is built in O(n) time, |
sigveseb | 0:ce4be867658b | 355 | so it is a cheap operation in terms of time, but it is expensice in |
sigveseb | 0:ce4be867658b | 356 | terms of memory consumption. */ |
sigveseb | 0:ce4be867658b | 357 | for( i = 0; i < 65536; ++ i ) |
sigveseb | 0:ce4be867658b | 358 | { |
sigveseb | 0:ce4be867658b | 359 | lastindex[ i ] = 0xffffffff; |
sigveseb | 0:ce4be867658b | 360 | } |
sigveseb | 0:ce4be867658b | 361 | for( i = 0; i < insize-1; ++ i ) |
sigveseb | 0:ce4be867658b | 362 | { |
sigveseb | 0:ce4be867658b | 363 | symbols = (((unsigned int)in[i]) << 8) | ((unsigned int)in[i+1]); |
sigveseb | 0:ce4be867658b | 364 | index = lastindex[ symbols ]; |
sigveseb | 0:ce4be867658b | 365 | lastindex[ symbols ] = i; |
sigveseb | 0:ce4be867658b | 366 | jumptable[ i ] = index; |
sigveseb | 0:ce4be867658b | 367 | } |
sigveseb | 0:ce4be867658b | 368 | jumptable[ insize-1 ] = 0xffffffff; |
sigveseb | 0:ce4be867658b | 369 | |
sigveseb | 0:ce4be867658b | 370 | /* Create histogram */ |
sigveseb | 0:ce4be867658b | 371 | for( i = 0; i < 256; ++ i ) |
sigveseb | 0:ce4be867658b | 372 | { |
sigveseb | 0:ce4be867658b | 373 | histogram[ i ] = 0; |
sigveseb | 0:ce4be867658b | 374 | } |
sigveseb | 0:ce4be867658b | 375 | for( i = 0; i < insize; ++ i ) |
sigveseb | 0:ce4be867658b | 376 | { |
sigveseb | 0:ce4be867658b | 377 | ++ histogram[ in[ i ] ]; |
sigveseb | 0:ce4be867658b | 378 | } |
sigveseb | 0:ce4be867658b | 379 | |
sigveseb | 0:ce4be867658b | 380 | /* Find the least common byte, and use it as the marker symbol */ |
sigveseb | 0:ce4be867658b | 381 | marker = 0; |
sigveseb | 0:ce4be867658b | 382 | for( i = 1; i < 256; ++ i ) |
sigveseb | 0:ce4be867658b | 383 | { |
sigveseb | 0:ce4be867658b | 384 | if( histogram[ i ] < histogram[ marker ] ) |
sigveseb | 0:ce4be867658b | 385 | { |
sigveseb | 0:ce4be867658b | 386 | marker = i; |
sigveseb | 0:ce4be867658b | 387 | } |
sigveseb | 0:ce4be867658b | 388 | } |
sigveseb | 0:ce4be867658b | 389 | |
sigveseb | 0:ce4be867658b | 390 | /* Remember the marker symbol for the decoder */ |
sigveseb | 0:ce4be867658b | 391 | out[ 0 ] = marker; |
sigveseb | 0:ce4be867658b | 392 | |
sigveseb | 0:ce4be867658b | 393 | /* Start of compression */ |
sigveseb | 0:ce4be867658b | 394 | inpos = 0; |
sigveseb | 0:ce4be867658b | 395 | outpos = 1; |
sigveseb | 0:ce4be867658b | 396 | |
sigveseb | 0:ce4be867658b | 397 | /* Main compression loop */ |
sigveseb | 0:ce4be867658b | 398 | bytesleft = insize; |
sigveseb | 0:ce4be867658b | 399 | do |
sigveseb | 0:ce4be867658b | 400 | { |
sigveseb | 0:ce4be867658b | 401 | /* Get pointer to current position */ |
sigveseb | 0:ce4be867658b | 402 | ptr1 = &in[ inpos ]; |
sigveseb | 0:ce4be867658b | 403 | |
sigveseb | 0:ce4be867658b | 404 | /* Search history window for maximum length string match */ |
sigveseb | 0:ce4be867658b | 405 | bestlength = 3; |
sigveseb | 0:ce4be867658b | 406 | bestoffset = 0; |
sigveseb | 0:ce4be867658b | 407 | index = jumptable[ inpos ]; |
sigveseb | 0:ce4be867658b | 408 | while( (index != 0xffffffff) && ((inpos - index) < LZ_MAX_OFFSET) ) |
sigveseb | 0:ce4be867658b | 409 | { |
sigveseb | 0:ce4be867658b | 410 | /* Get pointer to candidate string */ |
sigveseb | 0:ce4be867658b | 411 | ptr2 = &in[ index ]; |
sigveseb | 0:ce4be867658b | 412 | |
sigveseb | 0:ce4be867658b | 413 | /* Quickly determine if this is a candidate (for speed) */ |
sigveseb | 0:ce4be867658b | 414 | if( ptr2[ bestlength ] == ptr1[ bestlength ] ) |
sigveseb | 0:ce4be867658b | 415 | { |
sigveseb | 0:ce4be867658b | 416 | /* Determine maximum length for this offset */ |
sigveseb | 0:ce4be867658b | 417 | offset = inpos - index; |
sigveseb | 0:ce4be867658b | 418 | maxlength = (bytesleft < offset ? bytesleft : offset); |
sigveseb | 0:ce4be867658b | 419 | |
sigveseb | 0:ce4be867658b | 420 | /* Count maximum length match at this offset */ |
sigveseb | 0:ce4be867658b | 421 | length = _LZ_StringCompare( ptr1, ptr2, 2, maxlength ); |
sigveseb | 0:ce4be867658b | 422 | |
sigveseb | 0:ce4be867658b | 423 | /* Better match than any previous match? */ |
sigveseb | 0:ce4be867658b | 424 | if( length > bestlength ) |
sigveseb | 0:ce4be867658b | 425 | { |
sigveseb | 0:ce4be867658b | 426 | bestlength = length; |
sigveseb | 0:ce4be867658b | 427 | bestoffset = offset; |
sigveseb | 0:ce4be867658b | 428 | } |
sigveseb | 0:ce4be867658b | 429 | } |
sigveseb | 0:ce4be867658b | 430 | |
sigveseb | 0:ce4be867658b | 431 | /* Get next possible index from jump table */ |
sigveseb | 0:ce4be867658b | 432 | index = jumptable[ index ]; |
sigveseb | 0:ce4be867658b | 433 | } |
sigveseb | 0:ce4be867658b | 434 | |
sigveseb | 0:ce4be867658b | 435 | /* Was there a good enough match? */ |
sigveseb | 0:ce4be867658b | 436 | if( (bestlength >= 8) || |
sigveseb | 0:ce4be867658b | 437 | ((bestlength == 4) && (bestoffset <= 0x0000007f)) || |
sigveseb | 0:ce4be867658b | 438 | ((bestlength == 5) && (bestoffset <= 0x00003fff)) || |
sigveseb | 0:ce4be867658b | 439 | ((bestlength == 6) && (bestoffset <= 0x001fffff)) || |
sigveseb | 0:ce4be867658b | 440 | ((bestlength == 7) && (bestoffset <= 0x0fffffff)) ) |
sigveseb | 0:ce4be867658b | 441 | { |
sigveseb | 0:ce4be867658b | 442 | out[ outpos ++ ] = (unsigned char) marker; |
sigveseb | 0:ce4be867658b | 443 | outpos += _LZ_WriteVarSize( bestlength, &out[ outpos ] ); |
sigveseb | 0:ce4be867658b | 444 | outpos += _LZ_WriteVarSize( bestoffset, &out[ outpos ] ); |
sigveseb | 0:ce4be867658b | 445 | inpos += bestlength; |
sigveseb | 0:ce4be867658b | 446 | bytesleft -= bestlength; |
sigveseb | 0:ce4be867658b | 447 | } |
sigveseb | 0:ce4be867658b | 448 | else |
sigveseb | 0:ce4be867658b | 449 | { |
sigveseb | 0:ce4be867658b | 450 | /* Output single byte (or two bytes if marker byte) */ |
sigveseb | 0:ce4be867658b | 451 | symbol = in[ inpos ++ ]; |
sigveseb | 0:ce4be867658b | 452 | out[ outpos ++ ] = symbol; |
sigveseb | 0:ce4be867658b | 453 | if( symbol == marker ) |
sigveseb | 0:ce4be867658b | 454 | { |
sigveseb | 0:ce4be867658b | 455 | out[ outpos ++ ] = 0; |
sigveseb | 0:ce4be867658b | 456 | } |
sigveseb | 0:ce4be867658b | 457 | -- bytesleft; |
sigveseb | 0:ce4be867658b | 458 | } |
sigveseb | 0:ce4be867658b | 459 | } |
sigveseb | 0:ce4be867658b | 460 | while( bytesleft > 3 ); |
sigveseb | 0:ce4be867658b | 461 | |
sigveseb | 0:ce4be867658b | 462 | /* Dump remaining bytes, if any */ |
sigveseb | 0:ce4be867658b | 463 | while( inpos < insize ) |
sigveseb | 0:ce4be867658b | 464 | { |
sigveseb | 0:ce4be867658b | 465 | if( in[ inpos ] == marker ) |
sigveseb | 0:ce4be867658b | 466 | { |
sigveseb | 0:ce4be867658b | 467 | out[ outpos ++ ] = marker; |
sigveseb | 0:ce4be867658b | 468 | out[ outpos ++ ] = 0; |
sigveseb | 0:ce4be867658b | 469 | } |
sigveseb | 0:ce4be867658b | 470 | else |
sigveseb | 0:ce4be867658b | 471 | { |
sigveseb | 0:ce4be867658b | 472 | out[ outpos ++ ] = in[ inpos ]; |
sigveseb | 0:ce4be867658b | 473 | } |
sigveseb | 0:ce4be867658b | 474 | ++ inpos; |
sigveseb | 0:ce4be867658b | 475 | } |
sigveseb | 0:ce4be867658b | 476 | |
sigveseb | 0:ce4be867658b | 477 | return outpos; |
sigveseb | 0:ce4be867658b | 478 | } |
sigveseb | 0:ce4be867658b | 479 | |
sigveseb | 0:ce4be867658b | 480 | |
sigveseb | 0:ce4be867658b | 481 | /************************************************************************* |
sigveseb | 0:ce4be867658b | 482 | * LZ_Uncompress() - Uncompress a block of data using an LZ77 decoder. |
sigveseb | 0:ce4be867658b | 483 | * in - Input (compressed) buffer. |
sigveseb | 0:ce4be867658b | 484 | * out - Output (uncompressed) buffer. This buffer must be large |
sigveseb | 0:ce4be867658b | 485 | * enough to hold the uncompressed data. |
sigveseb | 0:ce4be867658b | 486 | * insize - Number of input bytes. |
sigveseb | 0:ce4be867658b | 487 | *************************************************************************/ |
sigveseb | 0:ce4be867658b | 488 | |
sigveseb | 0:ce4be867658b | 489 | void LZ_Uncompress( unsigned char *in, unsigned char *out, |
sigveseb | 0:ce4be867658b | 490 | unsigned int insize ) |
sigveseb | 0:ce4be867658b | 491 | { |
sigveseb | 0:ce4be867658b | 492 | unsigned char marker, symbol; |
sigveseb | 0:ce4be867658b | 493 | unsigned int i, inpos, outpos, length, offset; |
sigveseb | 0:ce4be867658b | 494 | |
sigveseb | 0:ce4be867658b | 495 | /* Do we have anything to uncompress? */ |
sigveseb | 0:ce4be867658b | 496 | if( insize < 1 ) |
sigveseb | 0:ce4be867658b | 497 | { |
sigveseb | 0:ce4be867658b | 498 | return; |
sigveseb | 0:ce4be867658b | 499 | } |
sigveseb | 0:ce4be867658b | 500 | |
sigveseb | 0:ce4be867658b | 501 | /* Get marker symbol from input stream */ |
sigveseb | 0:ce4be867658b | 502 | marker = in[ 0 ]; |
sigveseb | 0:ce4be867658b | 503 | inpos = 1; |
sigveseb | 0:ce4be867658b | 504 | |
sigveseb | 0:ce4be867658b | 505 | /* Main decompression loop */ |
sigveseb | 0:ce4be867658b | 506 | outpos = 0; |
sigveseb | 0:ce4be867658b | 507 | do |
sigveseb | 0:ce4be867658b | 508 | { |
sigveseb | 0:ce4be867658b | 509 | symbol = in[ inpos ++ ]; |
sigveseb | 0:ce4be867658b | 510 | if( symbol == marker ) |
sigveseb | 0:ce4be867658b | 511 | { |
sigveseb | 0:ce4be867658b | 512 | /* We had a marker byte */ |
sigveseb | 0:ce4be867658b | 513 | if( in[ inpos ] == 0 ) |
sigveseb | 0:ce4be867658b | 514 | { |
sigveseb | 0:ce4be867658b | 515 | /* It was a single occurrence of the marker byte */ |
sigveseb | 0:ce4be867658b | 516 | out[ outpos ++ ] = marker; |
sigveseb | 0:ce4be867658b | 517 | ++ inpos; |
sigveseb | 0:ce4be867658b | 518 | } |
sigveseb | 0:ce4be867658b | 519 | else |
sigveseb | 0:ce4be867658b | 520 | { |
sigveseb | 0:ce4be867658b | 521 | /* Extract true length and offset */ |
sigveseb | 0:ce4be867658b | 522 | inpos += _LZ_ReadVarSize( &length, &in[ inpos ] ); |
sigveseb | 0:ce4be867658b | 523 | inpos += _LZ_ReadVarSize( &offset, &in[ inpos ] ); |
sigveseb | 0:ce4be867658b | 524 | |
sigveseb | 0:ce4be867658b | 525 | /* Copy corresponding data from history window */ |
sigveseb | 0:ce4be867658b | 526 | for( i = 0; i < length; ++ i ) |
sigveseb | 0:ce4be867658b | 527 | { |
sigveseb | 0:ce4be867658b | 528 | out[ outpos ] = out[ outpos - offset ]; |
sigveseb | 0:ce4be867658b | 529 | ++ outpos; |
sigveseb | 0:ce4be867658b | 530 | } |
sigveseb | 0:ce4be867658b | 531 | } |
sigveseb | 0:ce4be867658b | 532 | } |
sigveseb | 0:ce4be867658b | 533 | else |
sigveseb | 0:ce4be867658b | 534 | { |
sigveseb | 0:ce4be867658b | 535 | /* No marker, plain copy */ |
sigveseb | 0:ce4be867658b | 536 | out[ outpos ++ ] = symbol; |
sigveseb | 0:ce4be867658b | 537 | } |
sigveseb | 0:ce4be867658b | 538 | } |
sigveseb | 0:ce4be867658b | 539 | while( inpos < insize ); |
sigveseb | 0:ce4be867658b | 540 | } |