Dependencies:   emwin_lib

Fork of DMemWin by Embedded Artists

Committer:
destinyXfate
Date:
Thu Jun 02 04:52:54 2016 +0000
Revision:
2:0e2ef1edf01b
;

Who changed what in which revision?

UserRevisionLine numberNew contents of line
destinyXfate 2:0e2ef1edf01b 1
destinyXfate 2:0e2ef1edf01b 2 /* pnggccrd.c - mixed C/assembler version of utilities to read a PNG file
destinyXfate 2:0e2ef1edf01b 3 *
destinyXfate 2:0e2ef1edf01b 4 * For Intel x86 CPU (Pentium-MMX or later) and GNU C compiler.
destinyXfate 2:0e2ef1edf01b 5 *
destinyXfate 2:0e2ef1edf01b 6 * See http://www.intel.com/drg/pentiumII/appnotes/916/916.htm
destinyXfate 2:0e2ef1edf01b 7 * and http://www.intel.com/drg/pentiumII/appnotes/923/923.htm
destinyXfate 2:0e2ef1edf01b 8 * for Intel's performance analysis of the MMX vs. non-MMX code.
destinyXfate 2:0e2ef1edf01b 9 *
destinyXfate 2:0e2ef1edf01b 10 * Last changed in libpng 1.2.15 January 5, 2007
destinyXfate 2:0e2ef1edf01b 11 * For conditions of distribution and use, see copyright notice in png.h
destinyXfate 2:0e2ef1edf01b 12 * Copyright (c) 1998-2007 Glenn Randers-Pehrson
destinyXfate 2:0e2ef1edf01b 13 * Copyright (c) 1998, Intel Corporation
destinyXfate 2:0e2ef1edf01b 14 *
destinyXfate 2:0e2ef1edf01b 15 * Based on MSVC code contributed by Nirav Chhatrapati, Intel Corp., 1998.
destinyXfate 2:0e2ef1edf01b 16 * Interface to libpng contributed by Gilles Vollant, 1999.
destinyXfate 2:0e2ef1edf01b 17 * GNU C port by Greg Roelofs, 1999-2001.
destinyXfate 2:0e2ef1edf01b 18 *
destinyXfate 2:0e2ef1edf01b 19 * Lines 2350-4300 converted in place with intel2gas 1.3.1:
destinyXfate 2:0e2ef1edf01b 20 *
destinyXfate 2:0e2ef1edf01b 21 * intel2gas -mdI pnggccrd.c.partially-msvc -o pnggccrd.c
destinyXfate 2:0e2ef1edf01b 22 *
destinyXfate 2:0e2ef1edf01b 23 * and then cleaned up by hand. See http://hermes.terminal.at/intel2gas/ .
destinyXfate 2:0e2ef1edf01b 24 *
destinyXfate 2:0e2ef1edf01b 25 * NOTE: A sufficiently recent version of GNU as (or as.exe under DOS/Windows)
destinyXfate 2:0e2ef1edf01b 26 * is required to assemble the newer MMX instructions such as movq.
destinyXfate 2:0e2ef1edf01b 27 * For djgpp, see
destinyXfate 2:0e2ef1edf01b 28 *
destinyXfate 2:0e2ef1edf01b 29 * ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/bnu281b.zip
destinyXfate 2:0e2ef1edf01b 30 *
destinyXfate 2:0e2ef1edf01b 31 * (or a later version in the same directory). For Linux, check your
destinyXfate 2:0e2ef1edf01b 32 * distribution's web site(s) or try these links:
destinyXfate 2:0e2ef1edf01b 33 *
destinyXfate 2:0e2ef1edf01b 34 * http://rufus.w3.org/linux/RPM/binutils.html
destinyXfate 2:0e2ef1edf01b 35 * http://www.debian.org/Packages/stable/devel/binutils.html
destinyXfate 2:0e2ef1edf01b 36 * ftp://ftp.slackware.com/pub/linux/slackware/slackware/slakware/d1/
destinyXfate 2:0e2ef1edf01b 37 * binutils.tgz
destinyXfate 2:0e2ef1edf01b 38 *
destinyXfate 2:0e2ef1edf01b 39 * For other platforms, see the main GNU site:
destinyXfate 2:0e2ef1edf01b 40 *
destinyXfate 2:0e2ef1edf01b 41 * ftp://ftp.gnu.org/pub/gnu/binutils/
destinyXfate 2:0e2ef1edf01b 42 *
destinyXfate 2:0e2ef1edf01b 43 * Version 2.5.2l.15 is definitely too old...
destinyXfate 2:0e2ef1edf01b 44 */
destinyXfate 2:0e2ef1edf01b 45
destinyXfate 2:0e2ef1edf01b 46 /*
destinyXfate 2:0e2ef1edf01b 47 * TEMPORARY PORTING NOTES AND CHANGELOG (mostly by Greg Roelofs)
destinyXfate 2:0e2ef1edf01b 48 * =====================================
destinyXfate 2:0e2ef1edf01b 49 *
destinyXfate 2:0e2ef1edf01b 50 * 19991006:
destinyXfate 2:0e2ef1edf01b 51 * - fixed sign error in post-MMX cleanup code (16- & 32-bit cases)
destinyXfate 2:0e2ef1edf01b 52 *
destinyXfate 2:0e2ef1edf01b 53 * 19991007:
destinyXfate 2:0e2ef1edf01b 54 * - additional optimizations (possible or definite):
destinyXfate 2:0e2ef1edf01b 55 * x [DONE] write MMX code for 64-bit case (pixel_bytes == 8) [not tested]
destinyXfate 2:0e2ef1edf01b 56 * - write MMX code for 48-bit case (pixel_bytes == 6)
destinyXfate 2:0e2ef1edf01b 57 * - figure out what's up with 24-bit case (pixel_bytes == 3):
destinyXfate 2:0e2ef1edf01b 58 * why subtract 8 from width_mmx in the pass 4/5 case?
destinyXfate 2:0e2ef1edf01b 59 * (only width_mmx case) (near line 1606)
destinyXfate 2:0e2ef1edf01b 60 * x [DONE] replace pixel_bytes within each block with the true
destinyXfate 2:0e2ef1edf01b 61 * constant value (or are compilers smart enough to do that?)
destinyXfate 2:0e2ef1edf01b 62 * - rewrite all MMX interlacing code so it's aligned with
destinyXfate 2:0e2ef1edf01b 63 * the *beginning* of the row buffer, not the end. This
destinyXfate 2:0e2ef1edf01b 64 * would not only allow one to eliminate half of the memory
destinyXfate 2:0e2ef1edf01b 65 * writes for odd passes (that is, pass == odd), it may also
destinyXfate 2:0e2ef1edf01b 66 * eliminate some unaligned-data-access exceptions (assuming
destinyXfate 2:0e2ef1edf01b 67 * there's a penalty for not aligning 64-bit accesses on
destinyXfate 2:0e2ef1edf01b 68 * 64-bit boundaries). The only catch is that the "leftover"
destinyXfate 2:0e2ef1edf01b 69 * pixel(s) at the end of the row would have to be saved,
destinyXfate 2:0e2ef1edf01b 70 * but there are enough unused MMX registers in every case,
destinyXfate 2:0e2ef1edf01b 71 * so this is not a problem. A further benefit is that the
destinyXfate 2:0e2ef1edf01b 72 * post-MMX cleanup code (C code) in at least some of the
destinyXfate 2:0e2ef1edf01b 73 * cases could be done within the assembler block.
destinyXfate 2:0e2ef1edf01b 74 * x [DONE] the "v3 v2 v1 v0 v7 v6 v5 v4" comments are confusing,
destinyXfate 2:0e2ef1edf01b 75 * inconsistent, and don't match the MMX Programmer's Reference
destinyXfate 2:0e2ef1edf01b 76 * Manual conventions anyway. They should be changed to
destinyXfate 2:0e2ef1edf01b 77 * "b7 b6 b5 b4 b3 b2 b1 b0," where b0 indicates the byte that
destinyXfate 2:0e2ef1edf01b 78 * was lowest in memory (e.g., corresponding to a left pixel)
destinyXfate 2:0e2ef1edf01b 79 * and b7 is the byte that was highest (e.g., a right pixel).
destinyXfate 2:0e2ef1edf01b 80 *
destinyXfate 2:0e2ef1edf01b 81 * 19991016:
destinyXfate 2:0e2ef1edf01b 82 * - Brennan's Guide notwithstanding, gcc under Linux does *not*
destinyXfate 2:0e2ef1edf01b 83 * want globals prefixed by underscores when referencing them--
destinyXfate 2:0e2ef1edf01b 84 * i.e., if the variable is const4, then refer to it as const4,
destinyXfate 2:0e2ef1edf01b 85 * not _const4. This seems to be a djgpp-specific requirement.
destinyXfate 2:0e2ef1edf01b 86 * Also, such variables apparently *must* be declared outside
destinyXfate 2:0e2ef1edf01b 87 * of functions; neither static nor automatic variables work if
destinyXfate 2:0e2ef1edf01b 88 * defined within the scope of a single function, but both
destinyXfate 2:0e2ef1edf01b 89 * static and truly global (multi-module) variables work fine.
destinyXfate 2:0e2ef1edf01b 90 *
destinyXfate 2:0e2ef1edf01b 91 * 19991023:
destinyXfate 2:0e2ef1edf01b 92 * - fixed png_combine_row() non-MMX replication bug (odd passes only?)
destinyXfate 2:0e2ef1edf01b 93 * - switched from string-concatenation-with-macros to cleaner method of
destinyXfate 2:0e2ef1edf01b 94 * renaming global variables for djgpp--i.e., always use prefixes in
destinyXfate 2:0e2ef1edf01b 95 * inlined assembler code (== strings) and conditionally rename the
destinyXfate 2:0e2ef1edf01b 96 * variables, not the other way around. Hence _const4, _mask8_0, etc.
destinyXfate 2:0e2ef1edf01b 97 *
destinyXfate 2:0e2ef1edf01b 98 * 19991024:
destinyXfate 2:0e2ef1edf01b 99 * - fixed mmxsupport()/png_do_read_interlace() first-row bug
destinyXfate 2:0e2ef1edf01b 100 * This one was severely weird: even though mmxsupport() doesn't touch
destinyXfate 2:0e2ef1edf01b 101 * ebx (where "row" pointer was stored), it nevertheless managed to zero
destinyXfate 2:0e2ef1edf01b 102 * the register (even in static/non-fPIC code--see below), which in turn
destinyXfate 2:0e2ef1edf01b 103 * caused png_do_read_interlace() to return prematurely on the first row of
destinyXfate 2:0e2ef1edf01b 104 * interlaced images (i.e., without expanding the interlaced pixels).
destinyXfate 2:0e2ef1edf01b 105 * Inspection of the generated assembly code didn't turn up any clues,
destinyXfate 2:0e2ef1edf01b 106 * although it did point at a minor optimization (i.e., get rid of
destinyXfate 2:0e2ef1edf01b 107 * mmx_supported_local variable and just use eax). Possibly the CPUID
destinyXfate 2:0e2ef1edf01b 108 * instruction is more destructive than it looks? (Not yet checked.)
destinyXfate 2:0e2ef1edf01b 109 * - "info gcc" was next to useless, so compared fPIC and non-fPIC assembly
destinyXfate 2:0e2ef1edf01b 110 * listings... Apparently register spillage has to do with ebx, since
destinyXfate 2:0e2ef1edf01b 111 * it's used to index the global offset table. Commenting it out of the
destinyXfate 2:0e2ef1edf01b 112 * input-reg lists in png_combine_row() eliminated compiler barfage, so
destinyXfate 2:0e2ef1edf01b 113 * ifdef'd with __PIC__ macro: if defined, use a global for unmask
destinyXfate 2:0e2ef1edf01b 114 *
destinyXfate 2:0e2ef1edf01b 115 * 19991107:
destinyXfate 2:0e2ef1edf01b 116 * - verified CPUID clobberage: 12-char string constant ("GenuineIntel",
destinyXfate 2:0e2ef1edf01b 117 * "AuthenticAMD", etc.) placed in ebx:ecx:edx. Still need to polish.
destinyXfate 2:0e2ef1edf01b 118 *
destinyXfate 2:0e2ef1edf01b 119 * 19991120:
destinyXfate 2:0e2ef1edf01b 120 * - made "diff" variable (now "_dif") global to simplify conversion of
destinyXfate 2:0e2ef1edf01b 121 * filtering routines (running out of regs, sigh). "diff" is still used
destinyXfate 2:0e2ef1edf01b 122 * in interlacing routines, however.
destinyXfate 2:0e2ef1edf01b 123 * - fixed up both versions of mmxsupport() (ORIG_THAT_USED_TO_CLOBBER_EBX
destinyXfate 2:0e2ef1edf01b 124 * macro determines which is used); original not yet tested.
destinyXfate 2:0e2ef1edf01b 125 *
destinyXfate 2:0e2ef1edf01b 126 * 20000213:
destinyXfate 2:0e2ef1edf01b 127 * - when compiling with gcc, be sure to use -fomit-frame-pointer
destinyXfate 2:0e2ef1edf01b 128 *
destinyXfate 2:0e2ef1edf01b 129 * 20000319:
destinyXfate 2:0e2ef1edf01b 130 * - fixed a register-name typo in png_do_read_interlace(), default (MMX) case,
destinyXfate 2:0e2ef1edf01b 131 * pass == 4 or 5, that caused visible corruption of interlaced images
destinyXfate 2:0e2ef1edf01b 132 *
destinyXfate 2:0e2ef1edf01b 133 * 20000623:
destinyXfate 2:0e2ef1edf01b 134 * - Various problems were reported with gcc 2.95.2 in the Cygwin environment,
destinyXfate 2:0e2ef1edf01b 135 * many of the form "forbidden register 0 (ax) was spilled for class AREG."
destinyXfate 2:0e2ef1edf01b 136 * This is explained at http://gcc.gnu.org/fom_serv/cache/23.html, and
destinyXfate 2:0e2ef1edf01b 137 * Chuck Wilson supplied a patch involving dummy output registers. See
destinyXfate 2:0e2ef1edf01b 138 * http://sourceforge.net/bugs/?func=detailbug&bug_id=108741&group_id=5624
destinyXfate 2:0e2ef1edf01b 139 * for the original (anonymous) SourceForge bug report.
destinyXfate 2:0e2ef1edf01b 140 *
destinyXfate 2:0e2ef1edf01b 141 * 20000706:
destinyXfate 2:0e2ef1edf01b 142 * - Chuck Wilson passed along these remaining gcc 2.95.2 errors:
destinyXfate 2:0e2ef1edf01b 143 * pnggccrd.c: In function `png_combine_row':
destinyXfate 2:0e2ef1edf01b 144 * pnggccrd.c:525: more than 10 operands in `asm'
destinyXfate 2:0e2ef1edf01b 145 * pnggccrd.c:669: more than 10 operands in `asm'
destinyXfate 2:0e2ef1edf01b 146 * pnggccrd.c:828: more than 10 operands in `asm'
destinyXfate 2:0e2ef1edf01b 147 * pnggccrd.c:994: more than 10 operands in `asm'
destinyXfate 2:0e2ef1edf01b 148 * pnggccrd.c:1177: more than 10 operands in `asm'
destinyXfate 2:0e2ef1edf01b 149 * They are all the same problem and can be worked around by using the
destinyXfate 2:0e2ef1edf01b 150 * global _unmask variable unconditionally, not just in the -fPIC case.
destinyXfate 2:0e2ef1edf01b 151 * Reportedly earlier versions of gcc also have the problem with more than
destinyXfate 2:0e2ef1edf01b 152 * 10 operands; they just don't report it. Much strangeness ensues, etc.
destinyXfate 2:0e2ef1edf01b 153 *
destinyXfate 2:0e2ef1edf01b 154 * 20000729:
destinyXfate 2:0e2ef1edf01b 155 * - enabled png_read_filter_row_mmx_up() (shortest remaining unconverted
destinyXfate 2:0e2ef1edf01b 156 * MMX routine); began converting png_read_filter_row_mmx_sub()
destinyXfate 2:0e2ef1edf01b 157 * - to finish remaining sections:
destinyXfate 2:0e2ef1edf01b 158 * - clean up indentation and comments
destinyXfate 2:0e2ef1edf01b 159 * - preload local variables
destinyXfate 2:0e2ef1edf01b 160 * - add output and input regs (order of former determines numerical
destinyXfate 2:0e2ef1edf01b 161 * mapping of latter)
destinyXfate 2:0e2ef1edf01b 162 * - avoid all usage of ebx (including bx, bh, bl) register [20000823]
destinyXfate 2:0e2ef1edf01b 163 * - remove "$" from addressing of Shift and Mask variables [20000823]
destinyXfate 2:0e2ef1edf01b 164 *
destinyXfate 2:0e2ef1edf01b 165 * 20000731:
destinyXfate 2:0e2ef1edf01b 166 * - global union vars causing segfaults in png_read_filter_row_mmx_sub()?
destinyXfate 2:0e2ef1edf01b 167 *
destinyXfate 2:0e2ef1edf01b 168 * 20000822:
destinyXfate 2:0e2ef1edf01b 169 * - ARGH, stupid png_read_filter_row_mmx_sub() segfault only happens with
destinyXfate 2:0e2ef1edf01b 170 * shared-library (-fPIC) version! Code works just fine as part of static
destinyXfate 2:0e2ef1edf01b 171 * library. Damn damn damn damn damn, should have tested that sooner.
destinyXfate 2:0e2ef1edf01b 172 * ebx is getting clobbered again (explicitly this time); need to save it
destinyXfate 2:0e2ef1edf01b 173 * on stack or rewrite asm code to avoid using it altogether. Blargh!
destinyXfate 2:0e2ef1edf01b 174 *
destinyXfate 2:0e2ef1edf01b 175 * 20000823:
destinyXfate 2:0e2ef1edf01b 176 * - first section was trickiest; all remaining sections have ebx -> edx now.
destinyXfate 2:0e2ef1edf01b 177 * (-fPIC works again.) Also added missing underscores to various Shift*
destinyXfate 2:0e2ef1edf01b 178 * and *Mask* globals and got rid of leading "$" signs.
destinyXfate 2:0e2ef1edf01b 179 *
destinyXfate 2:0e2ef1edf01b 180 * 20000826:
destinyXfate 2:0e2ef1edf01b 181 * - added visual separators to help navigate microscopic printed copies
destinyXfate 2:0e2ef1edf01b 182 * (http://pobox.com/~newt/code/gpr-latest.zip, mode 10); started working
destinyXfate 2:0e2ef1edf01b 183 * on png_read_filter_row_mmx_avg()
destinyXfate 2:0e2ef1edf01b 184 *
destinyXfate 2:0e2ef1edf01b 185 * 20000828:
destinyXfate 2:0e2ef1edf01b 186 * - finished png_read_filter_row_mmx_avg(): only Paeth left! (930 lines...)
destinyXfate 2:0e2ef1edf01b 187 * What the hell, did png_read_filter_row_mmx_paeth(), too. Comments not
destinyXfate 2:0e2ef1edf01b 188 * cleaned up/shortened in either routine, but functionality is complete
destinyXfate 2:0e2ef1edf01b 189 * and seems to be working fine.
destinyXfate 2:0e2ef1edf01b 190 *
destinyXfate 2:0e2ef1edf01b 191 * 20000829:
destinyXfate 2:0e2ef1edf01b 192 * - ahhh, figured out last(?) bit of gcc/gas asm-fu: if register is listed
destinyXfate 2:0e2ef1edf01b 193 * as an input reg (with dummy output variables, etc.), then it *cannot*
destinyXfate 2:0e2ef1edf01b 194 * also appear in the clobber list or gcc 2.95.2 will barf. The solution
destinyXfate 2:0e2ef1edf01b 195 * is simple enough...
destinyXfate 2:0e2ef1edf01b 196 *
destinyXfate 2:0e2ef1edf01b 197 * 20000914:
destinyXfate 2:0e2ef1edf01b 198 * - bug in png_read_filter_row_mmx_avg(): 16-bit grayscale not handled
destinyXfate 2:0e2ef1edf01b 199 * correctly (but 48-bit RGB just fine)
destinyXfate 2:0e2ef1edf01b 200 *
destinyXfate 2:0e2ef1edf01b 201 * 20000916:
destinyXfate 2:0e2ef1edf01b 202 * - fixed bug in png_read_filter_row_mmx_avg(), bpp == 2 case; three errors:
destinyXfate 2:0e2ef1edf01b 203 * - "_ShiftBpp.use = 24;" should have been "_ShiftBpp.use = 16;"
destinyXfate 2:0e2ef1edf01b 204 * - "_ShiftRem.use = 40;" should have been "_ShiftRem.use = 48;"
destinyXfate 2:0e2ef1edf01b 205 * - "psllq _ShiftRem, %%mm2" should have been "psrlq _ShiftRem, %%mm2"
destinyXfate 2:0e2ef1edf01b 206 *
destinyXfate 2:0e2ef1edf01b 207 * 20010101:
destinyXfate 2:0e2ef1edf01b 208 * - added new png_init_mmx_flags() function (here only because it needs to
destinyXfate 2:0e2ef1edf01b 209 * call mmxsupport(), which should probably become global png_mmxsupport());
destinyXfate 2:0e2ef1edf01b 210 * modified other MMX routines to run conditionally (png_ptr->asm_flags)
destinyXfate 2:0e2ef1edf01b 211 *
destinyXfate 2:0e2ef1edf01b 212 * 20010103:
destinyXfate 2:0e2ef1edf01b 213 * - renamed mmxsupport() to png_mmx_support(), with auto-set of mmx_supported,
destinyXfate 2:0e2ef1edf01b 214 * and made it public; moved png_init_mmx_flags() to png.c as internal func
destinyXfate 2:0e2ef1edf01b 215 *
destinyXfate 2:0e2ef1edf01b 216 * 20010104:
destinyXfate 2:0e2ef1edf01b 217 * - removed dependency on png_read_filter_row_c() (C code already duplicated
destinyXfate 2:0e2ef1edf01b 218 * within MMX version of png_read_filter_row()) so no longer necessary to
destinyXfate 2:0e2ef1edf01b 219 * compile it into pngrutil.o
destinyXfate 2:0e2ef1edf01b 220 *
destinyXfate 2:0e2ef1edf01b 221 * 20010310:
destinyXfate 2:0e2ef1edf01b 222 * - fixed buffer-overrun bug in png_combine_row() C code (non-MMX)
destinyXfate 2:0e2ef1edf01b 223 *
destinyXfate 2:0e2ef1edf01b 224 * 20020304:
destinyXfate 2:0e2ef1edf01b 225 * - eliminated incorrect use of width_mmx in pixel_bytes == 8 case
destinyXfate 2:0e2ef1edf01b 226 *
destinyXfate 2:0e2ef1edf01b 227 * 20040724:
destinyXfate 2:0e2ef1edf01b 228 * - more tinkering with clobber list at lines 4529 and 5033, to get
destinyXfate 2:0e2ef1edf01b 229 * it to compile on gcc-3.4.
destinyXfate 2:0e2ef1edf01b 230 *
destinyXfate 2:0e2ef1edf01b 231 * STILL TO DO:
destinyXfate 2:0e2ef1edf01b 232 * - test png_do_read_interlace() 64-bit case (pixel_bytes == 8)
destinyXfate 2:0e2ef1edf01b 233 * - write MMX code for 48-bit case (pixel_bytes == 6)
destinyXfate 2:0e2ef1edf01b 234 * - figure out what's up with 24-bit case (pixel_bytes == 3):
destinyXfate 2:0e2ef1edf01b 235 * why subtract 8 from width_mmx in the pass 4/5 case?
destinyXfate 2:0e2ef1edf01b 236 * (only width_mmx case) (near line 1606)
destinyXfate 2:0e2ef1edf01b 237 * - rewrite all MMX interlacing code so it's aligned with beginning
destinyXfate 2:0e2ef1edf01b 238 * of the row buffer, not the end (see 19991007 for details)
destinyXfate 2:0e2ef1edf01b 239 * x pick one version of mmxsupport() and get rid of the other
destinyXfate 2:0e2ef1edf01b 240 * - add error messages to any remaining bogus default cases
destinyXfate 2:0e2ef1edf01b 241 * - enable pixel_depth == 8 cases in png_read_filter_row()? (test speed)
destinyXfate 2:0e2ef1edf01b 242 * x add support for runtime enable/disable/query of various MMX routines
destinyXfate 2:0e2ef1edf01b 243 */
destinyXfate 2:0e2ef1edf01b 244
destinyXfate 2:0e2ef1edf01b 245 #define PNG_INTERNAL
destinyXfate 2:0e2ef1edf01b 246 #include "png.h"
destinyXfate 2:0e2ef1edf01b 247
destinyXfate 2:0e2ef1edf01b 248 #if defined(PNG_ASSEMBLER_CODE_SUPPORTED) && defined(PNG_USE_PNGGCCRD)
destinyXfate 2:0e2ef1edf01b 249
destinyXfate 2:0e2ef1edf01b 250 int PNGAPI png_mmx_support(void);
destinyXfate 2:0e2ef1edf01b 251
destinyXfate 2:0e2ef1edf01b 252 #ifdef PNG_USE_LOCAL_ARRAYS
destinyXfate 2:0e2ef1edf01b 253 static const int FARDATA png_pass_start[7] = {0, 4, 0, 2, 0, 1, 0};
destinyXfate 2:0e2ef1edf01b 254 static const int FARDATA png_pass_inc[7] = {8, 8, 4, 4, 2, 2, 1};
destinyXfate 2:0e2ef1edf01b 255 static const int FARDATA png_pass_width[7] = {8, 4, 4, 2, 2, 1, 1};
destinyXfate 2:0e2ef1edf01b 256 #endif
destinyXfate 2:0e2ef1edf01b 257
destinyXfate 2:0e2ef1edf01b 258 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 259 /* djgpp, Win32, Cygwin, and OS2 add their own underscores to global variables,
destinyXfate 2:0e2ef1edf01b 260 * so define them without: */
destinyXfate 2:0e2ef1edf01b 261 #if defined(__DJGPP__) || defined(WIN32) || defined(__CYGWIN__) || \
destinyXfate 2:0e2ef1edf01b 262 defined(__OS2__)
destinyXfate 2:0e2ef1edf01b 263 # define _mmx_supported mmx_supported
destinyXfate 2:0e2ef1edf01b 264 # define _const4 const4
destinyXfate 2:0e2ef1edf01b 265 # define _const6 const6
destinyXfate 2:0e2ef1edf01b 266 # define _mask8_0 mask8_0
destinyXfate 2:0e2ef1edf01b 267 # define _mask16_1 mask16_1
destinyXfate 2:0e2ef1edf01b 268 # define _mask16_0 mask16_0
destinyXfate 2:0e2ef1edf01b 269 # define _mask24_2 mask24_2
destinyXfate 2:0e2ef1edf01b 270 # define _mask24_1 mask24_1
destinyXfate 2:0e2ef1edf01b 271 # define _mask24_0 mask24_0
destinyXfate 2:0e2ef1edf01b 272 # define _mask32_3 mask32_3
destinyXfate 2:0e2ef1edf01b 273 # define _mask32_2 mask32_2
destinyXfate 2:0e2ef1edf01b 274 # define _mask32_1 mask32_1
destinyXfate 2:0e2ef1edf01b 275 # define _mask32_0 mask32_0
destinyXfate 2:0e2ef1edf01b 276 # define _mask48_5 mask48_5
destinyXfate 2:0e2ef1edf01b 277 # define _mask48_4 mask48_4
destinyXfate 2:0e2ef1edf01b 278 # define _mask48_3 mask48_3
destinyXfate 2:0e2ef1edf01b 279 # define _mask48_2 mask48_2
destinyXfate 2:0e2ef1edf01b 280 # define _mask48_1 mask48_1
destinyXfate 2:0e2ef1edf01b 281 # define _mask48_0 mask48_0
destinyXfate 2:0e2ef1edf01b 282 # define _LBCarryMask LBCarryMask
destinyXfate 2:0e2ef1edf01b 283 # define _HBClearMask HBClearMask
destinyXfate 2:0e2ef1edf01b 284 # define _ActiveMask ActiveMask
destinyXfate 2:0e2ef1edf01b 285 # define _ActiveMask2 ActiveMask2
destinyXfate 2:0e2ef1edf01b 286 # define _ActiveMaskEnd ActiveMaskEnd
destinyXfate 2:0e2ef1edf01b 287 # define _ShiftBpp ShiftBpp
destinyXfate 2:0e2ef1edf01b 288 # define _ShiftRem ShiftRem
destinyXfate 2:0e2ef1edf01b 289 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 290 # define _unmask unmask
destinyXfate 2:0e2ef1edf01b 291 # define _FullLength FullLength
destinyXfate 2:0e2ef1edf01b 292 # define _MMXLength MMXLength
destinyXfate 2:0e2ef1edf01b 293 # define _dif dif
destinyXfate 2:0e2ef1edf01b 294 # define _patemp patemp
destinyXfate 2:0e2ef1edf01b 295 # define _pbtemp pbtemp
destinyXfate 2:0e2ef1edf01b 296 # define _pctemp pctemp
destinyXfate 2:0e2ef1edf01b 297 #endif
destinyXfate 2:0e2ef1edf01b 298 #endif
destinyXfate 2:0e2ef1edf01b 299
destinyXfate 2:0e2ef1edf01b 300
destinyXfate 2:0e2ef1edf01b 301 /* These constants are used in the inlined MMX assembly code.
destinyXfate 2:0e2ef1edf01b 302 Ignore gcc's "At top level: defined but not used" warnings. */
destinyXfate 2:0e2ef1edf01b 303
destinyXfate 2:0e2ef1edf01b 304 /* GRR 20000706: originally _unmask was needed only when compiling with -fPIC,
destinyXfate 2:0e2ef1edf01b 305 * since that case uses the %ebx register for indexing the Global Offset Table
destinyXfate 2:0e2ef1edf01b 306 * and there were no other registers available. But gcc 2.95 and later emit
destinyXfate 2:0e2ef1edf01b 307 * "more than 10 operands in `asm'" errors when %ebx is used to preload unmask
destinyXfate 2:0e2ef1edf01b 308 * in the non-PIC case, so we'll just use the global unconditionally now.
destinyXfate 2:0e2ef1edf01b 309 */
destinyXfate 2:0e2ef1edf01b 310 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 311 static int _unmask;
destinyXfate 2:0e2ef1edf01b 312 #endif
destinyXfate 2:0e2ef1edf01b 313
destinyXfate 2:0e2ef1edf01b 314 static unsigned long long _mask8_0 = 0x0102040810204080LL;
destinyXfate 2:0e2ef1edf01b 315
destinyXfate 2:0e2ef1edf01b 316 static unsigned long long _mask16_1 = 0x0101020204040808LL;
destinyXfate 2:0e2ef1edf01b 317 static unsigned long long _mask16_0 = 0x1010202040408080LL;
destinyXfate 2:0e2ef1edf01b 318
destinyXfate 2:0e2ef1edf01b 319 static unsigned long long _mask24_2 = 0x0101010202020404LL;
destinyXfate 2:0e2ef1edf01b 320 static unsigned long long _mask24_1 = 0x0408080810101020LL;
destinyXfate 2:0e2ef1edf01b 321 static unsigned long long _mask24_0 = 0x2020404040808080LL;
destinyXfate 2:0e2ef1edf01b 322
destinyXfate 2:0e2ef1edf01b 323 static unsigned long long _mask32_3 = 0x0101010102020202LL;
destinyXfate 2:0e2ef1edf01b 324 static unsigned long long _mask32_2 = 0x0404040408080808LL;
destinyXfate 2:0e2ef1edf01b 325 static unsigned long long _mask32_1 = 0x1010101020202020LL;
destinyXfate 2:0e2ef1edf01b 326 static unsigned long long _mask32_0 = 0x4040404080808080LL;
destinyXfate 2:0e2ef1edf01b 327
destinyXfate 2:0e2ef1edf01b 328 static unsigned long long _mask48_5 = 0x0101010101010202LL;
destinyXfate 2:0e2ef1edf01b 329 static unsigned long long _mask48_4 = 0x0202020204040404LL;
destinyXfate 2:0e2ef1edf01b 330 static unsigned long long _mask48_3 = 0x0404080808080808LL;
destinyXfate 2:0e2ef1edf01b 331 static unsigned long long _mask48_2 = 0x1010101010102020LL;
destinyXfate 2:0e2ef1edf01b 332 static unsigned long long _mask48_1 = 0x2020202040404040LL;
destinyXfate 2:0e2ef1edf01b 333 static unsigned long long _mask48_0 = 0x4040808080808080LL;
destinyXfate 2:0e2ef1edf01b 334
destinyXfate 2:0e2ef1edf01b 335 static unsigned long long _const4 = 0x0000000000FFFFFFLL;
destinyXfate 2:0e2ef1edf01b 336 //static unsigned long long _const5 = 0x000000FFFFFF0000LL; // NOT USED
destinyXfate 2:0e2ef1edf01b 337 static unsigned long long _const6 = 0x00000000000000FFLL;
destinyXfate 2:0e2ef1edf01b 338
destinyXfate 2:0e2ef1edf01b 339 // These are used in the row-filter routines and should/would be local
destinyXfate 2:0e2ef1edf01b 340 // variables if not for gcc addressing limitations.
destinyXfate 2:0e2ef1edf01b 341 // WARNING: Their presence probably defeats the thread safety of libpng.
destinyXfate 2:0e2ef1edf01b 342
destinyXfate 2:0e2ef1edf01b 343 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 344 static png_uint_32 _FullLength;
destinyXfate 2:0e2ef1edf01b 345 static png_uint_32 _MMXLength;
destinyXfate 2:0e2ef1edf01b 346 static int _dif;
destinyXfate 2:0e2ef1edf01b 347 static int _patemp; // temp variables for Paeth routine
destinyXfate 2:0e2ef1edf01b 348 static int _pbtemp;
destinyXfate 2:0e2ef1edf01b 349 static int _pctemp;
destinyXfate 2:0e2ef1edf01b 350 #endif
destinyXfate 2:0e2ef1edf01b 351
destinyXfate 2:0e2ef1edf01b 352 void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 353 png_squelch_warnings(void)
destinyXfate 2:0e2ef1edf01b 354 {
destinyXfate 2:0e2ef1edf01b 355 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 356 _dif = _dif;
destinyXfate 2:0e2ef1edf01b 357 _patemp = _patemp;
destinyXfate 2:0e2ef1edf01b 358 _pbtemp = _pbtemp;
destinyXfate 2:0e2ef1edf01b 359 _pctemp = _pctemp;
destinyXfate 2:0e2ef1edf01b 360 _MMXLength = _MMXLength;
destinyXfate 2:0e2ef1edf01b 361 #endif
destinyXfate 2:0e2ef1edf01b 362 _const4 = _const4;
destinyXfate 2:0e2ef1edf01b 363 _const6 = _const6;
destinyXfate 2:0e2ef1edf01b 364 _mask8_0 = _mask8_0;
destinyXfate 2:0e2ef1edf01b 365 _mask16_1 = _mask16_1;
destinyXfate 2:0e2ef1edf01b 366 _mask16_0 = _mask16_0;
destinyXfate 2:0e2ef1edf01b 367 _mask24_2 = _mask24_2;
destinyXfate 2:0e2ef1edf01b 368 _mask24_1 = _mask24_1;
destinyXfate 2:0e2ef1edf01b 369 _mask24_0 = _mask24_0;
destinyXfate 2:0e2ef1edf01b 370 _mask32_3 = _mask32_3;
destinyXfate 2:0e2ef1edf01b 371 _mask32_2 = _mask32_2;
destinyXfate 2:0e2ef1edf01b 372 _mask32_1 = _mask32_1;
destinyXfate 2:0e2ef1edf01b 373 _mask32_0 = _mask32_0;
destinyXfate 2:0e2ef1edf01b 374 _mask48_5 = _mask48_5;
destinyXfate 2:0e2ef1edf01b 375 _mask48_4 = _mask48_4;
destinyXfate 2:0e2ef1edf01b 376 _mask48_3 = _mask48_3;
destinyXfate 2:0e2ef1edf01b 377 _mask48_2 = _mask48_2;
destinyXfate 2:0e2ef1edf01b 378 _mask48_1 = _mask48_1;
destinyXfate 2:0e2ef1edf01b 379 _mask48_0 = _mask48_0;
destinyXfate 2:0e2ef1edf01b 380 }
destinyXfate 2:0e2ef1edf01b 381 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 382
destinyXfate 2:0e2ef1edf01b 383
destinyXfate 2:0e2ef1edf01b 384 static int _mmx_supported = 2;
destinyXfate 2:0e2ef1edf01b 385
destinyXfate 2:0e2ef1edf01b 386 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 387 /* */
destinyXfate 2:0e2ef1edf01b 388 /* P N G _ C O M B I N E _ R O W */
destinyXfate 2:0e2ef1edf01b 389 /* */
destinyXfate 2:0e2ef1edf01b 390 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 391
destinyXfate 2:0e2ef1edf01b 392 #if defined(PNG_HAVE_MMX_COMBINE_ROW)
destinyXfate 2:0e2ef1edf01b 393
destinyXfate 2:0e2ef1edf01b 394 #define BPP2 2
destinyXfate 2:0e2ef1edf01b 395 #define BPP3 3 /* bytes per pixel (a.k.a. pixel_bytes) */
destinyXfate 2:0e2ef1edf01b 396 #define BPP4 4
destinyXfate 2:0e2ef1edf01b 397 #define BPP6 6 /* (defined only to help avoid cut-and-paste errors) */
destinyXfate 2:0e2ef1edf01b 398 #define BPP8 8
destinyXfate 2:0e2ef1edf01b 399
destinyXfate 2:0e2ef1edf01b 400 /* Combines the row recently read in with the previous row.
destinyXfate 2:0e2ef1edf01b 401 This routine takes care of alpha and transparency if requested.
destinyXfate 2:0e2ef1edf01b 402 This routine also handles the two methods of progressive display
destinyXfate 2:0e2ef1edf01b 403 of interlaced images, depending on the mask value.
destinyXfate 2:0e2ef1edf01b 404 The mask value describes which pixels are to be combined with
destinyXfate 2:0e2ef1edf01b 405 the row. The pattern always repeats every 8 pixels, so just 8
destinyXfate 2:0e2ef1edf01b 406 bits are needed. A one indicates the pixel is to be combined; a
destinyXfate 2:0e2ef1edf01b 407 zero indicates the pixel is to be skipped. This is in addition
destinyXfate 2:0e2ef1edf01b 408 to any alpha or transparency value associated with the pixel.
destinyXfate 2:0e2ef1edf01b 409 If you want all pixels to be combined, pass 0xff (255) in mask. */
destinyXfate 2:0e2ef1edf01b 410
destinyXfate 2:0e2ef1edf01b 411 /* Use this routine for the x86 platform - it uses a faster MMX routine
destinyXfate 2:0e2ef1edf01b 412 if the machine supports MMX. */
destinyXfate 2:0e2ef1edf01b 413
destinyXfate 2:0e2ef1edf01b 414 void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 415 png_combine_row(png_structp png_ptr, png_bytep row, int mask)
destinyXfate 2:0e2ef1edf01b 416 {
destinyXfate 2:0e2ef1edf01b 417 png_debug(1, "in png_combine_row (pnggccrd.c)\n");
destinyXfate 2:0e2ef1edf01b 418
destinyXfate 2:0e2ef1edf01b 419 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 420 if (_mmx_supported == 2) {
destinyXfate 2:0e2ef1edf01b 421 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 422 /* this should have happened in png_init_mmx_flags() already */
destinyXfate 2:0e2ef1edf01b 423 png_warning(png_ptr, "asm_flags may not have been initialized");
destinyXfate 2:0e2ef1edf01b 424 #endif
destinyXfate 2:0e2ef1edf01b 425 png_mmx_support();
destinyXfate 2:0e2ef1edf01b 426 }
destinyXfate 2:0e2ef1edf01b 427 #endif
destinyXfate 2:0e2ef1edf01b 428
destinyXfate 2:0e2ef1edf01b 429 if (mask == 0xff)
destinyXfate 2:0e2ef1edf01b 430 {
destinyXfate 2:0e2ef1edf01b 431 png_debug(2,"mask == 0xff: doing single png_memcpy()\n");
destinyXfate 2:0e2ef1edf01b 432 png_memcpy(row, png_ptr->row_buf + 1,
destinyXfate 2:0e2ef1edf01b 433 (png_size_t)PNG_ROWBYTES(png_ptr->row_info.pixel_depth,png_ptr->width));
destinyXfate 2:0e2ef1edf01b 434 }
destinyXfate 2:0e2ef1edf01b 435 else /* (png_combine_row() is never called with mask == 0) */
destinyXfate 2:0e2ef1edf01b 436 {
destinyXfate 2:0e2ef1edf01b 437 switch (png_ptr->row_info.pixel_depth)
destinyXfate 2:0e2ef1edf01b 438 {
destinyXfate 2:0e2ef1edf01b 439 case 1: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 440 {
destinyXfate 2:0e2ef1edf01b 441 png_bytep sp;
destinyXfate 2:0e2ef1edf01b 442 png_bytep dp;
destinyXfate 2:0e2ef1edf01b 443 int s_inc, s_start, s_end;
destinyXfate 2:0e2ef1edf01b 444 int m;
destinyXfate 2:0e2ef1edf01b 445 int shift;
destinyXfate 2:0e2ef1edf01b 446 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 447
destinyXfate 2:0e2ef1edf01b 448 sp = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 449 dp = row;
destinyXfate 2:0e2ef1edf01b 450 m = 0x80;
destinyXfate 2:0e2ef1edf01b 451 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 452 if (png_ptr->transformations & PNG_PACKSWAP)
destinyXfate 2:0e2ef1edf01b 453 {
destinyXfate 2:0e2ef1edf01b 454 s_start = 0;
destinyXfate 2:0e2ef1edf01b 455 s_end = 7;
destinyXfate 2:0e2ef1edf01b 456 s_inc = 1;
destinyXfate 2:0e2ef1edf01b 457 }
destinyXfate 2:0e2ef1edf01b 458 else
destinyXfate 2:0e2ef1edf01b 459 #endif
destinyXfate 2:0e2ef1edf01b 460 {
destinyXfate 2:0e2ef1edf01b 461 s_start = 7;
destinyXfate 2:0e2ef1edf01b 462 s_end = 0;
destinyXfate 2:0e2ef1edf01b 463 s_inc = -1;
destinyXfate 2:0e2ef1edf01b 464 }
destinyXfate 2:0e2ef1edf01b 465
destinyXfate 2:0e2ef1edf01b 466 shift = s_start;
destinyXfate 2:0e2ef1edf01b 467
destinyXfate 2:0e2ef1edf01b 468 for (i = 0; i < png_ptr->width; i++)
destinyXfate 2:0e2ef1edf01b 469 {
destinyXfate 2:0e2ef1edf01b 470 if (m & mask)
destinyXfate 2:0e2ef1edf01b 471 {
destinyXfate 2:0e2ef1edf01b 472 int value;
destinyXfate 2:0e2ef1edf01b 473
destinyXfate 2:0e2ef1edf01b 474 value = (*sp >> shift) & 0x1;
destinyXfate 2:0e2ef1edf01b 475 *dp &= (png_byte)((0x7f7f >> (7 - shift)) & 0xff);
destinyXfate 2:0e2ef1edf01b 476 *dp |= (png_byte)(value << shift);
destinyXfate 2:0e2ef1edf01b 477 }
destinyXfate 2:0e2ef1edf01b 478
destinyXfate 2:0e2ef1edf01b 479 if (shift == s_end)
destinyXfate 2:0e2ef1edf01b 480 {
destinyXfate 2:0e2ef1edf01b 481 shift = s_start;
destinyXfate 2:0e2ef1edf01b 482 sp++;
destinyXfate 2:0e2ef1edf01b 483 dp++;
destinyXfate 2:0e2ef1edf01b 484 }
destinyXfate 2:0e2ef1edf01b 485 else
destinyXfate 2:0e2ef1edf01b 486 shift += s_inc;
destinyXfate 2:0e2ef1edf01b 487
destinyXfate 2:0e2ef1edf01b 488 if (m == 1)
destinyXfate 2:0e2ef1edf01b 489 m = 0x80;
destinyXfate 2:0e2ef1edf01b 490 else
destinyXfate 2:0e2ef1edf01b 491 m >>= 1;
destinyXfate 2:0e2ef1edf01b 492 }
destinyXfate 2:0e2ef1edf01b 493 break;
destinyXfate 2:0e2ef1edf01b 494 }
destinyXfate 2:0e2ef1edf01b 495
destinyXfate 2:0e2ef1edf01b 496 case 2: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 497 {
destinyXfate 2:0e2ef1edf01b 498 png_bytep sp;
destinyXfate 2:0e2ef1edf01b 499 png_bytep dp;
destinyXfate 2:0e2ef1edf01b 500 int s_start, s_end, s_inc;
destinyXfate 2:0e2ef1edf01b 501 int m;
destinyXfate 2:0e2ef1edf01b 502 int shift;
destinyXfate 2:0e2ef1edf01b 503 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 504 int value;
destinyXfate 2:0e2ef1edf01b 505
destinyXfate 2:0e2ef1edf01b 506 sp = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 507 dp = row;
destinyXfate 2:0e2ef1edf01b 508 m = 0x80;
destinyXfate 2:0e2ef1edf01b 509 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 510 if (png_ptr->transformations & PNG_PACKSWAP)
destinyXfate 2:0e2ef1edf01b 511 {
destinyXfate 2:0e2ef1edf01b 512 s_start = 0;
destinyXfate 2:0e2ef1edf01b 513 s_end = 6;
destinyXfate 2:0e2ef1edf01b 514 s_inc = 2;
destinyXfate 2:0e2ef1edf01b 515 }
destinyXfate 2:0e2ef1edf01b 516 else
destinyXfate 2:0e2ef1edf01b 517 #endif
destinyXfate 2:0e2ef1edf01b 518 {
destinyXfate 2:0e2ef1edf01b 519 s_start = 6;
destinyXfate 2:0e2ef1edf01b 520 s_end = 0;
destinyXfate 2:0e2ef1edf01b 521 s_inc = -2;
destinyXfate 2:0e2ef1edf01b 522 }
destinyXfate 2:0e2ef1edf01b 523
destinyXfate 2:0e2ef1edf01b 524 shift = s_start;
destinyXfate 2:0e2ef1edf01b 525
destinyXfate 2:0e2ef1edf01b 526 for (i = 0; i < png_ptr->width; i++)
destinyXfate 2:0e2ef1edf01b 527 {
destinyXfate 2:0e2ef1edf01b 528 if (m & mask)
destinyXfate 2:0e2ef1edf01b 529 {
destinyXfate 2:0e2ef1edf01b 530 value = (*sp >> shift) & 0x3;
destinyXfate 2:0e2ef1edf01b 531 *dp &= (png_byte)((0x3f3f >> (6 - shift)) & 0xff);
destinyXfate 2:0e2ef1edf01b 532 *dp |= (png_byte)(value << shift);
destinyXfate 2:0e2ef1edf01b 533 }
destinyXfate 2:0e2ef1edf01b 534
destinyXfate 2:0e2ef1edf01b 535 if (shift == s_end)
destinyXfate 2:0e2ef1edf01b 536 {
destinyXfate 2:0e2ef1edf01b 537 shift = s_start;
destinyXfate 2:0e2ef1edf01b 538 sp++;
destinyXfate 2:0e2ef1edf01b 539 dp++;
destinyXfate 2:0e2ef1edf01b 540 }
destinyXfate 2:0e2ef1edf01b 541 else
destinyXfate 2:0e2ef1edf01b 542 shift += s_inc;
destinyXfate 2:0e2ef1edf01b 543 if (m == 1)
destinyXfate 2:0e2ef1edf01b 544 m = 0x80;
destinyXfate 2:0e2ef1edf01b 545 else
destinyXfate 2:0e2ef1edf01b 546 m >>= 1;
destinyXfate 2:0e2ef1edf01b 547 }
destinyXfate 2:0e2ef1edf01b 548 break;
destinyXfate 2:0e2ef1edf01b 549 }
destinyXfate 2:0e2ef1edf01b 550
destinyXfate 2:0e2ef1edf01b 551 case 4: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 552 {
destinyXfate 2:0e2ef1edf01b 553 png_bytep sp;
destinyXfate 2:0e2ef1edf01b 554 png_bytep dp;
destinyXfate 2:0e2ef1edf01b 555 int s_start, s_end, s_inc;
destinyXfate 2:0e2ef1edf01b 556 int m;
destinyXfate 2:0e2ef1edf01b 557 int shift;
destinyXfate 2:0e2ef1edf01b 558 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 559 int value;
destinyXfate 2:0e2ef1edf01b 560
destinyXfate 2:0e2ef1edf01b 561 sp = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 562 dp = row;
destinyXfate 2:0e2ef1edf01b 563 m = 0x80;
destinyXfate 2:0e2ef1edf01b 564 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 565 if (png_ptr->transformations & PNG_PACKSWAP)
destinyXfate 2:0e2ef1edf01b 566 {
destinyXfate 2:0e2ef1edf01b 567 s_start = 0;
destinyXfate 2:0e2ef1edf01b 568 s_end = 4;
destinyXfate 2:0e2ef1edf01b 569 s_inc = 4;
destinyXfate 2:0e2ef1edf01b 570 }
destinyXfate 2:0e2ef1edf01b 571 else
destinyXfate 2:0e2ef1edf01b 572 #endif
destinyXfate 2:0e2ef1edf01b 573 {
destinyXfate 2:0e2ef1edf01b 574 s_start = 4;
destinyXfate 2:0e2ef1edf01b 575 s_end = 0;
destinyXfate 2:0e2ef1edf01b 576 s_inc = -4;
destinyXfate 2:0e2ef1edf01b 577 }
destinyXfate 2:0e2ef1edf01b 578 shift = s_start;
destinyXfate 2:0e2ef1edf01b 579
destinyXfate 2:0e2ef1edf01b 580 for (i = 0; i < png_ptr->width; i++)
destinyXfate 2:0e2ef1edf01b 581 {
destinyXfate 2:0e2ef1edf01b 582 if (m & mask)
destinyXfate 2:0e2ef1edf01b 583 {
destinyXfate 2:0e2ef1edf01b 584 value = (*sp >> shift) & 0xf;
destinyXfate 2:0e2ef1edf01b 585 *dp &= (png_byte)((0xf0f >> (4 - shift)) & 0xff);
destinyXfate 2:0e2ef1edf01b 586 *dp |= (png_byte)(value << shift);
destinyXfate 2:0e2ef1edf01b 587 }
destinyXfate 2:0e2ef1edf01b 588
destinyXfate 2:0e2ef1edf01b 589 if (shift == s_end)
destinyXfate 2:0e2ef1edf01b 590 {
destinyXfate 2:0e2ef1edf01b 591 shift = s_start;
destinyXfate 2:0e2ef1edf01b 592 sp++;
destinyXfate 2:0e2ef1edf01b 593 dp++;
destinyXfate 2:0e2ef1edf01b 594 }
destinyXfate 2:0e2ef1edf01b 595 else
destinyXfate 2:0e2ef1edf01b 596 shift += s_inc;
destinyXfate 2:0e2ef1edf01b 597 if (m == 1)
destinyXfate 2:0e2ef1edf01b 598 m = 0x80;
destinyXfate 2:0e2ef1edf01b 599 else
destinyXfate 2:0e2ef1edf01b 600 m >>= 1;
destinyXfate 2:0e2ef1edf01b 601 }
destinyXfate 2:0e2ef1edf01b 602 break;
destinyXfate 2:0e2ef1edf01b 603 }
destinyXfate 2:0e2ef1edf01b 604
destinyXfate 2:0e2ef1edf01b 605 case 8: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 606 {
destinyXfate 2:0e2ef1edf01b 607 png_bytep srcptr;
destinyXfate 2:0e2ef1edf01b 608 png_bytep dstptr;
destinyXfate 2:0e2ef1edf01b 609
destinyXfate 2:0e2ef1edf01b 610 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 611 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 612 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_COMBINE_ROW)
destinyXfate 2:0e2ef1edf01b 613 /* && _mmx_supported */ )
destinyXfate 2:0e2ef1edf01b 614 #else
destinyXfate 2:0e2ef1edf01b 615 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 616 #endif
destinyXfate 2:0e2ef1edf01b 617 {
destinyXfate 2:0e2ef1edf01b 618 png_uint_32 len;
destinyXfate 2:0e2ef1edf01b 619 int diff;
destinyXfate 2:0e2ef1edf01b 620 int dummy_value_a; // fix 'forbidden register spilled' error
destinyXfate 2:0e2ef1edf01b 621 int dummy_value_d;
destinyXfate 2:0e2ef1edf01b 622 int dummy_value_c;
destinyXfate 2:0e2ef1edf01b 623 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 624 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 625 _unmask = ~mask; // global variable for -fPIC version
destinyXfate 2:0e2ef1edf01b 626 srcptr = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 627 dstptr = row;
destinyXfate 2:0e2ef1edf01b 628 len = png_ptr->width &~7; // reduce to multiple of 8
destinyXfate 2:0e2ef1edf01b 629 diff = (int) (png_ptr->width & 7); // amount lost
destinyXfate 2:0e2ef1edf01b 630
destinyXfate 2:0e2ef1edf01b 631 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 632 "movd _unmask, %%mm7 \n\t" // load bit pattern
destinyXfate 2:0e2ef1edf01b 633 "psubb %%mm6, %%mm6 \n\t" // zero mm6
destinyXfate 2:0e2ef1edf01b 634 "punpcklbw %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 635 "punpcklwd %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 636 "punpckldq %%mm7, %%mm7 \n\t" // fill reg with 8 masks
destinyXfate 2:0e2ef1edf01b 637
destinyXfate 2:0e2ef1edf01b 638 "movq _mask8_0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 639 "pand %%mm7, %%mm0 \n\t" // nonzero if keep byte
destinyXfate 2:0e2ef1edf01b 640 "pcmpeqb %%mm6, %%mm0 \n\t" // zeros->1s, v versa
destinyXfate 2:0e2ef1edf01b 641
destinyXfate 2:0e2ef1edf01b 642 // preload "movl len, %%ecx \n\t" // load length of line
destinyXfate 2:0e2ef1edf01b 643 // preload "movl srcptr, %%esi \n\t" // load source
destinyXfate 2:0e2ef1edf01b 644 // preload "movl dstptr, %%edi \n\t" // load dest
destinyXfate 2:0e2ef1edf01b 645
destinyXfate 2:0e2ef1edf01b 646 "cmpl $0, %%ecx \n\t" // len == 0 ?
destinyXfate 2:0e2ef1edf01b 647 "je mainloop8end \n\t"
destinyXfate 2:0e2ef1edf01b 648
destinyXfate 2:0e2ef1edf01b 649 "mainloop8: \n\t"
destinyXfate 2:0e2ef1edf01b 650 "movq (%%esi), %%mm4 \n\t" // *srcptr
destinyXfate 2:0e2ef1edf01b 651 "pand %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 652 "movq %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 653 "pandn (%%edi), %%mm6 \n\t" // *dstptr
destinyXfate 2:0e2ef1edf01b 654 "por %%mm6, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 655 "movq %%mm4, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 656 "addl $8, %%esi \n\t" // inc by 8 bytes processed
destinyXfate 2:0e2ef1edf01b 657 "addl $8, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 658 "subl $8, %%ecx \n\t" // dec by 8 pixels processed
destinyXfate 2:0e2ef1edf01b 659 "ja mainloop8 \n\t"
destinyXfate 2:0e2ef1edf01b 660
destinyXfate 2:0e2ef1edf01b 661 "mainloop8end: \n\t"
destinyXfate 2:0e2ef1edf01b 662 // preload "movl diff, %%ecx \n\t" // (diff is in eax)
destinyXfate 2:0e2ef1edf01b 663 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 664 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 665 "jz end8 \n\t"
destinyXfate 2:0e2ef1edf01b 666 // preload "movl mask, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 667 "sall $24, %%edx \n\t" // make low byte, high byte
destinyXfate 2:0e2ef1edf01b 668
destinyXfate 2:0e2ef1edf01b 669 "secondloop8: \n\t"
destinyXfate 2:0e2ef1edf01b 670 "sall %%edx \n\t" // move high bit to CF
destinyXfate 2:0e2ef1edf01b 671 "jnc skip8 \n\t" // if CF = 0
destinyXfate 2:0e2ef1edf01b 672 "movb (%%esi), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 673 "movb %%al, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 674
destinyXfate 2:0e2ef1edf01b 675 "skip8: \n\t"
destinyXfate 2:0e2ef1edf01b 676 "incl %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 677 "incl %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 678 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 679 "jnz secondloop8 \n\t"
destinyXfate 2:0e2ef1edf01b 680
destinyXfate 2:0e2ef1edf01b 681 "end8: \n\t"
destinyXfate 2:0e2ef1edf01b 682 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 683
destinyXfate 2:0e2ef1edf01b 684 : "=a" (dummy_value_a), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 685 "=d" (dummy_value_d),
destinyXfate 2:0e2ef1edf01b 686 "=c" (dummy_value_c),
destinyXfate 2:0e2ef1edf01b 687 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 688 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 689
destinyXfate 2:0e2ef1edf01b 690 : "3" (srcptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 691 "4" (dstptr), // edi
destinyXfate 2:0e2ef1edf01b 692 "0" (diff), // eax
destinyXfate 2:0e2ef1edf01b 693 // was (unmask) "b" RESERVED // ebx // Global Offset Table idx
destinyXfate 2:0e2ef1edf01b 694 "2" (len), // ecx
destinyXfate 2:0e2ef1edf01b 695 "1" (mask) // edx
destinyXfate 2:0e2ef1edf01b 696
destinyXfate 2:0e2ef1edf01b 697 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 698 : "%mm0", "%mm4", "%mm6", "%mm7" // clobber list
destinyXfate 2:0e2ef1edf01b 699 #endif
destinyXfate 2:0e2ef1edf01b 700 );
destinyXfate 2:0e2ef1edf01b 701 }
destinyXfate 2:0e2ef1edf01b 702 else /* mmx _not supported - Use modified C routine */
destinyXfate 2:0e2ef1edf01b 703 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 704 {
destinyXfate 2:0e2ef1edf01b 705 register png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 706 png_uint_32 initial_val = png_pass_start[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 707 /* png.c: png_pass_start[] = {0, 4, 0, 2, 0, 1, 0}; */
destinyXfate 2:0e2ef1edf01b 708 register int stride = png_pass_inc[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 709 /* png.c: png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1}; */
destinyXfate 2:0e2ef1edf01b 710 register int rep_bytes = png_pass_width[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 711 /* png.c: png_pass_width[] = {8, 4, 4, 2, 2, 1, 1}; */
destinyXfate 2:0e2ef1edf01b 712 png_uint_32 len = png_ptr->width &~7; /* reduce to mult. of 8 */
destinyXfate 2:0e2ef1edf01b 713 int diff = (int) (png_ptr->width & 7); /* amount lost */
destinyXfate 2:0e2ef1edf01b 714 register png_uint_32 final_val = len; /* GRR bugfix */
destinyXfate 2:0e2ef1edf01b 715
destinyXfate 2:0e2ef1edf01b 716 srcptr = png_ptr->row_buf + 1 + initial_val;
destinyXfate 2:0e2ef1edf01b 717 dstptr = row + initial_val;
destinyXfate 2:0e2ef1edf01b 718
destinyXfate 2:0e2ef1edf01b 719 for (i = initial_val; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 720 {
destinyXfate 2:0e2ef1edf01b 721 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 722 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 723 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 724 }
destinyXfate 2:0e2ef1edf01b 725 if (diff) /* number of leftover pixels: 3 for pngtest */
destinyXfate 2:0e2ef1edf01b 726 {
destinyXfate 2:0e2ef1edf01b 727 final_val+=diff /* *BPP1 */ ;
destinyXfate 2:0e2ef1edf01b 728 for (; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 729 {
destinyXfate 2:0e2ef1edf01b 730 if (rep_bytes > (int)(final_val-i))
destinyXfate 2:0e2ef1edf01b 731 rep_bytes = (int)(final_val-i);
destinyXfate 2:0e2ef1edf01b 732 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 733 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 734 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 735 }
destinyXfate 2:0e2ef1edf01b 736 }
destinyXfate 2:0e2ef1edf01b 737
destinyXfate 2:0e2ef1edf01b 738 } /* end of else (_mmx_supported) */
destinyXfate 2:0e2ef1edf01b 739
destinyXfate 2:0e2ef1edf01b 740 break;
destinyXfate 2:0e2ef1edf01b 741 } /* end 8 bpp */
destinyXfate 2:0e2ef1edf01b 742
destinyXfate 2:0e2ef1edf01b 743 case 16: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 744 {
destinyXfate 2:0e2ef1edf01b 745 png_bytep srcptr;
destinyXfate 2:0e2ef1edf01b 746 png_bytep dstptr;
destinyXfate 2:0e2ef1edf01b 747
destinyXfate 2:0e2ef1edf01b 748 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 749 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 750 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_COMBINE_ROW)
destinyXfate 2:0e2ef1edf01b 751 /* && _mmx_supported */ )
destinyXfate 2:0e2ef1edf01b 752 #else
destinyXfate 2:0e2ef1edf01b 753 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 754 #endif
destinyXfate 2:0e2ef1edf01b 755 {
destinyXfate 2:0e2ef1edf01b 756 png_uint_32 len;
destinyXfate 2:0e2ef1edf01b 757 int diff;
destinyXfate 2:0e2ef1edf01b 758 int dummy_value_a; // fix 'forbidden register spilled' error
destinyXfate 2:0e2ef1edf01b 759 int dummy_value_d;
destinyXfate 2:0e2ef1edf01b 760 int dummy_value_c;
destinyXfate 2:0e2ef1edf01b 761 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 762 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 763 _unmask = ~mask; // global variable for -fPIC version
destinyXfate 2:0e2ef1edf01b 764 srcptr = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 765 dstptr = row;
destinyXfate 2:0e2ef1edf01b 766 len = png_ptr->width &~7; // reduce to multiple of 8
destinyXfate 2:0e2ef1edf01b 767 diff = (int) (png_ptr->width & 7); // amount lost //
destinyXfate 2:0e2ef1edf01b 768
destinyXfate 2:0e2ef1edf01b 769 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 770 "movd _unmask, %%mm7 \n\t" // load bit pattern
destinyXfate 2:0e2ef1edf01b 771 "psubb %%mm6, %%mm6 \n\t" // zero mm6
destinyXfate 2:0e2ef1edf01b 772 "punpcklbw %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 773 "punpcklwd %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 774 "punpckldq %%mm7, %%mm7 \n\t" // fill reg with 8 masks
destinyXfate 2:0e2ef1edf01b 775
destinyXfate 2:0e2ef1edf01b 776 "movq _mask16_0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 777 "movq _mask16_1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 778
destinyXfate 2:0e2ef1edf01b 779 "pand %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 780 "pand %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 781
destinyXfate 2:0e2ef1edf01b 782 "pcmpeqb %%mm6, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 783 "pcmpeqb %%mm6, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 784
destinyXfate 2:0e2ef1edf01b 785 // preload "movl len, %%ecx \n\t" // load length of line
destinyXfate 2:0e2ef1edf01b 786 // preload "movl srcptr, %%esi \n\t" // load source
destinyXfate 2:0e2ef1edf01b 787 // preload "movl dstptr, %%edi \n\t" // load dest
destinyXfate 2:0e2ef1edf01b 788
destinyXfate 2:0e2ef1edf01b 789 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 790 "jz mainloop16end \n\t"
destinyXfate 2:0e2ef1edf01b 791
destinyXfate 2:0e2ef1edf01b 792 "mainloop16: \n\t"
destinyXfate 2:0e2ef1edf01b 793 "movq (%%esi), %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 794 "pand %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 795 "movq %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 796 "movq (%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 797 "pandn %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 798 "por %%mm6, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 799 "movq %%mm4, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 800
destinyXfate 2:0e2ef1edf01b 801 "movq 8(%%esi), %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 802 "pand %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 803 "movq %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 804 "movq 8(%%edi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 805 "pandn %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 806 "por %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 807 "movq %%mm5, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 808
destinyXfate 2:0e2ef1edf01b 809 "addl $16, %%esi \n\t" // inc by 16 bytes processed
destinyXfate 2:0e2ef1edf01b 810 "addl $16, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 811 "subl $8, %%ecx \n\t" // dec by 8 pixels processed
destinyXfate 2:0e2ef1edf01b 812 "ja mainloop16 \n\t"
destinyXfate 2:0e2ef1edf01b 813
destinyXfate 2:0e2ef1edf01b 814 "mainloop16end: \n\t"
destinyXfate 2:0e2ef1edf01b 815 // preload "movl diff, %%ecx \n\t" // (diff is in eax)
destinyXfate 2:0e2ef1edf01b 816 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 817 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 818 "jz end16 \n\t"
destinyXfate 2:0e2ef1edf01b 819 // preload "movl mask, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 820 "sall $24, %%edx \n\t" // make low byte, high byte
destinyXfate 2:0e2ef1edf01b 821
destinyXfate 2:0e2ef1edf01b 822 "secondloop16: \n\t"
destinyXfate 2:0e2ef1edf01b 823 "sall %%edx \n\t" // move high bit to CF
destinyXfate 2:0e2ef1edf01b 824 "jnc skip16 \n\t" // if CF = 0
destinyXfate 2:0e2ef1edf01b 825 "movw (%%esi), %%ax \n\t"
destinyXfate 2:0e2ef1edf01b 826 "movw %%ax, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 827
destinyXfate 2:0e2ef1edf01b 828 "skip16: \n\t"
destinyXfate 2:0e2ef1edf01b 829 "addl $2, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 830 "addl $2, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 831 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 832 "jnz secondloop16 \n\t"
destinyXfate 2:0e2ef1edf01b 833
destinyXfate 2:0e2ef1edf01b 834 "end16: \n\t"
destinyXfate 2:0e2ef1edf01b 835 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 836
destinyXfate 2:0e2ef1edf01b 837 : "=a" (dummy_value_a), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 838 "=c" (dummy_value_c),
destinyXfate 2:0e2ef1edf01b 839 "=d" (dummy_value_d),
destinyXfate 2:0e2ef1edf01b 840 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 841 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 842
destinyXfate 2:0e2ef1edf01b 843 : "0" (diff), // eax // input regs
destinyXfate 2:0e2ef1edf01b 844 // was (unmask) " " RESERVED // ebx // Global Offset Table idx
destinyXfate 2:0e2ef1edf01b 845 "1" (len), // ecx
destinyXfate 2:0e2ef1edf01b 846 "2" (mask), // edx
destinyXfate 2:0e2ef1edf01b 847 "3" (srcptr), // esi
destinyXfate 2:0e2ef1edf01b 848 "4" (dstptr) // edi
destinyXfate 2:0e2ef1edf01b 849
destinyXfate 2:0e2ef1edf01b 850 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 851 : "%mm0", "%mm1", "%mm4" // clobber list
destinyXfate 2:0e2ef1edf01b 852 , "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 853 #endif
destinyXfate 2:0e2ef1edf01b 854 );
destinyXfate 2:0e2ef1edf01b 855 }
destinyXfate 2:0e2ef1edf01b 856 else /* mmx _not supported - Use modified C routine */
destinyXfate 2:0e2ef1edf01b 857 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 858 {
destinyXfate 2:0e2ef1edf01b 859 register png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 860 png_uint_32 initial_val = BPP2 * png_pass_start[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 861 /* png.c: png_pass_start[] = {0, 4, 0, 2, 0, 1, 0}; */
destinyXfate 2:0e2ef1edf01b 862 register int stride = BPP2 * png_pass_inc[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 863 /* png.c: png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1}; */
destinyXfate 2:0e2ef1edf01b 864 register int rep_bytes = BPP2 * png_pass_width[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 865 /* png.c: png_pass_width[] = {8, 4, 4, 2, 2, 1, 1}; */
destinyXfate 2:0e2ef1edf01b 866 png_uint_32 len = png_ptr->width &~7; /* reduce to mult. of 8 */
destinyXfate 2:0e2ef1edf01b 867 int diff = (int) (png_ptr->width & 7); /* amount lost */
destinyXfate 2:0e2ef1edf01b 868 register png_uint_32 final_val = BPP2 * len; /* GRR bugfix */
destinyXfate 2:0e2ef1edf01b 869
destinyXfate 2:0e2ef1edf01b 870 srcptr = png_ptr->row_buf + 1 + initial_val;
destinyXfate 2:0e2ef1edf01b 871 dstptr = row + initial_val;
destinyXfate 2:0e2ef1edf01b 872
destinyXfate 2:0e2ef1edf01b 873 for (i = initial_val; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 874 {
destinyXfate 2:0e2ef1edf01b 875 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 876 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 877 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 878 }
destinyXfate 2:0e2ef1edf01b 879 if (diff) /* number of leftover pixels: 3 for pngtest */
destinyXfate 2:0e2ef1edf01b 880 {
destinyXfate 2:0e2ef1edf01b 881 final_val+=diff*BPP2;
destinyXfate 2:0e2ef1edf01b 882 for (; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 883 {
destinyXfate 2:0e2ef1edf01b 884 if (rep_bytes > (int)(final_val-i))
destinyXfate 2:0e2ef1edf01b 885 rep_bytes = (int)(final_val-i);
destinyXfate 2:0e2ef1edf01b 886 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 887 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 888 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 889 }
destinyXfate 2:0e2ef1edf01b 890 }
destinyXfate 2:0e2ef1edf01b 891 } /* end of else (_mmx_supported) */
destinyXfate 2:0e2ef1edf01b 892
destinyXfate 2:0e2ef1edf01b 893 break;
destinyXfate 2:0e2ef1edf01b 894 } /* end 16 bpp */
destinyXfate 2:0e2ef1edf01b 895
destinyXfate 2:0e2ef1edf01b 896 case 24: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 897 {
destinyXfate 2:0e2ef1edf01b 898 png_bytep srcptr;
destinyXfate 2:0e2ef1edf01b 899 png_bytep dstptr;
destinyXfate 2:0e2ef1edf01b 900
destinyXfate 2:0e2ef1edf01b 901 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 902 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 903 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_COMBINE_ROW)
destinyXfate 2:0e2ef1edf01b 904 /* && _mmx_supported */ )
destinyXfate 2:0e2ef1edf01b 905 #else
destinyXfate 2:0e2ef1edf01b 906 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 907 #endif
destinyXfate 2:0e2ef1edf01b 908 {
destinyXfate 2:0e2ef1edf01b 909 png_uint_32 len;
destinyXfate 2:0e2ef1edf01b 910 int diff;
destinyXfate 2:0e2ef1edf01b 911 int dummy_value_a; // fix 'forbidden register spilled' error
destinyXfate 2:0e2ef1edf01b 912 int dummy_value_d;
destinyXfate 2:0e2ef1edf01b 913 int dummy_value_c;
destinyXfate 2:0e2ef1edf01b 914 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 915 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 916 _unmask = ~mask; // global variable for -fPIC version
destinyXfate 2:0e2ef1edf01b 917 srcptr = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 918 dstptr = row;
destinyXfate 2:0e2ef1edf01b 919 len = png_ptr->width &~7; // reduce to multiple of 8
destinyXfate 2:0e2ef1edf01b 920 diff = (int) (png_ptr->width & 7); // amount lost //
destinyXfate 2:0e2ef1edf01b 921
destinyXfate 2:0e2ef1edf01b 922 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 923 "movd _unmask, %%mm7 \n\t" // load bit pattern
destinyXfate 2:0e2ef1edf01b 924 "psubb %%mm6, %%mm6 \n\t" // zero mm6
destinyXfate 2:0e2ef1edf01b 925 "punpcklbw %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 926 "punpcklwd %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 927 "punpckldq %%mm7, %%mm7 \n\t" // fill reg with 8 masks
destinyXfate 2:0e2ef1edf01b 928
destinyXfate 2:0e2ef1edf01b 929 "movq _mask24_0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 930 "movq _mask24_1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 931 "movq _mask24_2, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 932
destinyXfate 2:0e2ef1edf01b 933 "pand %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 934 "pand %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 935 "pand %%mm7, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 936
destinyXfate 2:0e2ef1edf01b 937 "pcmpeqb %%mm6, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 938 "pcmpeqb %%mm6, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 939 "pcmpeqb %%mm6, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 940
destinyXfate 2:0e2ef1edf01b 941 // preload "movl len, %%ecx \n\t" // load length of line
destinyXfate 2:0e2ef1edf01b 942 // preload "movl srcptr, %%esi \n\t" // load source
destinyXfate 2:0e2ef1edf01b 943 // preload "movl dstptr, %%edi \n\t" // load dest
destinyXfate 2:0e2ef1edf01b 944
destinyXfate 2:0e2ef1edf01b 945 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 946 "jz mainloop24end \n\t"
destinyXfate 2:0e2ef1edf01b 947
destinyXfate 2:0e2ef1edf01b 948 "mainloop24: \n\t"
destinyXfate 2:0e2ef1edf01b 949 "movq (%%esi), %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 950 "pand %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 951 "movq %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 952 "movq (%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 953 "pandn %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 954 "por %%mm6, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 955 "movq %%mm4, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 956
destinyXfate 2:0e2ef1edf01b 957 "movq 8(%%esi), %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 958 "pand %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 959 "movq %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 960 "movq 8(%%edi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 961 "pandn %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 962 "por %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 963 "movq %%mm5, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 964
destinyXfate 2:0e2ef1edf01b 965 "movq 16(%%esi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 966 "pand %%mm2, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 967 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 968 "movq 16(%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 969 "pandn %%mm7, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 970 "por %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 971 "movq %%mm6, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 972
destinyXfate 2:0e2ef1edf01b 973 "addl $24, %%esi \n\t" // inc by 24 bytes processed
destinyXfate 2:0e2ef1edf01b 974 "addl $24, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 975 "subl $8, %%ecx \n\t" // dec by 8 pixels processed
destinyXfate 2:0e2ef1edf01b 976
destinyXfate 2:0e2ef1edf01b 977 "ja mainloop24 \n\t"
destinyXfate 2:0e2ef1edf01b 978
destinyXfate 2:0e2ef1edf01b 979 "mainloop24end: \n\t"
destinyXfate 2:0e2ef1edf01b 980 // preload "movl diff, %%ecx \n\t" // (diff is in eax)
destinyXfate 2:0e2ef1edf01b 981 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 982 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 983 "jz end24 \n\t"
destinyXfate 2:0e2ef1edf01b 984 // preload "movl mask, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 985 "sall $24, %%edx \n\t" // make low byte, high byte
destinyXfate 2:0e2ef1edf01b 986
destinyXfate 2:0e2ef1edf01b 987 "secondloop24: \n\t"
destinyXfate 2:0e2ef1edf01b 988 "sall %%edx \n\t" // move high bit to CF
destinyXfate 2:0e2ef1edf01b 989 "jnc skip24 \n\t" // if CF = 0
destinyXfate 2:0e2ef1edf01b 990 "movw (%%esi), %%ax \n\t"
destinyXfate 2:0e2ef1edf01b 991 "movw %%ax, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 992 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 993 "movb 2(%%esi), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 994 "movb %%al, 2(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 995
destinyXfate 2:0e2ef1edf01b 996 "skip24: \n\t"
destinyXfate 2:0e2ef1edf01b 997 "addl $3, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 998 "addl $3, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 999 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1000 "jnz secondloop24 \n\t"
destinyXfate 2:0e2ef1edf01b 1001
destinyXfate 2:0e2ef1edf01b 1002 "end24: \n\t"
destinyXfate 2:0e2ef1edf01b 1003 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1004
destinyXfate 2:0e2ef1edf01b 1005 : "=a" (dummy_value_a), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1006 "=d" (dummy_value_d),
destinyXfate 2:0e2ef1edf01b 1007 "=c" (dummy_value_c),
destinyXfate 2:0e2ef1edf01b 1008 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1009 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 1010
destinyXfate 2:0e2ef1edf01b 1011 : "3" (srcptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1012 "4" (dstptr), // edi
destinyXfate 2:0e2ef1edf01b 1013 "0" (diff), // eax
destinyXfate 2:0e2ef1edf01b 1014 // was (unmask) "b" RESERVED // ebx // Global Offset Table idx
destinyXfate 2:0e2ef1edf01b 1015 "2" (len), // ecx
destinyXfate 2:0e2ef1edf01b 1016 "1" (mask) // edx
destinyXfate 2:0e2ef1edf01b 1017
destinyXfate 2:0e2ef1edf01b 1018 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1019 : "%mm0", "%mm1", "%mm2" // clobber list
destinyXfate 2:0e2ef1edf01b 1020 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 1021 #endif
destinyXfate 2:0e2ef1edf01b 1022 );
destinyXfate 2:0e2ef1edf01b 1023 }
destinyXfate 2:0e2ef1edf01b 1024 else /* mmx _not supported - Use modified C routine */
destinyXfate 2:0e2ef1edf01b 1025 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 1026 {
destinyXfate 2:0e2ef1edf01b 1027 register png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1028 png_uint_32 initial_val = BPP3 * png_pass_start[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1029 /* png.c: png_pass_start[] = {0, 4, 0, 2, 0, 1, 0}; */
destinyXfate 2:0e2ef1edf01b 1030 register int stride = BPP3 * png_pass_inc[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1031 /* png.c: png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1}; */
destinyXfate 2:0e2ef1edf01b 1032 register int rep_bytes = BPP3 * png_pass_width[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1033 /* png.c: png_pass_width[] = {8, 4, 4, 2, 2, 1, 1}; */
destinyXfate 2:0e2ef1edf01b 1034 png_uint_32 len = png_ptr->width &~7; /* reduce to mult. of 8 */
destinyXfate 2:0e2ef1edf01b 1035 int diff = (int) (png_ptr->width & 7); /* amount lost */
destinyXfate 2:0e2ef1edf01b 1036 register png_uint_32 final_val = BPP3 * len; /* GRR bugfix */
destinyXfate 2:0e2ef1edf01b 1037
destinyXfate 2:0e2ef1edf01b 1038 srcptr = png_ptr->row_buf + 1 + initial_val;
destinyXfate 2:0e2ef1edf01b 1039 dstptr = row + initial_val;
destinyXfate 2:0e2ef1edf01b 1040
destinyXfate 2:0e2ef1edf01b 1041 for (i = initial_val; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1042 {
destinyXfate 2:0e2ef1edf01b 1043 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1044 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1045 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1046 }
destinyXfate 2:0e2ef1edf01b 1047 if (diff) /* number of leftover pixels: 3 for pngtest */
destinyXfate 2:0e2ef1edf01b 1048 {
destinyXfate 2:0e2ef1edf01b 1049 final_val+=diff*BPP3;
destinyXfate 2:0e2ef1edf01b 1050 for (; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1051 {
destinyXfate 2:0e2ef1edf01b 1052 if (rep_bytes > (int)(final_val-i))
destinyXfate 2:0e2ef1edf01b 1053 rep_bytes = (int)(final_val-i);
destinyXfate 2:0e2ef1edf01b 1054 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1055 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1056 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1057 }
destinyXfate 2:0e2ef1edf01b 1058 }
destinyXfate 2:0e2ef1edf01b 1059 } /* end of else (_mmx_supported) */
destinyXfate 2:0e2ef1edf01b 1060
destinyXfate 2:0e2ef1edf01b 1061 break;
destinyXfate 2:0e2ef1edf01b 1062 } /* end 24 bpp */
destinyXfate 2:0e2ef1edf01b 1063
destinyXfate 2:0e2ef1edf01b 1064 case 32: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 1065 {
destinyXfate 2:0e2ef1edf01b 1066 png_bytep srcptr;
destinyXfate 2:0e2ef1edf01b 1067 png_bytep dstptr;
destinyXfate 2:0e2ef1edf01b 1068
destinyXfate 2:0e2ef1edf01b 1069 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 1070 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 1071 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_COMBINE_ROW)
destinyXfate 2:0e2ef1edf01b 1072 /* && _mmx_supported */ )
destinyXfate 2:0e2ef1edf01b 1073 #else
destinyXfate 2:0e2ef1edf01b 1074 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 1075 #endif
destinyXfate 2:0e2ef1edf01b 1076 {
destinyXfate 2:0e2ef1edf01b 1077 png_uint_32 len;
destinyXfate 2:0e2ef1edf01b 1078 int diff;
destinyXfate 2:0e2ef1edf01b 1079 int dummy_value_a; // fix 'forbidden register spilled' error
destinyXfate 2:0e2ef1edf01b 1080 int dummy_value_d;
destinyXfate 2:0e2ef1edf01b 1081 int dummy_value_c;
destinyXfate 2:0e2ef1edf01b 1082 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 1083 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 1084 _unmask = ~mask; // global variable for -fPIC version
destinyXfate 2:0e2ef1edf01b 1085 srcptr = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 1086 dstptr = row;
destinyXfate 2:0e2ef1edf01b 1087 len = png_ptr->width &~7; // reduce to multiple of 8
destinyXfate 2:0e2ef1edf01b 1088 diff = (int) (png_ptr->width & 7); // amount lost //
destinyXfate 2:0e2ef1edf01b 1089
destinyXfate 2:0e2ef1edf01b 1090 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 1091 "movd _unmask, %%mm7 \n\t" // load bit pattern
destinyXfate 2:0e2ef1edf01b 1092 "psubb %%mm6, %%mm6 \n\t" // zero mm6
destinyXfate 2:0e2ef1edf01b 1093 "punpcklbw %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1094 "punpcklwd %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1095 "punpckldq %%mm7, %%mm7 \n\t" // fill reg with 8 masks
destinyXfate 2:0e2ef1edf01b 1096
destinyXfate 2:0e2ef1edf01b 1097 "movq _mask32_0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 1098 "movq _mask32_1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 1099 "movq _mask32_2, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 1100 "movq _mask32_3, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 1101
destinyXfate 2:0e2ef1edf01b 1102 "pand %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 1103 "pand %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 1104 "pand %%mm7, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 1105 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 1106
destinyXfate 2:0e2ef1edf01b 1107 "pcmpeqb %%mm6, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 1108 "pcmpeqb %%mm6, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 1109 "pcmpeqb %%mm6, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 1110 "pcmpeqb %%mm6, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 1111
destinyXfate 2:0e2ef1edf01b 1112 // preload "movl len, %%ecx \n\t" // load length of line
destinyXfate 2:0e2ef1edf01b 1113 // preload "movl srcptr, %%esi \n\t" // load source
destinyXfate 2:0e2ef1edf01b 1114 // preload "movl dstptr, %%edi \n\t" // load dest
destinyXfate 2:0e2ef1edf01b 1115
destinyXfate 2:0e2ef1edf01b 1116 "cmpl $0, %%ecx \n\t" // lcr
destinyXfate 2:0e2ef1edf01b 1117 "jz mainloop32end \n\t"
destinyXfate 2:0e2ef1edf01b 1118
destinyXfate 2:0e2ef1edf01b 1119 "mainloop32: \n\t"
destinyXfate 2:0e2ef1edf01b 1120 "movq (%%esi), %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1121 "pand %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1122 "movq %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1123 "movq (%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1124 "pandn %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1125 "por %%mm6, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1126 "movq %%mm4, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1127
destinyXfate 2:0e2ef1edf01b 1128 "movq 8(%%esi), %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1129 "pand %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1130 "movq %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1131 "movq 8(%%edi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1132 "pandn %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1133 "por %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1134 "movq %%mm5, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1135
destinyXfate 2:0e2ef1edf01b 1136 "movq 16(%%esi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1137 "pand %%mm2, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1138 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1139 "movq 16(%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1140 "pandn %%mm7, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1141 "por %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1142 "movq %%mm6, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1143
destinyXfate 2:0e2ef1edf01b 1144 "movq 24(%%esi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1145 "pand %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1146 "movq %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1147 "movq 24(%%edi), %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1148 "pandn %%mm4, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1149 "por %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1150 "movq %%mm7, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1151
destinyXfate 2:0e2ef1edf01b 1152 "addl $32, %%esi \n\t" // inc by 32 bytes processed
destinyXfate 2:0e2ef1edf01b 1153 "addl $32, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1154 "subl $8, %%ecx \n\t" // dec by 8 pixels processed
destinyXfate 2:0e2ef1edf01b 1155 "ja mainloop32 \n\t"
destinyXfate 2:0e2ef1edf01b 1156
destinyXfate 2:0e2ef1edf01b 1157 "mainloop32end: \n\t"
destinyXfate 2:0e2ef1edf01b 1158 // preload "movl diff, %%ecx \n\t" // (diff is in eax)
destinyXfate 2:0e2ef1edf01b 1159 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1160 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1161 "jz end32 \n\t"
destinyXfate 2:0e2ef1edf01b 1162 // preload "movl mask, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 1163 "sall $24, %%edx \n\t" // low byte => high byte
destinyXfate 2:0e2ef1edf01b 1164
destinyXfate 2:0e2ef1edf01b 1165 "secondloop32: \n\t"
destinyXfate 2:0e2ef1edf01b 1166 "sall %%edx \n\t" // move high bit to CF
destinyXfate 2:0e2ef1edf01b 1167 "jnc skip32 \n\t" // if CF = 0
destinyXfate 2:0e2ef1edf01b 1168 "movl (%%esi), %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 1169 "movl %%eax, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1170
destinyXfate 2:0e2ef1edf01b 1171 "skip32: \n\t"
destinyXfate 2:0e2ef1edf01b 1172 "addl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1173 "addl $4, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1174 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1175 "jnz secondloop32 \n\t"
destinyXfate 2:0e2ef1edf01b 1176
destinyXfate 2:0e2ef1edf01b 1177 "end32: \n\t"
destinyXfate 2:0e2ef1edf01b 1178 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1179
destinyXfate 2:0e2ef1edf01b 1180 : "=a" (dummy_value_a), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1181 "=d" (dummy_value_d),
destinyXfate 2:0e2ef1edf01b 1182 "=c" (dummy_value_c),
destinyXfate 2:0e2ef1edf01b 1183 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1184 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 1185
destinyXfate 2:0e2ef1edf01b 1186 : "3" (srcptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1187 "4" (dstptr), // edi
destinyXfate 2:0e2ef1edf01b 1188 "0" (diff), // eax
destinyXfate 2:0e2ef1edf01b 1189 // was (unmask) "b" RESERVED // ebx // Global Offset Table idx
destinyXfate 2:0e2ef1edf01b 1190 "2" (len), // ecx
destinyXfate 2:0e2ef1edf01b 1191 "1" (mask) // edx
destinyXfate 2:0e2ef1edf01b 1192
destinyXfate 2:0e2ef1edf01b 1193 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1194 : "%mm0", "%mm1", "%mm2", "%mm3" // clobber list
destinyXfate 2:0e2ef1edf01b 1195 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 1196 #endif
destinyXfate 2:0e2ef1edf01b 1197 );
destinyXfate 2:0e2ef1edf01b 1198 }
destinyXfate 2:0e2ef1edf01b 1199 else /* mmx _not supported - Use modified C routine */
destinyXfate 2:0e2ef1edf01b 1200 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 1201 {
destinyXfate 2:0e2ef1edf01b 1202 register png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1203 png_uint_32 initial_val = BPP4 * png_pass_start[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1204 /* png.c: png_pass_start[] = {0, 4, 0, 2, 0, 1, 0}; */
destinyXfate 2:0e2ef1edf01b 1205 register int stride = BPP4 * png_pass_inc[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1206 /* png.c: png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1}; */
destinyXfate 2:0e2ef1edf01b 1207 register int rep_bytes = BPP4 * png_pass_width[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1208 /* png.c: png_pass_width[] = {8, 4, 4, 2, 2, 1, 1}; */
destinyXfate 2:0e2ef1edf01b 1209 png_uint_32 len = png_ptr->width &~7; /* reduce to mult. of 8 */
destinyXfate 2:0e2ef1edf01b 1210 int diff = (int) (png_ptr->width & 7); /* amount lost */
destinyXfate 2:0e2ef1edf01b 1211 register png_uint_32 final_val = BPP4 * len; /* GRR bugfix */
destinyXfate 2:0e2ef1edf01b 1212
destinyXfate 2:0e2ef1edf01b 1213 srcptr = png_ptr->row_buf + 1 + initial_val;
destinyXfate 2:0e2ef1edf01b 1214 dstptr = row + initial_val;
destinyXfate 2:0e2ef1edf01b 1215
destinyXfate 2:0e2ef1edf01b 1216 for (i = initial_val; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1217 {
destinyXfate 2:0e2ef1edf01b 1218 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1219 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1220 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1221 }
destinyXfate 2:0e2ef1edf01b 1222 if (diff) /* number of leftover pixels: 3 for pngtest */
destinyXfate 2:0e2ef1edf01b 1223 {
destinyXfate 2:0e2ef1edf01b 1224 final_val+=diff*BPP4;
destinyXfate 2:0e2ef1edf01b 1225 for (; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1226 {
destinyXfate 2:0e2ef1edf01b 1227 if (rep_bytes > (int)(final_val-i))
destinyXfate 2:0e2ef1edf01b 1228 rep_bytes = (int)(final_val-i);
destinyXfate 2:0e2ef1edf01b 1229 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1230 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1231 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1232 }
destinyXfate 2:0e2ef1edf01b 1233 }
destinyXfate 2:0e2ef1edf01b 1234 } /* end of else (_mmx_supported) */
destinyXfate 2:0e2ef1edf01b 1235
destinyXfate 2:0e2ef1edf01b 1236 break;
destinyXfate 2:0e2ef1edf01b 1237 } /* end 32 bpp */
destinyXfate 2:0e2ef1edf01b 1238
destinyXfate 2:0e2ef1edf01b 1239 case 48: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 1240 {
destinyXfate 2:0e2ef1edf01b 1241 png_bytep srcptr;
destinyXfate 2:0e2ef1edf01b 1242 png_bytep dstptr;
destinyXfate 2:0e2ef1edf01b 1243
destinyXfate 2:0e2ef1edf01b 1244 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 1245 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 1246 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_COMBINE_ROW)
destinyXfate 2:0e2ef1edf01b 1247 /* && _mmx_supported */ )
destinyXfate 2:0e2ef1edf01b 1248 #else
destinyXfate 2:0e2ef1edf01b 1249 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 1250 #endif
destinyXfate 2:0e2ef1edf01b 1251 {
destinyXfate 2:0e2ef1edf01b 1252 png_uint_32 len;
destinyXfate 2:0e2ef1edf01b 1253 int diff;
destinyXfate 2:0e2ef1edf01b 1254 int dummy_value_a; // fix 'forbidden register spilled' error
destinyXfate 2:0e2ef1edf01b 1255 int dummy_value_d;
destinyXfate 2:0e2ef1edf01b 1256 int dummy_value_c;
destinyXfate 2:0e2ef1edf01b 1257 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 1258 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 1259 _unmask = ~mask; // global variable for -fPIC version
destinyXfate 2:0e2ef1edf01b 1260 srcptr = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 1261 dstptr = row;
destinyXfate 2:0e2ef1edf01b 1262 len = png_ptr->width &~7; // reduce to multiple of 8
destinyXfate 2:0e2ef1edf01b 1263 diff = (int) (png_ptr->width & 7); // amount lost //
destinyXfate 2:0e2ef1edf01b 1264
destinyXfate 2:0e2ef1edf01b 1265 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 1266 "movd _unmask, %%mm7 \n\t" // load bit pattern
destinyXfate 2:0e2ef1edf01b 1267 "psubb %%mm6, %%mm6 \n\t" // zero mm6
destinyXfate 2:0e2ef1edf01b 1268 "punpcklbw %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1269 "punpcklwd %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1270 "punpckldq %%mm7, %%mm7 \n\t" // fill reg with 8 masks
destinyXfate 2:0e2ef1edf01b 1271
destinyXfate 2:0e2ef1edf01b 1272 "movq _mask48_0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 1273 "movq _mask48_1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 1274 "movq _mask48_2, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 1275 "movq _mask48_3, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 1276 "movq _mask48_4, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1277 "movq _mask48_5, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1278
destinyXfate 2:0e2ef1edf01b 1279 "pand %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 1280 "pand %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 1281 "pand %%mm7, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 1282 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 1283 "pand %%mm7, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1284 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1285
destinyXfate 2:0e2ef1edf01b 1286 "pcmpeqb %%mm6, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 1287 "pcmpeqb %%mm6, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 1288 "pcmpeqb %%mm6, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 1289 "pcmpeqb %%mm6, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 1290 "pcmpeqb %%mm6, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 1291 "pcmpeqb %%mm6, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 1292
destinyXfate 2:0e2ef1edf01b 1293 // preload "movl len, %%ecx \n\t" // load length of line
destinyXfate 2:0e2ef1edf01b 1294 // preload "movl srcptr, %%esi \n\t" // load source
destinyXfate 2:0e2ef1edf01b 1295 // preload "movl dstptr, %%edi \n\t" // load dest
destinyXfate 2:0e2ef1edf01b 1296
destinyXfate 2:0e2ef1edf01b 1297 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1298 "jz mainloop48end \n\t"
destinyXfate 2:0e2ef1edf01b 1299
destinyXfate 2:0e2ef1edf01b 1300 "mainloop48: \n\t"
destinyXfate 2:0e2ef1edf01b 1301 "movq (%%esi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1302 "pand %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1303 "movq %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1304 "pandn (%%edi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1305 "por %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1306 "movq %%mm7, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1307
destinyXfate 2:0e2ef1edf01b 1308 "movq 8(%%esi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1309 "pand %%mm1, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1310 "movq %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1311 "pandn 8(%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1312 "por %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1313 "movq %%mm6, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1314
destinyXfate 2:0e2ef1edf01b 1315 "movq 16(%%esi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1316 "pand %%mm2, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1317 "movq %%mm2, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1318 "pandn 16(%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1319 "por %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1320 "movq %%mm6, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1321
destinyXfate 2:0e2ef1edf01b 1322 "movq 24(%%esi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1323 "pand %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1324 "movq %%mm3, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1325 "pandn 24(%%edi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1326 "por %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1327 "movq %%mm7, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1328
destinyXfate 2:0e2ef1edf01b 1329 "movq 32(%%esi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1330 "pand %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1331 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1332 "pandn 32(%%edi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1333 "por %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1334 "movq %%mm6, 32(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1335
destinyXfate 2:0e2ef1edf01b 1336 "movq 40(%%esi), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1337 "pand %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1338 "movq %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1339 "pandn 40(%%edi), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 1340 "por %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 1341 "movq %%mm7, 40(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1342
destinyXfate 2:0e2ef1edf01b 1343 "addl $48, %%esi \n\t" // inc by 48 bytes processed
destinyXfate 2:0e2ef1edf01b 1344 "addl $48, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1345 "subl $8, %%ecx \n\t" // dec by 8 pixels processed
destinyXfate 2:0e2ef1edf01b 1346
destinyXfate 2:0e2ef1edf01b 1347 "ja mainloop48 \n\t"
destinyXfate 2:0e2ef1edf01b 1348
destinyXfate 2:0e2ef1edf01b 1349 "mainloop48end: \n\t"
destinyXfate 2:0e2ef1edf01b 1350 // preload "movl diff, %%ecx \n\t" // (diff is in eax)
destinyXfate 2:0e2ef1edf01b 1351 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1352 "cmpl $0, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1353 "jz end48 \n\t"
destinyXfate 2:0e2ef1edf01b 1354 // preload "movl mask, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 1355 "sall $24, %%edx \n\t" // make low byte, high byte
destinyXfate 2:0e2ef1edf01b 1356
destinyXfate 2:0e2ef1edf01b 1357 "secondloop48: \n\t"
destinyXfate 2:0e2ef1edf01b 1358 "sall %%edx \n\t" // move high bit to CF
destinyXfate 2:0e2ef1edf01b 1359 "jnc skip48 \n\t" // if CF = 0
destinyXfate 2:0e2ef1edf01b 1360 "movl (%%esi), %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 1361 "movl %%eax, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1362
destinyXfate 2:0e2ef1edf01b 1363 "skip48: \n\t"
destinyXfate 2:0e2ef1edf01b 1364 "addl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1365 "addl $4, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1366 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1367 "jnz secondloop48 \n\t"
destinyXfate 2:0e2ef1edf01b 1368
destinyXfate 2:0e2ef1edf01b 1369 "end48: \n\t"
destinyXfate 2:0e2ef1edf01b 1370 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1371
destinyXfate 2:0e2ef1edf01b 1372 : "=a" (dummy_value_a), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1373 "=d" (dummy_value_d),
destinyXfate 2:0e2ef1edf01b 1374 "=c" (dummy_value_c),
destinyXfate 2:0e2ef1edf01b 1375 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1376 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 1377
destinyXfate 2:0e2ef1edf01b 1378 : "3" (srcptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1379 "4" (dstptr), // edi
destinyXfate 2:0e2ef1edf01b 1380 "0" (diff), // eax
destinyXfate 2:0e2ef1edf01b 1381 // was (unmask) "b" RESERVED // ebx // Global Offset Table idx
destinyXfate 2:0e2ef1edf01b 1382 "2" (len), // ecx
destinyXfate 2:0e2ef1edf01b 1383 "1" (mask) // edx
destinyXfate 2:0e2ef1edf01b 1384
destinyXfate 2:0e2ef1edf01b 1385 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1386 : "%mm0", "%mm1", "%mm2", "%mm3" // clobber list
destinyXfate 2:0e2ef1edf01b 1387 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 1388 #endif
destinyXfate 2:0e2ef1edf01b 1389 );
destinyXfate 2:0e2ef1edf01b 1390 }
destinyXfate 2:0e2ef1edf01b 1391 else /* mmx _not supported - Use modified C routine */
destinyXfate 2:0e2ef1edf01b 1392 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 1393 {
destinyXfate 2:0e2ef1edf01b 1394 register png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1395 png_uint_32 initial_val = BPP6 * png_pass_start[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1396 /* png.c: png_pass_start[] = {0, 4, 0, 2, 0, 1, 0}; */
destinyXfate 2:0e2ef1edf01b 1397 register int stride = BPP6 * png_pass_inc[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1398 /* png.c: png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1}; */
destinyXfate 2:0e2ef1edf01b 1399 register int rep_bytes = BPP6 * png_pass_width[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1400 /* png.c: png_pass_width[] = {8, 4, 4, 2, 2, 1, 1}; */
destinyXfate 2:0e2ef1edf01b 1401 png_uint_32 len = png_ptr->width &~7; /* reduce to mult. of 8 */
destinyXfate 2:0e2ef1edf01b 1402 int diff = (int) (png_ptr->width & 7); /* amount lost */
destinyXfate 2:0e2ef1edf01b 1403 register png_uint_32 final_val = BPP6 * len; /* GRR bugfix */
destinyXfate 2:0e2ef1edf01b 1404
destinyXfate 2:0e2ef1edf01b 1405 srcptr = png_ptr->row_buf + 1 + initial_val;
destinyXfate 2:0e2ef1edf01b 1406 dstptr = row + initial_val;
destinyXfate 2:0e2ef1edf01b 1407
destinyXfate 2:0e2ef1edf01b 1408 for (i = initial_val; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1409 {
destinyXfate 2:0e2ef1edf01b 1410 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1411 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1412 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1413 }
destinyXfate 2:0e2ef1edf01b 1414 if (diff) /* number of leftover pixels: 3 for pngtest */
destinyXfate 2:0e2ef1edf01b 1415 {
destinyXfate 2:0e2ef1edf01b 1416 final_val+=diff*BPP6;
destinyXfate 2:0e2ef1edf01b 1417 for (; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1418 {
destinyXfate 2:0e2ef1edf01b 1419 if (rep_bytes > (int)(final_val-i))
destinyXfate 2:0e2ef1edf01b 1420 rep_bytes = (int)(final_val-i);
destinyXfate 2:0e2ef1edf01b 1421 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1422 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1423 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1424 }
destinyXfate 2:0e2ef1edf01b 1425 }
destinyXfate 2:0e2ef1edf01b 1426 } /* end of else (_mmx_supported) */
destinyXfate 2:0e2ef1edf01b 1427
destinyXfate 2:0e2ef1edf01b 1428 break;
destinyXfate 2:0e2ef1edf01b 1429 } /* end 48 bpp */
destinyXfate 2:0e2ef1edf01b 1430
destinyXfate 2:0e2ef1edf01b 1431 case 64: /* png_ptr->row_info.pixel_depth */
destinyXfate 2:0e2ef1edf01b 1432 {
destinyXfate 2:0e2ef1edf01b 1433 png_bytep srcptr;
destinyXfate 2:0e2ef1edf01b 1434 png_bytep dstptr;
destinyXfate 2:0e2ef1edf01b 1435 register png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1436 png_uint_32 initial_val = BPP8 * png_pass_start[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1437 /* png.c: png_pass_start[] = {0, 4, 0, 2, 0, 1, 0}; */
destinyXfate 2:0e2ef1edf01b 1438 register int stride = BPP8 * png_pass_inc[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1439 /* png.c: png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1}; */
destinyXfate 2:0e2ef1edf01b 1440 register int rep_bytes = BPP8 * png_pass_width[png_ptr->pass];
destinyXfate 2:0e2ef1edf01b 1441 /* png.c: png_pass_width[] = {8, 4, 4, 2, 2, 1, 1}; */
destinyXfate 2:0e2ef1edf01b 1442 png_uint_32 len = png_ptr->width &~7; /* reduce to mult. of 8 */
destinyXfate 2:0e2ef1edf01b 1443 int diff = (int) (png_ptr->width & 7); /* amount lost */
destinyXfate 2:0e2ef1edf01b 1444 register png_uint_32 final_val = BPP8 * len; /* GRR bugfix */
destinyXfate 2:0e2ef1edf01b 1445
destinyXfate 2:0e2ef1edf01b 1446 srcptr = png_ptr->row_buf + 1 + initial_val;
destinyXfate 2:0e2ef1edf01b 1447 dstptr = row + initial_val;
destinyXfate 2:0e2ef1edf01b 1448
destinyXfate 2:0e2ef1edf01b 1449 for (i = initial_val; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1450 {
destinyXfate 2:0e2ef1edf01b 1451 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1452 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1453 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1454 }
destinyXfate 2:0e2ef1edf01b 1455 if (diff) /* number of leftover pixels: 3 for pngtest */
destinyXfate 2:0e2ef1edf01b 1456 {
destinyXfate 2:0e2ef1edf01b 1457 final_val+=diff*BPP8;
destinyXfate 2:0e2ef1edf01b 1458 for (; i < final_val; i += stride)
destinyXfate 2:0e2ef1edf01b 1459 {
destinyXfate 2:0e2ef1edf01b 1460 if (rep_bytes > (int)(final_val-i))
destinyXfate 2:0e2ef1edf01b 1461 rep_bytes = (int)(final_val-i);
destinyXfate 2:0e2ef1edf01b 1462 png_memcpy(dstptr, srcptr, rep_bytes);
destinyXfate 2:0e2ef1edf01b 1463 srcptr += stride;
destinyXfate 2:0e2ef1edf01b 1464 dstptr += stride;
destinyXfate 2:0e2ef1edf01b 1465 }
destinyXfate 2:0e2ef1edf01b 1466 }
destinyXfate 2:0e2ef1edf01b 1467
destinyXfate 2:0e2ef1edf01b 1468 break;
destinyXfate 2:0e2ef1edf01b 1469 } /* end 64 bpp */
destinyXfate 2:0e2ef1edf01b 1470
destinyXfate 2:0e2ef1edf01b 1471 default: /* png_ptr->row_info.pixel_depth != 1,2,4,8,16,24,32,48,64 */
destinyXfate 2:0e2ef1edf01b 1472 {
destinyXfate 2:0e2ef1edf01b 1473 /* this should never happen */
destinyXfate 2:0e2ef1edf01b 1474 png_warning(png_ptr, "Invalid row_info.pixel_depth in pnggccrd");
destinyXfate 2:0e2ef1edf01b 1475 break;
destinyXfate 2:0e2ef1edf01b 1476 }
destinyXfate 2:0e2ef1edf01b 1477 } /* end switch (png_ptr->row_info.pixel_depth) */
destinyXfate 2:0e2ef1edf01b 1478
destinyXfate 2:0e2ef1edf01b 1479 } /* end if (non-trivial mask) */
destinyXfate 2:0e2ef1edf01b 1480
destinyXfate 2:0e2ef1edf01b 1481 } /* end png_combine_row() */
destinyXfate 2:0e2ef1edf01b 1482
destinyXfate 2:0e2ef1edf01b 1483 #endif /* PNG_HAVE_MMX_COMBINE_ROW */
destinyXfate 2:0e2ef1edf01b 1484
destinyXfate 2:0e2ef1edf01b 1485
destinyXfate 2:0e2ef1edf01b 1486
destinyXfate 2:0e2ef1edf01b 1487
destinyXfate 2:0e2ef1edf01b 1488 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 1489 /* */
destinyXfate 2:0e2ef1edf01b 1490 /* P N G _ D O _ R E A D _ I N T E R L A C E */
destinyXfate 2:0e2ef1edf01b 1491 /* */
destinyXfate 2:0e2ef1edf01b 1492 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 1493
destinyXfate 2:0e2ef1edf01b 1494 #if defined(PNG_READ_INTERLACING_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1495 #if defined(PNG_HAVE_MMX_READ_INTERLACE)
destinyXfate 2:0e2ef1edf01b 1496
destinyXfate 2:0e2ef1edf01b 1497 /* png_do_read_interlace() is called after any 16-bit to 8-bit conversion
destinyXfate 2:0e2ef1edf01b 1498 * has taken place. [GRR: what other steps come before and/or after?]
destinyXfate 2:0e2ef1edf01b 1499 */
destinyXfate 2:0e2ef1edf01b 1500
destinyXfate 2:0e2ef1edf01b 1501 void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 1502 png_do_read_interlace(png_structp png_ptr)
destinyXfate 2:0e2ef1edf01b 1503 {
destinyXfate 2:0e2ef1edf01b 1504 png_row_infop row_info = &(png_ptr->row_info);
destinyXfate 2:0e2ef1edf01b 1505 png_bytep row = png_ptr->row_buf + 1;
destinyXfate 2:0e2ef1edf01b 1506 int pass = png_ptr->pass;
destinyXfate 2:0e2ef1edf01b 1507 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1508 png_uint_32 transformations = png_ptr->transformations;
destinyXfate 2:0e2ef1edf01b 1509 #endif
destinyXfate 2:0e2ef1edf01b 1510
destinyXfate 2:0e2ef1edf01b 1511 png_debug(1, "in png_do_read_interlace (pnggccrd.c)\n");
destinyXfate 2:0e2ef1edf01b 1512
destinyXfate 2:0e2ef1edf01b 1513 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1514 if (_mmx_supported == 2) {
destinyXfate 2:0e2ef1edf01b 1515 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 1516 /* this should have happened in png_init_mmx_flags() already */
destinyXfate 2:0e2ef1edf01b 1517 png_warning(png_ptr, "asm_flags may not have been initialized");
destinyXfate 2:0e2ef1edf01b 1518 #endif
destinyXfate 2:0e2ef1edf01b 1519 png_mmx_support();
destinyXfate 2:0e2ef1edf01b 1520 }
destinyXfate 2:0e2ef1edf01b 1521 #endif
destinyXfate 2:0e2ef1edf01b 1522
destinyXfate 2:0e2ef1edf01b 1523 if (row != NULL && row_info != NULL)
destinyXfate 2:0e2ef1edf01b 1524 {
destinyXfate 2:0e2ef1edf01b 1525 png_uint_32 final_width;
destinyXfate 2:0e2ef1edf01b 1526
destinyXfate 2:0e2ef1edf01b 1527 final_width = row_info->width * png_pass_inc[pass];
destinyXfate 2:0e2ef1edf01b 1528
destinyXfate 2:0e2ef1edf01b 1529 switch (row_info->pixel_depth)
destinyXfate 2:0e2ef1edf01b 1530 {
destinyXfate 2:0e2ef1edf01b 1531 case 1:
destinyXfate 2:0e2ef1edf01b 1532 {
destinyXfate 2:0e2ef1edf01b 1533 png_bytep sp, dp;
destinyXfate 2:0e2ef1edf01b 1534 int sshift, dshift;
destinyXfate 2:0e2ef1edf01b 1535 int s_start, s_end, s_inc;
destinyXfate 2:0e2ef1edf01b 1536 png_byte v;
destinyXfate 2:0e2ef1edf01b 1537 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1538 int j;
destinyXfate 2:0e2ef1edf01b 1539
destinyXfate 2:0e2ef1edf01b 1540 sp = row + (png_size_t)((row_info->width - 1) >> 3);
destinyXfate 2:0e2ef1edf01b 1541 dp = row + (png_size_t)((final_width - 1) >> 3);
destinyXfate 2:0e2ef1edf01b 1542 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1543 if (transformations & PNG_PACKSWAP)
destinyXfate 2:0e2ef1edf01b 1544 {
destinyXfate 2:0e2ef1edf01b 1545 sshift = (int)((row_info->width + 7) & 7);
destinyXfate 2:0e2ef1edf01b 1546 dshift = (int)((final_width + 7) & 7);
destinyXfate 2:0e2ef1edf01b 1547 s_start = 7;
destinyXfate 2:0e2ef1edf01b 1548 s_end = 0;
destinyXfate 2:0e2ef1edf01b 1549 s_inc = -1;
destinyXfate 2:0e2ef1edf01b 1550 }
destinyXfate 2:0e2ef1edf01b 1551 else
destinyXfate 2:0e2ef1edf01b 1552 #endif
destinyXfate 2:0e2ef1edf01b 1553 {
destinyXfate 2:0e2ef1edf01b 1554 sshift = 7 - (int)((row_info->width + 7) & 7);
destinyXfate 2:0e2ef1edf01b 1555 dshift = 7 - (int)((final_width + 7) & 7);
destinyXfate 2:0e2ef1edf01b 1556 s_start = 0;
destinyXfate 2:0e2ef1edf01b 1557 s_end = 7;
destinyXfate 2:0e2ef1edf01b 1558 s_inc = 1;
destinyXfate 2:0e2ef1edf01b 1559 }
destinyXfate 2:0e2ef1edf01b 1560
destinyXfate 2:0e2ef1edf01b 1561 for (i = row_info->width; i; i--)
destinyXfate 2:0e2ef1edf01b 1562 {
destinyXfate 2:0e2ef1edf01b 1563 v = (png_byte)((*sp >> sshift) & 0x1);
destinyXfate 2:0e2ef1edf01b 1564 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 1565 {
destinyXfate 2:0e2ef1edf01b 1566 *dp &= (png_byte)((0x7f7f >> (7 - dshift)) & 0xff);
destinyXfate 2:0e2ef1edf01b 1567 *dp |= (png_byte)(v << dshift);
destinyXfate 2:0e2ef1edf01b 1568 if (dshift == s_end)
destinyXfate 2:0e2ef1edf01b 1569 {
destinyXfate 2:0e2ef1edf01b 1570 dshift = s_start;
destinyXfate 2:0e2ef1edf01b 1571 dp--;
destinyXfate 2:0e2ef1edf01b 1572 }
destinyXfate 2:0e2ef1edf01b 1573 else
destinyXfate 2:0e2ef1edf01b 1574 dshift += s_inc;
destinyXfate 2:0e2ef1edf01b 1575 }
destinyXfate 2:0e2ef1edf01b 1576 if (sshift == s_end)
destinyXfate 2:0e2ef1edf01b 1577 {
destinyXfate 2:0e2ef1edf01b 1578 sshift = s_start;
destinyXfate 2:0e2ef1edf01b 1579 sp--;
destinyXfate 2:0e2ef1edf01b 1580 }
destinyXfate 2:0e2ef1edf01b 1581 else
destinyXfate 2:0e2ef1edf01b 1582 sshift += s_inc;
destinyXfate 2:0e2ef1edf01b 1583 }
destinyXfate 2:0e2ef1edf01b 1584 break;
destinyXfate 2:0e2ef1edf01b 1585 }
destinyXfate 2:0e2ef1edf01b 1586
destinyXfate 2:0e2ef1edf01b 1587 case 2:
destinyXfate 2:0e2ef1edf01b 1588 {
destinyXfate 2:0e2ef1edf01b 1589 png_bytep sp, dp;
destinyXfate 2:0e2ef1edf01b 1590 int sshift, dshift;
destinyXfate 2:0e2ef1edf01b 1591 int s_start, s_end, s_inc;
destinyXfate 2:0e2ef1edf01b 1592 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1593
destinyXfate 2:0e2ef1edf01b 1594 sp = row + (png_size_t)((row_info->width - 1) >> 2);
destinyXfate 2:0e2ef1edf01b 1595 dp = row + (png_size_t)((final_width - 1) >> 2);
destinyXfate 2:0e2ef1edf01b 1596 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1597 if (transformations & PNG_PACKSWAP)
destinyXfate 2:0e2ef1edf01b 1598 {
destinyXfate 2:0e2ef1edf01b 1599 sshift = (png_size_t)(((row_info->width + 3) & 3) << 1);
destinyXfate 2:0e2ef1edf01b 1600 dshift = (png_size_t)(((final_width + 3) & 3) << 1);
destinyXfate 2:0e2ef1edf01b 1601 s_start = 6;
destinyXfate 2:0e2ef1edf01b 1602 s_end = 0;
destinyXfate 2:0e2ef1edf01b 1603 s_inc = -2;
destinyXfate 2:0e2ef1edf01b 1604 }
destinyXfate 2:0e2ef1edf01b 1605 else
destinyXfate 2:0e2ef1edf01b 1606 #endif
destinyXfate 2:0e2ef1edf01b 1607 {
destinyXfate 2:0e2ef1edf01b 1608 sshift = (png_size_t)((3 - ((row_info->width + 3) & 3)) << 1);
destinyXfate 2:0e2ef1edf01b 1609 dshift = (png_size_t)((3 - ((final_width + 3) & 3)) << 1);
destinyXfate 2:0e2ef1edf01b 1610 s_start = 0;
destinyXfate 2:0e2ef1edf01b 1611 s_end = 6;
destinyXfate 2:0e2ef1edf01b 1612 s_inc = 2;
destinyXfate 2:0e2ef1edf01b 1613 }
destinyXfate 2:0e2ef1edf01b 1614
destinyXfate 2:0e2ef1edf01b 1615 for (i = row_info->width; i; i--)
destinyXfate 2:0e2ef1edf01b 1616 {
destinyXfate 2:0e2ef1edf01b 1617 png_byte v;
destinyXfate 2:0e2ef1edf01b 1618 int j;
destinyXfate 2:0e2ef1edf01b 1619
destinyXfate 2:0e2ef1edf01b 1620 v = (png_byte)((*sp >> sshift) & 0x3);
destinyXfate 2:0e2ef1edf01b 1621 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 1622 {
destinyXfate 2:0e2ef1edf01b 1623 *dp &= (png_byte)((0x3f3f >> (6 - dshift)) & 0xff);
destinyXfate 2:0e2ef1edf01b 1624 *dp |= (png_byte)(v << dshift);
destinyXfate 2:0e2ef1edf01b 1625 if (dshift == s_end)
destinyXfate 2:0e2ef1edf01b 1626 {
destinyXfate 2:0e2ef1edf01b 1627 dshift = s_start;
destinyXfate 2:0e2ef1edf01b 1628 dp--;
destinyXfate 2:0e2ef1edf01b 1629 }
destinyXfate 2:0e2ef1edf01b 1630 else
destinyXfate 2:0e2ef1edf01b 1631 dshift += s_inc;
destinyXfate 2:0e2ef1edf01b 1632 }
destinyXfate 2:0e2ef1edf01b 1633 if (sshift == s_end)
destinyXfate 2:0e2ef1edf01b 1634 {
destinyXfate 2:0e2ef1edf01b 1635 sshift = s_start;
destinyXfate 2:0e2ef1edf01b 1636 sp--;
destinyXfate 2:0e2ef1edf01b 1637 }
destinyXfate 2:0e2ef1edf01b 1638 else
destinyXfate 2:0e2ef1edf01b 1639 sshift += s_inc;
destinyXfate 2:0e2ef1edf01b 1640 }
destinyXfate 2:0e2ef1edf01b 1641 break;
destinyXfate 2:0e2ef1edf01b 1642 }
destinyXfate 2:0e2ef1edf01b 1643
destinyXfate 2:0e2ef1edf01b 1644 case 4:
destinyXfate 2:0e2ef1edf01b 1645 {
destinyXfate 2:0e2ef1edf01b 1646 png_bytep sp, dp;
destinyXfate 2:0e2ef1edf01b 1647 int sshift, dshift;
destinyXfate 2:0e2ef1edf01b 1648 int s_start, s_end, s_inc;
destinyXfate 2:0e2ef1edf01b 1649 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1650
destinyXfate 2:0e2ef1edf01b 1651 sp = row + (png_size_t)((row_info->width - 1) >> 1);
destinyXfate 2:0e2ef1edf01b 1652 dp = row + (png_size_t)((final_width - 1) >> 1);
destinyXfate 2:0e2ef1edf01b 1653 #if defined(PNG_READ_PACKSWAP_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1654 if (transformations & PNG_PACKSWAP)
destinyXfate 2:0e2ef1edf01b 1655 {
destinyXfate 2:0e2ef1edf01b 1656 sshift = (png_size_t)(((row_info->width + 1) & 1) << 2);
destinyXfate 2:0e2ef1edf01b 1657 dshift = (png_size_t)(((final_width + 1) & 1) << 2);
destinyXfate 2:0e2ef1edf01b 1658 s_start = 4;
destinyXfate 2:0e2ef1edf01b 1659 s_end = 0;
destinyXfate 2:0e2ef1edf01b 1660 s_inc = -4;
destinyXfate 2:0e2ef1edf01b 1661 }
destinyXfate 2:0e2ef1edf01b 1662 else
destinyXfate 2:0e2ef1edf01b 1663 #endif
destinyXfate 2:0e2ef1edf01b 1664 {
destinyXfate 2:0e2ef1edf01b 1665 sshift = (png_size_t)((1 - ((row_info->width + 1) & 1)) << 2);
destinyXfate 2:0e2ef1edf01b 1666 dshift = (png_size_t)((1 - ((final_width + 1) & 1)) << 2);
destinyXfate 2:0e2ef1edf01b 1667 s_start = 0;
destinyXfate 2:0e2ef1edf01b 1668 s_end = 4;
destinyXfate 2:0e2ef1edf01b 1669 s_inc = 4;
destinyXfate 2:0e2ef1edf01b 1670 }
destinyXfate 2:0e2ef1edf01b 1671
destinyXfate 2:0e2ef1edf01b 1672 for (i = row_info->width; i; i--)
destinyXfate 2:0e2ef1edf01b 1673 {
destinyXfate 2:0e2ef1edf01b 1674 png_byte v;
destinyXfate 2:0e2ef1edf01b 1675 int j;
destinyXfate 2:0e2ef1edf01b 1676
destinyXfate 2:0e2ef1edf01b 1677 v = (png_byte)((*sp >> sshift) & 0xf);
destinyXfate 2:0e2ef1edf01b 1678 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 1679 {
destinyXfate 2:0e2ef1edf01b 1680 *dp &= (png_byte)((0xf0f >> (4 - dshift)) & 0xff);
destinyXfate 2:0e2ef1edf01b 1681 *dp |= (png_byte)(v << dshift);
destinyXfate 2:0e2ef1edf01b 1682 if (dshift == s_end)
destinyXfate 2:0e2ef1edf01b 1683 {
destinyXfate 2:0e2ef1edf01b 1684 dshift = s_start;
destinyXfate 2:0e2ef1edf01b 1685 dp--;
destinyXfate 2:0e2ef1edf01b 1686 }
destinyXfate 2:0e2ef1edf01b 1687 else
destinyXfate 2:0e2ef1edf01b 1688 dshift += s_inc;
destinyXfate 2:0e2ef1edf01b 1689 }
destinyXfate 2:0e2ef1edf01b 1690 if (sshift == s_end)
destinyXfate 2:0e2ef1edf01b 1691 {
destinyXfate 2:0e2ef1edf01b 1692 sshift = s_start;
destinyXfate 2:0e2ef1edf01b 1693 sp--;
destinyXfate 2:0e2ef1edf01b 1694 }
destinyXfate 2:0e2ef1edf01b 1695 else
destinyXfate 2:0e2ef1edf01b 1696 sshift += s_inc;
destinyXfate 2:0e2ef1edf01b 1697 }
destinyXfate 2:0e2ef1edf01b 1698 break;
destinyXfate 2:0e2ef1edf01b 1699 }
destinyXfate 2:0e2ef1edf01b 1700
destinyXfate 2:0e2ef1edf01b 1701 /*====================================================================*/
destinyXfate 2:0e2ef1edf01b 1702
destinyXfate 2:0e2ef1edf01b 1703 default: /* 8-bit or larger (this is where the routine is modified) */
destinyXfate 2:0e2ef1edf01b 1704 {
destinyXfate 2:0e2ef1edf01b 1705 #if 0
destinyXfate 2:0e2ef1edf01b 1706 // static unsigned long long _const4 = 0x0000000000FFFFFFLL; no good
destinyXfate 2:0e2ef1edf01b 1707 // static unsigned long long const4 = 0x0000000000FFFFFFLL; no good
destinyXfate 2:0e2ef1edf01b 1708 // unsigned long long _const4 = 0x0000000000FFFFFFLL; no good
destinyXfate 2:0e2ef1edf01b 1709 // unsigned long long const4 = 0x0000000000FFFFFFLL; no good
destinyXfate 2:0e2ef1edf01b 1710 #endif
destinyXfate 2:0e2ef1edf01b 1711 png_bytep sptr, dp;
destinyXfate 2:0e2ef1edf01b 1712 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 1713 png_size_t pixel_bytes;
destinyXfate 2:0e2ef1edf01b 1714 int width = (int)row_info->width;
destinyXfate 2:0e2ef1edf01b 1715
destinyXfate 2:0e2ef1edf01b 1716 pixel_bytes = (row_info->pixel_depth >> 3);
destinyXfate 2:0e2ef1edf01b 1717
destinyXfate 2:0e2ef1edf01b 1718 /* point sptr at the last pixel in the pre-expanded row: */
destinyXfate 2:0e2ef1edf01b 1719 sptr = row + (width - 1) * pixel_bytes;
destinyXfate 2:0e2ef1edf01b 1720
destinyXfate 2:0e2ef1edf01b 1721 /* point dp at the last pixel position in the expanded row: */
destinyXfate 2:0e2ef1edf01b 1722 dp = row + (final_width - 1) * pixel_bytes;
destinyXfate 2:0e2ef1edf01b 1723
destinyXfate 2:0e2ef1edf01b 1724 /* New code by Nirav Chhatrapati - Intel Corporation */
destinyXfate 2:0e2ef1edf01b 1725
destinyXfate 2:0e2ef1edf01b 1726 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 1727 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 1728 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_INTERLACE)
destinyXfate 2:0e2ef1edf01b 1729 /* && _mmx_supported */ )
destinyXfate 2:0e2ef1edf01b 1730 #else
destinyXfate 2:0e2ef1edf01b 1731 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 1732 #endif
destinyXfate 2:0e2ef1edf01b 1733 {
destinyXfate 2:0e2ef1edf01b 1734 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 1735 if (pixel_bytes == 3)
destinyXfate 2:0e2ef1edf01b 1736 {
destinyXfate 2:0e2ef1edf01b 1737 if (((pass == 0) || (pass == 1)) && width)
destinyXfate 2:0e2ef1edf01b 1738 {
destinyXfate 2:0e2ef1edf01b 1739 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 1740 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 1741 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 1742 int dummy_value_a;
destinyXfate 2:0e2ef1edf01b 1743
destinyXfate 2:0e2ef1edf01b 1744 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 1745 "subl $21, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1746 // (png_pass_inc[pass] - 1)*pixel_bytes
destinyXfate 2:0e2ef1edf01b 1747
destinyXfate 2:0e2ef1edf01b 1748 ".loop3_pass0: \n\t"
destinyXfate 2:0e2ef1edf01b 1749 "movd (%%esi), %%mm0 \n\t" // x x x x x 2 1 0
destinyXfate 2:0e2ef1edf01b 1750 "pand (%3), %%mm0 \n\t" // z z z z z 2 1 0
destinyXfate 2:0e2ef1edf01b 1751 "movq %%mm0, %%mm1 \n\t" // z z z z z 2 1 0
destinyXfate 2:0e2ef1edf01b 1752 "psllq $16, %%mm0 \n\t" // z z z 2 1 0 z z
destinyXfate 2:0e2ef1edf01b 1753 "movq %%mm0, %%mm2 \n\t" // z z z 2 1 0 z z
destinyXfate 2:0e2ef1edf01b 1754 "psllq $24, %%mm0 \n\t" // 2 1 0 z z z z z
destinyXfate 2:0e2ef1edf01b 1755 "psrlq $8, %%mm1 \n\t" // z z z z z z 2 1
destinyXfate 2:0e2ef1edf01b 1756 "por %%mm2, %%mm0 \n\t" // 2 1 0 2 1 0 z z
destinyXfate 2:0e2ef1edf01b 1757 "por %%mm1, %%mm0 \n\t" // 2 1 0 2 1 0 2 1
destinyXfate 2:0e2ef1edf01b 1758 "movq %%mm0, %%mm3 \n\t" // 2 1 0 2 1 0 2 1
destinyXfate 2:0e2ef1edf01b 1759 "psllq $16, %%mm0 \n\t" // 0 2 1 0 2 1 z z
destinyXfate 2:0e2ef1edf01b 1760 "movq %%mm3, %%mm4 \n\t" // 2 1 0 2 1 0 2 1
destinyXfate 2:0e2ef1edf01b 1761 "punpckhdq %%mm0, %%mm3 \n\t" // 0 2 1 0 2 1 0 2
destinyXfate 2:0e2ef1edf01b 1762 "movq %%mm4, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1763 "psrlq $32, %%mm0 \n\t" // z z z z 0 2 1 0
destinyXfate 2:0e2ef1edf01b 1764 "movq %%mm3, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1765 "punpckldq %%mm4, %%mm0 \n\t" // 1 0 2 1 0 2 1 0
destinyXfate 2:0e2ef1edf01b 1766 "subl $3, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1767 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1768 "subl $24, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1769 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1770 "jnz .loop3_pass0 \n\t"
destinyXfate 2:0e2ef1edf01b 1771 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1772
destinyXfate 2:0e2ef1edf01b 1773 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1774 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1775 "=D" (dummy_value_D),
destinyXfate 2:0e2ef1edf01b 1776 "=a" (dummy_value_a)
destinyXfate 2:0e2ef1edf01b 1777
destinyXfate 2:0e2ef1edf01b 1778
destinyXfate 2:0e2ef1edf01b 1779 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1780 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 1781 "0" (width), // ecx
destinyXfate 2:0e2ef1edf01b 1782 "3" (&_const4) // %1(?) (0x0000000000FFFFFFLL)
destinyXfate 2:0e2ef1edf01b 1783
destinyXfate 2:0e2ef1edf01b 1784 #if 0 /* %mm0, ..., %mm4 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1785 : "%mm0", "%mm1", "%mm2" // clobber list
destinyXfate 2:0e2ef1edf01b 1786 , "%mm3", "%mm4"
destinyXfate 2:0e2ef1edf01b 1787 #endif
destinyXfate 2:0e2ef1edf01b 1788 );
destinyXfate 2:0e2ef1edf01b 1789 }
destinyXfate 2:0e2ef1edf01b 1790 else if (((pass == 2) || (pass == 3)) && width)
destinyXfate 2:0e2ef1edf01b 1791 {
destinyXfate 2:0e2ef1edf01b 1792 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 1793 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 1794 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 1795 int dummy_value_a;
destinyXfate 2:0e2ef1edf01b 1796
destinyXfate 2:0e2ef1edf01b 1797 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 1798 "subl $9, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1799 // (png_pass_inc[pass] - 1)*pixel_bytes
destinyXfate 2:0e2ef1edf01b 1800
destinyXfate 2:0e2ef1edf01b 1801 ".loop3_pass2: \n\t"
destinyXfate 2:0e2ef1edf01b 1802 "movd (%%esi), %%mm0 \n\t" // x x x x x 2 1 0
destinyXfate 2:0e2ef1edf01b 1803 "pand (%3), %%mm0 \n\t" // z z z z z 2 1 0
destinyXfate 2:0e2ef1edf01b 1804 "movq %%mm0, %%mm1 \n\t" // z z z z z 2 1 0
destinyXfate 2:0e2ef1edf01b 1805 "psllq $16, %%mm0 \n\t" // z z z 2 1 0 z z
destinyXfate 2:0e2ef1edf01b 1806 "movq %%mm0, %%mm2 \n\t" // z z z 2 1 0 z z
destinyXfate 2:0e2ef1edf01b 1807 "psllq $24, %%mm0 \n\t" // 2 1 0 z z z z z
destinyXfate 2:0e2ef1edf01b 1808 "psrlq $8, %%mm1 \n\t" // z z z z z z 2 1
destinyXfate 2:0e2ef1edf01b 1809 "por %%mm2, %%mm0 \n\t" // 2 1 0 2 1 0 z z
destinyXfate 2:0e2ef1edf01b 1810 "por %%mm1, %%mm0 \n\t" // 2 1 0 2 1 0 2 1
destinyXfate 2:0e2ef1edf01b 1811 "movq %%mm0, 4(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1812 "psrlq $16, %%mm0 \n\t" // z z 2 1 0 2 1 0
destinyXfate 2:0e2ef1edf01b 1813 "subl $3, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1814 "movd %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1815 "subl $12, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1816 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1817 "jnz .loop3_pass2 \n\t"
destinyXfate 2:0e2ef1edf01b 1818 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1819
destinyXfate 2:0e2ef1edf01b 1820 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1821 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1822 "=D" (dummy_value_D),
destinyXfate 2:0e2ef1edf01b 1823 "=a" (dummy_value_a)
destinyXfate 2:0e2ef1edf01b 1824
destinyXfate 2:0e2ef1edf01b 1825 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1826 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 1827 "0" (width), // ecx
destinyXfate 2:0e2ef1edf01b 1828 "3" (&_const4) // (0x0000000000FFFFFFLL)
destinyXfate 2:0e2ef1edf01b 1829
destinyXfate 2:0e2ef1edf01b 1830 #if 0 /* %mm0, ..., %mm2 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1831 : "%mm0", "%mm1", "%mm2" // clobber list
destinyXfate 2:0e2ef1edf01b 1832 #endif
destinyXfate 2:0e2ef1edf01b 1833 );
destinyXfate 2:0e2ef1edf01b 1834 }
destinyXfate 2:0e2ef1edf01b 1835 else if (width) /* && ((pass == 4) || (pass == 5)) */
destinyXfate 2:0e2ef1edf01b 1836 {
destinyXfate 2:0e2ef1edf01b 1837 int width_mmx = ((width >> 1) << 1) - 8; // GRR: huh?
destinyXfate 2:0e2ef1edf01b 1838 if (width_mmx < 0)
destinyXfate 2:0e2ef1edf01b 1839 width_mmx = 0;
destinyXfate 2:0e2ef1edf01b 1840 width -= width_mmx; // 8 or 9 pix, 24 or 27 bytes
destinyXfate 2:0e2ef1edf01b 1841 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 1842 {
destinyXfate 2:0e2ef1edf01b 1843 // png_pass_inc[] = {8, 8, 4, 4, 2, 2, 1};
destinyXfate 2:0e2ef1edf01b 1844 // sptr points at last pixel in pre-expanded row
destinyXfate 2:0e2ef1edf01b 1845 // dp points at last pixel position in expanded row
destinyXfate 2:0e2ef1edf01b 1846 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 1847 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 1848 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 1849 int dummy_value_a;
destinyXfate 2:0e2ef1edf01b 1850 int dummy_value_d;
destinyXfate 2:0e2ef1edf01b 1851
destinyXfate 2:0e2ef1edf01b 1852 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 1853 "subl $3, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1854 "subl $9, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1855 // (png_pass_inc[pass] + 1)*pixel_bytes
destinyXfate 2:0e2ef1edf01b 1856
destinyXfate 2:0e2ef1edf01b 1857 ".loop3_pass4: \n\t"
destinyXfate 2:0e2ef1edf01b 1858 "movq (%%esi), %%mm0 \n\t" // x x 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 1859 "movq %%mm0, %%mm1 \n\t" // x x 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 1860 "movq %%mm0, %%mm2 \n\t" // x x 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 1861 "psllq $24, %%mm0 \n\t" // 4 3 2 1 0 z z z
destinyXfate 2:0e2ef1edf01b 1862 "pand (%3), %%mm1 \n\t" // z z z z z 2 1 0
destinyXfate 2:0e2ef1edf01b 1863 "psrlq $24, %%mm2 \n\t" // z z z x x 5 4 3
destinyXfate 2:0e2ef1edf01b 1864 "por %%mm1, %%mm0 \n\t" // 4 3 2 1 0 2 1 0
destinyXfate 2:0e2ef1edf01b 1865 "movq %%mm2, %%mm3 \n\t" // z z z x x 5 4 3
destinyXfate 2:0e2ef1edf01b 1866 "psllq $8, %%mm2 \n\t" // z z x x 5 4 3 z
destinyXfate 2:0e2ef1edf01b 1867 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1868 "psrlq $16, %%mm3 \n\t" // z z z z z x x 5
destinyXfate 2:0e2ef1edf01b 1869 "pand (%4), %%mm3 \n\t" // z z z z z z z 5
destinyXfate 2:0e2ef1edf01b 1870 "por %%mm3, %%mm2 \n\t" // z z x x 5 4 3 5
destinyXfate 2:0e2ef1edf01b 1871 "subl $6, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1872 "movd %%mm2, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1873 "subl $12, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1874 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1875 "jnz .loop3_pass4 \n\t"
destinyXfate 2:0e2ef1edf01b 1876 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1877
destinyXfate 2:0e2ef1edf01b 1878 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1879 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1880 "=D" (dummy_value_D),
destinyXfate 2:0e2ef1edf01b 1881 "=a" (dummy_value_a),
destinyXfate 2:0e2ef1edf01b 1882 "=d" (dummy_value_d)
destinyXfate 2:0e2ef1edf01b 1883
destinyXfate 2:0e2ef1edf01b 1884 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1885 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 1886 "0" (width_mmx), // ecx
destinyXfate 2:0e2ef1edf01b 1887 "3" (&_const4), // 0x0000000000FFFFFFLL
destinyXfate 2:0e2ef1edf01b 1888 "4" (&_const6) // 0x00000000000000FFLL
destinyXfate 2:0e2ef1edf01b 1889
destinyXfate 2:0e2ef1edf01b 1890 #if 0 /* %mm0, ..., %mm3 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1891 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 1892 , "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 1893 #endif
destinyXfate 2:0e2ef1edf01b 1894 );
destinyXfate 2:0e2ef1edf01b 1895 }
destinyXfate 2:0e2ef1edf01b 1896
destinyXfate 2:0e2ef1edf01b 1897 sptr -= width_mmx*3;
destinyXfate 2:0e2ef1edf01b 1898 dp -= width_mmx*6;
destinyXfate 2:0e2ef1edf01b 1899 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 1900 {
destinyXfate 2:0e2ef1edf01b 1901 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 1902 int j;
destinyXfate 2:0e2ef1edf01b 1903
destinyXfate 2:0e2ef1edf01b 1904 png_memcpy(v, sptr, 3);
destinyXfate 2:0e2ef1edf01b 1905 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 1906 {
destinyXfate 2:0e2ef1edf01b 1907 png_memcpy(dp, v, 3);
destinyXfate 2:0e2ef1edf01b 1908 dp -= 3;
destinyXfate 2:0e2ef1edf01b 1909 }
destinyXfate 2:0e2ef1edf01b 1910 sptr -= 3;
destinyXfate 2:0e2ef1edf01b 1911 }
destinyXfate 2:0e2ef1edf01b 1912 }
destinyXfate 2:0e2ef1edf01b 1913 } /* end of pixel_bytes == 3 */
destinyXfate 2:0e2ef1edf01b 1914
destinyXfate 2:0e2ef1edf01b 1915 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 1916 else if (pixel_bytes == 1)
destinyXfate 2:0e2ef1edf01b 1917 {
destinyXfate 2:0e2ef1edf01b 1918 if (((pass == 0) || (pass == 1)) && width)
destinyXfate 2:0e2ef1edf01b 1919 {
destinyXfate 2:0e2ef1edf01b 1920 int width_mmx = ((width >> 2) << 2);
destinyXfate 2:0e2ef1edf01b 1921 width -= width_mmx; // 0-3 pixels => 0-3 bytes
destinyXfate 2:0e2ef1edf01b 1922 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 1923 {
destinyXfate 2:0e2ef1edf01b 1924 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 1925 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 1926 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 1927
destinyXfate 2:0e2ef1edf01b 1928 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 1929 "subl $3, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1930 "subl $31, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1931
destinyXfate 2:0e2ef1edf01b 1932 ".loop1_pass0: \n\t"
destinyXfate 2:0e2ef1edf01b 1933 "movd (%%esi), %%mm0 \n\t" // x x x x 3 2 1 0
destinyXfate 2:0e2ef1edf01b 1934 "movq %%mm0, %%mm1 \n\t" // x x x x 3 2 1 0
destinyXfate 2:0e2ef1edf01b 1935 "punpcklbw %%mm0, %%mm0 \n\t" // 3 3 2 2 1 1 0 0
destinyXfate 2:0e2ef1edf01b 1936 "movq %%mm0, %%mm2 \n\t" // 3 3 2 2 1 1 0 0
destinyXfate 2:0e2ef1edf01b 1937 "punpcklwd %%mm0, %%mm0 \n\t" // 1 1 1 1 0 0 0 0
destinyXfate 2:0e2ef1edf01b 1938 "movq %%mm0, %%mm3 \n\t" // 1 1 1 1 0 0 0 0
destinyXfate 2:0e2ef1edf01b 1939 "punpckldq %%mm0, %%mm0 \n\t" // 0 0 0 0 0 0 0 0
destinyXfate 2:0e2ef1edf01b 1940 "punpckhdq %%mm3, %%mm3 \n\t" // 1 1 1 1 1 1 1 1
destinyXfate 2:0e2ef1edf01b 1941 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1942 "punpckhwd %%mm2, %%mm2 \n\t" // 3 3 3 3 2 2 2 2
destinyXfate 2:0e2ef1edf01b 1943 "movq %%mm3, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1944 "movq %%mm2, %%mm4 \n\t" // 3 3 3 3 2 2 2 2
destinyXfate 2:0e2ef1edf01b 1945 "punpckldq %%mm2, %%mm2 \n\t" // 2 2 2 2 2 2 2 2
destinyXfate 2:0e2ef1edf01b 1946 "punpckhdq %%mm4, %%mm4 \n\t" // 3 3 3 3 3 3 3 3
destinyXfate 2:0e2ef1edf01b 1947 "movq %%mm2, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1948 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 1949 "movq %%mm4, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 1950 "subl $32, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 1951 "subl $4, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 1952 "jnz .loop1_pass0 \n\t"
destinyXfate 2:0e2ef1edf01b 1953 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 1954
destinyXfate 2:0e2ef1edf01b 1955 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 1956 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 1957 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 1958
destinyXfate 2:0e2ef1edf01b 1959 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 1960 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 1961 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 1962
destinyXfate 2:0e2ef1edf01b 1963 #if 0 /* %mm0, ..., %mm4 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 1964 : "%mm0", "%mm1", "%mm2" // clobber list
destinyXfate 2:0e2ef1edf01b 1965 , "%mm3", "%mm4"
destinyXfate 2:0e2ef1edf01b 1966 #endif
destinyXfate 2:0e2ef1edf01b 1967 );
destinyXfate 2:0e2ef1edf01b 1968 }
destinyXfate 2:0e2ef1edf01b 1969
destinyXfate 2:0e2ef1edf01b 1970 sptr -= width_mmx;
destinyXfate 2:0e2ef1edf01b 1971 dp -= width_mmx*8;
destinyXfate 2:0e2ef1edf01b 1972 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 1973 {
destinyXfate 2:0e2ef1edf01b 1974 int j;
destinyXfate 2:0e2ef1edf01b 1975
destinyXfate 2:0e2ef1edf01b 1976 /* I simplified this part in version 1.0.4e
destinyXfate 2:0e2ef1edf01b 1977 * here and in several other instances where
destinyXfate 2:0e2ef1edf01b 1978 * pixel_bytes == 1 -- GR-P
destinyXfate 2:0e2ef1edf01b 1979 *
destinyXfate 2:0e2ef1edf01b 1980 * Original code:
destinyXfate 2:0e2ef1edf01b 1981 *
destinyXfate 2:0e2ef1edf01b 1982 * png_byte v[8];
destinyXfate 2:0e2ef1edf01b 1983 * png_memcpy(v, sptr, pixel_bytes);
destinyXfate 2:0e2ef1edf01b 1984 * for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 1985 * {
destinyXfate 2:0e2ef1edf01b 1986 * png_memcpy(dp, v, pixel_bytes);
destinyXfate 2:0e2ef1edf01b 1987 * dp -= pixel_bytes;
destinyXfate 2:0e2ef1edf01b 1988 * }
destinyXfate 2:0e2ef1edf01b 1989 * sptr -= pixel_bytes;
destinyXfate 2:0e2ef1edf01b 1990 *
destinyXfate 2:0e2ef1edf01b 1991 * Replacement code is in the next three lines:
destinyXfate 2:0e2ef1edf01b 1992 */
destinyXfate 2:0e2ef1edf01b 1993
destinyXfate 2:0e2ef1edf01b 1994 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 1995 {
destinyXfate 2:0e2ef1edf01b 1996 *dp-- = *sptr;
destinyXfate 2:0e2ef1edf01b 1997 }
destinyXfate 2:0e2ef1edf01b 1998 --sptr;
destinyXfate 2:0e2ef1edf01b 1999 }
destinyXfate 2:0e2ef1edf01b 2000 }
destinyXfate 2:0e2ef1edf01b 2001 else if (((pass == 2) || (pass == 3)) && width)
destinyXfate 2:0e2ef1edf01b 2002 {
destinyXfate 2:0e2ef1edf01b 2003 int width_mmx = ((width >> 2) << 2);
destinyXfate 2:0e2ef1edf01b 2004 width -= width_mmx; // 0-3 pixels => 0-3 bytes
destinyXfate 2:0e2ef1edf01b 2005 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2006 {
destinyXfate 2:0e2ef1edf01b 2007 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2008 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2009 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2010
destinyXfate 2:0e2ef1edf01b 2011 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2012 "subl $3, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2013 "subl $15, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2014
destinyXfate 2:0e2ef1edf01b 2015 ".loop1_pass2: \n\t"
destinyXfate 2:0e2ef1edf01b 2016 "movd (%%esi), %%mm0 \n\t" // x x x x 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2017 "punpcklbw %%mm0, %%mm0 \n\t" // 3 3 2 2 1 1 0 0
destinyXfate 2:0e2ef1edf01b 2018 "movq %%mm0, %%mm1 \n\t" // 3 3 2 2 1 1 0 0
destinyXfate 2:0e2ef1edf01b 2019 "punpcklwd %%mm0, %%mm0 \n\t" // 1 1 1 1 0 0 0 0
destinyXfate 2:0e2ef1edf01b 2020 "punpckhwd %%mm1, %%mm1 \n\t" // 3 3 3 3 2 2 2 2
destinyXfate 2:0e2ef1edf01b 2021 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2022 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2023 "movq %%mm1, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2024 "subl $16, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2025 "subl $4, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2026 "jnz .loop1_pass2 \n\t"
destinyXfate 2:0e2ef1edf01b 2027 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2028
destinyXfate 2:0e2ef1edf01b 2029 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2030 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2031 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2032
destinyXfate 2:0e2ef1edf01b 2033 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2034 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2035 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2036
destinyXfate 2:0e2ef1edf01b 2037 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2038 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2039 #endif
destinyXfate 2:0e2ef1edf01b 2040 );
destinyXfate 2:0e2ef1edf01b 2041 }
destinyXfate 2:0e2ef1edf01b 2042
destinyXfate 2:0e2ef1edf01b 2043 sptr -= width_mmx;
destinyXfate 2:0e2ef1edf01b 2044 dp -= width_mmx*4;
destinyXfate 2:0e2ef1edf01b 2045 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2046 {
destinyXfate 2:0e2ef1edf01b 2047 int j;
destinyXfate 2:0e2ef1edf01b 2048
destinyXfate 2:0e2ef1edf01b 2049 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2050 {
destinyXfate 2:0e2ef1edf01b 2051 *dp-- = *sptr;
destinyXfate 2:0e2ef1edf01b 2052 }
destinyXfate 2:0e2ef1edf01b 2053 --sptr;
destinyXfate 2:0e2ef1edf01b 2054 }
destinyXfate 2:0e2ef1edf01b 2055 }
destinyXfate 2:0e2ef1edf01b 2056 else if (width) /* && ((pass == 4) || (pass == 5)) */
destinyXfate 2:0e2ef1edf01b 2057 {
destinyXfate 2:0e2ef1edf01b 2058 int width_mmx = ((width >> 3) << 3);
destinyXfate 2:0e2ef1edf01b 2059 width -= width_mmx; // 0-3 pixels => 0-3 bytes
destinyXfate 2:0e2ef1edf01b 2060 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2061 {
destinyXfate 2:0e2ef1edf01b 2062 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2063 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2064 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2065
destinyXfate 2:0e2ef1edf01b 2066 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2067 "subl $7, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2068 "subl $15, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2069
destinyXfate 2:0e2ef1edf01b 2070 ".loop1_pass4: \n\t"
destinyXfate 2:0e2ef1edf01b 2071 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2072 "movq %%mm0, %%mm1 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2073 "punpcklbw %%mm0, %%mm0 \n\t" // 3 3 2 2 1 1 0 0
destinyXfate 2:0e2ef1edf01b 2074 "punpckhbw %%mm1, %%mm1 \n\t" // 7 7 6 6 5 5 4 4
destinyXfate 2:0e2ef1edf01b 2075 "movq %%mm1, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2076 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2077 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2078 "subl $16, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2079 "subl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2080 "jnz .loop1_pass4 \n\t"
destinyXfate 2:0e2ef1edf01b 2081 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2082
destinyXfate 2:0e2ef1edf01b 2083 : "=c" (dummy_value_c), // output regs (none)
destinyXfate 2:0e2ef1edf01b 2084 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2085 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2086
destinyXfate 2:0e2ef1edf01b 2087 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2088 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2089 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2090
destinyXfate 2:0e2ef1edf01b 2091 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2092 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2093 #endif
destinyXfate 2:0e2ef1edf01b 2094 );
destinyXfate 2:0e2ef1edf01b 2095 }
destinyXfate 2:0e2ef1edf01b 2096
destinyXfate 2:0e2ef1edf01b 2097 sptr -= width_mmx;
destinyXfate 2:0e2ef1edf01b 2098 dp -= width_mmx*2;
destinyXfate 2:0e2ef1edf01b 2099 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2100 {
destinyXfate 2:0e2ef1edf01b 2101 int j;
destinyXfate 2:0e2ef1edf01b 2102
destinyXfate 2:0e2ef1edf01b 2103 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2104 {
destinyXfate 2:0e2ef1edf01b 2105 *dp-- = *sptr;
destinyXfate 2:0e2ef1edf01b 2106 }
destinyXfate 2:0e2ef1edf01b 2107 --sptr;
destinyXfate 2:0e2ef1edf01b 2108 }
destinyXfate 2:0e2ef1edf01b 2109 }
destinyXfate 2:0e2ef1edf01b 2110 } /* end of pixel_bytes == 1 */
destinyXfate 2:0e2ef1edf01b 2111
destinyXfate 2:0e2ef1edf01b 2112 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 2113 else if (pixel_bytes == 2)
destinyXfate 2:0e2ef1edf01b 2114 {
destinyXfate 2:0e2ef1edf01b 2115 if (((pass == 0) || (pass == 1)) && width)
destinyXfate 2:0e2ef1edf01b 2116 {
destinyXfate 2:0e2ef1edf01b 2117 int width_mmx = ((width >> 1) << 1);
destinyXfate 2:0e2ef1edf01b 2118 width -= width_mmx; // 0,1 pixels => 0,2 bytes
destinyXfate 2:0e2ef1edf01b 2119 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2120 {
destinyXfate 2:0e2ef1edf01b 2121 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2122 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2123 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2124
destinyXfate 2:0e2ef1edf01b 2125 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2126 "subl $2, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2127 "subl $30, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2128
destinyXfate 2:0e2ef1edf01b 2129 ".loop2_pass0: \n\t"
destinyXfate 2:0e2ef1edf01b 2130 "movd (%%esi), %%mm0 \n\t" // x x x x 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2131 "punpcklwd %%mm0, %%mm0 \n\t" // 3 2 3 2 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2132 "movq %%mm0, %%mm1 \n\t" // 3 2 3 2 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2133 "punpckldq %%mm0, %%mm0 \n\t" // 1 0 1 0 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2134 "punpckhdq %%mm1, %%mm1 \n\t" // 3 2 3 2 3 2 3 2
destinyXfate 2:0e2ef1edf01b 2135 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2136 "movq %%mm0, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2137 "movq %%mm1, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2138 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2139 "movq %%mm1, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2140 "subl $32, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2141 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2142 "jnz .loop2_pass0 \n\t"
destinyXfate 2:0e2ef1edf01b 2143 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2144
destinyXfate 2:0e2ef1edf01b 2145 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2146 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2147 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2148
destinyXfate 2:0e2ef1edf01b 2149 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2150 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2151 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2152
destinyXfate 2:0e2ef1edf01b 2153 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2154 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2155 #endif
destinyXfate 2:0e2ef1edf01b 2156 );
destinyXfate 2:0e2ef1edf01b 2157 }
destinyXfate 2:0e2ef1edf01b 2158
destinyXfate 2:0e2ef1edf01b 2159 sptr -= (width_mmx*2 - 2); // sign fixed
destinyXfate 2:0e2ef1edf01b 2160 dp -= (width_mmx*16 - 2); // sign fixed
destinyXfate 2:0e2ef1edf01b 2161 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2162 {
destinyXfate 2:0e2ef1edf01b 2163 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2164 int j;
destinyXfate 2:0e2ef1edf01b 2165 sptr -= 2;
destinyXfate 2:0e2ef1edf01b 2166 png_memcpy(v, sptr, 2);
destinyXfate 2:0e2ef1edf01b 2167 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2168 {
destinyXfate 2:0e2ef1edf01b 2169 dp -= 2;
destinyXfate 2:0e2ef1edf01b 2170 png_memcpy(dp, v, 2);
destinyXfate 2:0e2ef1edf01b 2171 }
destinyXfate 2:0e2ef1edf01b 2172 }
destinyXfate 2:0e2ef1edf01b 2173 }
destinyXfate 2:0e2ef1edf01b 2174 else if (((pass == 2) || (pass == 3)) && width)
destinyXfate 2:0e2ef1edf01b 2175 {
destinyXfate 2:0e2ef1edf01b 2176 int width_mmx = ((width >> 1) << 1) ;
destinyXfate 2:0e2ef1edf01b 2177 width -= width_mmx; // 0,1 pixels => 0,2 bytes
destinyXfate 2:0e2ef1edf01b 2178 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2179 {
destinyXfate 2:0e2ef1edf01b 2180 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2181 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2182 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2183
destinyXfate 2:0e2ef1edf01b 2184 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2185 "subl $2, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2186 "subl $14, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2187
destinyXfate 2:0e2ef1edf01b 2188 ".loop2_pass2: \n\t"
destinyXfate 2:0e2ef1edf01b 2189 "movd (%%esi), %%mm0 \n\t" // x x x x 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2190 "punpcklwd %%mm0, %%mm0 \n\t" // 3 2 3 2 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2191 "movq %%mm0, %%mm1 \n\t" // 3 2 3 2 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2192 "punpckldq %%mm0, %%mm0 \n\t" // 1 0 1 0 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2193 "punpckhdq %%mm1, %%mm1 \n\t" // 3 2 3 2 3 2 3 2
destinyXfate 2:0e2ef1edf01b 2194 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2195 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2196 "movq %%mm1, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2197 "subl $16, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2198 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2199 "jnz .loop2_pass2 \n\t"
destinyXfate 2:0e2ef1edf01b 2200 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2201
destinyXfate 2:0e2ef1edf01b 2202 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2203 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2204 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2205
destinyXfate 2:0e2ef1edf01b 2206 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2207 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2208 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2209
destinyXfate 2:0e2ef1edf01b 2210 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2211 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2212 #endif
destinyXfate 2:0e2ef1edf01b 2213 );
destinyXfate 2:0e2ef1edf01b 2214 }
destinyXfate 2:0e2ef1edf01b 2215
destinyXfate 2:0e2ef1edf01b 2216 sptr -= (width_mmx*2 - 2); // sign fixed
destinyXfate 2:0e2ef1edf01b 2217 dp -= (width_mmx*8 - 2); // sign fixed
destinyXfate 2:0e2ef1edf01b 2218 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2219 {
destinyXfate 2:0e2ef1edf01b 2220 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2221 int j;
destinyXfate 2:0e2ef1edf01b 2222 sptr -= 2;
destinyXfate 2:0e2ef1edf01b 2223 png_memcpy(v, sptr, 2);
destinyXfate 2:0e2ef1edf01b 2224 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2225 {
destinyXfate 2:0e2ef1edf01b 2226 dp -= 2;
destinyXfate 2:0e2ef1edf01b 2227 png_memcpy(dp, v, 2);
destinyXfate 2:0e2ef1edf01b 2228 }
destinyXfate 2:0e2ef1edf01b 2229 }
destinyXfate 2:0e2ef1edf01b 2230 }
destinyXfate 2:0e2ef1edf01b 2231 else if (width) // pass == 4 or 5
destinyXfate 2:0e2ef1edf01b 2232 {
destinyXfate 2:0e2ef1edf01b 2233 int width_mmx = ((width >> 1) << 1) ;
destinyXfate 2:0e2ef1edf01b 2234 width -= width_mmx; // 0,1 pixels => 0,2 bytes
destinyXfate 2:0e2ef1edf01b 2235 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2236 {
destinyXfate 2:0e2ef1edf01b 2237 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2238 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2239 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2240
destinyXfate 2:0e2ef1edf01b 2241 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2242 "subl $2, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2243 "subl $6, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2244
destinyXfate 2:0e2ef1edf01b 2245 ".loop2_pass4: \n\t"
destinyXfate 2:0e2ef1edf01b 2246 "movd (%%esi), %%mm0 \n\t" // x x x x 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2247 "punpcklwd %%mm0, %%mm0 \n\t" // 3 2 3 2 1 0 1 0
destinyXfate 2:0e2ef1edf01b 2248 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2249 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2250 "subl $8, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2251 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2252 "jnz .loop2_pass4 \n\t"
destinyXfate 2:0e2ef1edf01b 2253 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2254
destinyXfate 2:0e2ef1edf01b 2255 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2256 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2257 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2258
destinyXfate 2:0e2ef1edf01b 2259 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2260 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2261 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2262
destinyXfate 2:0e2ef1edf01b 2263 #if 0 /* %mm0 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2264 : "%mm0" // clobber list
destinyXfate 2:0e2ef1edf01b 2265 #endif
destinyXfate 2:0e2ef1edf01b 2266 );
destinyXfate 2:0e2ef1edf01b 2267 }
destinyXfate 2:0e2ef1edf01b 2268
destinyXfate 2:0e2ef1edf01b 2269 sptr -= (width_mmx*2 - 2); // sign fixed
destinyXfate 2:0e2ef1edf01b 2270 dp -= (width_mmx*4 - 2); // sign fixed
destinyXfate 2:0e2ef1edf01b 2271 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2272 {
destinyXfate 2:0e2ef1edf01b 2273 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2274 int j;
destinyXfate 2:0e2ef1edf01b 2275 sptr -= 2;
destinyXfate 2:0e2ef1edf01b 2276 png_memcpy(v, sptr, 2);
destinyXfate 2:0e2ef1edf01b 2277 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2278 {
destinyXfate 2:0e2ef1edf01b 2279 dp -= 2;
destinyXfate 2:0e2ef1edf01b 2280 png_memcpy(dp, v, 2);
destinyXfate 2:0e2ef1edf01b 2281 }
destinyXfate 2:0e2ef1edf01b 2282 }
destinyXfate 2:0e2ef1edf01b 2283 }
destinyXfate 2:0e2ef1edf01b 2284 } /* end of pixel_bytes == 2 */
destinyXfate 2:0e2ef1edf01b 2285
destinyXfate 2:0e2ef1edf01b 2286 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 2287 else if (pixel_bytes == 4)
destinyXfate 2:0e2ef1edf01b 2288 {
destinyXfate 2:0e2ef1edf01b 2289 if (((pass == 0) || (pass == 1)) && width)
destinyXfate 2:0e2ef1edf01b 2290 {
destinyXfate 2:0e2ef1edf01b 2291 int width_mmx = ((width >> 1) << 1);
destinyXfate 2:0e2ef1edf01b 2292 width -= width_mmx; // 0,1 pixels => 0,4 bytes
destinyXfate 2:0e2ef1edf01b 2293 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2294 {
destinyXfate 2:0e2ef1edf01b 2295 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2296 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2297 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2298
destinyXfate 2:0e2ef1edf01b 2299 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2300 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2301 "subl $60, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2302
destinyXfate 2:0e2ef1edf01b 2303 ".loop4_pass0: \n\t"
destinyXfate 2:0e2ef1edf01b 2304 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2305 "movq %%mm0, %%mm1 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2306 "punpckldq %%mm0, %%mm0 \n\t" // 3 2 1 0 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2307 "punpckhdq %%mm1, %%mm1 \n\t" // 7 6 5 4 7 6 5 4
destinyXfate 2:0e2ef1edf01b 2308 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2309 "movq %%mm0, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2310 "movq %%mm0, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2311 "movq %%mm0, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2312 "movq %%mm1, 32(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2313 "movq %%mm1, 40(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2314 "movq %%mm1, 48(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2315 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2316 "movq %%mm1, 56(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2317 "subl $64, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2318 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2319 "jnz .loop4_pass0 \n\t"
destinyXfate 2:0e2ef1edf01b 2320 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2321
destinyXfate 2:0e2ef1edf01b 2322 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2323 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2324 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2325
destinyXfate 2:0e2ef1edf01b 2326 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2327 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2328 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2329
destinyXfate 2:0e2ef1edf01b 2330 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2331 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2332 #endif
destinyXfate 2:0e2ef1edf01b 2333 );
destinyXfate 2:0e2ef1edf01b 2334 }
destinyXfate 2:0e2ef1edf01b 2335
destinyXfate 2:0e2ef1edf01b 2336 sptr -= (width_mmx*4 - 4); // sign fixed
destinyXfate 2:0e2ef1edf01b 2337 dp -= (width_mmx*32 - 4); // sign fixed
destinyXfate 2:0e2ef1edf01b 2338 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2339 {
destinyXfate 2:0e2ef1edf01b 2340 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2341 int j;
destinyXfate 2:0e2ef1edf01b 2342 sptr -= 4;
destinyXfate 2:0e2ef1edf01b 2343 png_memcpy(v, sptr, 4);
destinyXfate 2:0e2ef1edf01b 2344 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2345 {
destinyXfate 2:0e2ef1edf01b 2346 dp -= 4;
destinyXfate 2:0e2ef1edf01b 2347 png_memcpy(dp, v, 4);
destinyXfate 2:0e2ef1edf01b 2348 }
destinyXfate 2:0e2ef1edf01b 2349 }
destinyXfate 2:0e2ef1edf01b 2350 }
destinyXfate 2:0e2ef1edf01b 2351 else if (((pass == 2) || (pass == 3)) && width)
destinyXfate 2:0e2ef1edf01b 2352 {
destinyXfate 2:0e2ef1edf01b 2353 int width_mmx = ((width >> 1) << 1);
destinyXfate 2:0e2ef1edf01b 2354 width -= width_mmx; // 0,1 pixels => 0,4 bytes
destinyXfate 2:0e2ef1edf01b 2355 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2356 {
destinyXfate 2:0e2ef1edf01b 2357 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2358 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2359 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2360
destinyXfate 2:0e2ef1edf01b 2361 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2362 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2363 "subl $28, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2364
destinyXfate 2:0e2ef1edf01b 2365 ".loop4_pass2: \n\t"
destinyXfate 2:0e2ef1edf01b 2366 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2367 "movq %%mm0, %%mm1 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2368 "punpckldq %%mm0, %%mm0 \n\t" // 3 2 1 0 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2369 "punpckhdq %%mm1, %%mm1 \n\t" // 7 6 5 4 7 6 5 4
destinyXfate 2:0e2ef1edf01b 2370 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2371 "movq %%mm0, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2372 "movq %%mm1, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2373 "movq %%mm1, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2374 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2375 "subl $32, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2376 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2377 "jnz .loop4_pass2 \n\t"
destinyXfate 2:0e2ef1edf01b 2378 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2379
destinyXfate 2:0e2ef1edf01b 2380 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2381 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2382 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2383
destinyXfate 2:0e2ef1edf01b 2384 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2385 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2386 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2387
destinyXfate 2:0e2ef1edf01b 2388 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2389 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2390 #endif
destinyXfate 2:0e2ef1edf01b 2391 );
destinyXfate 2:0e2ef1edf01b 2392 }
destinyXfate 2:0e2ef1edf01b 2393
destinyXfate 2:0e2ef1edf01b 2394 sptr -= (width_mmx*4 - 4); // sign fixed
destinyXfate 2:0e2ef1edf01b 2395 dp -= (width_mmx*16 - 4); // sign fixed
destinyXfate 2:0e2ef1edf01b 2396 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2397 {
destinyXfate 2:0e2ef1edf01b 2398 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2399 int j;
destinyXfate 2:0e2ef1edf01b 2400 sptr -= 4;
destinyXfate 2:0e2ef1edf01b 2401 png_memcpy(v, sptr, 4);
destinyXfate 2:0e2ef1edf01b 2402 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2403 {
destinyXfate 2:0e2ef1edf01b 2404 dp -= 4;
destinyXfate 2:0e2ef1edf01b 2405 png_memcpy(dp, v, 4);
destinyXfate 2:0e2ef1edf01b 2406 }
destinyXfate 2:0e2ef1edf01b 2407 }
destinyXfate 2:0e2ef1edf01b 2408 }
destinyXfate 2:0e2ef1edf01b 2409 else if (width) // pass == 4 or 5
destinyXfate 2:0e2ef1edf01b 2410 {
destinyXfate 2:0e2ef1edf01b 2411 int width_mmx = ((width >> 1) << 1) ;
destinyXfate 2:0e2ef1edf01b 2412 width -= width_mmx; // 0,1 pixels => 0,4 bytes
destinyXfate 2:0e2ef1edf01b 2413 if (width_mmx)
destinyXfate 2:0e2ef1edf01b 2414 {
destinyXfate 2:0e2ef1edf01b 2415 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2416 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2417 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2418
destinyXfate 2:0e2ef1edf01b 2419 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2420 "subl $4, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2421 "subl $12, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2422
destinyXfate 2:0e2ef1edf01b 2423 ".loop4_pass4: \n\t"
destinyXfate 2:0e2ef1edf01b 2424 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2425 "movq %%mm0, %%mm1 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2426 "punpckldq %%mm0, %%mm0 \n\t" // 3 2 1 0 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2427 "punpckhdq %%mm1, %%mm1 \n\t" // 7 6 5 4 7 6 5 4
destinyXfate 2:0e2ef1edf01b 2428 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2429 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2430 "movq %%mm1, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2431 "subl $16, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2432 "subl $2, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2433 "jnz .loop4_pass4 \n\t"
destinyXfate 2:0e2ef1edf01b 2434 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2435
destinyXfate 2:0e2ef1edf01b 2436 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2437 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2438 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2439
destinyXfate 2:0e2ef1edf01b 2440 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2441 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2442 "0" (width_mmx) // ecx
destinyXfate 2:0e2ef1edf01b 2443
destinyXfate 2:0e2ef1edf01b 2444 #if 0 /* %mm0, %mm1 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2445 : "%mm0", "%mm1" // clobber list
destinyXfate 2:0e2ef1edf01b 2446 #endif
destinyXfate 2:0e2ef1edf01b 2447 );
destinyXfate 2:0e2ef1edf01b 2448 }
destinyXfate 2:0e2ef1edf01b 2449
destinyXfate 2:0e2ef1edf01b 2450 sptr -= (width_mmx*4 - 4); // sign fixed
destinyXfate 2:0e2ef1edf01b 2451 dp -= (width_mmx*8 - 4); // sign fixed
destinyXfate 2:0e2ef1edf01b 2452 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2453 {
destinyXfate 2:0e2ef1edf01b 2454 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2455 int j;
destinyXfate 2:0e2ef1edf01b 2456 sptr -= 4;
destinyXfate 2:0e2ef1edf01b 2457 png_memcpy(v, sptr, 4);
destinyXfate 2:0e2ef1edf01b 2458 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2459 {
destinyXfate 2:0e2ef1edf01b 2460 dp -= 4;
destinyXfate 2:0e2ef1edf01b 2461 png_memcpy(dp, v, 4);
destinyXfate 2:0e2ef1edf01b 2462 }
destinyXfate 2:0e2ef1edf01b 2463 }
destinyXfate 2:0e2ef1edf01b 2464 }
destinyXfate 2:0e2ef1edf01b 2465 } /* end of pixel_bytes == 4 */
destinyXfate 2:0e2ef1edf01b 2466
destinyXfate 2:0e2ef1edf01b 2467 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 2468 else if (pixel_bytes == 8)
destinyXfate 2:0e2ef1edf01b 2469 {
destinyXfate 2:0e2ef1edf01b 2470 // GRR TEST: should work, but needs testing (special 64-bit version of rpng2?)
destinyXfate 2:0e2ef1edf01b 2471 // GRR NOTE: no need to combine passes here!
destinyXfate 2:0e2ef1edf01b 2472 if (((pass == 0) || (pass == 1)) && width)
destinyXfate 2:0e2ef1edf01b 2473 {
destinyXfate 2:0e2ef1edf01b 2474 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2475 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2476 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2477
destinyXfate 2:0e2ef1edf01b 2478 // source is 8-byte RRGGBBAA
destinyXfate 2:0e2ef1edf01b 2479 // dest is 64-byte RRGGBBAA RRGGBBAA RRGGBBAA RRGGBBAA ...
destinyXfate 2:0e2ef1edf01b 2480 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2481 "subl $56, %%edi \n\t" // start of last block
destinyXfate 2:0e2ef1edf01b 2482
destinyXfate 2:0e2ef1edf01b 2483 ".loop8_pass0: \n\t"
destinyXfate 2:0e2ef1edf01b 2484 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2485 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2486 "movq %%mm0, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2487 "movq %%mm0, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2488 "movq %%mm0, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2489 "movq %%mm0, 32(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2490 "movq %%mm0, 40(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2491 "movq %%mm0, 48(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2492 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2493 "movq %%mm0, 56(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2494 "subl $64, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2495 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2496 "jnz .loop8_pass0 \n\t"
destinyXfate 2:0e2ef1edf01b 2497 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2498
destinyXfate 2:0e2ef1edf01b 2499 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2500 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2501 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2502
destinyXfate 2:0e2ef1edf01b 2503 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2504 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2505 "0" (width) // ecx
destinyXfate 2:0e2ef1edf01b 2506
destinyXfate 2:0e2ef1edf01b 2507 #if 0 /* %mm0 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2508 : "%mm0" // clobber list
destinyXfate 2:0e2ef1edf01b 2509 #endif
destinyXfate 2:0e2ef1edf01b 2510 );
destinyXfate 2:0e2ef1edf01b 2511 }
destinyXfate 2:0e2ef1edf01b 2512 else if (((pass == 2) || (pass == 3)) && width)
destinyXfate 2:0e2ef1edf01b 2513 {
destinyXfate 2:0e2ef1edf01b 2514 // source is 8-byte RRGGBBAA
destinyXfate 2:0e2ef1edf01b 2515 // dest is 32-byte RRGGBBAA RRGGBBAA RRGGBBAA RRGGBBAA
destinyXfate 2:0e2ef1edf01b 2516 // (recall that expansion is _in place_: sptr and dp
destinyXfate 2:0e2ef1edf01b 2517 // both point at locations within same row buffer)
destinyXfate 2:0e2ef1edf01b 2518 {
destinyXfate 2:0e2ef1edf01b 2519 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2520 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2521 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2522
destinyXfate 2:0e2ef1edf01b 2523 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2524 "subl $24, %%edi \n\t" // start of last block
destinyXfate 2:0e2ef1edf01b 2525
destinyXfate 2:0e2ef1edf01b 2526 ".loop8_pass2: \n\t"
destinyXfate 2:0e2ef1edf01b 2527 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2528 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2529 "movq %%mm0, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2530 "movq %%mm0, 16(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2531 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2532 "movq %%mm0, 24(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2533 "subl $32, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2534 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2535 "jnz .loop8_pass2 \n\t"
destinyXfate 2:0e2ef1edf01b 2536 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2537
destinyXfate 2:0e2ef1edf01b 2538 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2539 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2540 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2541
destinyXfate 2:0e2ef1edf01b 2542 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2543 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2544 "0" (width) // ecx
destinyXfate 2:0e2ef1edf01b 2545
destinyXfate 2:0e2ef1edf01b 2546 #if 0 /* %mm0 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2547 : "%mm0" // clobber list
destinyXfate 2:0e2ef1edf01b 2548 #endif
destinyXfate 2:0e2ef1edf01b 2549 );
destinyXfate 2:0e2ef1edf01b 2550 }
destinyXfate 2:0e2ef1edf01b 2551 }
destinyXfate 2:0e2ef1edf01b 2552 else if (width) // pass == 4 or 5
destinyXfate 2:0e2ef1edf01b 2553 {
destinyXfate 2:0e2ef1edf01b 2554 // source is 8-byte RRGGBBAA
destinyXfate 2:0e2ef1edf01b 2555 // dest is 16-byte RRGGBBAA RRGGBBAA
destinyXfate 2:0e2ef1edf01b 2556 {
destinyXfate 2:0e2ef1edf01b 2557 int dummy_value_c; // fix 'forbidden register spilled'
destinyXfate 2:0e2ef1edf01b 2558 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2559 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2560
destinyXfate 2:0e2ef1edf01b 2561 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2562 "subl $8, %%edi \n\t" // start of last block
destinyXfate 2:0e2ef1edf01b 2563
destinyXfate 2:0e2ef1edf01b 2564 ".loop8_pass4: \n\t"
destinyXfate 2:0e2ef1edf01b 2565 "movq (%%esi), %%mm0 \n\t" // 7 6 5 4 3 2 1 0
destinyXfate 2:0e2ef1edf01b 2566 "movq %%mm0, (%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2567 "subl $8, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 2568 "movq %%mm0, 8(%%edi) \n\t"
destinyXfate 2:0e2ef1edf01b 2569 "subl $16, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 2570 "decl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2571 "jnz .loop8_pass4 \n\t"
destinyXfate 2:0e2ef1edf01b 2572 "EMMS \n\t" // DONE
destinyXfate 2:0e2ef1edf01b 2573
destinyXfate 2:0e2ef1edf01b 2574 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2575 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2576 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2577
destinyXfate 2:0e2ef1edf01b 2578 : "1" (sptr), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2579 "2" (dp), // edi
destinyXfate 2:0e2ef1edf01b 2580 "0" (width) // ecx
destinyXfate 2:0e2ef1edf01b 2581
destinyXfate 2:0e2ef1edf01b 2582 #if 0 /* %mm0 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2583 : "%mm0" // clobber list
destinyXfate 2:0e2ef1edf01b 2584 #endif
destinyXfate 2:0e2ef1edf01b 2585 );
destinyXfate 2:0e2ef1edf01b 2586 }
destinyXfate 2:0e2ef1edf01b 2587 }
destinyXfate 2:0e2ef1edf01b 2588
destinyXfate 2:0e2ef1edf01b 2589 } /* end of pixel_bytes == 8 */
destinyXfate 2:0e2ef1edf01b 2590
destinyXfate 2:0e2ef1edf01b 2591 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 2592 else if (pixel_bytes == 6)
destinyXfate 2:0e2ef1edf01b 2593 {
destinyXfate 2:0e2ef1edf01b 2594 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2595 {
destinyXfate 2:0e2ef1edf01b 2596 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2597 int j;
destinyXfate 2:0e2ef1edf01b 2598 png_memcpy(v, sptr, 6);
destinyXfate 2:0e2ef1edf01b 2599 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2600 {
destinyXfate 2:0e2ef1edf01b 2601 png_memcpy(dp, v, 6);
destinyXfate 2:0e2ef1edf01b 2602 dp -= 6;
destinyXfate 2:0e2ef1edf01b 2603 }
destinyXfate 2:0e2ef1edf01b 2604 sptr -= 6;
destinyXfate 2:0e2ef1edf01b 2605 }
destinyXfate 2:0e2ef1edf01b 2606 } /* end of pixel_bytes == 6 */
destinyXfate 2:0e2ef1edf01b 2607
destinyXfate 2:0e2ef1edf01b 2608 //--------------------------------------------------------------
destinyXfate 2:0e2ef1edf01b 2609 else
destinyXfate 2:0e2ef1edf01b 2610 {
destinyXfate 2:0e2ef1edf01b 2611 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2612 {
destinyXfate 2:0e2ef1edf01b 2613 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2614 int j;
destinyXfate 2:0e2ef1edf01b 2615 png_memcpy(v, sptr, pixel_bytes);
destinyXfate 2:0e2ef1edf01b 2616 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2617 {
destinyXfate 2:0e2ef1edf01b 2618 png_memcpy(dp, v, pixel_bytes);
destinyXfate 2:0e2ef1edf01b 2619 dp -= pixel_bytes;
destinyXfate 2:0e2ef1edf01b 2620 }
destinyXfate 2:0e2ef1edf01b 2621 sptr-= pixel_bytes;
destinyXfate 2:0e2ef1edf01b 2622 }
destinyXfate 2:0e2ef1edf01b 2623 }
destinyXfate 2:0e2ef1edf01b 2624 } // end of _mmx_supported ========================================
destinyXfate 2:0e2ef1edf01b 2625
destinyXfate 2:0e2ef1edf01b 2626 else /* MMX not supported: use modified C code - takes advantage
destinyXfate 2:0e2ef1edf01b 2627 * of inlining of png_memcpy for a constant */
destinyXfate 2:0e2ef1edf01b 2628 /* GRR 19991007: does it? or should pixel_bytes in each
destinyXfate 2:0e2ef1edf01b 2629 * block be replaced with immediate value (e.g., 1)? */
destinyXfate 2:0e2ef1edf01b 2630 /* GRR 19991017: replaced with constants in each case */
destinyXfate 2:0e2ef1edf01b 2631 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 2632 {
destinyXfate 2:0e2ef1edf01b 2633 if (pixel_bytes == 1)
destinyXfate 2:0e2ef1edf01b 2634 {
destinyXfate 2:0e2ef1edf01b 2635 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2636 {
destinyXfate 2:0e2ef1edf01b 2637 int j;
destinyXfate 2:0e2ef1edf01b 2638 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2639 {
destinyXfate 2:0e2ef1edf01b 2640 *dp-- = *sptr;
destinyXfate 2:0e2ef1edf01b 2641 }
destinyXfate 2:0e2ef1edf01b 2642 --sptr;
destinyXfate 2:0e2ef1edf01b 2643 }
destinyXfate 2:0e2ef1edf01b 2644 }
destinyXfate 2:0e2ef1edf01b 2645 else if (pixel_bytes == 3)
destinyXfate 2:0e2ef1edf01b 2646 {
destinyXfate 2:0e2ef1edf01b 2647 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2648 {
destinyXfate 2:0e2ef1edf01b 2649 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2650 int j;
destinyXfate 2:0e2ef1edf01b 2651 png_memcpy(v, sptr, 3);
destinyXfate 2:0e2ef1edf01b 2652 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2653 {
destinyXfate 2:0e2ef1edf01b 2654 png_memcpy(dp, v, 3);
destinyXfate 2:0e2ef1edf01b 2655 dp -= 3;
destinyXfate 2:0e2ef1edf01b 2656 }
destinyXfate 2:0e2ef1edf01b 2657 sptr -= 3;
destinyXfate 2:0e2ef1edf01b 2658 }
destinyXfate 2:0e2ef1edf01b 2659 }
destinyXfate 2:0e2ef1edf01b 2660 else if (pixel_bytes == 2)
destinyXfate 2:0e2ef1edf01b 2661 {
destinyXfate 2:0e2ef1edf01b 2662 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2663 {
destinyXfate 2:0e2ef1edf01b 2664 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2665 int j;
destinyXfate 2:0e2ef1edf01b 2666 png_memcpy(v, sptr, 2);
destinyXfate 2:0e2ef1edf01b 2667 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2668 {
destinyXfate 2:0e2ef1edf01b 2669 png_memcpy(dp, v, 2);
destinyXfate 2:0e2ef1edf01b 2670 dp -= 2;
destinyXfate 2:0e2ef1edf01b 2671 }
destinyXfate 2:0e2ef1edf01b 2672 sptr -= 2;
destinyXfate 2:0e2ef1edf01b 2673 }
destinyXfate 2:0e2ef1edf01b 2674 }
destinyXfate 2:0e2ef1edf01b 2675 else if (pixel_bytes == 4)
destinyXfate 2:0e2ef1edf01b 2676 {
destinyXfate 2:0e2ef1edf01b 2677 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2678 {
destinyXfate 2:0e2ef1edf01b 2679 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2680 int j;
destinyXfate 2:0e2ef1edf01b 2681 png_memcpy(v, sptr, 4);
destinyXfate 2:0e2ef1edf01b 2682 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2683 {
destinyXfate 2:0e2ef1edf01b 2684 #ifdef PNG_DEBUG
destinyXfate 2:0e2ef1edf01b 2685 if (dp < row || dp+3 > row+png_ptr->row_buf_size)
destinyXfate 2:0e2ef1edf01b 2686 {
destinyXfate 2:0e2ef1edf01b 2687 printf("dp out of bounds: row=%d, dp=%d, rp=%d\n",
destinyXfate 2:0e2ef1edf01b 2688 row, dp, row+png_ptr->row_buf_size);
destinyXfate 2:0e2ef1edf01b 2689 printf("row_buf=%d\n",png_ptr->row_buf_size);
destinyXfate 2:0e2ef1edf01b 2690 }
destinyXfate 2:0e2ef1edf01b 2691 #endif
destinyXfate 2:0e2ef1edf01b 2692 png_memcpy(dp, v, 4);
destinyXfate 2:0e2ef1edf01b 2693 dp -= 4;
destinyXfate 2:0e2ef1edf01b 2694 }
destinyXfate 2:0e2ef1edf01b 2695 sptr -= 4;
destinyXfate 2:0e2ef1edf01b 2696 }
destinyXfate 2:0e2ef1edf01b 2697 }
destinyXfate 2:0e2ef1edf01b 2698 else if (pixel_bytes == 6)
destinyXfate 2:0e2ef1edf01b 2699 {
destinyXfate 2:0e2ef1edf01b 2700 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2701 {
destinyXfate 2:0e2ef1edf01b 2702 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2703 int j;
destinyXfate 2:0e2ef1edf01b 2704 png_memcpy(v, sptr, 6);
destinyXfate 2:0e2ef1edf01b 2705 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2706 {
destinyXfate 2:0e2ef1edf01b 2707 png_memcpy(dp, v, 6);
destinyXfate 2:0e2ef1edf01b 2708 dp -= 6;
destinyXfate 2:0e2ef1edf01b 2709 }
destinyXfate 2:0e2ef1edf01b 2710 sptr -= 6;
destinyXfate 2:0e2ef1edf01b 2711 }
destinyXfate 2:0e2ef1edf01b 2712 }
destinyXfate 2:0e2ef1edf01b 2713 else if (pixel_bytes == 8)
destinyXfate 2:0e2ef1edf01b 2714 {
destinyXfate 2:0e2ef1edf01b 2715 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2716 {
destinyXfate 2:0e2ef1edf01b 2717 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2718 int j;
destinyXfate 2:0e2ef1edf01b 2719 png_memcpy(v, sptr, 8);
destinyXfate 2:0e2ef1edf01b 2720 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2721 {
destinyXfate 2:0e2ef1edf01b 2722 png_memcpy(dp, v, 8);
destinyXfate 2:0e2ef1edf01b 2723 dp -= 8;
destinyXfate 2:0e2ef1edf01b 2724 }
destinyXfate 2:0e2ef1edf01b 2725 sptr -= 8;
destinyXfate 2:0e2ef1edf01b 2726 }
destinyXfate 2:0e2ef1edf01b 2727 }
destinyXfate 2:0e2ef1edf01b 2728 else /* GRR: should never be reached */
destinyXfate 2:0e2ef1edf01b 2729 {
destinyXfate 2:0e2ef1edf01b 2730 for (i = width; i; i--)
destinyXfate 2:0e2ef1edf01b 2731 {
destinyXfate 2:0e2ef1edf01b 2732 png_byte v[8];
destinyXfate 2:0e2ef1edf01b 2733 int j;
destinyXfate 2:0e2ef1edf01b 2734 png_memcpy(v, sptr, pixel_bytes);
destinyXfate 2:0e2ef1edf01b 2735 for (j = 0; j < png_pass_inc[pass]; j++)
destinyXfate 2:0e2ef1edf01b 2736 {
destinyXfate 2:0e2ef1edf01b 2737 png_memcpy(dp, v, pixel_bytes);
destinyXfate 2:0e2ef1edf01b 2738 dp -= pixel_bytes;
destinyXfate 2:0e2ef1edf01b 2739 }
destinyXfate 2:0e2ef1edf01b 2740 sptr -= pixel_bytes;
destinyXfate 2:0e2ef1edf01b 2741 }
destinyXfate 2:0e2ef1edf01b 2742 }
destinyXfate 2:0e2ef1edf01b 2743
destinyXfate 2:0e2ef1edf01b 2744 } /* end if (MMX not supported) */
destinyXfate 2:0e2ef1edf01b 2745 break;
destinyXfate 2:0e2ef1edf01b 2746 }
destinyXfate 2:0e2ef1edf01b 2747 } /* end switch (row_info->pixel_depth) */
destinyXfate 2:0e2ef1edf01b 2748
destinyXfate 2:0e2ef1edf01b 2749 row_info->width = final_width;
destinyXfate 2:0e2ef1edf01b 2750
destinyXfate 2:0e2ef1edf01b 2751 row_info->rowbytes = PNG_ROWBYTES(row_info->pixel_depth,final_width);
destinyXfate 2:0e2ef1edf01b 2752 }
destinyXfate 2:0e2ef1edf01b 2753
destinyXfate 2:0e2ef1edf01b 2754 } /* end png_do_read_interlace() */
destinyXfate 2:0e2ef1edf01b 2755
destinyXfate 2:0e2ef1edf01b 2756 #endif /* PNG_HAVE_MMX_READ_INTERLACE */
destinyXfate 2:0e2ef1edf01b 2757 #endif /* PNG_READ_INTERLACING_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 2758
destinyXfate 2:0e2ef1edf01b 2759
destinyXfate 2:0e2ef1edf01b 2760
destinyXfate 2:0e2ef1edf01b 2761 #if defined(PNG_HAVE_MMX_READ_FILTER_ROW)
destinyXfate 2:0e2ef1edf01b 2762 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 2763
destinyXfate 2:0e2ef1edf01b 2764 // These variables are utilized in the functions below. They are declared
destinyXfate 2:0e2ef1edf01b 2765 // globally here to ensure alignment on 8-byte boundaries.
destinyXfate 2:0e2ef1edf01b 2766
destinyXfate 2:0e2ef1edf01b 2767 union uAll {
destinyXfate 2:0e2ef1edf01b 2768 long long use;
destinyXfate 2:0e2ef1edf01b 2769 double align;
destinyXfate 2:0e2ef1edf01b 2770 } _LBCarryMask = {0x0101010101010101LL},
destinyXfate 2:0e2ef1edf01b 2771 _HBClearMask = {0x7f7f7f7f7f7f7f7fLL},
destinyXfate 2:0e2ef1edf01b 2772 _ActiveMask, _ActiveMask2, _ActiveMaskEnd, _ShiftBpp, _ShiftRem;
destinyXfate 2:0e2ef1edf01b 2773
destinyXfate 2:0e2ef1edf01b 2774 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 2775 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 2776 // //
destinyXfate 2:0e2ef1edf01b 2777 // P N G _ R E A D _ F I L T E R _ R O W _ M M X _ A V G //
destinyXfate 2:0e2ef1edf01b 2778 // //
destinyXfate 2:0e2ef1edf01b 2779 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 2780
destinyXfate 2:0e2ef1edf01b 2781 // Optimized code for PNG Average filter decoder
destinyXfate 2:0e2ef1edf01b 2782
destinyXfate 2:0e2ef1edf01b 2783 static void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 2784 png_read_filter_row_mmx_avg(png_row_infop row_info, png_bytep row,
destinyXfate 2:0e2ef1edf01b 2785 png_bytep prev_row)
destinyXfate 2:0e2ef1edf01b 2786 {
destinyXfate 2:0e2ef1edf01b 2787 int bpp;
destinyXfate 2:0e2ef1edf01b 2788 int dummy_value_c; // fix 'forbidden register 2 (cx) was spilled' error
destinyXfate 2:0e2ef1edf01b 2789 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 2790 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 2791
destinyXfate 2:0e2ef1edf01b 2792 bpp = (row_info->pixel_depth + 7) >> 3; // get # bytes per pixel
destinyXfate 2:0e2ef1edf01b 2793 _FullLength = row_info->rowbytes; // # of bytes to filter
destinyXfate 2:0e2ef1edf01b 2794
destinyXfate 2:0e2ef1edf01b 2795 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2796 // initialize address pointers and offset
destinyXfate 2:0e2ef1edf01b 2797 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 2798 "pushl %%ebx \n\t" // save index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 2799 #endif
destinyXfate 2:0e2ef1edf01b 2800 //pre "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 2801 "xorl %%ebx, %%ebx \n\t" // ebx: x
destinyXfate 2:0e2ef1edf01b 2802 "movl %%edi, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 2803 //pre "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 2804 //pre "subl bpp, %%edx \n\t" // (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 2805 "subl %%ecx, %%edx \n\t" // edx: Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 2806
destinyXfate 2:0e2ef1edf01b 2807 "xorl %%eax,%%eax \n\t"
destinyXfate 2:0e2ef1edf01b 2808
destinyXfate 2:0e2ef1edf01b 2809 // Compute the Raw value for the first bpp bytes
destinyXfate 2:0e2ef1edf01b 2810 // Raw(x) = Avg(x) + (Prior(x)/2)
destinyXfate 2:0e2ef1edf01b 2811 "avg_rlp: \n\t"
destinyXfate 2:0e2ef1edf01b 2812 "movb (%%esi,%%ebx,),%%al \n\t" // load al with Prior(x)
destinyXfate 2:0e2ef1edf01b 2813 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 2814 "shrb %%al \n\t" // divide by 2
destinyXfate 2:0e2ef1edf01b 2815 "addb -1(%%edi,%%ebx,),%%al \n\t" // add Avg(x); -1 to offset inc ebx
destinyXfate 2:0e2ef1edf01b 2816 //pre "cmpl bpp, %%ebx \n\t" // (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 2817 "cmpl %%ecx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 2818 "movb %%al,-1(%%edi,%%ebx,) \n\t" // write Raw(x); -1 to offset inc ebx
destinyXfate 2:0e2ef1edf01b 2819 "jb avg_rlp \n\t" // mov does not affect flags
destinyXfate 2:0e2ef1edf01b 2820
destinyXfate 2:0e2ef1edf01b 2821 // get # of bytes to alignment
destinyXfate 2:0e2ef1edf01b 2822 "movl %%edi, _dif \n\t" // take start of row
destinyXfate 2:0e2ef1edf01b 2823 "addl %%ebx, _dif \n\t" // add bpp
destinyXfate 2:0e2ef1edf01b 2824 "addl $0xf, _dif \n\t" // add 7+8 to incr past alignment bdry
destinyXfate 2:0e2ef1edf01b 2825 "andl $0xfffffff8, _dif \n\t" // mask to alignment boundary
destinyXfate 2:0e2ef1edf01b 2826 "subl %%edi, _dif \n\t" // subtract from start => value ebx at
destinyXfate 2:0e2ef1edf01b 2827 "jz avg_go \n\t" // alignment
destinyXfate 2:0e2ef1edf01b 2828
destinyXfate 2:0e2ef1edf01b 2829 // fix alignment
destinyXfate 2:0e2ef1edf01b 2830 // Compute the Raw value for the bytes up to the alignment boundary
destinyXfate 2:0e2ef1edf01b 2831 // Raw(x) = Avg(x) + ((Raw(x-bpp) + Prior(x))/2)
destinyXfate 2:0e2ef1edf01b 2832 "xorl %%ecx, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2833
destinyXfate 2:0e2ef1edf01b 2834 "avg_lp1: \n\t"
destinyXfate 2:0e2ef1edf01b 2835 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 2836 "movb (%%esi,%%ebx,), %%cl \n\t" // load cl with Prior(x)
destinyXfate 2:0e2ef1edf01b 2837 "movb (%%edx,%%ebx,), %%al \n\t" // load al with Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 2838 "addw %%cx, %%ax \n\t"
destinyXfate 2:0e2ef1edf01b 2839 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 2840 "shrw %%ax \n\t" // divide by 2
destinyXfate 2:0e2ef1edf01b 2841 "addb -1(%%edi,%%ebx,), %%al \n\t" // add Avg(x); -1 to offset inc ebx
destinyXfate 2:0e2ef1edf01b 2842 "cmpl _dif, %%ebx \n\t" // check if at alignment boundary
destinyXfate 2:0e2ef1edf01b 2843 "movb %%al, -1(%%edi,%%ebx,) \n\t" // write Raw(x); -1 to offset inc ebx
destinyXfate 2:0e2ef1edf01b 2844 "jb avg_lp1 \n\t" // repeat until at alignment boundary
destinyXfate 2:0e2ef1edf01b 2845
destinyXfate 2:0e2ef1edf01b 2846 "avg_go: \n\t"
destinyXfate 2:0e2ef1edf01b 2847 "movl _FullLength, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 2848 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2849 "subl %%ebx, %%eax \n\t" // subtract alignment fix
destinyXfate 2:0e2ef1edf01b 2850 "andl $0x00000007, %%eax \n\t" // calc bytes over mult of 8
destinyXfate 2:0e2ef1edf01b 2851 "subl %%eax, %%ecx \n\t" // drop over bytes from original length
destinyXfate 2:0e2ef1edf01b 2852 "movl %%ecx, _MMXLength \n\t"
destinyXfate 2:0e2ef1edf01b 2853 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 2854 "popl %%ebx \n\t" // restore index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 2855 #endif
destinyXfate 2:0e2ef1edf01b 2856
destinyXfate 2:0e2ef1edf01b 2857 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2858 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 2859 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2860
destinyXfate 2:0e2ef1edf01b 2861 : "0" (bpp), // ecx // input regs
destinyXfate 2:0e2ef1edf01b 2862 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 2863 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 2864
destinyXfate 2:0e2ef1edf01b 2865 : "%eax", "%edx" // clobber list
destinyXfate 2:0e2ef1edf01b 2866 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 2867 , "%ebx"
destinyXfate 2:0e2ef1edf01b 2868 #endif
destinyXfate 2:0e2ef1edf01b 2869 // GRR: INCLUDE "memory" as clobbered? (_dif, _MMXLength)
destinyXfate 2:0e2ef1edf01b 2870 // (seems to work fine without...)
destinyXfate 2:0e2ef1edf01b 2871 );
destinyXfate 2:0e2ef1edf01b 2872
destinyXfate 2:0e2ef1edf01b 2873 // now do the math for the rest of the row
destinyXfate 2:0e2ef1edf01b 2874 switch (bpp)
destinyXfate 2:0e2ef1edf01b 2875 {
destinyXfate 2:0e2ef1edf01b 2876 case 3:
destinyXfate 2:0e2ef1edf01b 2877 {
destinyXfate 2:0e2ef1edf01b 2878 _ActiveMask.use = 0x0000000000ffffffLL;
destinyXfate 2:0e2ef1edf01b 2879 _ShiftBpp.use = 24; // == 3 * 8
destinyXfate 2:0e2ef1edf01b 2880 _ShiftRem.use = 40; // == 64 - 24
destinyXfate 2:0e2ef1edf01b 2881
destinyXfate 2:0e2ef1edf01b 2882 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 2883 // re-init address pointers and offset
destinyXfate 2:0e2ef1edf01b 2884 "movq _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 2885 "movl _dif, %%ecx \n\t" // ecx: x = offset to
destinyXfate 2:0e2ef1edf01b 2886 "movq _LBCarryMask, %%mm5 \n\t" // alignment boundary
destinyXfate 2:0e2ef1edf01b 2887 // preload "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 2888 "movq _HBClearMask, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 2889 // preload "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 2890
destinyXfate 2:0e2ef1edf01b 2891 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 2892 "movq -8(%%edi,%%ecx,), %%mm2 \n\t" // load previous aligned 8 bytes
destinyXfate 2:0e2ef1edf01b 2893 // (correct pos. in loop below)
destinyXfate 2:0e2ef1edf01b 2894 "avg_3lp: \n\t"
destinyXfate 2:0e2ef1edf01b 2895 "movq (%%edi,%%ecx,), %%mm0 \n\t" // load mm0 with Avg(x)
destinyXfate 2:0e2ef1edf01b 2896 "movq %%mm5, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 2897 "psrlq _ShiftRem, %%mm2 \n\t" // correct position Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 2898 // data
destinyXfate 2:0e2ef1edf01b 2899 "movq (%%esi,%%ecx,), %%mm1 \n\t" // load mm1 with Prior(x)
destinyXfate 2:0e2ef1edf01b 2900 "movq %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 2901 "pand %%mm1, %%mm3 \n\t" // get lsb for each prev_row byte
destinyXfate 2:0e2ef1edf01b 2902 "psrlq $1, %%mm1 \n\t" // divide prev_row bytes by 2
destinyXfate 2:0e2ef1edf01b 2903 "pand %%mm4, %%mm1 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 2904 // byte
destinyXfate 2:0e2ef1edf01b 2905 "paddb %%mm1, %%mm0 \n\t" // add (Prev_row/2) to Avg for
destinyXfate 2:0e2ef1edf01b 2906 // each byte
destinyXfate 2:0e2ef1edf01b 2907 // add 1st active group (Raw(x-bpp)/2) to average with LBCarry
destinyXfate 2:0e2ef1edf01b 2908 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 2909 // LBCarrys
destinyXfate 2:0e2ef1edf01b 2910 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 2911 // where both
destinyXfate 2:0e2ef1edf01b 2912 // lsb's were == 1 (only valid for active group)
destinyXfate 2:0e2ef1edf01b 2913 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 2914 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 2915 // byte
destinyXfate 2:0e2ef1edf01b 2916 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 2917 // for each byte
destinyXfate 2:0e2ef1edf01b 2918 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 1
destinyXfate 2:0e2ef1edf01b 2919 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 2920 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 2921 // Avg for each Active
destinyXfate 2:0e2ef1edf01b 2922 // byte
destinyXfate 2:0e2ef1edf01b 2923 // add 2nd active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 2924 "psllq _ShiftBpp, %%mm6 \n\t" // shift the mm6 mask to cover
destinyXfate 2:0e2ef1edf01b 2925 // bytes 3-5
destinyXfate 2:0e2ef1edf01b 2926 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 2927 "psllq _ShiftBpp, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 2928 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 2929 // LBCarrys
destinyXfate 2:0e2ef1edf01b 2930 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 2931 // where both
destinyXfate 2:0e2ef1edf01b 2932 // lsb's were == 1 (only valid for active group)
destinyXfate 2:0e2ef1edf01b 2933 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 2934 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 2935 // byte
destinyXfate 2:0e2ef1edf01b 2936 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 2937 // for each byte
destinyXfate 2:0e2ef1edf01b 2938 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 2
destinyXfate 2:0e2ef1edf01b 2939 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 2940 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 2941 // Avg for each Active
destinyXfate 2:0e2ef1edf01b 2942 // byte
destinyXfate 2:0e2ef1edf01b 2943
destinyXfate 2:0e2ef1edf01b 2944 // add 3rd active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 2945 "psllq _ShiftBpp, %%mm6 \n\t" // shift mm6 mask to cover last
destinyXfate 2:0e2ef1edf01b 2946 // two
destinyXfate 2:0e2ef1edf01b 2947 // bytes
destinyXfate 2:0e2ef1edf01b 2948 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 2949 "psllq _ShiftBpp, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 2950 // Data only needs to be shifted once here to
destinyXfate 2:0e2ef1edf01b 2951 // get the correct x-bpp offset.
destinyXfate 2:0e2ef1edf01b 2952 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 2953 // LBCarrys
destinyXfate 2:0e2ef1edf01b 2954 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 2955 // where both
destinyXfate 2:0e2ef1edf01b 2956 // lsb's were == 1 (only valid for active group)
destinyXfate 2:0e2ef1edf01b 2957 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 2958 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 2959 // byte
destinyXfate 2:0e2ef1edf01b 2960 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 2961 // for each byte
destinyXfate 2:0e2ef1edf01b 2962 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 2
destinyXfate 2:0e2ef1edf01b 2963 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 2964 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2965 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 2966 // Avg for each Active
destinyXfate 2:0e2ef1edf01b 2967 // byte
destinyXfate 2:0e2ef1edf01b 2968 // now ready to write back to memory
destinyXfate 2:0e2ef1edf01b 2969 "movq %%mm0, -8(%%edi,%%ecx,) \n\t"
destinyXfate 2:0e2ef1edf01b 2970 // move updated Raw(x) to use as Raw(x-bpp) for next loop
destinyXfate 2:0e2ef1edf01b 2971 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 2972 "movq %%mm0, %%mm2 \n\t" // mov updated Raw(x) to mm2
destinyXfate 2:0e2ef1edf01b 2973 "jb avg_3lp \n\t"
destinyXfate 2:0e2ef1edf01b 2974
destinyXfate 2:0e2ef1edf01b 2975 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 2976 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 2977
destinyXfate 2:0e2ef1edf01b 2978 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 2979 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 2980
destinyXfate 2:0e2ef1edf01b 2981 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 2982 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 2983 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 2984 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 2985 #endif
destinyXfate 2:0e2ef1edf01b 2986 );
destinyXfate 2:0e2ef1edf01b 2987 }
destinyXfate 2:0e2ef1edf01b 2988 break; // end 3 bpp
destinyXfate 2:0e2ef1edf01b 2989
destinyXfate 2:0e2ef1edf01b 2990 case 6:
destinyXfate 2:0e2ef1edf01b 2991 case 4:
destinyXfate 2:0e2ef1edf01b 2992 //case 7: // who wrote this? PNG doesn't support 5 or 7 bytes/pixel
destinyXfate 2:0e2ef1edf01b 2993 //case 5: // GRR BOGUS
destinyXfate 2:0e2ef1edf01b 2994 {
destinyXfate 2:0e2ef1edf01b 2995 _ActiveMask.use = 0xffffffffffffffffLL; // use shift below to clear
destinyXfate 2:0e2ef1edf01b 2996 // appropriate inactive bytes
destinyXfate 2:0e2ef1edf01b 2997 _ShiftBpp.use = bpp << 3;
destinyXfate 2:0e2ef1edf01b 2998 _ShiftRem.use = 64 - _ShiftBpp.use;
destinyXfate 2:0e2ef1edf01b 2999
destinyXfate 2:0e2ef1edf01b 3000 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3001 "movq _HBClearMask, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3002
destinyXfate 2:0e2ef1edf01b 3003 // re-init address pointers and offset
destinyXfate 2:0e2ef1edf01b 3004 "movl _dif, %%ecx \n\t" // ecx: x = offset to
destinyXfate 2:0e2ef1edf01b 3005 // alignment boundary
destinyXfate 2:0e2ef1edf01b 3006
destinyXfate 2:0e2ef1edf01b 3007 // load _ActiveMask and clear all bytes except for 1st active group
destinyXfate 2:0e2ef1edf01b 3008 "movq _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3009 // preload "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 3010 "psrlq _ShiftRem, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3011 // preload "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 3012 "movq %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3013 "movq _LBCarryMask, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3014 "psllq _ShiftBpp, %%mm6 \n\t" // create mask for 2nd active
destinyXfate 2:0e2ef1edf01b 3015 // group
destinyXfate 2:0e2ef1edf01b 3016
destinyXfate 2:0e2ef1edf01b 3017 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 3018 "movq -8(%%edi,%%ecx,), %%mm2 \n\t" // load previous aligned 8 bytes
destinyXfate 2:0e2ef1edf01b 3019 // (we correct pos. in loop below)
destinyXfate 2:0e2ef1edf01b 3020 "avg_4lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3021 "movq (%%edi,%%ecx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3022 "psrlq _ShiftRem, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 3023 "movq (%%esi,%%ecx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3024 // add (Prev_row/2) to average
destinyXfate 2:0e2ef1edf01b 3025 "movq %%mm5, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3026 "pand %%mm1, %%mm3 \n\t" // get lsb for each prev_row byte
destinyXfate 2:0e2ef1edf01b 3027 "psrlq $1, %%mm1 \n\t" // divide prev_row bytes by 2
destinyXfate 2:0e2ef1edf01b 3028 "pand %%mm4, %%mm1 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3029 // byte
destinyXfate 2:0e2ef1edf01b 3030 "paddb %%mm1, %%mm0 \n\t" // add (Prev_row/2) to Avg for
destinyXfate 2:0e2ef1edf01b 3031 // each byte
destinyXfate 2:0e2ef1edf01b 3032 // add 1st active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 3033 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 3034 // LBCarrys
destinyXfate 2:0e2ef1edf01b 3035 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3036 // where both
destinyXfate 2:0e2ef1edf01b 3037 // lsb's were == 1 (only valid for active group)
destinyXfate 2:0e2ef1edf01b 3038 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3039 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3040 // byte
destinyXfate 2:0e2ef1edf01b 3041 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 3042 // for each byte
destinyXfate 2:0e2ef1edf01b 3043 "pand %%mm7, %%mm2 \n\t" // leave only Active Group 1
destinyXfate 2:0e2ef1edf01b 3044 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 3045 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to Avg
destinyXfate 2:0e2ef1edf01b 3046 // for each Active
destinyXfate 2:0e2ef1edf01b 3047 // byte
destinyXfate 2:0e2ef1edf01b 3048 // add 2nd active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 3049 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 3050 "psllq _ShiftBpp, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 3051 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3052 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 3053 // LBCarrys
destinyXfate 2:0e2ef1edf01b 3054 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3055 // where both
destinyXfate 2:0e2ef1edf01b 3056 // lsb's were == 1 (only valid for active group)
destinyXfate 2:0e2ef1edf01b 3057 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3058 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3059 // byte
destinyXfate 2:0e2ef1edf01b 3060 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 3061 // for each byte
destinyXfate 2:0e2ef1edf01b 3062 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 2
destinyXfate 2:0e2ef1edf01b 3063 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 3064 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 3065 // Avg for each Active
destinyXfate 2:0e2ef1edf01b 3066 // byte
destinyXfate 2:0e2ef1edf01b 3067 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3068 // now ready to write back to memory
destinyXfate 2:0e2ef1edf01b 3069 "movq %%mm0, -8(%%edi,%%ecx,) \n\t"
destinyXfate 2:0e2ef1edf01b 3070 // prep Raw(x-bpp) for next loop
destinyXfate 2:0e2ef1edf01b 3071 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 3072 "jb avg_4lp \n\t"
destinyXfate 2:0e2ef1edf01b 3073
destinyXfate 2:0e2ef1edf01b 3074 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3075 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3076
destinyXfate 2:0e2ef1edf01b 3077 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 3078 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 3079
destinyXfate 2:0e2ef1edf01b 3080 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 3081 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 3082 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 3083 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 3084 #endif
destinyXfate 2:0e2ef1edf01b 3085 );
destinyXfate 2:0e2ef1edf01b 3086 }
destinyXfate 2:0e2ef1edf01b 3087 break; // end 4,6 bpp
destinyXfate 2:0e2ef1edf01b 3088
destinyXfate 2:0e2ef1edf01b 3089 case 2:
destinyXfate 2:0e2ef1edf01b 3090 {
destinyXfate 2:0e2ef1edf01b 3091 _ActiveMask.use = 0x000000000000ffffLL;
destinyXfate 2:0e2ef1edf01b 3092 _ShiftBpp.use = 16; // == 2 * 8
destinyXfate 2:0e2ef1edf01b 3093 _ShiftRem.use = 48; // == 64 - 16
destinyXfate 2:0e2ef1edf01b 3094
destinyXfate 2:0e2ef1edf01b 3095 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3096 // load _ActiveMask
destinyXfate 2:0e2ef1edf01b 3097 "movq _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3098 // re-init address pointers and offset
destinyXfate 2:0e2ef1edf01b 3099 "movl _dif, %%ecx \n\t" // ecx: x = offset to alignment
destinyXfate 2:0e2ef1edf01b 3100 // boundary
destinyXfate 2:0e2ef1edf01b 3101 "movq _LBCarryMask, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3102 // preload "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 3103 "movq _HBClearMask, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3104 // preload "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 3105
destinyXfate 2:0e2ef1edf01b 3106 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 3107 "movq -8(%%edi,%%ecx,), %%mm2 \n\t" // load previous aligned 8 bytes
destinyXfate 2:0e2ef1edf01b 3108 // (we correct pos. in loop below)
destinyXfate 2:0e2ef1edf01b 3109 "avg_2lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3110 "movq (%%edi,%%ecx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3111 "psrlq _ShiftRem, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 3112 "movq (%%esi,%%ecx,), %%mm1 \n\t" // (GRR BUGFIX: was psllq)
destinyXfate 2:0e2ef1edf01b 3113 // add (Prev_row/2) to average
destinyXfate 2:0e2ef1edf01b 3114 "movq %%mm5, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3115 "pand %%mm1, %%mm3 \n\t" // get lsb for each prev_row byte
destinyXfate 2:0e2ef1edf01b 3116 "psrlq $1, %%mm1 \n\t" // divide prev_row bytes by 2
destinyXfate 2:0e2ef1edf01b 3117 "pand %%mm4, %%mm1 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3118 // byte
destinyXfate 2:0e2ef1edf01b 3119 "movq %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3120 "paddb %%mm1, %%mm0 \n\t" // add (Prev_row/2) to Avg for
destinyXfate 2:0e2ef1edf01b 3121 // each byte
destinyXfate 2:0e2ef1edf01b 3122
destinyXfate 2:0e2ef1edf01b 3123 // add 1st active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 3124 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 3125 // LBCarrys
destinyXfate 2:0e2ef1edf01b 3126 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3127 // where both
destinyXfate 2:0e2ef1edf01b 3128 // lsb's were == 1 (only valid
destinyXfate 2:0e2ef1edf01b 3129 // for active group)
destinyXfate 2:0e2ef1edf01b 3130 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3131 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3132 // byte
destinyXfate 2:0e2ef1edf01b 3133 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 3134 // for each byte
destinyXfate 2:0e2ef1edf01b 3135 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 1
destinyXfate 2:0e2ef1edf01b 3136 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 3137 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to Avg
destinyXfate 2:0e2ef1edf01b 3138 // for each Active byte
destinyXfate 2:0e2ef1edf01b 3139
destinyXfate 2:0e2ef1edf01b 3140 // add 2nd active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 3141 "psllq _ShiftBpp, %%mm6 \n\t" // shift the mm6 mask to cover
destinyXfate 2:0e2ef1edf01b 3142 // bytes 2 & 3
destinyXfate 2:0e2ef1edf01b 3143 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 3144 "psllq _ShiftBpp, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 3145 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 3146 // LBCarrys
destinyXfate 2:0e2ef1edf01b 3147 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3148 // where both
destinyXfate 2:0e2ef1edf01b 3149 // lsb's were == 1 (only valid
destinyXfate 2:0e2ef1edf01b 3150 // for active group)
destinyXfate 2:0e2ef1edf01b 3151 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3152 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3153 // byte
destinyXfate 2:0e2ef1edf01b 3154 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 3155 // for each byte
destinyXfate 2:0e2ef1edf01b 3156 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 2
destinyXfate 2:0e2ef1edf01b 3157 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 3158 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 3159 // Avg for each Active byte
destinyXfate 2:0e2ef1edf01b 3160
destinyXfate 2:0e2ef1edf01b 3161 // add 3rd active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 3162 "psllq _ShiftBpp, %%mm6 \n\t" // shift the mm6 mask to cover
destinyXfate 2:0e2ef1edf01b 3163 // bytes 4 & 5
destinyXfate 2:0e2ef1edf01b 3164 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 3165 "psllq _ShiftBpp, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 3166 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 3167 // LBCarrys
destinyXfate 2:0e2ef1edf01b 3168 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3169 // where both lsb's were == 1
destinyXfate 2:0e2ef1edf01b 3170 // (only valid for active group)
destinyXfate 2:0e2ef1edf01b 3171 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3172 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3173 // byte
destinyXfate 2:0e2ef1edf01b 3174 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 3175 // for each byte
destinyXfate 2:0e2ef1edf01b 3176 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 2
destinyXfate 2:0e2ef1edf01b 3177 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 3178 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 3179 // Avg for each Active byte
destinyXfate 2:0e2ef1edf01b 3180
destinyXfate 2:0e2ef1edf01b 3181 // add 4th active group (Raw(x-bpp)/2) to average with _LBCarry
destinyXfate 2:0e2ef1edf01b 3182 "psllq _ShiftBpp, %%mm6 \n\t" // shift the mm6 mask to cover
destinyXfate 2:0e2ef1edf01b 3183 // bytes 6 & 7
destinyXfate 2:0e2ef1edf01b 3184 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 3185 "psllq _ShiftBpp, %%mm2 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 3186 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3187 "movq %%mm3, %%mm1 \n\t" // now use mm1 for getting
destinyXfate 2:0e2ef1edf01b 3188 // LBCarrys
destinyXfate 2:0e2ef1edf01b 3189 "pand %%mm2, %%mm1 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3190 // where both
destinyXfate 2:0e2ef1edf01b 3191 // lsb's were == 1 (only valid
destinyXfate 2:0e2ef1edf01b 3192 // for active group)
destinyXfate 2:0e2ef1edf01b 3193 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3194 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3195 // byte
destinyXfate 2:0e2ef1edf01b 3196 "paddb %%mm1, %%mm2 \n\t" // add LBCarrys to (Raw(x-bpp)/2)
destinyXfate 2:0e2ef1edf01b 3197 // for each byte
destinyXfate 2:0e2ef1edf01b 3198 "pand %%mm6, %%mm2 \n\t" // leave only Active Group 2
destinyXfate 2:0e2ef1edf01b 3199 // bytes to add to Avg
destinyXfate 2:0e2ef1edf01b 3200 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) + LBCarrys to
destinyXfate 2:0e2ef1edf01b 3201 // Avg for each Active byte
destinyXfate 2:0e2ef1edf01b 3202
destinyXfate 2:0e2ef1edf01b 3203 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3204 // now ready to write back to memory
destinyXfate 2:0e2ef1edf01b 3205 "movq %%mm0, -8(%%edi,%%ecx,) \n\t"
destinyXfate 2:0e2ef1edf01b 3206 // prep Raw(x-bpp) for next loop
destinyXfate 2:0e2ef1edf01b 3207 "movq %%mm0, %%mm2 \n\t" // mov updated Raws to mm2
destinyXfate 2:0e2ef1edf01b 3208 "jb avg_2lp \n\t"
destinyXfate 2:0e2ef1edf01b 3209
destinyXfate 2:0e2ef1edf01b 3210 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3211 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3212
destinyXfate 2:0e2ef1edf01b 3213 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 3214 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 3215
destinyXfate 2:0e2ef1edf01b 3216 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 3217 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 3218 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 3219 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 3220 #endif
destinyXfate 2:0e2ef1edf01b 3221 );
destinyXfate 2:0e2ef1edf01b 3222 }
destinyXfate 2:0e2ef1edf01b 3223 break; // end 2 bpp
destinyXfate 2:0e2ef1edf01b 3224
destinyXfate 2:0e2ef1edf01b 3225 case 1:
destinyXfate 2:0e2ef1edf01b 3226 {
destinyXfate 2:0e2ef1edf01b 3227 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3228 // re-init address pointers and offset
destinyXfate 2:0e2ef1edf01b 3229 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 3230 "pushl %%ebx \n\t" // save Global Offset Table index
destinyXfate 2:0e2ef1edf01b 3231 #endif
destinyXfate 2:0e2ef1edf01b 3232 "movl _dif, %%ebx \n\t" // ebx: x = offset to alignment
destinyXfate 2:0e2ef1edf01b 3233 // boundary
destinyXfate 2:0e2ef1edf01b 3234 // preload "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 3235 "cmpl _FullLength, %%ebx \n\t" // test if offset at end of array
destinyXfate 2:0e2ef1edf01b 3236 "jnb avg_1end \n\t"
destinyXfate 2:0e2ef1edf01b 3237 // do Paeth decode for remaining bytes
destinyXfate 2:0e2ef1edf01b 3238 // preload "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 3239 "movl %%edi, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 3240 // preload "subl bpp, %%edx \n\t" // (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 3241 "subl %%ecx, %%edx \n\t" // edx: Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3242 "xorl %%ecx, %%ecx \n\t" // zero ecx before using cl & cx
destinyXfate 2:0e2ef1edf01b 3243 // in loop below
destinyXfate 2:0e2ef1edf01b 3244 "avg_1lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3245 // Raw(x) = Avg(x) + ((Raw(x-bpp) + Prior(x))/2)
destinyXfate 2:0e2ef1edf01b 3246 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3247 "movb (%%esi,%%ebx,), %%cl \n\t" // load cl with Prior(x)
destinyXfate 2:0e2ef1edf01b 3248 "movb (%%edx,%%ebx,), %%al \n\t" // load al with Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3249 "addw %%cx, %%ax \n\t"
destinyXfate 2:0e2ef1edf01b 3250 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3251 "shrw %%ax \n\t" // divide by 2
destinyXfate 2:0e2ef1edf01b 3252 "addb -1(%%edi,%%ebx,), %%al \n\t" // add Avg(x); -1 to offset
destinyXfate 2:0e2ef1edf01b 3253 // inc ebx
destinyXfate 2:0e2ef1edf01b 3254 "cmpl _FullLength, %%ebx \n\t" // check if at end of array
destinyXfate 2:0e2ef1edf01b 3255 "movb %%al, -1(%%edi,%%ebx,) \n\t" // write back Raw(x);
destinyXfate 2:0e2ef1edf01b 3256 // mov does not affect flags; -1 to offset inc ebx
destinyXfate 2:0e2ef1edf01b 3257 "jb avg_1lp \n\t"
destinyXfate 2:0e2ef1edf01b 3258
destinyXfate 2:0e2ef1edf01b 3259 "avg_1end: \n\t"
destinyXfate 2:0e2ef1edf01b 3260 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 3261 "popl %%ebx \n\t" // Global Offset Table index
destinyXfate 2:0e2ef1edf01b 3262 #endif
destinyXfate 2:0e2ef1edf01b 3263
destinyXfate 2:0e2ef1edf01b 3264 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3265 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 3266 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3267
destinyXfate 2:0e2ef1edf01b 3268 : "0" (bpp), // ecx // input regs
destinyXfate 2:0e2ef1edf01b 3269 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 3270 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 3271
destinyXfate 2:0e2ef1edf01b 3272 : "%eax", "%edx" // clobber list
destinyXfate 2:0e2ef1edf01b 3273 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 3274 , "%ebx"
destinyXfate 2:0e2ef1edf01b 3275 #endif
destinyXfate 2:0e2ef1edf01b 3276 );
destinyXfate 2:0e2ef1edf01b 3277 }
destinyXfate 2:0e2ef1edf01b 3278 return; // end 1 bpp
destinyXfate 2:0e2ef1edf01b 3279
destinyXfate 2:0e2ef1edf01b 3280 case 8:
destinyXfate 2:0e2ef1edf01b 3281 {
destinyXfate 2:0e2ef1edf01b 3282 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3283 // re-init address pointers and offset
destinyXfate 2:0e2ef1edf01b 3284 "movl _dif, %%ecx \n\t" // ecx: x == offset to alignment
destinyXfate 2:0e2ef1edf01b 3285 "movq _LBCarryMask, %%mm5 \n\t" // boundary
destinyXfate 2:0e2ef1edf01b 3286 // preload "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 3287 "movq _HBClearMask, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3288 // preload "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 3289
destinyXfate 2:0e2ef1edf01b 3290 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 3291 "movq -8(%%edi,%%ecx,), %%mm2 \n\t" // load previous aligned 8 bytes
destinyXfate 2:0e2ef1edf01b 3292 // (NO NEED to correct pos. in loop below)
destinyXfate 2:0e2ef1edf01b 3293
destinyXfate 2:0e2ef1edf01b 3294 "avg_8lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3295 "movq (%%edi,%%ecx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3296 "movq %%mm5, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3297 "movq (%%esi,%%ecx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3298 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3299 "pand %%mm1, %%mm3 \n\t" // get lsb for each prev_row byte
destinyXfate 2:0e2ef1edf01b 3300 "psrlq $1, %%mm1 \n\t" // divide prev_row bytes by 2
destinyXfate 2:0e2ef1edf01b 3301 "pand %%mm2, %%mm3 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3302 // where both lsb's were == 1
destinyXfate 2:0e2ef1edf01b 3303 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3304 "pand %%mm4, %%mm1 \n\t" // clear invalid bit 7, each byte
destinyXfate 2:0e2ef1edf01b 3305 "paddb %%mm3, %%mm0 \n\t" // add LBCarrys to Avg, each byte
destinyXfate 2:0e2ef1edf01b 3306 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7, each byte
destinyXfate 2:0e2ef1edf01b 3307 "paddb %%mm1, %%mm0 \n\t" // add (Prev_row/2) to Avg, each
destinyXfate 2:0e2ef1edf01b 3308 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) to Avg for each
destinyXfate 2:0e2ef1edf01b 3309 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3310 "movq %%mm0, -8(%%edi,%%ecx,) \n\t"
destinyXfate 2:0e2ef1edf01b 3311 "movq %%mm0, %%mm2 \n\t" // reuse as Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3312 "jb avg_8lp \n\t"
destinyXfate 2:0e2ef1edf01b 3313
destinyXfate 2:0e2ef1edf01b 3314 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3315 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3316
destinyXfate 2:0e2ef1edf01b 3317 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 3318 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 3319
destinyXfate 2:0e2ef1edf01b 3320 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 3321 #if 0 /* %mm0, ..., %mm5 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 3322 , "%mm0", "%mm1", "%mm2"
destinyXfate 2:0e2ef1edf01b 3323 , "%mm3", "%mm4", "%mm5"
destinyXfate 2:0e2ef1edf01b 3324 #endif
destinyXfate 2:0e2ef1edf01b 3325 );
destinyXfate 2:0e2ef1edf01b 3326 }
destinyXfate 2:0e2ef1edf01b 3327 break; // end 8 bpp
destinyXfate 2:0e2ef1edf01b 3328
destinyXfate 2:0e2ef1edf01b 3329 default: // bpp greater than 8 (!= 1,2,3,4,[5],6,[7],8)
destinyXfate 2:0e2ef1edf01b 3330 {
destinyXfate 2:0e2ef1edf01b 3331
destinyXfate 2:0e2ef1edf01b 3332 #ifdef PNG_DEBUG
destinyXfate 2:0e2ef1edf01b 3333 // GRR: PRINT ERROR HERE: SHOULD NEVER BE REACHED
destinyXfate 2:0e2ef1edf01b 3334 png_debug(1,
destinyXfate 2:0e2ef1edf01b 3335 "Internal logic error in pnggccrd (png_read_filter_row_mmx_avg())\n");
destinyXfate 2:0e2ef1edf01b 3336 #endif
destinyXfate 2:0e2ef1edf01b 3337
destinyXfate 2:0e2ef1edf01b 3338 #if 0
destinyXfate 2:0e2ef1edf01b 3339 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3340 "movq _LBCarryMask, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3341 // re-init address pointers and offset
destinyXfate 2:0e2ef1edf01b 3342 "movl _dif, %%ebx \n\t" // ebx: x = offset to
destinyXfate 2:0e2ef1edf01b 3343 // alignment boundary
destinyXfate 2:0e2ef1edf01b 3344 "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 3345 "movq _HBClearMask, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3346 "movl %%edi, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 3347 "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 3348 "subl bpp, %%edx \n\t" // edx: Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3349 "avg_Alp: \n\t"
destinyXfate 2:0e2ef1edf01b 3350 "movq (%%edi,%%ebx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3351 "movq %%mm5, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3352 "movq (%%esi,%%ebx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3353 "pand %%mm1, %%mm3 \n\t" // get lsb for each prev_row byte
destinyXfate 2:0e2ef1edf01b 3354 "movq (%%edx,%%ebx,), %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 3355 "psrlq $1, %%mm1 \n\t" // divide prev_row bytes by 2
destinyXfate 2:0e2ef1edf01b 3356 "pand %%mm2, %%mm3 \n\t" // get LBCarrys for each byte
destinyXfate 2:0e2ef1edf01b 3357 // where both lsb's were == 1
destinyXfate 2:0e2ef1edf01b 3358 "psrlq $1, %%mm2 \n\t" // divide raw bytes by 2
destinyXfate 2:0e2ef1edf01b 3359 "pand %%mm4, %%mm1 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3360 // byte
destinyXfate 2:0e2ef1edf01b 3361 "paddb %%mm3, %%mm0 \n\t" // add LBCarrys to Avg for each
destinyXfate 2:0e2ef1edf01b 3362 // byte
destinyXfate 2:0e2ef1edf01b 3363 "pand %%mm4, %%mm2 \n\t" // clear invalid bit 7 of each
destinyXfate 2:0e2ef1edf01b 3364 // byte
destinyXfate 2:0e2ef1edf01b 3365 "paddb %%mm1, %%mm0 \n\t" // add (Prev_row/2) to Avg for
destinyXfate 2:0e2ef1edf01b 3366 // each byte
destinyXfate 2:0e2ef1edf01b 3367 "addl $8, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3368 "paddb %%mm2, %%mm0 \n\t" // add (Raw/2) to Avg for each
destinyXfate 2:0e2ef1edf01b 3369 // byte
destinyXfate 2:0e2ef1edf01b 3370 "cmpl _MMXLength, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3371 "movq %%mm0, -8(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 3372 "jb avg_Alp \n\t"
destinyXfate 2:0e2ef1edf01b 3373
destinyXfate 2:0e2ef1edf01b 3374 : // FIXASM: output regs/vars go here, e.g.: "=m" (memory_var)
destinyXfate 2:0e2ef1edf01b 3375
destinyXfate 2:0e2ef1edf01b 3376 : // FIXASM: input regs, e.g.: "c" (count), "S" (src), "D" (dest)
destinyXfate 2:0e2ef1edf01b 3377
destinyXfate 2:0e2ef1edf01b 3378 : "%ebx", "%edx", "%edi", "%esi" // CHECKASM: clobber list
destinyXfate 2:0e2ef1edf01b 3379 );
destinyXfate 2:0e2ef1edf01b 3380 #endif /* 0 - NEVER REACHED */
destinyXfate 2:0e2ef1edf01b 3381 }
destinyXfate 2:0e2ef1edf01b 3382 break;
destinyXfate 2:0e2ef1edf01b 3383
destinyXfate 2:0e2ef1edf01b 3384 } // end switch (bpp)
destinyXfate 2:0e2ef1edf01b 3385
destinyXfate 2:0e2ef1edf01b 3386 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3387 // MMX acceleration complete; now do clean-up
destinyXfate 2:0e2ef1edf01b 3388 // check if any remaining bytes left to decode
destinyXfate 2:0e2ef1edf01b 3389 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 3390 "pushl %%ebx \n\t" // save index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 3391 #endif
destinyXfate 2:0e2ef1edf01b 3392 "movl _MMXLength, %%ebx \n\t" // ebx: x == offset bytes after MMX
destinyXfate 2:0e2ef1edf01b 3393 //pre "movl row, %%edi \n\t" // edi: Avg(x)
destinyXfate 2:0e2ef1edf01b 3394 "cmpl _FullLength, %%ebx \n\t" // test if offset at end of array
destinyXfate 2:0e2ef1edf01b 3395 "jnb avg_end \n\t"
destinyXfate 2:0e2ef1edf01b 3396
destinyXfate 2:0e2ef1edf01b 3397 // do Avg decode for remaining bytes
destinyXfate 2:0e2ef1edf01b 3398 //pre "movl prev_row, %%esi \n\t" // esi: Prior(x)
destinyXfate 2:0e2ef1edf01b 3399 "movl %%edi, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 3400 //pre "subl bpp, %%edx \n\t" // (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 3401 "subl %%ecx, %%edx \n\t" // edx: Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3402 "xorl %%ecx, %%ecx \n\t" // zero ecx before using cl & cx below
destinyXfate 2:0e2ef1edf01b 3403
destinyXfate 2:0e2ef1edf01b 3404 "avg_lp2: \n\t"
destinyXfate 2:0e2ef1edf01b 3405 // Raw(x) = Avg(x) + ((Raw(x-bpp) + Prior(x))/2)
destinyXfate 2:0e2ef1edf01b 3406 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3407 "movb (%%esi,%%ebx,), %%cl \n\t" // load cl with Prior(x)
destinyXfate 2:0e2ef1edf01b 3408 "movb (%%edx,%%ebx,), %%al \n\t" // load al with Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3409 "addw %%cx, %%ax \n\t"
destinyXfate 2:0e2ef1edf01b 3410 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3411 "shrw %%ax \n\t" // divide by 2
destinyXfate 2:0e2ef1edf01b 3412 "addb -1(%%edi,%%ebx,), %%al \n\t" // add Avg(x); -1 to offset inc ebx
destinyXfate 2:0e2ef1edf01b 3413 "cmpl _FullLength, %%ebx \n\t" // check if at end of array
destinyXfate 2:0e2ef1edf01b 3414 "movb %%al, -1(%%edi,%%ebx,) \n\t" // write back Raw(x) [mov does not
destinyXfate 2:0e2ef1edf01b 3415 "jb avg_lp2 \n\t" // affect flags; -1 to offset inc ebx]
destinyXfate 2:0e2ef1edf01b 3416
destinyXfate 2:0e2ef1edf01b 3417 "avg_end: \n\t"
destinyXfate 2:0e2ef1edf01b 3418 "EMMS \n\t" // end MMX; prep for poss. FP instrs.
destinyXfate 2:0e2ef1edf01b 3419 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 3420 "popl %%ebx \n\t" // restore index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 3421 #endif
destinyXfate 2:0e2ef1edf01b 3422
destinyXfate 2:0e2ef1edf01b 3423 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3424 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 3425 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3426
destinyXfate 2:0e2ef1edf01b 3427 : "0" (bpp), // ecx // input regs
destinyXfate 2:0e2ef1edf01b 3428 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 3429 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 3430
destinyXfate 2:0e2ef1edf01b 3431 : "%eax", "%edx" // clobber list
destinyXfate 2:0e2ef1edf01b 3432 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 3433 , "%ebx"
destinyXfate 2:0e2ef1edf01b 3434 #endif
destinyXfate 2:0e2ef1edf01b 3435 );
destinyXfate 2:0e2ef1edf01b 3436
destinyXfate 2:0e2ef1edf01b 3437 } /* end png_read_filter_row_mmx_avg() */
destinyXfate 2:0e2ef1edf01b 3438 #endif
destinyXfate 2:0e2ef1edf01b 3439
destinyXfate 2:0e2ef1edf01b 3440
destinyXfate 2:0e2ef1edf01b 3441
destinyXfate 2:0e2ef1edf01b 3442 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 3443 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 3444 // //
destinyXfate 2:0e2ef1edf01b 3445 // P N G _ R E A D _ F I L T E R _ R O W _ M M X _ P A E T H //
destinyXfate 2:0e2ef1edf01b 3446 // //
destinyXfate 2:0e2ef1edf01b 3447 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 3448
destinyXfate 2:0e2ef1edf01b 3449 // Optimized code for PNG Paeth filter decoder
destinyXfate 2:0e2ef1edf01b 3450
destinyXfate 2:0e2ef1edf01b 3451 static void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 3452 png_read_filter_row_mmx_paeth(png_row_infop row_info, png_bytep row,
destinyXfate 2:0e2ef1edf01b 3453 png_bytep prev_row)
destinyXfate 2:0e2ef1edf01b 3454 {
destinyXfate 2:0e2ef1edf01b 3455 int bpp;
destinyXfate 2:0e2ef1edf01b 3456 int dummy_value_c; // fix 'forbidden register 2 (cx) was spilled' error
destinyXfate 2:0e2ef1edf01b 3457 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 3458 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 3459
destinyXfate 2:0e2ef1edf01b 3460 bpp = (row_info->pixel_depth + 7) >> 3; // Get # bytes per pixel
destinyXfate 2:0e2ef1edf01b 3461 _FullLength = row_info->rowbytes; // # of bytes to filter
destinyXfate 2:0e2ef1edf01b 3462
destinyXfate 2:0e2ef1edf01b 3463 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3464 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 3465 "pushl %%ebx \n\t" // save index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 3466 #endif
destinyXfate 2:0e2ef1edf01b 3467 "xorl %%ebx, %%ebx \n\t" // ebx: x offset
destinyXfate 2:0e2ef1edf01b 3468 //pre "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 3469 "xorl %%edx, %%edx \n\t" // edx: x-bpp offset
destinyXfate 2:0e2ef1edf01b 3470 //pre "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 3471 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3472
destinyXfate 2:0e2ef1edf01b 3473 // Compute the Raw value for the first bpp bytes
destinyXfate 2:0e2ef1edf01b 3474 // Note: the formula works out to be always
destinyXfate 2:0e2ef1edf01b 3475 // Paeth(x) = Raw(x) + Prior(x) where x < bpp
destinyXfate 2:0e2ef1edf01b 3476 "paeth_rlp: \n\t"
destinyXfate 2:0e2ef1edf01b 3477 "movb (%%edi,%%ebx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 3478 "addb (%%esi,%%ebx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 3479 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3480 //pre "cmpl bpp, %%ebx \n\t" (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 3481 "cmpl %%ecx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3482 "movb %%al, -1(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 3483 "jb paeth_rlp \n\t"
destinyXfate 2:0e2ef1edf01b 3484 // get # of bytes to alignment
destinyXfate 2:0e2ef1edf01b 3485 "movl %%edi, _dif \n\t" // take start of row
destinyXfate 2:0e2ef1edf01b 3486 "addl %%ebx, _dif \n\t" // add bpp
destinyXfate 2:0e2ef1edf01b 3487 "xorl %%ecx, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3488 "addl $0xf, _dif \n\t" // add 7 + 8 to incr past alignment
destinyXfate 2:0e2ef1edf01b 3489 // boundary
destinyXfate 2:0e2ef1edf01b 3490 "andl $0xfffffff8, _dif \n\t" // mask to alignment boundary
destinyXfate 2:0e2ef1edf01b 3491 "subl %%edi, _dif \n\t" // subtract from start ==> value ebx
destinyXfate 2:0e2ef1edf01b 3492 // at alignment
destinyXfate 2:0e2ef1edf01b 3493 "jz paeth_go \n\t"
destinyXfate 2:0e2ef1edf01b 3494 // fix alignment
destinyXfate 2:0e2ef1edf01b 3495
destinyXfate 2:0e2ef1edf01b 3496 "paeth_lp1: \n\t"
destinyXfate 2:0e2ef1edf01b 3497 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3498 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3499 "movb (%%esi,%%ebx,), %%al \n\t" // load Prior(x) into al
destinyXfate 2:0e2ef1edf01b 3500 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 3501 "subl %%ecx, %%eax \n\t" // subtract Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 3502 "movl %%eax, _patemp \n\t" // Save pav for later use
destinyXfate 2:0e2ef1edf01b 3503 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3504 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 3505 "movb (%%edi,%%edx,), %%al \n\t" // load Raw(x-bpp) into al
destinyXfate 2:0e2ef1edf01b 3506 "subl %%ecx, %%eax \n\t" // subtract Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 3507 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3508 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 3509 "addl _patemp, %%eax \n\t" // pcv = pav + pbv
destinyXfate 2:0e2ef1edf01b 3510 // pc = abs(pcv)
destinyXfate 2:0e2ef1edf01b 3511 "testl $0x80000000, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3512 "jz paeth_pca \n\t"
destinyXfate 2:0e2ef1edf01b 3513 "negl %%eax \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 3514
destinyXfate 2:0e2ef1edf01b 3515 "paeth_pca: \n\t"
destinyXfate 2:0e2ef1edf01b 3516 "movl %%eax, _pctemp \n\t" // save pc for later use
destinyXfate 2:0e2ef1edf01b 3517 // pb = abs(pbv)
destinyXfate 2:0e2ef1edf01b 3518 "testl $0x80000000, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3519 "jz paeth_pba \n\t"
destinyXfate 2:0e2ef1edf01b 3520 "negl %%ecx \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 3521
destinyXfate 2:0e2ef1edf01b 3522 "paeth_pba: \n\t"
destinyXfate 2:0e2ef1edf01b 3523 "movl %%ecx, _pbtemp \n\t" // save pb for later use
destinyXfate 2:0e2ef1edf01b 3524 // pa = abs(pav)
destinyXfate 2:0e2ef1edf01b 3525 "movl _patemp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3526 "testl $0x80000000, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3527 "jz paeth_paa \n\t"
destinyXfate 2:0e2ef1edf01b 3528 "negl %%eax \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 3529
destinyXfate 2:0e2ef1edf01b 3530 "paeth_paa: \n\t"
destinyXfate 2:0e2ef1edf01b 3531 "movl %%eax, _patemp \n\t" // save pa for later use
destinyXfate 2:0e2ef1edf01b 3532 // test if pa <= pb
destinyXfate 2:0e2ef1edf01b 3533 "cmpl %%ecx, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3534 "jna paeth_abb \n\t"
destinyXfate 2:0e2ef1edf01b 3535 // pa > pb; now test if pb <= pc
destinyXfate 2:0e2ef1edf01b 3536 "cmpl _pctemp, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3537 "jna paeth_bbc \n\t"
destinyXfate 2:0e2ef1edf01b 3538 // pb > pc; Raw(x) = Paeth(x) + Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 3539 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 3540 "jmp paeth_paeth \n\t"
destinyXfate 2:0e2ef1edf01b 3541
destinyXfate 2:0e2ef1edf01b 3542 "paeth_bbc: \n\t"
destinyXfate 2:0e2ef1edf01b 3543 // pb <= pc; Raw(x) = Paeth(x) + Prior(x)
destinyXfate 2:0e2ef1edf01b 3544 "movb (%%esi,%%ebx,), %%cl \n\t" // load Prior(x) into cl
destinyXfate 2:0e2ef1edf01b 3545 "jmp paeth_paeth \n\t"
destinyXfate 2:0e2ef1edf01b 3546
destinyXfate 2:0e2ef1edf01b 3547 "paeth_abb: \n\t"
destinyXfate 2:0e2ef1edf01b 3548 // pa <= pb; now test if pa <= pc
destinyXfate 2:0e2ef1edf01b 3549 "cmpl _pctemp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3550 "jna paeth_abc \n\t"
destinyXfate 2:0e2ef1edf01b 3551 // pa > pc; Raw(x) = Paeth(x) + Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 3552 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 3553 "jmp paeth_paeth \n\t"
destinyXfate 2:0e2ef1edf01b 3554
destinyXfate 2:0e2ef1edf01b 3555 "paeth_abc: \n\t"
destinyXfate 2:0e2ef1edf01b 3556 // pa <= pc; Raw(x) = Paeth(x) + Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3557 "movb (%%edi,%%edx,), %%cl \n\t" // load Raw(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 3558
destinyXfate 2:0e2ef1edf01b 3559 "paeth_paeth: \n\t"
destinyXfate 2:0e2ef1edf01b 3560 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3561 "incl %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 3562 // Raw(x) = (Paeth(x) + Paeth_Predictor( a, b, c )) mod 256
destinyXfate 2:0e2ef1edf01b 3563 "addb %%cl, -1(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 3564 "cmpl _dif, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 3565 "jb paeth_lp1 \n\t"
destinyXfate 2:0e2ef1edf01b 3566
destinyXfate 2:0e2ef1edf01b 3567 "paeth_go: \n\t"
destinyXfate 2:0e2ef1edf01b 3568 "movl _FullLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3569 "movl %%ecx, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 3570 "subl %%ebx, %%eax \n\t" // subtract alignment fix
destinyXfate 2:0e2ef1edf01b 3571 "andl $0x00000007, %%eax \n\t" // calc bytes over mult of 8
destinyXfate 2:0e2ef1edf01b 3572 "subl %%eax, %%ecx \n\t" // drop over bytes from original length
destinyXfate 2:0e2ef1edf01b 3573 "movl %%ecx, _MMXLength \n\t"
destinyXfate 2:0e2ef1edf01b 3574 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 3575 "popl %%ebx \n\t" // restore index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 3576 #endif
destinyXfate 2:0e2ef1edf01b 3577
destinyXfate 2:0e2ef1edf01b 3578 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3579 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 3580 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3581
destinyXfate 2:0e2ef1edf01b 3582 : "0" (bpp), // ecx // input regs
destinyXfate 2:0e2ef1edf01b 3583 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 3584 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 3585
destinyXfate 2:0e2ef1edf01b 3586 : "%eax", "%edx" // clobber list
destinyXfate 2:0e2ef1edf01b 3587 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 3588 , "%ebx"
destinyXfate 2:0e2ef1edf01b 3589 #endif
destinyXfate 2:0e2ef1edf01b 3590 );
destinyXfate 2:0e2ef1edf01b 3591
destinyXfate 2:0e2ef1edf01b 3592 // now do the math for the rest of the row
destinyXfate 2:0e2ef1edf01b 3593 switch (bpp)
destinyXfate 2:0e2ef1edf01b 3594 {
destinyXfate 2:0e2ef1edf01b 3595 case 3:
destinyXfate 2:0e2ef1edf01b 3596 {
destinyXfate 2:0e2ef1edf01b 3597 _ActiveMask.use = 0x0000000000ffffffLL;
destinyXfate 2:0e2ef1edf01b 3598 _ActiveMaskEnd.use = 0xffff000000000000LL;
destinyXfate 2:0e2ef1edf01b 3599 _ShiftBpp.use = 24; // == bpp(3) * 8
destinyXfate 2:0e2ef1edf01b 3600 _ShiftRem.use = 40; // == 64 - 24
destinyXfate 2:0e2ef1edf01b 3601
destinyXfate 2:0e2ef1edf01b 3602 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3603 "movl _dif, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3604 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 3605 // preload "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 3606 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3607 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 3608 "movq -8(%%edi,%%ecx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3609 "paeth_3lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3610 "psrlq _ShiftRem, %%mm1 \n\t" // shift last 3 bytes to 1st
destinyXfate 2:0e2ef1edf01b 3611 // 3 bytes
destinyXfate 2:0e2ef1edf01b 3612 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x)
destinyXfate 2:0e2ef1edf01b 3613 "punpcklbw %%mm0, %%mm1 \n\t" // unpack High bytes of a
destinyXfate 2:0e2ef1edf01b 3614 "movq -8(%%esi,%%ecx,), %%mm3 \n\t" // prep c=Prior(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 3615 "punpcklbw %%mm0, %%mm2 \n\t" // unpack High bytes of b
destinyXfate 2:0e2ef1edf01b 3616 "psrlq _ShiftRem, %%mm3 \n\t" // shift last 3 bytes to 1st
destinyXfate 2:0e2ef1edf01b 3617 // 3 bytes
destinyXfate 2:0e2ef1edf01b 3618 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3619 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3620 "punpcklbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 3621 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 3622 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3623 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3624 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3625 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 3626 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3627 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3628
destinyXfate 2:0e2ef1edf01b 3629 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 3630 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 3631 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 3632 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 3633 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3634 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3635 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 3636 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3637 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 3638 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3639 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3640 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3641 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 3642 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3643 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3644 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3645 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 3646 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3647 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3648 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 3649 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3650 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 3651 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3652 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 3653 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 3654 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3655 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3656 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3657 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3658 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 3659 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 3660 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3661 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3662 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3663 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3664 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3665 "packuswb %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3666 "movq (%%esi,%%ecx,), %%mm3 \n\t" // load c=Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 3667 "pand _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3668 "movq %%mm3, %%mm2 \n\t" // load b=Prior(x) step 1
destinyXfate 2:0e2ef1edf01b 3669 "paddb (%%edi,%%ecx,), %%mm7 \n\t" // add Paeth predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 3670 "punpcklbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 3671 "movq %%mm7, (%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 3672 "movq %%mm7, %%mm1 \n\t" // now mm1 will be used as
destinyXfate 2:0e2ef1edf01b 3673 // Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3674 // now do Paeth for 2nd set of bytes (3-5)
destinyXfate 2:0e2ef1edf01b 3675 "psrlq _ShiftBpp, %%mm2 \n\t" // load b=Prior(x) step 2
destinyXfate 2:0e2ef1edf01b 3676 "punpcklbw %%mm0, %%mm1 \n\t" // unpack High bytes of a
destinyXfate 2:0e2ef1edf01b 3677 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3678 "punpcklbw %%mm0, %%mm2 \n\t" // unpack High bytes of b
destinyXfate 2:0e2ef1edf01b 3679 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 3680 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3681 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3682 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3683 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3684 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3685 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) =
destinyXfate 2:0e2ef1edf01b 3686 // pav + pbv = pbv + pav
destinyXfate 2:0e2ef1edf01b 3687 "movq %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3688 "paddw %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3689
destinyXfate 2:0e2ef1edf01b 3690 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 3691 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 3692 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 3693 "pcmpgtw %%mm5, %%mm0 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 3694 "pcmpgtw %%mm4, %%mm7 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 3695 "pand %%mm5, %%mm0 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 3696 "pand %%mm4, %%mm7 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3697 "psubw %%mm0, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3698 "psubw %%mm7, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3699 "psubw %%mm0, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3700 "psubw %%mm7, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3701 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3702 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 3703 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3704 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3705 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 3706 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3707 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3708 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 3709 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3710 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 3711 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3712 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 3713 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 3714 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3715 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3716 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3717 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3718 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 3719 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 3720 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x)
destinyXfate 2:0e2ef1edf01b 3721 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3722 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3723 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3724 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3725 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3726 "packuswb %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3727 "movq %%mm2, %%mm3 \n\t" // load c=Prior(x-bpp) step 1
destinyXfate 2:0e2ef1edf01b 3728 "pand _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3729 "punpckhbw %%mm0, %%mm2 \n\t" // unpack High bytes of b
destinyXfate 2:0e2ef1edf01b 3730 "psllq _ShiftBpp, %%mm7 \n\t" // shift bytes to 2nd group of
destinyXfate 2:0e2ef1edf01b 3731 // 3 bytes
destinyXfate 2:0e2ef1edf01b 3732 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3733 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3734 "paddb (%%edi,%%ecx,), %%mm7 \n\t" // add Paeth predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 3735 "psllq _ShiftBpp, %%mm3 \n\t" // load c=Prior(x-bpp) step 2
destinyXfate 2:0e2ef1edf01b 3736 "movq %%mm7, (%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 3737 "movq %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3738 "punpckhbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 3739 "psllq _ShiftBpp, %%mm1 \n\t" // shift bytes
destinyXfate 2:0e2ef1edf01b 3740 // now mm1 will be used as Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 3741 // now do Paeth for 3rd, and final, set of bytes (6-7)
destinyXfate 2:0e2ef1edf01b 3742 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3743 "punpckhbw %%mm0, %%mm1 \n\t" // unpack High bytes of a
destinyXfate 2:0e2ef1edf01b 3744 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3745 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 3746 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3747 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 3748 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3749 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3750 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3751 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3752
destinyXfate 2:0e2ef1edf01b 3753 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 3754 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 3755 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 3756 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 3757 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 3758 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3759 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 3760 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3761 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3762 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3763 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3764 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3765 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 3766 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3767 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3768 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 3769 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3770 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3771 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 3772 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3773 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 3774 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 3775 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 3776 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3777 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3778 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3779 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3780 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3781 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 3782 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 3783 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3784 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3785 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3786 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3787 "packuswb %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3788 // step ecx to next set of 8 bytes and repeat loop til done
destinyXfate 2:0e2ef1edf01b 3789 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3790 "pand _ActiveMaskEnd, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3791 "paddb -8(%%edi,%%ecx,), %%mm1 \n\t" // add Paeth predictor with
destinyXfate 2:0e2ef1edf01b 3792 // Raw(x)
destinyXfate 2:0e2ef1edf01b 3793
destinyXfate 2:0e2ef1edf01b 3794 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3795 "pxor %%mm0, %%mm0 \n\t" // pxor does not affect flags
destinyXfate 2:0e2ef1edf01b 3796 "movq %%mm1, -8(%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 3797 // mm1 will be used as Raw(x-bpp) next loop
destinyXfate 2:0e2ef1edf01b 3798 // mm3 ready to be used as Prior(x-bpp) next loop
destinyXfate 2:0e2ef1edf01b 3799 "jb paeth_3lp \n\t"
destinyXfate 2:0e2ef1edf01b 3800
destinyXfate 2:0e2ef1edf01b 3801 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3802 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3803
destinyXfate 2:0e2ef1edf01b 3804 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 3805 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 3806
destinyXfate 2:0e2ef1edf01b 3807 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 3808 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 3809 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 3810 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 3811 #endif
destinyXfate 2:0e2ef1edf01b 3812 );
destinyXfate 2:0e2ef1edf01b 3813 }
destinyXfate 2:0e2ef1edf01b 3814 break; // end 3 bpp
destinyXfate 2:0e2ef1edf01b 3815
destinyXfate 2:0e2ef1edf01b 3816 case 6:
destinyXfate 2:0e2ef1edf01b 3817 //case 7: // GRR BOGUS
destinyXfate 2:0e2ef1edf01b 3818 //case 5: // GRR BOGUS
destinyXfate 2:0e2ef1edf01b 3819 {
destinyXfate 2:0e2ef1edf01b 3820 _ActiveMask.use = 0x00000000ffffffffLL;
destinyXfate 2:0e2ef1edf01b 3821 _ActiveMask2.use = 0xffffffff00000000LL;
destinyXfate 2:0e2ef1edf01b 3822 _ShiftBpp.use = bpp << 3; // == bpp * 8
destinyXfate 2:0e2ef1edf01b 3823 _ShiftRem.use = 64 - _ShiftBpp.use;
destinyXfate 2:0e2ef1edf01b 3824
destinyXfate 2:0e2ef1edf01b 3825 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3826 "movl _dif, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3827 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 3828 // preload "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 3829 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 3830 "movq -8(%%edi,%%ecx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3831 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3832
destinyXfate 2:0e2ef1edf01b 3833 "paeth_6lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3834 // must shift to position Raw(x-bpp) data
destinyXfate 2:0e2ef1edf01b 3835 "psrlq _ShiftRem, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3836 // do first set of 4 bytes
destinyXfate 2:0e2ef1edf01b 3837 "movq -8(%%esi,%%ecx,), %%mm3 \n\t" // read c=Prior(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 3838 "punpcklbw %%mm0, %%mm1 \n\t" // unpack Low bytes of a
destinyXfate 2:0e2ef1edf01b 3839 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x)
destinyXfate 2:0e2ef1edf01b 3840 "punpcklbw %%mm0, %%mm2 \n\t" // unpack Low bytes of b
destinyXfate 2:0e2ef1edf01b 3841 // must shift to position Prior(x-bpp) data
destinyXfate 2:0e2ef1edf01b 3842 "psrlq _ShiftRem, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3843 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3844 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3845 "punpcklbw %%mm0, %%mm3 \n\t" // unpack Low bytes of c
destinyXfate 2:0e2ef1edf01b 3846 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 3847 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3848 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3849 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3850 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 3851 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3852 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3853 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 3854 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 3855 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 3856 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 3857 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3858 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3859 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 3860 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3861 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 3862 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3863 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3864 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3865 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 3866 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3867 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3868 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3869 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 3870 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3871 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3872 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 3873 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3874 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 3875 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3876 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 3877 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 3878 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3879 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3880 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3881 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3882 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 3883 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 3884 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3885 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3886 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3887 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3888 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3889 "packuswb %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3890 "movq -8(%%esi,%%ecx,), %%mm3 \n\t" // load c=Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 3891 "pand _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3892 "psrlq _ShiftRem, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3893 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x) step 1
destinyXfate 2:0e2ef1edf01b 3894 "paddb (%%edi,%%ecx,), %%mm7 \n\t" // add Paeth predictor and Raw(x)
destinyXfate 2:0e2ef1edf01b 3895 "movq %%mm2, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3896 "movq %%mm7, (%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 3897 "movq -8(%%edi,%%ecx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3898 "psllq _ShiftBpp, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3899 "movq %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3900 "psrlq _ShiftRem, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3901 "por %%mm6, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3902 "psllq _ShiftBpp, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3903 "punpckhbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 3904 "por %%mm5, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3905 // do second set of 4 bytes
destinyXfate 2:0e2ef1edf01b 3906 "punpckhbw %%mm0, %%mm2 \n\t" // unpack High bytes of b
destinyXfate 2:0e2ef1edf01b 3907 "punpckhbw %%mm0, %%mm1 \n\t" // unpack High bytes of a
destinyXfate 2:0e2ef1edf01b 3908 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3909 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3910 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 3911 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3912 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3913 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3914 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 3915 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3916 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3917 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 3918 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 3919 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 3920 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 3921 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3922 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3923 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 3924 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3925 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 3926 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3927 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3928 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3929 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 3930 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 3931 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3932 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3933 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 3934 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3935 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 3936 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 3937 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3938 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 3939 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 3940 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 3941 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 3942 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3943 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3944 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3945 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3946 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 3947 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 3948 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3949 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 3950 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3951 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3952 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 3953 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3954 // step ecx to next set of 8 bytes and repeat loop til done
destinyXfate 2:0e2ef1edf01b 3955 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3956 "packuswb %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 3957 "paddb -8(%%edi,%%ecx,), %%mm1 \n\t" // add Paeth predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 3958 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3959 "movq %%mm1, -8(%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 3960 // mm1 will be used as Raw(x-bpp) next loop
destinyXfate 2:0e2ef1edf01b 3961 "jb paeth_6lp \n\t"
destinyXfate 2:0e2ef1edf01b 3962
destinyXfate 2:0e2ef1edf01b 3963 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 3964 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 3965
destinyXfate 2:0e2ef1edf01b 3966 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 3967 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 3968
destinyXfate 2:0e2ef1edf01b 3969 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 3970 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 3971 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 3972 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 3973 #endif
destinyXfate 2:0e2ef1edf01b 3974 );
destinyXfate 2:0e2ef1edf01b 3975 }
destinyXfate 2:0e2ef1edf01b 3976 break; // end 6 bpp
destinyXfate 2:0e2ef1edf01b 3977
destinyXfate 2:0e2ef1edf01b 3978 case 4:
destinyXfate 2:0e2ef1edf01b 3979 {
destinyXfate 2:0e2ef1edf01b 3980 _ActiveMask.use = 0x00000000ffffffffLL;
destinyXfate 2:0e2ef1edf01b 3981
destinyXfate 2:0e2ef1edf01b 3982 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 3983 "movl _dif, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 3984 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 3985 // preload "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 3986 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 3987 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 3988 "movq -8(%%edi,%%ecx,), %%mm1 \n\t" // only time should need to read
destinyXfate 2:0e2ef1edf01b 3989 // a=Raw(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 3990 "paeth_4lp: \n\t"
destinyXfate 2:0e2ef1edf01b 3991 // do first set of 4 bytes
destinyXfate 2:0e2ef1edf01b 3992 "movq -8(%%esi,%%ecx,), %%mm3 \n\t" // read c=Prior(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 3993 "punpckhbw %%mm0, %%mm1 \n\t" // unpack Low bytes of a
destinyXfate 2:0e2ef1edf01b 3994 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x)
destinyXfate 2:0e2ef1edf01b 3995 "punpcklbw %%mm0, %%mm2 \n\t" // unpack High bytes of b
destinyXfate 2:0e2ef1edf01b 3996 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 3997 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 3998 "punpckhbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 3999 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 4000 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4001 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4002 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4003 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 4004 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4005 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4006 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 4007 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 4008 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 4009 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 4010 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4011 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4012 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 4013 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4014 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 4015 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4016 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4017 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4018 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 4019 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4020 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4021 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4022 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 4023 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4024 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4025 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 4026 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4027 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 4028 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4029 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 4030 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4031 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4032 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4033 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4034 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4035 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 4036 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 4037 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4038 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4039 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4040 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4041 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4042 "packuswb %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4043 "movq (%%esi,%%ecx,), %%mm3 \n\t" // load c=Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4044 "pand _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4045 "movq %%mm3, %%mm2 \n\t" // load b=Prior(x) step 1
destinyXfate 2:0e2ef1edf01b 4046 "paddb (%%edi,%%ecx,), %%mm7 \n\t" // add Paeth predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 4047 "punpcklbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 4048 "movq %%mm7, (%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 4049 "movq %%mm7, %%mm1 \n\t" // now mm1 will be used as Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 4050 // do second set of 4 bytes
destinyXfate 2:0e2ef1edf01b 4051 "punpckhbw %%mm0, %%mm2 \n\t" // unpack Low bytes of b
destinyXfate 2:0e2ef1edf01b 4052 "punpcklbw %%mm0, %%mm1 \n\t" // unpack Low bytes of a
destinyXfate 2:0e2ef1edf01b 4053 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 4054 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4055 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 4056 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4057 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4058 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4059 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 4060 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4061 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4062 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 4063 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 4064 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 4065 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 4066 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4067 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4068 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 4069 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4070 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 4071 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4072 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4073 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4074 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 4075 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4076 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4077 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4078 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 4079 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4080 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4081 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 4082 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4083 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 4084 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4085 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 4086 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4087 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4088 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4089 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4090 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4091 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 4092 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 4093 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4094 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4095 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4096 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4097 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4098 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4099 // step ecx to next set of 8 bytes and repeat loop til done
destinyXfate 2:0e2ef1edf01b 4100 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4101 "packuswb %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4102 "paddb -8(%%edi,%%ecx,), %%mm1 \n\t" // add predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 4103 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4104 "movq %%mm1, -8(%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 4105 // mm1 will be used as Raw(x-bpp) next loop
destinyXfate 2:0e2ef1edf01b 4106 "jb paeth_4lp \n\t"
destinyXfate 2:0e2ef1edf01b 4107
destinyXfate 2:0e2ef1edf01b 4108 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4109 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 4110
destinyXfate 2:0e2ef1edf01b 4111 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 4112 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4113
destinyXfate 2:0e2ef1edf01b 4114 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 4115 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4116 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 4117 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 4118 #endif
destinyXfate 2:0e2ef1edf01b 4119 );
destinyXfate 2:0e2ef1edf01b 4120 }
destinyXfate 2:0e2ef1edf01b 4121 break; // end 4 bpp
destinyXfate 2:0e2ef1edf01b 4122
destinyXfate 2:0e2ef1edf01b 4123 case 8: // bpp == 8
destinyXfate 2:0e2ef1edf01b 4124 {
destinyXfate 2:0e2ef1edf01b 4125 _ActiveMask.use = 0x00000000ffffffffLL;
destinyXfate 2:0e2ef1edf01b 4126
destinyXfate 2:0e2ef1edf01b 4127 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4128 "movl _dif, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4129 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4130 // preload "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 4131 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4132 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 4133 "movq -8(%%edi,%%ecx,), %%mm1 \n\t" // only time should need to read
destinyXfate 2:0e2ef1edf01b 4134 // a=Raw(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 4135 "paeth_8lp: \n\t"
destinyXfate 2:0e2ef1edf01b 4136 // do first set of 4 bytes
destinyXfate 2:0e2ef1edf01b 4137 "movq -8(%%esi,%%ecx,), %%mm3 \n\t" // read c=Prior(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 4138 "punpcklbw %%mm0, %%mm1 \n\t" // unpack Low bytes of a
destinyXfate 2:0e2ef1edf01b 4139 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x)
destinyXfate 2:0e2ef1edf01b 4140 "punpcklbw %%mm0, %%mm2 \n\t" // unpack Low bytes of b
destinyXfate 2:0e2ef1edf01b 4141 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 4142 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4143 "punpcklbw %%mm0, %%mm3 \n\t" // unpack Low bytes of c
destinyXfate 2:0e2ef1edf01b 4144 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 4145 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4146 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4147 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4148 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 4149 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4150 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4151 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 4152 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 4153 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 4154 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 4155 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4156 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4157 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 4158 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4159 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 4160 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4161 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4162 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4163 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 4164 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4165 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4166 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4167 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 4168 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4169 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4170 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 4171 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4172 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 4173 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4174 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 4175 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4176 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4177 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4178 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4179 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4180 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 4181 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 4182 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4183 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4184 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4185 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4186 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4187 "packuswb %%mm1, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4188 "movq -8(%%esi,%%ecx,), %%mm3 \n\t" // read c=Prior(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 4189 "pand _ActiveMask, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4190 "movq (%%esi,%%ecx,), %%mm2 \n\t" // load b=Prior(x)
destinyXfate 2:0e2ef1edf01b 4191 "paddb (%%edi,%%ecx,), %%mm7 \n\t" // add Paeth predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 4192 "punpckhbw %%mm0, %%mm3 \n\t" // unpack High bytes of c
destinyXfate 2:0e2ef1edf01b 4193 "movq %%mm7, (%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 4194 "movq -8(%%edi,%%ecx,), %%mm1 \n\t" // read a=Raw(x-bpp) bytes
destinyXfate 2:0e2ef1edf01b 4195
destinyXfate 2:0e2ef1edf01b 4196 // do second set of 4 bytes
destinyXfate 2:0e2ef1edf01b 4197 "punpckhbw %%mm0, %%mm2 \n\t" // unpack High bytes of b
destinyXfate 2:0e2ef1edf01b 4198 "punpckhbw %%mm0, %%mm1 \n\t" // unpack High bytes of a
destinyXfate 2:0e2ef1edf01b 4199 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 4200 "movq %%mm2, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4201 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 4202 "movq %%mm1, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4203 "psubw %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4204 "pxor %%mm7, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4205 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 4206 "movq %%mm4, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4207 "psubw %%mm3, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4208 // pa = abs(p-a) = abs(pav)
destinyXfate 2:0e2ef1edf01b 4209 // pb = abs(p-b) = abs(pbv)
destinyXfate 2:0e2ef1edf01b 4210 // pc = abs(p-c) = abs(pcv)
destinyXfate 2:0e2ef1edf01b 4211 "pcmpgtw %%mm4, %%mm0 \n\t" // create mask pav bytes < 0
destinyXfate 2:0e2ef1edf01b 4212 "paddw %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4213 "pand %%mm4, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4214 "pcmpgtw %%mm5, %%mm7 \n\t" // create mask pbv bytes < 0
destinyXfate 2:0e2ef1edf01b 4215 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4216 "pand %%mm5, %%mm7 \n\t" // only pbv bytes < 0 in mm0
destinyXfate 2:0e2ef1edf01b 4217 "psubw %%mm0, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4218 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4219 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4220 "pcmpgtw %%mm6, %%mm0 \n\t" // create mask pcv bytes < 0
destinyXfate 2:0e2ef1edf01b 4221 "pand %%mm6, %%mm0 \n\t" // only pav bytes < 0 in mm7
destinyXfate 2:0e2ef1edf01b 4222 "psubw %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4223 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4224 // test pa <= pb
destinyXfate 2:0e2ef1edf01b 4225 "movq %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4226 "psubw %%mm0, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4227 "pcmpgtw %%mm5, %%mm7 \n\t" // pa > pb?
destinyXfate 2:0e2ef1edf01b 4228 "movq %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4229 // use mm7 mask to merge pa & pb
destinyXfate 2:0e2ef1edf01b 4230 "pand %%mm7, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4231 // use mm0 mask copy to merge a & b
destinyXfate 2:0e2ef1edf01b 4232 "pand %%mm0, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4233 "pandn %%mm4, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4234 "pandn %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4235 "paddw %%mm5, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4236 "paddw %%mm2, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4237 // test ((pa <= pb)? pa:pb) <= pc
destinyXfate 2:0e2ef1edf01b 4238 "pcmpgtw %%mm6, %%mm7 \n\t" // pab > pc?
destinyXfate 2:0e2ef1edf01b 4239 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4240 "pand %%mm7, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4241 "pandn %%mm0, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4242 "pxor %%mm1, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4243 "paddw %%mm3, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4244 "pxor %%mm0, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4245 // step ecx to next set of 8 bytes and repeat loop til done
destinyXfate 2:0e2ef1edf01b 4246 "addl $8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4247 "packuswb %%mm7, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4248 "paddb -8(%%edi,%%ecx,), %%mm1 \n\t" // add Paeth predictor with Raw(x)
destinyXfate 2:0e2ef1edf01b 4249 "cmpl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4250 "movq %%mm1, -8(%%edi,%%ecx,) \n\t" // write back updated value
destinyXfate 2:0e2ef1edf01b 4251 // mm1 will be used as Raw(x-bpp) next loop
destinyXfate 2:0e2ef1edf01b 4252 "jb paeth_8lp \n\t"
destinyXfate 2:0e2ef1edf01b 4253
destinyXfate 2:0e2ef1edf01b 4254 : "=S" (dummy_value_S), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4255 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 4256
destinyXfate 2:0e2ef1edf01b 4257 : "0" (prev_row), // esi // input regs
destinyXfate 2:0e2ef1edf01b 4258 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4259
destinyXfate 2:0e2ef1edf01b 4260 : "%ecx" // clobber list
destinyXfate 2:0e2ef1edf01b 4261 #if 0 /* %mm0, ..., %mm7 not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4262 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 4263 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 4264 #endif
destinyXfate 2:0e2ef1edf01b 4265 );
destinyXfate 2:0e2ef1edf01b 4266 }
destinyXfate 2:0e2ef1edf01b 4267 break; // end 8 bpp
destinyXfate 2:0e2ef1edf01b 4268
destinyXfate 2:0e2ef1edf01b 4269 case 1: // bpp = 1
destinyXfate 2:0e2ef1edf01b 4270 case 2: // bpp = 2
destinyXfate 2:0e2ef1edf01b 4271 default: // bpp > 8
destinyXfate 2:0e2ef1edf01b 4272 {
destinyXfate 2:0e2ef1edf01b 4273 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4274 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 4275 "pushl %%ebx \n\t" // save Global Offset Table index
destinyXfate 2:0e2ef1edf01b 4276 #endif
destinyXfate 2:0e2ef1edf01b 4277 "movl _dif, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4278 "cmpl _FullLength, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4279 "jnb paeth_dend \n\t"
destinyXfate 2:0e2ef1edf01b 4280
destinyXfate 2:0e2ef1edf01b 4281 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4282 // preload "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 4283 // do Paeth decode for remaining bytes
destinyXfate 2:0e2ef1edf01b 4284 "movl %%ebx, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4285 // preload "subl bpp, %%edx \n\t" // (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 4286 "subl %%ecx, %%edx \n\t" // edx = ebx - bpp
destinyXfate 2:0e2ef1edf01b 4287 "xorl %%ecx, %%ecx \n\t" // zero ecx before using cl & cx
destinyXfate 2:0e2ef1edf01b 4288
destinyXfate 2:0e2ef1edf01b 4289 "paeth_dlp: \n\t"
destinyXfate 2:0e2ef1edf01b 4290 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4291 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 4292 "movb (%%esi,%%ebx,), %%al \n\t" // load Prior(x) into al
destinyXfate 2:0e2ef1edf01b 4293 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4294 "subl %%ecx, %%eax \n\t" // subtract Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4295 "movl %%eax, _patemp \n\t" // Save pav for later use
destinyXfate 2:0e2ef1edf01b 4296 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4297 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 4298 "movb (%%edi,%%edx,), %%al \n\t" // load Raw(x-bpp) into al
destinyXfate 2:0e2ef1edf01b 4299 "subl %%ecx, %%eax \n\t" // subtract Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4300 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4301 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 4302 "addl _patemp, %%eax \n\t" // pcv = pav + pbv
destinyXfate 2:0e2ef1edf01b 4303 // pc = abs(pcv)
destinyXfate 2:0e2ef1edf01b 4304 "testl $0x80000000, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4305 "jz paeth_dpca \n\t"
destinyXfate 2:0e2ef1edf01b 4306 "negl %%eax \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 4307
destinyXfate 2:0e2ef1edf01b 4308 "paeth_dpca: \n\t"
destinyXfate 2:0e2ef1edf01b 4309 "movl %%eax, _pctemp \n\t" // save pc for later use
destinyXfate 2:0e2ef1edf01b 4310 // pb = abs(pbv)
destinyXfate 2:0e2ef1edf01b 4311 "testl $0x80000000, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4312 "jz paeth_dpba \n\t"
destinyXfate 2:0e2ef1edf01b 4313 "negl %%ecx \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 4314
destinyXfate 2:0e2ef1edf01b 4315 "paeth_dpba: \n\t"
destinyXfate 2:0e2ef1edf01b 4316 "movl %%ecx, _pbtemp \n\t" // save pb for later use
destinyXfate 2:0e2ef1edf01b 4317 // pa = abs(pav)
destinyXfate 2:0e2ef1edf01b 4318 "movl _patemp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4319 "testl $0x80000000, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4320 "jz paeth_dpaa \n\t"
destinyXfate 2:0e2ef1edf01b 4321 "negl %%eax \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 4322
destinyXfate 2:0e2ef1edf01b 4323 "paeth_dpaa: \n\t"
destinyXfate 2:0e2ef1edf01b 4324 "movl %%eax, _patemp \n\t" // save pa for later use
destinyXfate 2:0e2ef1edf01b 4325 // test if pa <= pb
destinyXfate 2:0e2ef1edf01b 4326 "cmpl %%ecx, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4327 "jna paeth_dabb \n\t"
destinyXfate 2:0e2ef1edf01b 4328 // pa > pb; now test if pb <= pc
destinyXfate 2:0e2ef1edf01b 4329 "cmpl _pctemp, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4330 "jna paeth_dbbc \n\t"
destinyXfate 2:0e2ef1edf01b 4331 // pb > pc; Raw(x) = Paeth(x) + Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4332 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4333 "jmp paeth_dpaeth \n\t"
destinyXfate 2:0e2ef1edf01b 4334
destinyXfate 2:0e2ef1edf01b 4335 "paeth_dbbc: \n\t"
destinyXfate 2:0e2ef1edf01b 4336 // pb <= pc; Raw(x) = Paeth(x) + Prior(x)
destinyXfate 2:0e2ef1edf01b 4337 "movb (%%esi,%%ebx,), %%cl \n\t" // load Prior(x) into cl
destinyXfate 2:0e2ef1edf01b 4338 "jmp paeth_dpaeth \n\t"
destinyXfate 2:0e2ef1edf01b 4339
destinyXfate 2:0e2ef1edf01b 4340 "paeth_dabb: \n\t"
destinyXfate 2:0e2ef1edf01b 4341 // pa <= pb; now test if pa <= pc
destinyXfate 2:0e2ef1edf01b 4342 "cmpl _pctemp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4343 "jna paeth_dabc \n\t"
destinyXfate 2:0e2ef1edf01b 4344 // pa > pc; Raw(x) = Paeth(x) + Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4345 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4346 "jmp paeth_dpaeth \n\t"
destinyXfate 2:0e2ef1edf01b 4347
destinyXfate 2:0e2ef1edf01b 4348 "paeth_dabc: \n\t"
destinyXfate 2:0e2ef1edf01b 4349 // pa <= pc; Raw(x) = Paeth(x) + Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 4350 "movb (%%edi,%%edx,), %%cl \n\t" // load Raw(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4351
destinyXfate 2:0e2ef1edf01b 4352 "paeth_dpaeth: \n\t"
destinyXfate 2:0e2ef1edf01b 4353 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4354 "incl %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4355 // Raw(x) = (Paeth(x) + Paeth_Predictor( a, b, c )) mod 256
destinyXfate 2:0e2ef1edf01b 4356 "addb %%cl, -1(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4357 "cmpl _FullLength, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4358 "jb paeth_dlp \n\t"
destinyXfate 2:0e2ef1edf01b 4359
destinyXfate 2:0e2ef1edf01b 4360 "paeth_dend: \n\t"
destinyXfate 2:0e2ef1edf01b 4361 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 4362 "popl %%ebx \n\t" // index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 4363 #endif
destinyXfate 2:0e2ef1edf01b 4364
destinyXfate 2:0e2ef1edf01b 4365 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4366 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 4367 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 4368
destinyXfate 2:0e2ef1edf01b 4369 : "0" (bpp), // ecx // input regs
destinyXfate 2:0e2ef1edf01b 4370 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 4371 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 4372
destinyXfate 2:0e2ef1edf01b 4373 : "%eax", "%edx" // clobber list
destinyXfate 2:0e2ef1edf01b 4374 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 4375 , "%ebx"
destinyXfate 2:0e2ef1edf01b 4376 #endif
destinyXfate 2:0e2ef1edf01b 4377 );
destinyXfate 2:0e2ef1edf01b 4378 }
destinyXfate 2:0e2ef1edf01b 4379 return; // No need to go further with this one
destinyXfate 2:0e2ef1edf01b 4380
destinyXfate 2:0e2ef1edf01b 4381 } // end switch (bpp)
destinyXfate 2:0e2ef1edf01b 4382
destinyXfate 2:0e2ef1edf01b 4383 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4384 // MMX acceleration complete; now do clean-up
destinyXfate 2:0e2ef1edf01b 4385 // check if any remaining bytes left to decode
destinyXfate 2:0e2ef1edf01b 4386 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 4387 "pushl %%ebx \n\t" // save index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 4388 #endif
destinyXfate 2:0e2ef1edf01b 4389 "movl _MMXLength, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4390 "cmpl _FullLength, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4391 "jnb paeth_end \n\t"
destinyXfate 2:0e2ef1edf01b 4392 //pre "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4393 //pre "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 4394 // do Paeth decode for remaining bytes
destinyXfate 2:0e2ef1edf01b 4395 "movl %%ebx, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4396 //pre "subl bpp, %%edx \n\t" // (bpp is preloaded into ecx)
destinyXfate 2:0e2ef1edf01b 4397 "subl %%ecx, %%edx \n\t" // edx = ebx - bpp
destinyXfate 2:0e2ef1edf01b 4398 "xorl %%ecx, %%ecx \n\t" // zero ecx before using cl & cx below
destinyXfate 2:0e2ef1edf01b 4399
destinyXfate 2:0e2ef1edf01b 4400 "paeth_lp2: \n\t"
destinyXfate 2:0e2ef1edf01b 4401 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4402 // pav = p - a = (a + b - c) - a = b - c
destinyXfate 2:0e2ef1edf01b 4403 "movb (%%esi,%%ebx,), %%al \n\t" // load Prior(x) into al
destinyXfate 2:0e2ef1edf01b 4404 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4405 "subl %%ecx, %%eax \n\t" // subtract Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4406 "movl %%eax, _patemp \n\t" // Save pav for later use
destinyXfate 2:0e2ef1edf01b 4407 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4408 // pbv = p - b = (a + b - c) - b = a - c
destinyXfate 2:0e2ef1edf01b 4409 "movb (%%edi,%%edx,), %%al \n\t" // load Raw(x-bpp) into al
destinyXfate 2:0e2ef1edf01b 4410 "subl %%ecx, %%eax \n\t" // subtract Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4411 "movl %%eax, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4412 // pcv = p - c = (a + b - c) -c = (a - c) + (b - c) = pav + pbv
destinyXfate 2:0e2ef1edf01b 4413 "addl _patemp, %%eax \n\t" // pcv = pav + pbv
destinyXfate 2:0e2ef1edf01b 4414 // pc = abs(pcv)
destinyXfate 2:0e2ef1edf01b 4415 "testl $0x80000000, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4416 "jz paeth_pca2 \n\t"
destinyXfate 2:0e2ef1edf01b 4417 "negl %%eax \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 4418
destinyXfate 2:0e2ef1edf01b 4419 "paeth_pca2: \n\t"
destinyXfate 2:0e2ef1edf01b 4420 "movl %%eax, _pctemp \n\t" // save pc for later use
destinyXfate 2:0e2ef1edf01b 4421 // pb = abs(pbv)
destinyXfate 2:0e2ef1edf01b 4422 "testl $0x80000000, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4423 "jz paeth_pba2 \n\t"
destinyXfate 2:0e2ef1edf01b 4424 "negl %%ecx \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 4425
destinyXfate 2:0e2ef1edf01b 4426 "paeth_pba2: \n\t"
destinyXfate 2:0e2ef1edf01b 4427 "movl %%ecx, _pbtemp \n\t" // save pb for later use
destinyXfate 2:0e2ef1edf01b 4428 // pa = abs(pav)
destinyXfate 2:0e2ef1edf01b 4429 "movl _patemp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4430 "testl $0x80000000, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4431 "jz paeth_paa2 \n\t"
destinyXfate 2:0e2ef1edf01b 4432 "negl %%eax \n\t" // reverse sign of neg values
destinyXfate 2:0e2ef1edf01b 4433
destinyXfate 2:0e2ef1edf01b 4434 "paeth_paa2: \n\t"
destinyXfate 2:0e2ef1edf01b 4435 "movl %%eax, _patemp \n\t" // save pa for later use
destinyXfate 2:0e2ef1edf01b 4436 // test if pa <= pb
destinyXfate 2:0e2ef1edf01b 4437 "cmpl %%ecx, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4438 "jna paeth_abb2 \n\t"
destinyXfate 2:0e2ef1edf01b 4439 // pa > pb; now test if pb <= pc
destinyXfate 2:0e2ef1edf01b 4440 "cmpl _pctemp, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4441 "jna paeth_bbc2 \n\t"
destinyXfate 2:0e2ef1edf01b 4442 // pb > pc; Raw(x) = Paeth(x) + Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4443 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4444 "jmp paeth_paeth2 \n\t"
destinyXfate 2:0e2ef1edf01b 4445
destinyXfate 2:0e2ef1edf01b 4446 "paeth_bbc2: \n\t"
destinyXfate 2:0e2ef1edf01b 4447 // pb <= pc; Raw(x) = Paeth(x) + Prior(x)
destinyXfate 2:0e2ef1edf01b 4448 "movb (%%esi,%%ebx,), %%cl \n\t" // load Prior(x) into cl
destinyXfate 2:0e2ef1edf01b 4449 "jmp paeth_paeth2 \n\t"
destinyXfate 2:0e2ef1edf01b 4450
destinyXfate 2:0e2ef1edf01b 4451 "paeth_abb2: \n\t"
destinyXfate 2:0e2ef1edf01b 4452 // pa <= pb; now test if pa <= pc
destinyXfate 2:0e2ef1edf01b 4453 "cmpl _pctemp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4454 "jna paeth_abc2 \n\t"
destinyXfate 2:0e2ef1edf01b 4455 // pa > pc; Raw(x) = Paeth(x) + Prior(x-bpp)
destinyXfate 2:0e2ef1edf01b 4456 "movb (%%esi,%%edx,), %%cl \n\t" // load Prior(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4457 "jmp paeth_paeth2 \n\t"
destinyXfate 2:0e2ef1edf01b 4458
destinyXfate 2:0e2ef1edf01b 4459 "paeth_abc2: \n\t"
destinyXfate 2:0e2ef1edf01b 4460 // pa <= pc; Raw(x) = Paeth(x) + Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 4461 "movb (%%edi,%%edx,), %%cl \n\t" // load Raw(x-bpp) into cl
destinyXfate 2:0e2ef1edf01b 4462
destinyXfate 2:0e2ef1edf01b 4463 "paeth_paeth2: \n\t"
destinyXfate 2:0e2ef1edf01b 4464 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4465 "incl %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4466 // Raw(x) = (Paeth(x) + Paeth_Predictor( a, b, c )) mod 256
destinyXfate 2:0e2ef1edf01b 4467 "addb %%cl, -1(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4468 "cmpl _FullLength, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4469 "jb paeth_lp2 \n\t"
destinyXfate 2:0e2ef1edf01b 4470
destinyXfate 2:0e2ef1edf01b 4471 "paeth_end: \n\t"
destinyXfate 2:0e2ef1edf01b 4472 "EMMS \n\t" // end MMX; prep for poss. FP instrs.
destinyXfate 2:0e2ef1edf01b 4473 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 4474 "popl %%ebx \n\t" // restore index to Global Offset Table
destinyXfate 2:0e2ef1edf01b 4475 #endif
destinyXfate 2:0e2ef1edf01b 4476
destinyXfate 2:0e2ef1edf01b 4477 : "=c" (dummy_value_c), // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4478 "=S" (dummy_value_S),
destinyXfate 2:0e2ef1edf01b 4479 "=D" (dummy_value_D)
destinyXfate 2:0e2ef1edf01b 4480
destinyXfate 2:0e2ef1edf01b 4481 : "0" (bpp), // ecx // input regs
destinyXfate 2:0e2ef1edf01b 4482 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 4483 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 4484
destinyXfate 2:0e2ef1edf01b 4485 : "%eax", "%edx" // clobber list (no input regs!)
destinyXfate 2:0e2ef1edf01b 4486 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 4487 , "%ebx"
destinyXfate 2:0e2ef1edf01b 4488 #endif
destinyXfate 2:0e2ef1edf01b 4489 );
destinyXfate 2:0e2ef1edf01b 4490
destinyXfate 2:0e2ef1edf01b 4491 } /* end png_read_filter_row_mmx_paeth() */
destinyXfate 2:0e2ef1edf01b 4492 #endif
destinyXfate 2:0e2ef1edf01b 4493
destinyXfate 2:0e2ef1edf01b 4494
destinyXfate 2:0e2ef1edf01b 4495
destinyXfate 2:0e2ef1edf01b 4496
destinyXfate 2:0e2ef1edf01b 4497 #ifdef PNG_THREAD_UNSAFE_OK
destinyXfate 2:0e2ef1edf01b 4498 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 4499 // //
destinyXfate 2:0e2ef1edf01b 4500 // P N G _ R E A D _ F I L T E R _ R O W _ M M X _ S U B //
destinyXfate 2:0e2ef1edf01b 4501 // //
destinyXfate 2:0e2ef1edf01b 4502 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 4503
destinyXfate 2:0e2ef1edf01b 4504 // Optimized code for PNG Sub filter decoder
destinyXfate 2:0e2ef1edf01b 4505
destinyXfate 2:0e2ef1edf01b 4506 static void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 4507 png_read_filter_row_mmx_sub(png_row_infop row_info, png_bytep row)
destinyXfate 2:0e2ef1edf01b 4508 {
destinyXfate 2:0e2ef1edf01b 4509 int bpp;
destinyXfate 2:0e2ef1edf01b 4510 int dummy_value_a;
destinyXfate 2:0e2ef1edf01b 4511 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 4512
destinyXfate 2:0e2ef1edf01b 4513 bpp = (row_info->pixel_depth + 7) >> 3; // calc number of bytes per pixel
destinyXfate 2:0e2ef1edf01b 4514 _FullLength = row_info->rowbytes - bpp; // number of bytes to filter
destinyXfate 2:0e2ef1edf01b 4515
destinyXfate 2:0e2ef1edf01b 4516 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4517 //pre "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4518 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4519 //pre "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4520 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4521 //irr "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4522 // get # of bytes to alignment
destinyXfate 2:0e2ef1edf01b 4523 "movl %%edi, _dif \n\t" // take start of row
destinyXfate 2:0e2ef1edf01b 4524 "addl $0xf, _dif \n\t" // add 7 + 8 to incr past
destinyXfate 2:0e2ef1edf01b 4525 // alignment boundary
destinyXfate 2:0e2ef1edf01b 4526 "xorl %%ecx, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4527 "andl $0xfffffff8, _dif \n\t" // mask to alignment boundary
destinyXfate 2:0e2ef1edf01b 4528 "subl %%edi, _dif \n\t" // subtract from start ==> value
destinyXfate 2:0e2ef1edf01b 4529 "jz sub_go \n\t" // ecx at alignment
destinyXfate 2:0e2ef1edf01b 4530
destinyXfate 2:0e2ef1edf01b 4531 "sub_lp1: \n\t" // fix alignment
destinyXfate 2:0e2ef1edf01b 4532 "movb (%%esi,%%ecx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 4533 "addb %%al, (%%edi,%%ecx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4534 "incl %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4535 "cmpl _dif, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4536 "jb sub_lp1 \n\t"
destinyXfate 2:0e2ef1edf01b 4537
destinyXfate 2:0e2ef1edf01b 4538 "sub_go: \n\t"
destinyXfate 2:0e2ef1edf01b 4539 "movl _FullLength, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4540 "movl %%eax, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4541 "subl %%ecx, %%edx \n\t" // subtract alignment fix
destinyXfate 2:0e2ef1edf01b 4542 "andl $0x00000007, %%edx \n\t" // calc bytes over mult of 8
destinyXfate 2:0e2ef1edf01b 4543 "subl %%edx, %%eax \n\t" // drop over bytes from length
destinyXfate 2:0e2ef1edf01b 4544 "movl %%eax, _MMXLength \n\t"
destinyXfate 2:0e2ef1edf01b 4545
destinyXfate 2:0e2ef1edf01b 4546 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4547 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4548
destinyXfate 2:0e2ef1edf01b 4549 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4550 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4551
destinyXfate 2:0e2ef1edf01b 4552 : "%esi", "%ecx", "%edx" // clobber list
destinyXfate 2:0e2ef1edf01b 4553
destinyXfate 2:0e2ef1edf01b 4554 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4555 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 4556 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 4557 #endif
destinyXfate 2:0e2ef1edf01b 4558 );
destinyXfate 2:0e2ef1edf01b 4559
destinyXfate 2:0e2ef1edf01b 4560 // now do the math for the rest of the row
destinyXfate 2:0e2ef1edf01b 4561 switch (bpp)
destinyXfate 2:0e2ef1edf01b 4562 {
destinyXfate 2:0e2ef1edf01b 4563 case 3:
destinyXfate 2:0e2ef1edf01b 4564 {
destinyXfate 2:0e2ef1edf01b 4565 _ActiveMask.use = 0x0000ffffff000000LL;
destinyXfate 2:0e2ef1edf01b 4566 _ShiftBpp.use = 24; // == 3 * 8
destinyXfate 2:0e2ef1edf01b 4567 _ShiftRem.use = 40; // == 64 - 24
destinyXfate 2:0e2ef1edf01b 4568
destinyXfate 2:0e2ef1edf01b 4569 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4570 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4571 "movq _ActiveMask, %%mm7 \n\t" // load _ActiveMask for 2nd
destinyXfate 2:0e2ef1edf01b 4572 // active byte group
destinyXfate 2:0e2ef1edf01b 4573 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4574 // preload "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4575 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4576 "movq %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4577 "movl _dif, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4578 "psllq _ShiftBpp, %%mm6 \n\t" // move mask in mm6 to cover
destinyXfate 2:0e2ef1edf01b 4579 // 3rd active byte group
destinyXfate 2:0e2ef1edf01b 4580 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 4581 "movq -8(%%edi,%%edx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4582
destinyXfate 2:0e2ef1edf01b 4583 "sub_3lp: \n\t" // shift data for adding first
destinyXfate 2:0e2ef1edf01b 4584 "psrlq _ShiftRem, %%mm1 \n\t" // bpp bytes (no need for mask;
destinyXfate 2:0e2ef1edf01b 4585 // shift clears inactive bytes)
destinyXfate 2:0e2ef1edf01b 4586 // add 1st active group
destinyXfate 2:0e2ef1edf01b 4587 "movq (%%edi,%%edx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4588 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4589
destinyXfate 2:0e2ef1edf01b 4590 // add 2nd active group
destinyXfate 2:0e2ef1edf01b 4591 "movq %%mm0, %%mm1 \n\t" // mov updated Raws to mm1
destinyXfate 2:0e2ef1edf01b 4592 "psllq _ShiftBpp, %%mm1 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 4593 "pand %%mm7, %%mm1 \n\t" // mask to use 2nd active group
destinyXfate 2:0e2ef1edf01b 4594 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4595
destinyXfate 2:0e2ef1edf01b 4596 // add 3rd active group
destinyXfate 2:0e2ef1edf01b 4597 "movq %%mm0, %%mm1 \n\t" // mov updated Raws to mm1
destinyXfate 2:0e2ef1edf01b 4598 "psllq _ShiftBpp, %%mm1 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 4599 "pand %%mm6, %%mm1 \n\t" // mask to use 3rd active group
destinyXfate 2:0e2ef1edf01b 4600 "addl $8, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4601 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4602
destinyXfate 2:0e2ef1edf01b 4603 "cmpl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4604 "movq %%mm0, -8(%%edi,%%edx,) \n\t" // write updated Raws to array
destinyXfate 2:0e2ef1edf01b 4605 "movq %%mm0, %%mm1 \n\t" // prep 1st add at top of loop
destinyXfate 2:0e2ef1edf01b 4606 "jb sub_3lp \n\t"
destinyXfate 2:0e2ef1edf01b 4607
destinyXfate 2:0e2ef1edf01b 4608 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4609 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4610
destinyXfate 2:0e2ef1edf01b 4611 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4612 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4613
destinyXfate 2:0e2ef1edf01b 4614 : "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4615 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4616 , "%mm0", "%mm1", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 4617 #endif
destinyXfate 2:0e2ef1edf01b 4618 );
destinyXfate 2:0e2ef1edf01b 4619 }
destinyXfate 2:0e2ef1edf01b 4620 break;
destinyXfate 2:0e2ef1edf01b 4621
destinyXfate 2:0e2ef1edf01b 4622 case 1:
destinyXfate 2:0e2ef1edf01b 4623 {
destinyXfate 2:0e2ef1edf01b 4624 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4625 "movl _dif, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4626 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4627 "cmpl _FullLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4628 "jnb sub_1end \n\t"
destinyXfate 2:0e2ef1edf01b 4629 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4630 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4631 // preload "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4632 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4633
destinyXfate 2:0e2ef1edf01b 4634 "sub_1lp: \n\t"
destinyXfate 2:0e2ef1edf01b 4635 "movb (%%esi,%%edx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 4636 "addb %%al, (%%edi,%%edx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4637 "incl %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4638 "cmpl _FullLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4639 "jb sub_1lp \n\t"
destinyXfate 2:0e2ef1edf01b 4640
destinyXfate 2:0e2ef1edf01b 4641 "sub_1end: \n\t"
destinyXfate 2:0e2ef1edf01b 4642
destinyXfate 2:0e2ef1edf01b 4643 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4644 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4645
destinyXfate 2:0e2ef1edf01b 4646 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4647 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4648
destinyXfate 2:0e2ef1edf01b 4649 : "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4650 );
destinyXfate 2:0e2ef1edf01b 4651 }
destinyXfate 2:0e2ef1edf01b 4652 return;
destinyXfate 2:0e2ef1edf01b 4653
destinyXfate 2:0e2ef1edf01b 4654 case 6:
destinyXfate 2:0e2ef1edf01b 4655 case 4:
destinyXfate 2:0e2ef1edf01b 4656 //case 7: // GRR BOGUS
destinyXfate 2:0e2ef1edf01b 4657 //case 5: // GRR BOGUS
destinyXfate 2:0e2ef1edf01b 4658 {
destinyXfate 2:0e2ef1edf01b 4659 _ShiftBpp.use = bpp << 3;
destinyXfate 2:0e2ef1edf01b 4660 _ShiftRem.use = 64 - _ShiftBpp.use;
destinyXfate 2:0e2ef1edf01b 4661
destinyXfate 2:0e2ef1edf01b 4662 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4663 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4664 "movl _dif, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4665 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4666 // preload "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4667 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4668
destinyXfate 2:0e2ef1edf01b 4669 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 4670 "movq -8(%%edi,%%edx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4671
destinyXfate 2:0e2ef1edf01b 4672 "sub_4lp: \n\t" // shift data for adding first
destinyXfate 2:0e2ef1edf01b 4673 "psrlq _ShiftRem, %%mm1 \n\t" // bpp bytes (no need for mask;
destinyXfate 2:0e2ef1edf01b 4674 // shift clears inactive bytes)
destinyXfate 2:0e2ef1edf01b 4675 "movq (%%edi,%%edx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4676 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4677
destinyXfate 2:0e2ef1edf01b 4678 // add 2nd active group
destinyXfate 2:0e2ef1edf01b 4679 "movq %%mm0, %%mm1 \n\t" // mov updated Raws to mm1
destinyXfate 2:0e2ef1edf01b 4680 "psllq _ShiftBpp, %%mm1 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 4681 "addl $8, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4682 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4683
destinyXfate 2:0e2ef1edf01b 4684 "cmpl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4685 "movq %%mm0, -8(%%edi,%%edx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4686 "movq %%mm0, %%mm1 \n\t" // prep 1st add at top of loop
destinyXfate 2:0e2ef1edf01b 4687 "jb sub_4lp \n\t"
destinyXfate 2:0e2ef1edf01b 4688
destinyXfate 2:0e2ef1edf01b 4689 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4690 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4691
destinyXfate 2:0e2ef1edf01b 4692 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4693 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4694
destinyXfate 2:0e2ef1edf01b 4695 : "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4696 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4697 , "%mm0", "%mm1"
destinyXfate 2:0e2ef1edf01b 4698 #endif
destinyXfate 2:0e2ef1edf01b 4699 );
destinyXfate 2:0e2ef1edf01b 4700 }
destinyXfate 2:0e2ef1edf01b 4701 break;
destinyXfate 2:0e2ef1edf01b 4702
destinyXfate 2:0e2ef1edf01b 4703 case 2:
destinyXfate 2:0e2ef1edf01b 4704 {
destinyXfate 2:0e2ef1edf01b 4705 _ActiveMask.use = 0x00000000ffff0000LL;
destinyXfate 2:0e2ef1edf01b 4706 _ShiftBpp.use = 16; // == 2 * 8
destinyXfate 2:0e2ef1edf01b 4707 _ShiftRem.use = 48; // == 64 - 16
destinyXfate 2:0e2ef1edf01b 4708
destinyXfate 2:0e2ef1edf01b 4709 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4710 "movq _ActiveMask, %%mm7 \n\t" // load _ActiveMask for 2nd
destinyXfate 2:0e2ef1edf01b 4711 // active byte group
destinyXfate 2:0e2ef1edf01b 4712 "movl _dif, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4713 "movq %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4714 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4715 "psllq _ShiftBpp, %%mm6 \n\t" // move mask in mm6 to cover
destinyXfate 2:0e2ef1edf01b 4716 // 3rd active byte group
destinyXfate 2:0e2ef1edf01b 4717 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4718 "movq %%mm6, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4719 // preload "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4720 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4721 "psllq _ShiftBpp, %%mm5 \n\t" // move mask in mm5 to cover
destinyXfate 2:0e2ef1edf01b 4722 // 4th active byte group
destinyXfate 2:0e2ef1edf01b 4723 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 4724 "movq -8(%%edi,%%edx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4725
destinyXfate 2:0e2ef1edf01b 4726 "sub_2lp: \n\t" // shift data for adding first
destinyXfate 2:0e2ef1edf01b 4727 "psrlq _ShiftRem, %%mm1 \n\t" // bpp bytes (no need for mask;
destinyXfate 2:0e2ef1edf01b 4728 // shift clears inactive bytes)
destinyXfate 2:0e2ef1edf01b 4729 // add 1st active group
destinyXfate 2:0e2ef1edf01b 4730 "movq (%%edi,%%edx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4731 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4732
destinyXfate 2:0e2ef1edf01b 4733 // add 2nd active group
destinyXfate 2:0e2ef1edf01b 4734 "movq %%mm0, %%mm1 \n\t" // mov updated Raws to mm1
destinyXfate 2:0e2ef1edf01b 4735 "psllq _ShiftBpp, %%mm1 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 4736 "pand %%mm7, %%mm1 \n\t" // mask to use 2nd active group
destinyXfate 2:0e2ef1edf01b 4737 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4738
destinyXfate 2:0e2ef1edf01b 4739 // add 3rd active group
destinyXfate 2:0e2ef1edf01b 4740 "movq %%mm0, %%mm1 \n\t" // mov updated Raws to mm1
destinyXfate 2:0e2ef1edf01b 4741 "psllq _ShiftBpp, %%mm1 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 4742 "pand %%mm6, %%mm1 \n\t" // mask to use 3rd active group
destinyXfate 2:0e2ef1edf01b 4743 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4744
destinyXfate 2:0e2ef1edf01b 4745 // add 4th active group
destinyXfate 2:0e2ef1edf01b 4746 "movq %%mm0, %%mm1 \n\t" // mov updated Raws to mm1
destinyXfate 2:0e2ef1edf01b 4747 "psllq _ShiftBpp, %%mm1 \n\t" // shift data to pos. correctly
destinyXfate 2:0e2ef1edf01b 4748 "pand %%mm5, %%mm1 \n\t" // mask to use 4th active group
destinyXfate 2:0e2ef1edf01b 4749 "addl $8, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4750 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4751 "cmpl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4752 "movq %%mm0, -8(%%edi,%%edx,) \n\t" // write updated Raws to array
destinyXfate 2:0e2ef1edf01b 4753 "movq %%mm0, %%mm1 \n\t" // prep 1st add at top of loop
destinyXfate 2:0e2ef1edf01b 4754 "jb sub_2lp \n\t"
destinyXfate 2:0e2ef1edf01b 4755
destinyXfate 2:0e2ef1edf01b 4756 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4757 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4758
destinyXfate 2:0e2ef1edf01b 4759 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4760 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4761
destinyXfate 2:0e2ef1edf01b 4762 : "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4763 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4764 , "%mm0", "%mm1", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 4765 #endif
destinyXfate 2:0e2ef1edf01b 4766 );
destinyXfate 2:0e2ef1edf01b 4767 }
destinyXfate 2:0e2ef1edf01b 4768 break;
destinyXfate 2:0e2ef1edf01b 4769
destinyXfate 2:0e2ef1edf01b 4770 case 8:
destinyXfate 2:0e2ef1edf01b 4771 {
destinyXfate 2:0e2ef1edf01b 4772 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4773 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4774 "movl _dif, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4775 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4776 // preload "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4777 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4778 "movl _MMXLength, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4779
destinyXfate 2:0e2ef1edf01b 4780 // prime the pump: load the first Raw(x-bpp) data set
destinyXfate 2:0e2ef1edf01b 4781 "movq -8(%%edi,%%edx,), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4782 "andl $0x0000003f, %%ecx \n\t" // calc bytes over mult of 64
destinyXfate 2:0e2ef1edf01b 4783
destinyXfate 2:0e2ef1edf01b 4784 "sub_8lp: \n\t"
destinyXfate 2:0e2ef1edf01b 4785 "movq (%%edi,%%edx,), %%mm0 \n\t" // load Sub(x) for 1st 8 bytes
destinyXfate 2:0e2ef1edf01b 4786 "paddb %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4787 "movq 8(%%edi,%%edx,), %%mm1 \n\t" // load Sub(x) for 2nd 8 bytes
destinyXfate 2:0e2ef1edf01b 4788 "movq %%mm0, (%%edi,%%edx,) \n\t" // write Raw(x) for 1st 8 bytes
destinyXfate 2:0e2ef1edf01b 4789
destinyXfate 2:0e2ef1edf01b 4790 // Now mm0 will be used as Raw(x-bpp) for the 2nd group of 8 bytes.
destinyXfate 2:0e2ef1edf01b 4791 // This will be repeated for each group of 8 bytes with the 8th
destinyXfate 2:0e2ef1edf01b 4792 // group being used as the Raw(x-bpp) for the 1st group of the
destinyXfate 2:0e2ef1edf01b 4793 // next loop.
destinyXfate 2:0e2ef1edf01b 4794
destinyXfate 2:0e2ef1edf01b 4795 "paddb %%mm0, %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4796 "movq 16(%%edi,%%edx,), %%mm2 \n\t" // load Sub(x) for 3rd 8 bytes
destinyXfate 2:0e2ef1edf01b 4797 "movq %%mm1, 8(%%edi,%%edx,) \n\t" // write Raw(x) for 2nd 8 bytes
destinyXfate 2:0e2ef1edf01b 4798 "paddb %%mm1, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4799 "movq 24(%%edi,%%edx,), %%mm3 \n\t" // load Sub(x) for 4th 8 bytes
destinyXfate 2:0e2ef1edf01b 4800 "movq %%mm2, 16(%%edi,%%edx,) \n\t" // write Raw(x) for 3rd 8 bytes
destinyXfate 2:0e2ef1edf01b 4801 "paddb %%mm2, %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4802 "movq 32(%%edi,%%edx,), %%mm4 \n\t" // load Sub(x) for 5th 8 bytes
destinyXfate 2:0e2ef1edf01b 4803 "movq %%mm3, 24(%%edi,%%edx,) \n\t" // write Raw(x) for 4th 8 bytes
destinyXfate 2:0e2ef1edf01b 4804 "paddb %%mm3, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4805 "movq 40(%%edi,%%edx,), %%mm5 \n\t" // load Sub(x) for 6th 8 bytes
destinyXfate 2:0e2ef1edf01b 4806 "movq %%mm4, 32(%%edi,%%edx,) \n\t" // write Raw(x) for 5th 8 bytes
destinyXfate 2:0e2ef1edf01b 4807 "paddb %%mm4, %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4808 "movq 48(%%edi,%%edx,), %%mm6 \n\t" // load Sub(x) for 7th 8 bytes
destinyXfate 2:0e2ef1edf01b 4809 "movq %%mm5, 40(%%edi,%%edx,) \n\t" // write Raw(x) for 6th 8 bytes
destinyXfate 2:0e2ef1edf01b 4810 "paddb %%mm5, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4811 "movq 56(%%edi,%%edx,), %%mm7 \n\t" // load Sub(x) for 8th 8 bytes
destinyXfate 2:0e2ef1edf01b 4812 "movq %%mm6, 48(%%edi,%%edx,) \n\t" // write Raw(x) for 7th 8 bytes
destinyXfate 2:0e2ef1edf01b 4813 "addl $64, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4814 "paddb %%mm6, %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4815 "cmpl %%ecx, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4816 "movq %%mm7, -8(%%edi,%%edx,) \n\t" // write Raw(x) for 8th 8 bytes
destinyXfate 2:0e2ef1edf01b 4817 "jb sub_8lp \n\t"
destinyXfate 2:0e2ef1edf01b 4818
destinyXfate 2:0e2ef1edf01b 4819 "cmpl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4820 "jnb sub_8lt8 \n\t"
destinyXfate 2:0e2ef1edf01b 4821
destinyXfate 2:0e2ef1edf01b 4822 "sub_8lpA: \n\t"
destinyXfate 2:0e2ef1edf01b 4823 "movq (%%edi,%%edx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4824 "addl $8, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4825 "paddb %%mm7, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4826 "cmpl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4827 "movq %%mm0, -8(%%edi,%%edx,) \n\t" // -8 to offset early addl edx
destinyXfate 2:0e2ef1edf01b 4828 "movq %%mm0, %%mm7 \n\t" // move calculated Raw(x) data
destinyXfate 2:0e2ef1edf01b 4829 // to mm1 to be new Raw(x-bpp)
destinyXfate 2:0e2ef1edf01b 4830 // for next loop
destinyXfate 2:0e2ef1edf01b 4831 "jb sub_8lpA \n\t"
destinyXfate 2:0e2ef1edf01b 4832
destinyXfate 2:0e2ef1edf01b 4833 "sub_8lt8: \n\t"
destinyXfate 2:0e2ef1edf01b 4834
destinyXfate 2:0e2ef1edf01b 4835 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4836 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4837
destinyXfate 2:0e2ef1edf01b 4838 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4839 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4840
destinyXfate 2:0e2ef1edf01b 4841 : "%ecx", "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4842 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4843 , "%mm0", "%mm1", "%mm2", "%mm3", "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 4844 #endif
destinyXfate 2:0e2ef1edf01b 4845 );
destinyXfate 2:0e2ef1edf01b 4846 }
destinyXfate 2:0e2ef1edf01b 4847 break;
destinyXfate 2:0e2ef1edf01b 4848
destinyXfate 2:0e2ef1edf01b 4849 default: // bpp greater than 8 bytes GRR BOGUS
destinyXfate 2:0e2ef1edf01b 4850 {
destinyXfate 2:0e2ef1edf01b 4851 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4852 "movl _dif, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4853 // preload "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4854 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4855 // preload "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4856 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4857
destinyXfate 2:0e2ef1edf01b 4858 "sub_Alp: \n\t"
destinyXfate 2:0e2ef1edf01b 4859 "movq (%%edi,%%edx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4860 "movq (%%esi,%%edx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4861 "addl $8, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4862 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4863 "cmpl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4864 "movq %%mm0, -8(%%edi,%%edx,) \n\t" // mov does not affect flags;
destinyXfate 2:0e2ef1edf01b 4865 // -8 to offset addl edx
destinyXfate 2:0e2ef1edf01b 4866 "jb sub_Alp \n\t"
destinyXfate 2:0e2ef1edf01b 4867
destinyXfate 2:0e2ef1edf01b 4868 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4869 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4870
destinyXfate 2:0e2ef1edf01b 4871 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4872 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4873
destinyXfate 2:0e2ef1edf01b 4874 : "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4875 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 4876 , "%mm0", "%mm1"
destinyXfate 2:0e2ef1edf01b 4877 #endif
destinyXfate 2:0e2ef1edf01b 4878 );
destinyXfate 2:0e2ef1edf01b 4879 }
destinyXfate 2:0e2ef1edf01b 4880 break;
destinyXfate 2:0e2ef1edf01b 4881
destinyXfate 2:0e2ef1edf01b 4882 } // end switch (bpp)
destinyXfate 2:0e2ef1edf01b 4883
destinyXfate 2:0e2ef1edf01b 4884 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4885 "movl _MMXLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4886 //pre "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4887 "cmpl _FullLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4888 "jnb sub_end \n\t"
destinyXfate 2:0e2ef1edf01b 4889
destinyXfate 2:0e2ef1edf01b 4890 "movl %%edi, %%esi \n\t" // lp = row
destinyXfate 2:0e2ef1edf01b 4891 //pre "movl bpp, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4892 "addl %%eax, %%edi \n\t" // rp = row + bpp
destinyXfate 2:0e2ef1edf01b 4893 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4894
destinyXfate 2:0e2ef1edf01b 4895 "sub_lp2: \n\t"
destinyXfate 2:0e2ef1edf01b 4896 "movb (%%esi,%%edx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 4897 "addb %%al, (%%edi,%%edx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4898 "incl %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4899 "cmpl _FullLength, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4900 "jb sub_lp2 \n\t"
destinyXfate 2:0e2ef1edf01b 4901
destinyXfate 2:0e2ef1edf01b 4902 "sub_end: \n\t"
destinyXfate 2:0e2ef1edf01b 4903 "EMMS \n\t" // end MMX instructions
destinyXfate 2:0e2ef1edf01b 4904
destinyXfate 2:0e2ef1edf01b 4905 : "=a" (dummy_value_a), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 4906 "=D" (dummy_value_D) // 1
destinyXfate 2:0e2ef1edf01b 4907
destinyXfate 2:0e2ef1edf01b 4908 : "0" (bpp), // eax // input regs
destinyXfate 2:0e2ef1edf01b 4909 "1" (row) // edi
destinyXfate 2:0e2ef1edf01b 4910
destinyXfate 2:0e2ef1edf01b 4911 : "%edx", "%esi" // clobber list
destinyXfate 2:0e2ef1edf01b 4912 );
destinyXfate 2:0e2ef1edf01b 4913
destinyXfate 2:0e2ef1edf01b 4914 } // end of png_read_filter_row_mmx_sub()
destinyXfate 2:0e2ef1edf01b 4915 #endif
destinyXfate 2:0e2ef1edf01b 4916
destinyXfate 2:0e2ef1edf01b 4917
destinyXfate 2:0e2ef1edf01b 4918
destinyXfate 2:0e2ef1edf01b 4919
destinyXfate 2:0e2ef1edf01b 4920 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 4921 // //
destinyXfate 2:0e2ef1edf01b 4922 // P N G _ R E A D _ F I L T E R _ R O W _ M M X _ U P //
destinyXfate 2:0e2ef1edf01b 4923 // //
destinyXfate 2:0e2ef1edf01b 4924 //===========================================================================//
destinyXfate 2:0e2ef1edf01b 4925
destinyXfate 2:0e2ef1edf01b 4926 // Optimized code for PNG Up filter decoder
destinyXfate 2:0e2ef1edf01b 4927
destinyXfate 2:0e2ef1edf01b 4928 static void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 4929 png_read_filter_row_mmx_up(png_row_infop row_info, png_bytep row,
destinyXfate 2:0e2ef1edf01b 4930 png_bytep prev_row)
destinyXfate 2:0e2ef1edf01b 4931 {
destinyXfate 2:0e2ef1edf01b 4932 png_uint_32 len;
destinyXfate 2:0e2ef1edf01b 4933 int dummy_value_d; // fix 'forbidden register 3 (dx) was spilled' error
destinyXfate 2:0e2ef1edf01b 4934 int dummy_value_S;
destinyXfate 2:0e2ef1edf01b 4935 int dummy_value_D;
destinyXfate 2:0e2ef1edf01b 4936
destinyXfate 2:0e2ef1edf01b 4937 len = row_info->rowbytes; // number of bytes to filter
destinyXfate 2:0e2ef1edf01b 4938
destinyXfate 2:0e2ef1edf01b 4939 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 4940 //pre "movl row, %%edi \n\t"
destinyXfate 2:0e2ef1edf01b 4941 // get # of bytes to alignment
destinyXfate 2:0e2ef1edf01b 4942 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 4943 "pushl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4944 #endif
destinyXfate 2:0e2ef1edf01b 4945 "movl %%edi, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4946 "xorl %%ebx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4947 "addl $0x7, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4948 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 4949 "andl $0xfffffff8, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4950 //pre "movl prev_row, %%esi \n\t"
destinyXfate 2:0e2ef1edf01b 4951 "subl %%edi, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4952 "jz up_go \n\t"
destinyXfate 2:0e2ef1edf01b 4953
destinyXfate 2:0e2ef1edf01b 4954 "up_lp1: \n\t" // fix alignment
destinyXfate 2:0e2ef1edf01b 4955 "movb (%%edi,%%ebx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 4956 "addb (%%esi,%%ebx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 4957 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4958 "cmpl %%ecx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 4959 "movb %%al, -1(%%edi,%%ebx,) \n\t" // mov does not affect flags; -1 to
destinyXfate 2:0e2ef1edf01b 4960 "jb up_lp1 \n\t" // offset incl ebx
destinyXfate 2:0e2ef1edf01b 4961
destinyXfate 2:0e2ef1edf01b 4962 "up_go: \n\t"
destinyXfate 2:0e2ef1edf01b 4963 //pre "movl len, %%edx \n\t"
destinyXfate 2:0e2ef1edf01b 4964 "movl %%edx, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 4965 "subl %%ebx, %%edx \n\t" // subtract alignment fix
destinyXfate 2:0e2ef1edf01b 4966 "andl $0x0000003f, %%edx \n\t" // calc bytes over mult of 64
destinyXfate 2:0e2ef1edf01b 4967 "subl %%edx, %%ecx \n\t" // drop over bytes from length
destinyXfate 2:0e2ef1edf01b 4968
destinyXfate 2:0e2ef1edf01b 4969 // unrolled loop - use all MMX registers and interleave to reduce
destinyXfate 2:0e2ef1edf01b 4970 // number of branch instructions (loops) and reduce partial stalls
destinyXfate 2:0e2ef1edf01b 4971 "up_loop: \n\t"
destinyXfate 2:0e2ef1edf01b 4972 "movq (%%esi,%%ebx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4973 "movq (%%edi,%%ebx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4974 "movq 8(%%esi,%%ebx,), %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4975 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4976 "movq 8(%%edi,%%ebx,), %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4977 "movq %%mm0, (%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4978 "paddb %%mm3, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4979 "movq 16(%%esi,%%ebx,), %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4980 "movq %%mm2, 8(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4981 "movq 16(%%edi,%%ebx,), %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4982 "movq 24(%%esi,%%ebx,), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4983 "paddb %%mm5, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4984 "movq 24(%%edi,%%ebx,), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4985 "movq %%mm4, 16(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4986 "paddb %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 4987 "movq 32(%%esi,%%ebx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 4988 "movq %%mm6, 24(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4989 "movq 32(%%edi,%%ebx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4990 "movq 40(%%esi,%%ebx,), %%mm3 \n\t"
destinyXfate 2:0e2ef1edf01b 4991 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 4992 "movq 40(%%edi,%%ebx,), %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4993 "movq %%mm0, 32(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4994 "paddb %%mm3, %%mm2 \n\t"
destinyXfate 2:0e2ef1edf01b 4995 "movq 48(%%esi,%%ebx,), %%mm5 \n\t"
destinyXfate 2:0e2ef1edf01b 4996 "movq %%mm2, 40(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 4997 "movq 48(%%edi,%%ebx,), %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 4998 "movq 56(%%esi,%%ebx,), %%mm7 \n\t"
destinyXfate 2:0e2ef1edf01b 4999 "paddb %%mm5, %%mm4 \n\t"
destinyXfate 2:0e2ef1edf01b 5000 "movq 56(%%edi,%%ebx,), %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 5001 "movq %%mm4, 48(%%edi,%%ebx,) \n\t"
destinyXfate 2:0e2ef1edf01b 5002 "addl $64, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5003 "paddb %%mm7, %%mm6 \n\t"
destinyXfate 2:0e2ef1edf01b 5004 "cmpl %%ecx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5005 "movq %%mm6, -8(%%edi,%%ebx,) \n\t" // (+56)movq does not affect flags;
destinyXfate 2:0e2ef1edf01b 5006 "jb up_loop \n\t" // -8 to offset addl ebx
destinyXfate 2:0e2ef1edf01b 5007
destinyXfate 2:0e2ef1edf01b 5008 "cmpl $0, %%edx \n\t" // test for bytes over mult of 64
destinyXfate 2:0e2ef1edf01b 5009 "jz up_end \n\t"
destinyXfate 2:0e2ef1edf01b 5010
destinyXfate 2:0e2ef1edf01b 5011 "cmpl $8, %%edx \n\t" // test for less than 8 bytes
destinyXfate 2:0e2ef1edf01b 5012 "jb up_lt8 \n\t" // [added by lcreeve at netins.net]
destinyXfate 2:0e2ef1edf01b 5013
destinyXfate 2:0e2ef1edf01b 5014 "addl %%edx, %%ecx \n\t"
destinyXfate 2:0e2ef1edf01b 5015 "andl $0x00000007, %%edx \n\t" // calc bytes over mult of 8
destinyXfate 2:0e2ef1edf01b 5016 "subl %%edx, %%ecx \n\t" // drop over bytes from length
destinyXfate 2:0e2ef1edf01b 5017 "jz up_lt8 \n\t"
destinyXfate 2:0e2ef1edf01b 5018
destinyXfate 2:0e2ef1edf01b 5019 "up_lpA: \n\t" // use MMX regs to update 8 bytes sim.
destinyXfate 2:0e2ef1edf01b 5020 "movq (%%esi,%%ebx,), %%mm1 \n\t"
destinyXfate 2:0e2ef1edf01b 5021 "movq (%%edi,%%ebx,), %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 5022 "addl $8, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5023 "paddb %%mm1, %%mm0 \n\t"
destinyXfate 2:0e2ef1edf01b 5024 "cmpl %%ecx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5025 "movq %%mm0, -8(%%edi,%%ebx,) \n\t" // movq does not affect flags; -8 to
destinyXfate 2:0e2ef1edf01b 5026 "jb up_lpA \n\t" // offset add ebx
destinyXfate 2:0e2ef1edf01b 5027 "cmpl $0, %%edx \n\t" // test for bytes over mult of 8
destinyXfate 2:0e2ef1edf01b 5028 "jz up_end \n\t"
destinyXfate 2:0e2ef1edf01b 5029
destinyXfate 2:0e2ef1edf01b 5030 "up_lt8: \n\t"
destinyXfate 2:0e2ef1edf01b 5031 "xorl %%eax, %%eax \n\t"
destinyXfate 2:0e2ef1edf01b 5032 "addl %%edx, %%ecx \n\t" // move over byte count into counter
destinyXfate 2:0e2ef1edf01b 5033
destinyXfate 2:0e2ef1edf01b 5034 "up_lp2: \n\t" // use x86 regs for remaining bytes
destinyXfate 2:0e2ef1edf01b 5035 "movb (%%edi,%%ebx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 5036 "addb (%%esi,%%ebx,), %%al \n\t"
destinyXfate 2:0e2ef1edf01b 5037 "incl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5038 "cmpl %%ecx, %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5039 "movb %%al, -1(%%edi,%%ebx,) \n\t" // mov does not affect flags; -1 to
destinyXfate 2:0e2ef1edf01b 5040 "jb up_lp2 \n\t" // offset inc ebx
destinyXfate 2:0e2ef1edf01b 5041
destinyXfate 2:0e2ef1edf01b 5042 "up_end: \n\t"
destinyXfate 2:0e2ef1edf01b 5043 "EMMS \n\t" // conversion of filtered row complete
destinyXfate 2:0e2ef1edf01b 5044 #ifdef __PIC__
destinyXfate 2:0e2ef1edf01b 5045 "popl %%ebx \n\t"
destinyXfate 2:0e2ef1edf01b 5046 #endif
destinyXfate 2:0e2ef1edf01b 5047
destinyXfate 2:0e2ef1edf01b 5048 : "=d" (dummy_value_d), // 0 // output regs (dummy)
destinyXfate 2:0e2ef1edf01b 5049 "=S" (dummy_value_S), // 1
destinyXfate 2:0e2ef1edf01b 5050 "=D" (dummy_value_D) // 2
destinyXfate 2:0e2ef1edf01b 5051
destinyXfate 2:0e2ef1edf01b 5052 : "0" (len), // edx // input regs
destinyXfate 2:0e2ef1edf01b 5053 "1" (prev_row), // esi
destinyXfate 2:0e2ef1edf01b 5054 "2" (row) // edi
destinyXfate 2:0e2ef1edf01b 5055
destinyXfate 2:0e2ef1edf01b 5056 : "%eax", "%ecx" // clobber list (no input regs!)
destinyXfate 2:0e2ef1edf01b 5057 #ifndef __PIC__
destinyXfate 2:0e2ef1edf01b 5058 , "%ebx"
destinyXfate 2:0e2ef1edf01b 5059 #endif
destinyXfate 2:0e2ef1edf01b 5060
destinyXfate 2:0e2ef1edf01b 5061 #if 0 /* MMX regs (%mm0, etc.) not supported by gcc 2.7.2.3 or egcs 1.1 */
destinyXfate 2:0e2ef1edf01b 5062 , "%mm0", "%mm1", "%mm2", "%mm3"
destinyXfate 2:0e2ef1edf01b 5063 , "%mm4", "%mm5", "%mm6", "%mm7"
destinyXfate 2:0e2ef1edf01b 5064 #endif
destinyXfate 2:0e2ef1edf01b 5065 );
destinyXfate 2:0e2ef1edf01b 5066
destinyXfate 2:0e2ef1edf01b 5067 } // end of png_read_filter_row_mmx_up()
destinyXfate 2:0e2ef1edf01b 5068
destinyXfate 2:0e2ef1edf01b 5069 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5070
destinyXfate 2:0e2ef1edf01b 5071
destinyXfate 2:0e2ef1edf01b 5072
destinyXfate 2:0e2ef1edf01b 5073
destinyXfate 2:0e2ef1edf01b 5074 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 5075 /* */
destinyXfate 2:0e2ef1edf01b 5076 /* P N G _ R E A D _ F I L T E R _ R O W */
destinyXfate 2:0e2ef1edf01b 5077 /* */
destinyXfate 2:0e2ef1edf01b 5078 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 5079
destinyXfate 2:0e2ef1edf01b 5080
destinyXfate 2:0e2ef1edf01b 5081 /* Optimized png_read_filter_row routines */
destinyXfate 2:0e2ef1edf01b 5082
destinyXfate 2:0e2ef1edf01b 5083 void /* PRIVATE */
destinyXfate 2:0e2ef1edf01b 5084 png_read_filter_row(png_structp png_ptr, png_row_infop row_info, png_bytep
destinyXfate 2:0e2ef1edf01b 5085 row, png_bytep prev_row, int filter)
destinyXfate 2:0e2ef1edf01b 5086 {
destinyXfate 2:0e2ef1edf01b 5087 #ifdef PNG_DEBUG
destinyXfate 2:0e2ef1edf01b 5088 char filnm[10];
destinyXfate 2:0e2ef1edf01b 5089 #endif
destinyXfate 2:0e2ef1edf01b 5090
destinyXfate 2:0e2ef1edf01b 5091 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 5092 /* GRR: these are superseded by png_ptr->asm_flags: */
destinyXfate 2:0e2ef1edf01b 5093 #define UseMMX_sub 1 // GRR: converted 20000730
destinyXfate 2:0e2ef1edf01b 5094 #define UseMMX_up 1 // GRR: converted 20000729
destinyXfate 2:0e2ef1edf01b 5095 #define UseMMX_avg 1 // GRR: converted 20000828 (+ 16-bit bugfix 20000916)
destinyXfate 2:0e2ef1edf01b 5096 #define UseMMX_paeth 1 // GRR: converted 20000828
destinyXfate 2:0e2ef1edf01b 5097
destinyXfate 2:0e2ef1edf01b 5098 if (_mmx_supported == 2) {
destinyXfate 2:0e2ef1edf01b 5099 /* this should have happened in png_init_mmx_flags() already */
destinyXfate 2:0e2ef1edf01b 5100 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5101 png_warning(png_ptr, "asm_flags may not have been initialized");
destinyXfate 2:0e2ef1edf01b 5102 #endif
destinyXfate 2:0e2ef1edf01b 5103 png_mmx_support();
destinyXfate 2:0e2ef1edf01b 5104 }
destinyXfate 2:0e2ef1edf01b 5105 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5106
destinyXfate 2:0e2ef1edf01b 5107 #ifdef PNG_DEBUG
destinyXfate 2:0e2ef1edf01b 5108 png_debug(1, "in png_read_filter_row (pnggccrd.c)\n");
destinyXfate 2:0e2ef1edf01b 5109 switch (filter)
destinyXfate 2:0e2ef1edf01b 5110 {
destinyXfate 2:0e2ef1edf01b 5111 case 0: sprintf(filnm, "none");
destinyXfate 2:0e2ef1edf01b 5112 break;
destinyXfate 2:0e2ef1edf01b 5113 case 1: sprintf(filnm, "sub-%s",
destinyXfate 2:0e2ef1edf01b 5114 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 5115 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5116 (png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_SUB)? "MMX" :
destinyXfate 2:0e2ef1edf01b 5117 #endif
destinyXfate 2:0e2ef1edf01b 5118 #endif
destinyXfate 2:0e2ef1edf01b 5119 "x86");
destinyXfate 2:0e2ef1edf01b 5120 break;
destinyXfate 2:0e2ef1edf01b 5121 case 2: sprintf(filnm, "up-%s",
destinyXfate 2:0e2ef1edf01b 5122 #ifdef PNG_MMX_CODE_SUPPORTED
destinyXfate 2:0e2ef1edf01b 5123 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5124 (png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_UP)? "MMX" :
destinyXfate 2:0e2ef1edf01b 5125 #endif
destinyXfate 2:0e2ef1edf01b 5126 #endif
destinyXfate 2:0e2ef1edf01b 5127 "x86");
destinyXfate 2:0e2ef1edf01b 5128 break;
destinyXfate 2:0e2ef1edf01b 5129 case 3: sprintf(filnm, "avg-%s",
destinyXfate 2:0e2ef1edf01b 5130 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 5131 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5132 (png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_AVG)? "MMX" :
destinyXfate 2:0e2ef1edf01b 5133 #endif
destinyXfate 2:0e2ef1edf01b 5134 #endif
destinyXfate 2:0e2ef1edf01b 5135 "x86");
destinyXfate 2:0e2ef1edf01b 5136 break;
destinyXfate 2:0e2ef1edf01b 5137 case 4: sprintf(filnm, "Paeth-%s",
destinyXfate 2:0e2ef1edf01b 5138 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 5139 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5140 (png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_PAETH)? "MMX":
destinyXfate 2:0e2ef1edf01b 5141 #endif
destinyXfate 2:0e2ef1edf01b 5142 #endif
destinyXfate 2:0e2ef1edf01b 5143 "x86");
destinyXfate 2:0e2ef1edf01b 5144 break;
destinyXfate 2:0e2ef1edf01b 5145 default: sprintf(filnm, "unknw");
destinyXfate 2:0e2ef1edf01b 5146 break;
destinyXfate 2:0e2ef1edf01b 5147 }
destinyXfate 2:0e2ef1edf01b 5148 png_debug2(0, "row_number=%5ld, %5s, ", png_ptr->row_number, filnm);
destinyXfate 2:0e2ef1edf01b 5149 png_debug1(0, "row=0x%08lx, ", (unsigned long)row);
destinyXfate 2:0e2ef1edf01b 5150 png_debug2(0, "pixdepth=%2d, bytes=%d, ", (int)row_info->pixel_depth,
destinyXfate 2:0e2ef1edf01b 5151 (int)((row_info->pixel_depth + 7) >> 3));
destinyXfate 2:0e2ef1edf01b 5152 png_debug1(0,"rowbytes=%8ld\n", row_info->rowbytes);
destinyXfate 2:0e2ef1edf01b 5153 #endif /* PNG_DEBUG */
destinyXfate 2:0e2ef1edf01b 5154
destinyXfate 2:0e2ef1edf01b 5155 switch (filter)
destinyXfate 2:0e2ef1edf01b 5156 {
destinyXfate 2:0e2ef1edf01b 5157 case PNG_FILTER_VALUE_NONE:
destinyXfate 2:0e2ef1edf01b 5158 break;
destinyXfate 2:0e2ef1edf01b 5159
destinyXfate 2:0e2ef1edf01b 5160 case PNG_FILTER_VALUE_SUB:
destinyXfate 2:0e2ef1edf01b 5161 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 5162 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5163 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_SUB) &&
destinyXfate 2:0e2ef1edf01b 5164 (row_info->pixel_depth >= png_ptr->mmx_bitdepth_threshold) &&
destinyXfate 2:0e2ef1edf01b 5165 (row_info->rowbytes >= png_ptr->mmx_rowbytes_threshold))
destinyXfate 2:0e2ef1edf01b 5166 #else
destinyXfate 2:0e2ef1edf01b 5167 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 5168 #endif
destinyXfate 2:0e2ef1edf01b 5169 {
destinyXfate 2:0e2ef1edf01b 5170 png_read_filter_row_mmx_sub(row_info, row);
destinyXfate 2:0e2ef1edf01b 5171 }
destinyXfate 2:0e2ef1edf01b 5172 else
destinyXfate 2:0e2ef1edf01b 5173 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5174 {
destinyXfate 2:0e2ef1edf01b 5175 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 5176 png_uint_32 istop = row_info->rowbytes;
destinyXfate 2:0e2ef1edf01b 5177 png_uint_32 bpp = (row_info->pixel_depth + 7) >> 3;
destinyXfate 2:0e2ef1edf01b 5178 png_bytep rp = row + bpp;
destinyXfate 2:0e2ef1edf01b 5179 png_bytep lp = row;
destinyXfate 2:0e2ef1edf01b 5180
destinyXfate 2:0e2ef1edf01b 5181 for (i = bpp; i < istop; i++)
destinyXfate 2:0e2ef1edf01b 5182 {
destinyXfate 2:0e2ef1edf01b 5183 *rp = (png_byte)(((int)(*rp) + (int)(*lp++)) & 0xff);
destinyXfate 2:0e2ef1edf01b 5184 rp++;
destinyXfate 2:0e2ef1edf01b 5185 }
destinyXfate 2:0e2ef1edf01b 5186 } /* end !UseMMX_sub */
destinyXfate 2:0e2ef1edf01b 5187 break;
destinyXfate 2:0e2ef1edf01b 5188
destinyXfate 2:0e2ef1edf01b 5189 case PNG_FILTER_VALUE_UP:
destinyXfate 2:0e2ef1edf01b 5190 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 5191 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5192 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_UP) &&
destinyXfate 2:0e2ef1edf01b 5193 (row_info->pixel_depth >= png_ptr->mmx_bitdepth_threshold) &&
destinyXfate 2:0e2ef1edf01b 5194 (row_info->rowbytes >= png_ptr->mmx_rowbytes_threshold))
destinyXfate 2:0e2ef1edf01b 5195 #else
destinyXfate 2:0e2ef1edf01b 5196 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 5197 #endif
destinyXfate 2:0e2ef1edf01b 5198 {
destinyXfate 2:0e2ef1edf01b 5199 png_read_filter_row_mmx_up(row_info, row, prev_row);
destinyXfate 2:0e2ef1edf01b 5200 }
destinyXfate 2:0e2ef1edf01b 5201 else
destinyXfate 2:0e2ef1edf01b 5202 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5203 {
destinyXfate 2:0e2ef1edf01b 5204 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 5205 png_uint_32 istop = row_info->rowbytes;
destinyXfate 2:0e2ef1edf01b 5206 png_bytep rp = row;
destinyXfate 2:0e2ef1edf01b 5207 png_bytep pp = prev_row;
destinyXfate 2:0e2ef1edf01b 5208
destinyXfate 2:0e2ef1edf01b 5209 for (i = 0; i < istop; ++i)
destinyXfate 2:0e2ef1edf01b 5210 {
destinyXfate 2:0e2ef1edf01b 5211 *rp = (png_byte)(((int)(*rp) + (int)(*pp++)) & 0xff);
destinyXfate 2:0e2ef1edf01b 5212 rp++;
destinyXfate 2:0e2ef1edf01b 5213 }
destinyXfate 2:0e2ef1edf01b 5214 } /* end !UseMMX_up */
destinyXfate 2:0e2ef1edf01b 5215 break;
destinyXfate 2:0e2ef1edf01b 5216
destinyXfate 2:0e2ef1edf01b 5217 case PNG_FILTER_VALUE_AVG:
destinyXfate 2:0e2ef1edf01b 5218 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 5219 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5220 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_AVG) &&
destinyXfate 2:0e2ef1edf01b 5221 (row_info->pixel_depth >= png_ptr->mmx_bitdepth_threshold) &&
destinyXfate 2:0e2ef1edf01b 5222 (row_info->rowbytes >= png_ptr->mmx_rowbytes_threshold))
destinyXfate 2:0e2ef1edf01b 5223 #else
destinyXfate 2:0e2ef1edf01b 5224 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 5225 #endif
destinyXfate 2:0e2ef1edf01b 5226 {
destinyXfate 2:0e2ef1edf01b 5227 png_read_filter_row_mmx_avg(row_info, row, prev_row);
destinyXfate 2:0e2ef1edf01b 5228 }
destinyXfate 2:0e2ef1edf01b 5229 else
destinyXfate 2:0e2ef1edf01b 5230 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5231 {
destinyXfate 2:0e2ef1edf01b 5232 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 5233 png_bytep rp = row;
destinyXfate 2:0e2ef1edf01b 5234 png_bytep pp = prev_row;
destinyXfate 2:0e2ef1edf01b 5235 png_bytep lp = row;
destinyXfate 2:0e2ef1edf01b 5236 png_uint_32 bpp = (row_info->pixel_depth + 7) >> 3;
destinyXfate 2:0e2ef1edf01b 5237 png_uint_32 istop = row_info->rowbytes - bpp;
destinyXfate 2:0e2ef1edf01b 5238
destinyXfate 2:0e2ef1edf01b 5239 for (i = 0; i < bpp; i++)
destinyXfate 2:0e2ef1edf01b 5240 {
destinyXfate 2:0e2ef1edf01b 5241 *rp = (png_byte)(((int)(*rp) +
destinyXfate 2:0e2ef1edf01b 5242 ((int)(*pp++) >> 1)) & 0xff);
destinyXfate 2:0e2ef1edf01b 5243 rp++;
destinyXfate 2:0e2ef1edf01b 5244 }
destinyXfate 2:0e2ef1edf01b 5245
destinyXfate 2:0e2ef1edf01b 5246 for (i = 0; i < istop; i++)
destinyXfate 2:0e2ef1edf01b 5247 {
destinyXfate 2:0e2ef1edf01b 5248 *rp = (png_byte)(((int)(*rp) +
destinyXfate 2:0e2ef1edf01b 5249 ((int)(*pp++ + *lp++) >> 1)) & 0xff);
destinyXfate 2:0e2ef1edf01b 5250 rp++;
destinyXfate 2:0e2ef1edf01b 5251 }
destinyXfate 2:0e2ef1edf01b 5252 } /* end !UseMMX_avg */
destinyXfate 2:0e2ef1edf01b 5253 break;
destinyXfate 2:0e2ef1edf01b 5254
destinyXfate 2:0e2ef1edf01b 5255 case PNG_FILTER_VALUE_PAETH:
destinyXfate 2:0e2ef1edf01b 5256 #if defined(PNG_MMX_CODE_SUPPORTED) && defined(PNG_THREAD_UNSAFE_OK)
destinyXfate 2:0e2ef1edf01b 5257 #if !defined(PNG_1_0_X)
destinyXfate 2:0e2ef1edf01b 5258 if ((png_ptr->asm_flags & PNG_ASM_FLAG_MMX_READ_FILTER_PAETH) &&
destinyXfate 2:0e2ef1edf01b 5259 (row_info->pixel_depth >= png_ptr->mmx_bitdepth_threshold) &&
destinyXfate 2:0e2ef1edf01b 5260 (row_info->rowbytes >= png_ptr->mmx_rowbytes_threshold))
destinyXfate 2:0e2ef1edf01b 5261 #else
destinyXfate 2:0e2ef1edf01b 5262 if (_mmx_supported)
destinyXfate 2:0e2ef1edf01b 5263 #endif
destinyXfate 2:0e2ef1edf01b 5264 {
destinyXfate 2:0e2ef1edf01b 5265 png_read_filter_row_mmx_paeth(row_info, row, prev_row);
destinyXfate 2:0e2ef1edf01b 5266 }
destinyXfate 2:0e2ef1edf01b 5267 else
destinyXfate 2:0e2ef1edf01b 5268 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5269 {
destinyXfate 2:0e2ef1edf01b 5270 png_uint_32 i;
destinyXfate 2:0e2ef1edf01b 5271 png_bytep rp = row;
destinyXfate 2:0e2ef1edf01b 5272 png_bytep pp = prev_row;
destinyXfate 2:0e2ef1edf01b 5273 png_bytep lp = row;
destinyXfate 2:0e2ef1edf01b 5274 png_bytep cp = prev_row;
destinyXfate 2:0e2ef1edf01b 5275 png_uint_32 bpp = (row_info->pixel_depth + 7) >> 3;
destinyXfate 2:0e2ef1edf01b 5276 png_uint_32 istop = row_info->rowbytes - bpp;
destinyXfate 2:0e2ef1edf01b 5277
destinyXfate 2:0e2ef1edf01b 5278 for (i = 0; i < bpp; i++)
destinyXfate 2:0e2ef1edf01b 5279 {
destinyXfate 2:0e2ef1edf01b 5280 *rp = (png_byte)(((int)(*rp) + (int)(*pp++)) & 0xff);
destinyXfate 2:0e2ef1edf01b 5281 rp++;
destinyXfate 2:0e2ef1edf01b 5282 }
destinyXfate 2:0e2ef1edf01b 5283
destinyXfate 2:0e2ef1edf01b 5284 for (i = 0; i < istop; i++) /* use leftover rp,pp */
destinyXfate 2:0e2ef1edf01b 5285 {
destinyXfate 2:0e2ef1edf01b 5286 int a, b, c, pa, pb, pc, p;
destinyXfate 2:0e2ef1edf01b 5287
destinyXfate 2:0e2ef1edf01b 5288 a = *lp++;
destinyXfate 2:0e2ef1edf01b 5289 b = *pp++;
destinyXfate 2:0e2ef1edf01b 5290 c = *cp++;
destinyXfate 2:0e2ef1edf01b 5291
destinyXfate 2:0e2ef1edf01b 5292 p = b - c;
destinyXfate 2:0e2ef1edf01b 5293 pc = a - c;
destinyXfate 2:0e2ef1edf01b 5294
destinyXfate 2:0e2ef1edf01b 5295 #ifdef PNG_USE_ABS
destinyXfate 2:0e2ef1edf01b 5296 pa = abs(p);
destinyXfate 2:0e2ef1edf01b 5297 pb = abs(pc);
destinyXfate 2:0e2ef1edf01b 5298 pc = abs(p + pc);
destinyXfate 2:0e2ef1edf01b 5299 #else
destinyXfate 2:0e2ef1edf01b 5300 pa = p < 0 ? -p : p;
destinyXfate 2:0e2ef1edf01b 5301 pb = pc < 0 ? -pc : pc;
destinyXfate 2:0e2ef1edf01b 5302 pc = (p + pc) < 0 ? -(p + pc) : p + pc;
destinyXfate 2:0e2ef1edf01b 5303 #endif
destinyXfate 2:0e2ef1edf01b 5304
destinyXfate 2:0e2ef1edf01b 5305 /*
destinyXfate 2:0e2ef1edf01b 5306 if (pa <= pb && pa <= pc)
destinyXfate 2:0e2ef1edf01b 5307 p = a;
destinyXfate 2:0e2ef1edf01b 5308 else if (pb <= pc)
destinyXfate 2:0e2ef1edf01b 5309 p = b;
destinyXfate 2:0e2ef1edf01b 5310 else
destinyXfate 2:0e2ef1edf01b 5311 p = c;
destinyXfate 2:0e2ef1edf01b 5312 */
destinyXfate 2:0e2ef1edf01b 5313
destinyXfate 2:0e2ef1edf01b 5314 p = (pa <= pb && pa <= pc) ? a : (pb <= pc) ? b : c;
destinyXfate 2:0e2ef1edf01b 5315
destinyXfate 2:0e2ef1edf01b 5316 *rp = (png_byte)(((int)(*rp) + p) & 0xff);
destinyXfate 2:0e2ef1edf01b 5317 rp++;
destinyXfate 2:0e2ef1edf01b 5318 }
destinyXfate 2:0e2ef1edf01b 5319 } /* end !UseMMX_paeth */
destinyXfate 2:0e2ef1edf01b 5320 break;
destinyXfate 2:0e2ef1edf01b 5321
destinyXfate 2:0e2ef1edf01b 5322 default:
destinyXfate 2:0e2ef1edf01b 5323 png_warning(png_ptr, "Ignoring bad row-filter type");
destinyXfate 2:0e2ef1edf01b 5324 *row=0;
destinyXfate 2:0e2ef1edf01b 5325 break;
destinyXfate 2:0e2ef1edf01b 5326 }
destinyXfate 2:0e2ef1edf01b 5327 }
destinyXfate 2:0e2ef1edf01b 5328
destinyXfate 2:0e2ef1edf01b 5329 #endif /* PNG_HAVE_MMX_READ_FILTER_ROW */
destinyXfate 2:0e2ef1edf01b 5330
destinyXfate 2:0e2ef1edf01b 5331
destinyXfate 2:0e2ef1edf01b 5332 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 5333 /* */
destinyXfate 2:0e2ef1edf01b 5334 /* P N G _ M M X _ S U P P O R T */
destinyXfate 2:0e2ef1edf01b 5335 /* */
destinyXfate 2:0e2ef1edf01b 5336 /*===========================================================================*/
destinyXfate 2:0e2ef1edf01b 5337
destinyXfate 2:0e2ef1edf01b 5338 /* GRR NOTES: (1) the following code assumes 386 or better (pushfl/popfl)
destinyXfate 2:0e2ef1edf01b 5339 * (2) all instructions compile with gcc 2.7.2.3 and later
destinyXfate 2:0e2ef1edf01b 5340 * (3) the function is moved down here to prevent gcc from
destinyXfate 2:0e2ef1edf01b 5341 * inlining it in multiple places and then barfing be-
destinyXfate 2:0e2ef1edf01b 5342 * cause the ".NOT_SUPPORTED" label is multiply defined
destinyXfate 2:0e2ef1edf01b 5343 * [is there a way to signal that a *single* function should
destinyXfate 2:0e2ef1edf01b 5344 * not be inlined? is there a way to modify the label for
destinyXfate 2:0e2ef1edf01b 5345 * each inlined instance, e.g., by appending _1, _2, etc.?
destinyXfate 2:0e2ef1edf01b 5346 * maybe if don't use leading "." in label name? (nope...sigh)]
destinyXfate 2:0e2ef1edf01b 5347 */
destinyXfate 2:0e2ef1edf01b 5348
destinyXfate 2:0e2ef1edf01b 5349 int PNGAPI
destinyXfate 2:0e2ef1edf01b 5350 png_mmx_support(void)
destinyXfate 2:0e2ef1edf01b 5351 {
destinyXfate 2:0e2ef1edf01b 5352 #if defined(PNG_MMX_CODE_SUPPORTED)
destinyXfate 2:0e2ef1edf01b 5353 int result;
destinyXfate 2:0e2ef1edf01b 5354 __asm__ __volatile__ (
destinyXfate 2:0e2ef1edf01b 5355 "pushl %%ebx \n\t" // ebx gets clobbered by CPUID instruction
destinyXfate 2:0e2ef1edf01b 5356 "pushl %%ecx \n\t" // so does ecx...
destinyXfate 2:0e2ef1edf01b 5357 "pushl %%edx \n\t" // ...and edx (but ecx & edx safe on Linux)
destinyXfate 2:0e2ef1edf01b 5358 // ".byte 0x66 \n\t" // convert 16-bit pushf to 32-bit pushfd
destinyXfate 2:0e2ef1edf01b 5359 // "pushf \n\t" // 16-bit pushf
destinyXfate 2:0e2ef1edf01b 5360 "pushfl \n\t" // save Eflag to stack
destinyXfate 2:0e2ef1edf01b 5361 "popl %%eax \n\t" // get Eflag from stack into eax
destinyXfate 2:0e2ef1edf01b 5362 "movl %%eax, %%ecx \n\t" // make another copy of Eflag in ecx
destinyXfate 2:0e2ef1edf01b 5363 "xorl $0x200000, %%eax \n\t" // toggle ID bit in Eflag (i.e., bit 21)
destinyXfate 2:0e2ef1edf01b 5364 "pushl %%eax \n\t" // save modified Eflag back to stack
destinyXfate 2:0e2ef1edf01b 5365 // ".byte 0x66 \n\t" // convert 16-bit popf to 32-bit popfd
destinyXfate 2:0e2ef1edf01b 5366 // "popf \n\t" // 16-bit popf
destinyXfate 2:0e2ef1edf01b 5367 "popfl \n\t" // restore modified value to Eflag reg
destinyXfate 2:0e2ef1edf01b 5368 "pushfl \n\t" // save Eflag to stack
destinyXfate 2:0e2ef1edf01b 5369 "popl %%eax \n\t" // get Eflag from stack
destinyXfate 2:0e2ef1edf01b 5370 "pushl %%ecx \n\t" // save original Eflag to stack
destinyXfate 2:0e2ef1edf01b 5371 "popfl \n\t" // restore original Eflag
destinyXfate 2:0e2ef1edf01b 5372 "xorl %%ecx, %%eax \n\t" // compare new Eflag with original Eflag
destinyXfate 2:0e2ef1edf01b 5373 "jz 0f \n\t" // if same, CPUID instr. is not supported
destinyXfate 2:0e2ef1edf01b 5374
destinyXfate 2:0e2ef1edf01b 5375 "xorl %%eax, %%eax \n\t" // set eax to zero
destinyXfate 2:0e2ef1edf01b 5376 // ".byte 0x0f, 0xa2 \n\t" // CPUID instruction (two-byte opcode)
destinyXfate 2:0e2ef1edf01b 5377 "cpuid \n\t" // get the CPU identification info
destinyXfate 2:0e2ef1edf01b 5378 "cmpl $1, %%eax \n\t" // make sure eax return non-zero value
destinyXfate 2:0e2ef1edf01b 5379 "jl 0f \n\t" // if eax is zero, MMX is not supported
destinyXfate 2:0e2ef1edf01b 5380
destinyXfate 2:0e2ef1edf01b 5381 "xorl %%eax, %%eax \n\t" // set eax to zero and...
destinyXfate 2:0e2ef1edf01b 5382 "incl %%eax \n\t" // ...increment eax to 1. This pair is
destinyXfate 2:0e2ef1edf01b 5383 // faster than the instruction "mov eax, 1"
destinyXfate 2:0e2ef1edf01b 5384 "cpuid \n\t" // get the CPU identification info again
destinyXfate 2:0e2ef1edf01b 5385 "andl $0x800000, %%edx \n\t" // mask out all bits but MMX bit (23)
destinyXfate 2:0e2ef1edf01b 5386 "cmpl $0, %%edx \n\t" // 0 = MMX not supported
destinyXfate 2:0e2ef1edf01b 5387 "jz 0f \n\t" // non-zero = yes, MMX IS supported
destinyXfate 2:0e2ef1edf01b 5388
destinyXfate 2:0e2ef1edf01b 5389 "movl $1, %%eax \n\t" // set return value to 1
destinyXfate 2:0e2ef1edf01b 5390 "jmp 1f \n\t" // DONE: have MMX support
destinyXfate 2:0e2ef1edf01b 5391
destinyXfate 2:0e2ef1edf01b 5392 "0: \n\t" // .NOT_SUPPORTED: target label for jump instructions
destinyXfate 2:0e2ef1edf01b 5393 "movl $0, %%eax \n\t" // set return value to 0
destinyXfate 2:0e2ef1edf01b 5394 "1: \n\t" // .RETURN: target label for jump instructions
destinyXfate 2:0e2ef1edf01b 5395 "popl %%edx \n\t" // restore edx
destinyXfate 2:0e2ef1edf01b 5396 "popl %%ecx \n\t" // restore ecx
destinyXfate 2:0e2ef1edf01b 5397 "popl %%ebx \n\t" // restore ebx
destinyXfate 2:0e2ef1edf01b 5398
destinyXfate 2:0e2ef1edf01b 5399 // "ret \n\t" // DONE: no MMX support
destinyXfate 2:0e2ef1edf01b 5400 // (fall through to standard C "ret")
destinyXfate 2:0e2ef1edf01b 5401
destinyXfate 2:0e2ef1edf01b 5402 : "=a" (result) // output list
destinyXfate 2:0e2ef1edf01b 5403
destinyXfate 2:0e2ef1edf01b 5404 : // any variables used on input (none)
destinyXfate 2:0e2ef1edf01b 5405
destinyXfate 2:0e2ef1edf01b 5406 // no clobber list
destinyXfate 2:0e2ef1edf01b 5407 // , "%ebx", "%ecx", "%edx" // GRR: we handle these manually
destinyXfate 2:0e2ef1edf01b 5408 // , "memory" // if write to a variable gcc thought was in a reg
destinyXfate 2:0e2ef1edf01b 5409 // , "cc" // "condition codes" (flag bits)
destinyXfate 2:0e2ef1edf01b 5410 );
destinyXfate 2:0e2ef1edf01b 5411 _mmx_supported = result;
destinyXfate 2:0e2ef1edf01b 5412 #else
destinyXfate 2:0e2ef1edf01b 5413 _mmx_supported = 0;
destinyXfate 2:0e2ef1edf01b 5414 #endif /* PNG_MMX_CODE_SUPPORTED */
destinyXfate 2:0e2ef1edf01b 5415
destinyXfate 2:0e2ef1edf01b 5416 return _mmx_supported;
destinyXfate 2:0e2ef1edf01b 5417 }
destinyXfate 2:0e2ef1edf01b 5418
destinyXfate 2:0e2ef1edf01b 5419
destinyXfate 2:0e2ef1edf01b 5420 #endif /* PNG_USE_PNGGCCRD */
destinyXfate 2:0e2ef1edf01b 5421