《双三次插值及优化.doc》由会员分享,可在线阅读,更多相关《双三次插值及优化.doc(221页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Four short words sum up what has lifted most successful individuals above the crowd: a little bit more.-author-date双三次插值及优化双三次插值及优化1.数学模型对于一个目的像素,其坐标通过反向变换得到的在原图中的浮点坐标为(i+u,j+v),其中i、j均为非负整数,u、v为0,1)区间的浮点数,双三次插值考虑一个浮点坐标(i+u,j+v)周围的16个邻点,目的像素值f(i+u,j+v)可由如下插值公式得到:f(i+u,j+v) = A * B * CA= S(u + 1)S(u +
2、 0)S(u - 1)S(u - 2) f(i-1, j-1)f(i-1, j+0)f(i-1, j+1)f(i-1, j+2) B= f(i+0, j-1)f(i+0, j+0)f(i+0, j+1)f(i+0, j+2) f(i+1, j-1)f(i+1, j+0)f(i+1, j+1)f(i+1, j+2) f(i+2, j-1)f(i+2, j+0)f(i+2, j+1)f(i+2, j+2) S(v + 1) C= S(v + 0) S(v - 1) S(v - 2) 1-2*Abs(x)2+Abs(x)3 , 0=Abs(x)1S(x)= 4-8*Abs(x)+5*Abs(x)2-
3、Abs(x)3, 1=Abs(x)=2S(x)是对 Sin(x*Pi)/x 的逼近(Pi是圆周率),为插值核。2.计算流程1. 获取16个点的坐标P1、P2P162. 由插值核计算公式S(x) 分别计算出x、y方向的插值核向量Su、Sv3. 进行矩阵运算,得到插值结果iTemp1 = Su0 * P1 + Su1 * P5 + Su2 * P9 + Su3 * P13iTemp2 = Su0 * P2 + Su1 * P6 + Su2 * P10 + Su3 * P14iTemp3 = Su0 * P3 + Su1 * P7 + Su2 * P11 + Su3 * P15iTemp4 = Su
4、0 * P4 + Su1 * P8 + Su2 * P12 + Su3 * P16iResult = Sv1 * iTemp1 + Sv2 * iTemp2 + Sv3 * iTemp3 + Sv4 * iTemp44. 在得到插值结果图后,我们发现图像中有“毛刺”,因此对插值结果做了个后处理,即:设该点在原图中的像素值为pSrc,若abs(iResult - pSrc) 大于某阈值,我们认为插值后的点可能污染原图,因此用原像素值pSrc代替。 3. 算法优化由于双三次插值计算一个点的坐标需要其周围16个点,更有多达20次的乘法及15次的加法,计算量可以说是非常大,势必要进行优化。我们选择了I
5、ntel的SSE2优化技术,它只支持在P4及以上的机器。测试当前CPU是否支持SSE2,可由CPUID指令得到,代码为: BOOL g_bSSE2 = FALSE; _asm mov eax, 1; cpuid; test edx, 0x04000000; jz NotSupport; mov g_bSSE2, 1 NotSupport: 支持SSE2的CPU引入了8个128位的寄存器,这样一个寄存器中就可以存放4个点(RGB),有利于并行计算。详细代码见Transform.cpp中函数Optimize_Bicubic。优化中遇到的问题:1. 图像每个点由RGB通道组成,由于1个SSE2寄存器
6、有16个字节,这样读入4个像素点后,要浪费4个字节,同时要花费时间将数据对齐,即由BRGB | RGBR | GBRG | BRGB对齐成 0RGB | 0RGB | 0RGB | 0RGB ;2. 读16字节数据到寄存器时,由于图像地址不能保证是16字节对齐,因此需用更多时钟周期的MOVDQU指令(6个以上时钟周期);如能使地址16字节对齐,则可用MOVDQA指令(1个时钟周期) ;3. 为了消除除法及浮点运算,对权值放大256倍,这样在计算插值核时,必须用2Bytes来表示1个系数,而图像数据都是1Byte,这样在对齐做乘法时,要浪费一半的SSE2寄存器的空间,导致运算时间变长;而若降低插
7、值核的精度,使其在1Byte表示范围内时,运算的精度又大为下降 ;4. 对各指令的周期以及 若干行指令是否能够并行流水缺乏经验和认识。附:SSE2指令整理算术(Arithmetic)指令:ADDPD-Packed Double-Precision Floating-Point AddSSE2 2个double对应相加ADDPD xmm0, xmm1/m128ADDPS-Packed Single-Precision Floating-Point AddSSE 4个float对应相加ADDPS xmm0, xmm1/m128ADDSD-Scalar Double-Precision Floati
8、ng-Point Add1个double(低端)对应相加SSE2ADDSD xmm0, xmm1/m64ADDSS-Scalar Single-Precision Floating-Point AddSSE1个float(低端)对应相加ADDSS xmm0, xmm1/m32PADDB/PADDW/PADDD-Packed AddOpcodeInstructionDescription0F FC /rPADDB mm, mm/m64Add packed byte integers from mm/m64 and mm.66 0F FC /rPADDB xmm1,xmm2/m128Add pac
9、ked byte integers from xmm2/m128 and xmm1.0F FD /rPADDW mm, mm/m64Add packed word integers from mm/m64 and mm.66 0F FD /rPADDW xmm1, xmm2/m128Add packed word integers from xmm2/m128 and xmm1.0F FE /rPADDD mm, mm/m64Add packed doubleword integers from mm/m64 and mm.66 0F FE /rPADDD xmm1, xmm2/m128Add
10、 packed doubleword integers from xmm2/m128 and xmm1.PADDQ-Packed Quadword AddOpcodeInstructionDescription0F D4 /rPADDQ mm1,mm2/m64Add quadword integer mm2/m64 to mm166 0F D4 /rPADDQ xmm1,xmm2/m128Add packed quadword integers xmm2/m128 to xmm1PADDSB/PADDSW-Packed Add with SaturationOpcodeInstructionD
11、escription0F EC /rPADDSB mm, mm/m64Add packed signed byte integers from mm/m64 and mm and saturate the results.66 0F EC /rPADDSB xmm1,xmm2/m128Add packed signed byte integers from xmm2/m128 and xmm1 saturate the results.0F ED /rPADDSW mm, mm/m64Add packed signed word integers from mm/m64 and mm and
12、saturate the results.66 0F ED /rPADDSW xmm1, xmm2/m128Add packed signed word integers from xmm2/m128 and xmm1 and saturate the results.PADDUSB/PADDUSW-Packed Add Unsigned with SaturationOpcodeInstructionDescription0F DC /rPADDUSB mm, mm/m64Add packed unsigned byte integers from mm/m64 and mm and sat
13、urate the results.66 0F DC /rPADDUSB xmm1, xmm2/m128Add packed unsigned byte integers from xmm2/m128 and xmm1 saturate the results.0F DD /rPADDUSW mm, mm/m64Add packed unsigned word integers from mm/m64 and mm and saturate the results.66 0F DD /rPADDUSW xmm1, xmm2/m128Add packed unsigned word intege
14、rs from xmm2/m128 to xmm1 and saturate the results.PMADDWD-Packed Multiply and AddOpcodeInstructionDescription0F F5 /rPMADDWD mm, mm/m64Multiply the packed words in mm by the packed words in mm/m64. Add the 32-bit pairs of results and store in mm as doubleword66 0F F5 /rPMADDWD xmm1, xmm2/m128Multip
15、ly the packed word integers in xmm1 by the packed word integers in xmm2/m128, and add the adjacent doubleword results.PSADBW-Packed Sum of Absolute DifferencesOpcodeInstructionDescription0F F6 /rPSADBW mm1, mm2/m64Absolute difference of packed unsigned byte integers from mm2 /m64 and mm1; difference
16、s are then summed to produce an unsigned word integer result.66 0F F6 /rPSADBW xmm1, xmm2/m128Absolute difference of packed unsigned byte integers from xmm2 /m128 and xmm1; the 8 low differences and 8 high differences are then summed separately to produce two word integer results.PSUBB/PSUBW/PSUBD-P
17、acked SubtractOpcodeInstructionDescription0F F8 /rPSUBB mm, mm/m64Subtract packed byte integers in mm/m64 from packed byte integers in mm.66 0F F8 /rPSUBB xmm1, xmm2/m128Subtract packed byte integers in xmm2/m128 from packed byte integers in xmm1.0F F9 /rPSUBW mm, mm/m64Subtract packed word integers
18、 in mm/m64 from packed word integers in mm.66 0F F9 /rPSUBW xmm1, xmm2/m128Subtract packed word integers in xmm2/m128 from packed word integers in xmm1.0F FA /rPSUBD mm, mm/m64Subtract packed doubleword integers in mm/m64 from packed doubleword integers in mm.66 0F FA /rPSUBD xmm1, xmm2/m128Subtract
19、 packed doubleword integers in xmm2/mem128 from packed doubleword integers in xmm1.PSUBQ-Packed Subtract QuadwordOpcodeInstructionDescription0F FB /rPSUBQ mm1, mm2/m64Subtract quadword integer in mm1 from mm2 /m64.66 0F FB /rPSUBQ xmm1, xmm2/m128Subtract packed quadword integers in xmm1 from xmm2 /m
20、128.PSUBSB/PSUBSW-Packed Subtract with SaturationOpcodeInstructionDescription0F E8 /rPSUBSB mm, mm/m64Subtract signed packed bytes in mm/m64 from signed packed bytes in mm and saturate results.66 0F E8 /rPSUBSB xmm1, xmm2/m128Subtract packed signed byte integers in xmm2/m128 from packed signed byte
21、integers in xmm1 and saturate results.0F E9 /rPSUBSW mm, mm/m64Subtract signed packed words in mm/m64 from signed packed words in mm and saturate results.66 0F E9 /rPSUBSW xmm1, xmm2/m128Subtract packed signed word integers in xmm2/m128 from packed signed word integers in xmm1 and saturate results.P
22、SUBUSB/PSUBUSW-Packed Subtract Unsigned with SaturationOpcodeInstructionDescription0F D8 /rPSUBUSB mm, mm/m64Subtract unsigned packed bytes in mm/m64 from unsigned packed bytes in mm and saturate result.66 0F D8 /rPSUBUSB xmm1, xmm2/m128Subtract packed unsigned byte integers in xmm2/m128 from packed
23、 unsigned byte integers in xmm1 and saturate result.0F D9 /rPSUBUSW mm, mm/m64Subtract unsigned packed words in mm/m64 from unsigned packed words in mm and saturate result.66 0F D9 /rPSUBUSW xmm1, xmm2/m128Subtract packed unsigned word integers in xmm2/m128 from packed unsigned word integers in xmm1
24、 and saturate result.SUBPD-Packed Double-Precision Floating-Point SubtractOpcodeInstructionDescription66 0F 5C /rSUBPD xmm1, xmm2/m128Subtract packed double-precision floating-point values in xmm2/m128 from xmm1.SUBPS-Packed Single-Precision Floating-Point SubtractOpcodeInstructionDescription0F 5C /
25、rSUBPS xmm1 xmm2/m128Subtract packed single-precision floating-point values in xmm2/mem from xmm1.SUBSD-Scalar Double-Precision Floating-Point SubtractOpcodeInstructionDescriptionF2 0F 5C /rSUBSD xmm1, xmm2/m64Subtracts the low double-precision floating-point numbers in xmm2/mem64 from xmm1.SUBSS-Sc
26、alar Single-FP SubtractOpcodeInstructionDescriptionF3 0F 5C /rSUBSS xmm1, xmm2/m32Subtract the lower single-precision floating-point numbers in xmm2/m32 from xmm1.-PMULHUW-Packed Multiply High UnsignedOpcodeInstructionDescription0F E4 /rPMULHUW mm1, mm2/m64Multiply the packed unsigned word integers
27、in mm1 register and mm2/m64, and store the high 16 bits of the results in mm1.66 0F E4 /rPMULHUW xmm1, xmm2/m128Multiply the packed unsigned word integers in xmm1 and xmm2/m128, and store the high 16 bits of the results in xmm1. PMULHW-Packed Multiply High SignedOpcodeInstructionDescription0F E5 /rP
28、MULHW mm, mm/m64Multiply the packed signed word integers in mm1 register and mm2/m64, and store the high 16 bits of the results in mm1.66 0F E5 /rPMULHW xmm1, xmm2/m128Multiply the packed signed word integers in xmm1 and xmm2/m128, and store the high 16 bits of the results in xmm1.PMULLW-Packed Mult
29、iply Low SignedOpcodeInstructionDescription0F D5 /rPMULLW mm, mm/m64Multiply the packed signed word integers in mm1 register and mm2/m64, and store the low 16 bits of the results in mm1.66 0F D5 /rPMULLW xmm1, xmm2/m128Multiply the packed signed word integers in xmm1 and xmm2/m128, and store the low
30、 16 bits of the results in xmm1.PMULUDQ-Multiply Doubleword UnsignedOpcodeInstructionDescription0F F4 /rPMULUDQ mm1, mm2/m64Multiply unsigned doubleword integer in mm1 by unsigned doubleword integer in mm2/m64, and store the quadword result in mm1.66 OF F4 /rPMULUDQ xmm1, xmm2/m128Multiply packed un
31、signed doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1.PMULUDQ instruction with 64-Bit operands:DEST63-0 DEST31-0 * SRC31-0;PMULUDQ instruction with 128-Bit operands:DEST63-0 DEST31-0 * SRC31-0;DEST127-64 DEST95-64 * SRC95-64;MU
32、LPD-Packed Double-Precision Floating-Point MultiplyOpcodeInstructionDescription66 0F 59 /rMULPD xmm1, xmm2/m128Multiply packed double-precision floating-point values in xmm2/m128 by xmm1.DEST63-0 DEST63-0 * SRC63-0;DEST127-64 DEST127-64 * SRC127-64;MULPS-Packed Single-Precision Floating-Point Multip
33、lyOpcodeInstructionDescription0F 59 /rMULPS xmm1, xmm2/m128Multiply packed single-precision floating-point values in xmm2/mem by xmm1.DEST31-0 DEST31-0 * SRC31-0;DEST63-32 DEST63-32 * SRC63-32;DEST95-64 DEST95-64 * SRC95-64;DEST127-96 DEST127-96 * SRC127-96;MULSD-Scalar Double-Precision Floating-Poi
34、nt MultiplyOpcodeInstructionDescriptionF2 0F 59 /rMULSD xmm1, xmm2/m64Multiply the low double-precision floating-point value in xmm2/mem64 by low double-precision floating-point value in xmm1.DEST63-0 DEST63-0 * xmm2/m6463-0;* DEST127-64 remains unchanged *;MULSS-Scalar Single-FP MultiplyOpcodeInstr
35、uctionDescriptionF3 0F 59 /rMULSS xmm1, xmm2/m32Multiply the low single-precision floating-point value in xmm2/mem by the low single-precision floating-point value in xmm1.DEST31-0 DEST31-0 * SRC31-0;* DEST127-32 remains unchanged *;-DIVPD-Packed Double-Precision Floating-Point DivideDIVPDxmm0, xmm1
36、/m128DEST63-0 DEST63-0 / (SRC63-0);DEST127-64 DEST127-64 / (SRC127-64);DIVPS-Packed Single-Precision Floating-Point DivideDIVPS xmm0, xmm1/m128DEST31-0 DEST31-0 / (SRC31-0);DEST63-32 DEST63-32 / (SRC63-32);DEST95-64 DEST95-64 / (SRC95-64);DEST127-96 DEST127-96 / (SRC127-96);DIVSD-Scalar Double-Preci
37、sion Floating-Point DivideDIVSDxmm0, xmm1/m64DEST63-0 DEST63-0 / SRC63-0;* DEST127-64 remains unchanged *;DIVSS-Scalar Single-Precision Floating-Point DivideDIVSSxmm0, xmm1/m32DEST31-0 DEST31-0 / SRC31-0;* DEST127-32 remains unchanged *;-PAVGB/PAVGW-Packed AverageOpcodeInstructionDescription0F E0 /r
38、PAVGB mm1, mm2/m64Average packed unsigned byte integers from mm2/m64 and mm1, with rounding.66 0F E0, /rPAVGB xmm1, xmm2/m128Average packed unsigned byte integers from xmm2/m128 and xmm1, with rounding.0F E3 /rPAVGW mm1, mm2/m64Average packed unsigned word integers from mm2/m64 and mm1, with roundin
39、g.66 0F E3 /rPAVGW xmm1, xmm2/m128Average packed unsigned word integers from xmm2/m128 and xmm1, with rounding.-PMAXSW-Packed Signed Integer Word MaximumOpcodeInstructionDescription0F EE /rPMAXSW mm1, mm2/m64Compare signed word integers in mm2/m64 and mm1 for maximum values.66 0F EE /rPMAXSW xmm1, x
40、mm2/m128Compare signed word integers in xmm2/m128 and xmm1 for maximum values.PMAXUB-Packed Unsigned Integer Byte MaximumOpcodeInstructionDescription0F DE /rPMAXUB mm1, mm2/m64Compare unsigned byte integers in mm2/m64 and mm1 for maximum values.66 0F DE /rPMAXUB xmm1, xmm2/m128Compare unsigned byte
41、integers in xmm2/m128 and xmm1 for maximum values.PMINSW-Packed Signed Integer Word MinimumOpcodeInstructionDescription0F EA /rPMINSW mm1, mm2/m64Compare signed word integers in mm2/m64 and mm1 for minimum values.66 0F EA /rPMINSW xmm1, xmm2/m128Compare signed word integers in xmm2/m128 and xmm1 for
42、 minimum values.PMINUB-Packed Unsigned Integer Byte MinimumOpcodeInstructionDescription0F DA /rPMINUB mm1, mm2/m64Compare unsigned byte integers in mm2/m64 and mm1 for minimum values.66 0F DA /rPMINUB xmm1, xmm2/m128Compare unsigned byte integers in xmm2/m128 and xmm1 for minimum values.-RCPPS-Packe
43、d Single-Precision Floating-Point ReciprocalOpcodeInstructionDescription0F 53 /rRCPPS xmm1, xmm2/m128Returns to xmm1 the packed approximations of the reciprocals of the packed single-precision floating-point values in xmm2/m128.DEST31-0 APPROXIMATE(1.0/(SRC31-0);DEST63-32 APPROXIMATE(1.0/(SRC63-32);DEST95-64 APPROX