当前位置首页 > 学术论文 > 毕业论文
搜柄,搜必应! 快速导航 | 使用教程  [会员中心]

Delphi SSE优化算法应用

文档格式:DOCX| 4 页|大小 18.02KB|积分 20|2022-10-07 发布|文档ID:159005272
第1页
下载文档到电脑,查找使用更方便 还剩页未读,继续阅读>>
1 / 4
此文档下载收益归作者所有 下载文档
  • 版权提示
  • 文本预览
  • 常见问题
  • DelphiSSE优化算法应用之三(CRC-32C(Castagnoli)校验算法优化)作者:CodeGameCRC-32C (Castagnoli)算法是iSCSI和SCTP数据校验的算法,和常用CRC-32-IEEE 802.3算法所不同的是多 项式常数CRC32C是0x1EDC6F41 ,CRC32是0x04C11DB7也就是说由此生成的CRC表不同外算法是一模一样.CRC32常规算法如下:1. function _CRC32CX86(Data: PByte; aLength: Integer): DWORD;2. const3. _CRC32CTable: array[0..255] of DWORD =(4. $00000000, $F26B8303, $E13B70F7, $1350F3F4, //CRC32C Table5. $C79A971F,$35F1141C,$26A1E7E8,$D4CA64EB,6. $8AD958CF,$78B2DBCC,$6BE22838,$9989AB3B,7. $4D43CFD0,$BF284CD3,$AC78BF27,$5E133C24,8. $105EC76F,$E235446C,$F165B798,$030E349B,9. $D7C45070,$25AFD373,$36FF2087,$C494A384,10. $9A879FA0,$68EC1CA3,$7BBCEF57,$89D76C54,11. $5D1D08BF,$AF768BBC,$BC267848,$4E4DFB4B,12. $20BD8EDE,$D2D60DDD,$C186FE29,$33ED7D2A,13. $E72719C1,$154C9AC2,$061C6936,$F477EA35,14. $AA64D611,$580F5512,$4B5FA6E6,$B93425E5,15. $6DFE410E,$9F95C20D,$8CC531F9,$7EAEB2FA,16. $30E349B1,$C288CAB2,$D1D83946,$23B3BA45,17. $F779DEAE,$05125DAD,$1642AE59,$E4292D5A,18. $BA3A117E,$4851927D,$5B016189,$A96AE28A,19. $7DA08661,$8FCB0562,$9C9BF696,$6EF07595,20. $417B1DBC,$B3109EBF,$A0406D4B,$522BEE48,21. $86E18AA3,$748A09A0,$67DAFA54,$95B17957,22. $CBA24573,$39C9C670,$2A993584,$D8F2B687,23. $0C38D26C,$FE53516F,$ED03A29B,$1F682198,24. $5125DAD3,$A34E59D0,$B01EAA24,$42752927,25. $96BF4DCC,$64D4CECF,$77843D3B,$85EFBE38,26. $DBFC821C,$2997011F,$3AC7F2EB,$C8AC71E8,27. $1C661503,$EE0D9600,$FD5D65F4,$0F36E6F7,28. $61C69362,$93AD1061,$80FDE395,$72966096,29. $A65C047D,$5437877E,$4767748A,$B50CF789,30. $EB1FCBAD, $197448AE, $0A24BB5A, $F84F3859,$2C855CB2,$7198540D,$B602C312,$FB410CC2,$3CDB9BDD,$82F63B78,$456CAC67,$082F63B7,$CFB5F4A8,$92A8FC17,$55326B08,$1871A4D8,$DFEB33C7,$A24BB5A6,$65D122B9,$2892ED69,$EF087A76,$B21572C9,$758FE5D6,$38CC2A06,$FF56BD19,$C38D26C4,$0417B1DB,$49547E0B,$8ECEE914,$D3D3E1AB,$144976B4,$590AB964,$9E902E7B,$E330A81A,$24AA3F05,$69E9F0D5,$AE7367CA,$F36E6F75,$34F4F86A,$79B737BA,$BE2DA0A5,vari: Integer;$DEEEDFB1,$CDBE2C45,$3FD5AF46,$83F3D70E,$90A324FA,$62C8A7F9,$44694011,$5739B3E5,$A55230E6,$092A8FC1,$1A7A7C35,$E811FF36,$CEB018DE,$DDE0EB2A,$2F8B6829,$709DB87B,$63CD4B8F,$91A6C88C,$B7072F64,$A457DC90,$563C5F93,$FA44E0B4,$E9141340,$1B7F9043,$3DDE77AB,$2E8E845F,$DCE5075C,$60C37F14,$73938CE0,$81F80FE3,$A759E80B,$B4091BFF,$466298FC,$EA1A27DB,$F94AD42F,$0B21572C,$2D80B0C4,$3ED04330,$CCBBC033,$502036A5,$4370C551,$B11B4652,$97BAA1BA,$84EA524E,$7681D14D,$DAF96E6A,$C9A99D9E,$3BC21E9D,$1D63F975,$0E330A81,$FC588982,$407EF1CA,$532E023E,$A145813D,$87E466D5,$94B49521,$66DF1622,$CAA7A905,$D9F75AF1,$2B9CD9F2,$0D3D3E1A,$1E6DCDEE,$EC064EED,$31E6A5C7,$22B65633,$D0DDD530,$F67C32D8,$E52CC12C,$1747422F,$BB3FFD08,$A86F0EFC,$5A048DFF,$7CA56A17,$6FF599E3,$9D9E1AE0,$21B862A8,$32E8915C,$C083125F,$E622F5B7,$F5720643,$07198540,$AB613A67,$B831C993,$4A5A4A90,$6CFBAD78,$7FAB5E8C,$8DC0DD8F,$115B2B19,$020BD8ED,$F0605BEE,$D6C1BC06,$C5914FF2,$37FACCF1,$9B8273D6,$88D28022,$7AB90321,$5C18E4C9,$4F48173D,$BD23943E,$0105EC76,$12551F82,$E03E9C81,$C69F7B69,$D5CF889D,$27A40B9E,$8BDCB4B9,$988C474D,$6AE7C44E,$4C4623A6,$5F16D052,$AD7D5351);31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68.69.70. begin71. Result := $FFFFFFFF;72. for I := 0 to aLength - 1 do73. begin74. Result := (Result shr 8) xor _CRC32CTable[(Result and $FF) xor DataT;75. Inc(Data);76. end;77. Result := not Result;78. end;CRC32C使用SSE4.2硬件指令优化算法部分代码如下:1. function _CRC32CSSE(Data: PByte; aLength: Integer): DWORD;2. asm3. push esi4. push edx5. push ecx6. mov esi,eax7. mov eax,$FFFFFFFF8. test edx,edx9. jz @Exit10. test esi,esi11. jz @Exit12. mov ecx,edx13. shr ecx, 214. test ecx,ecx15. jz @Exit16. xor edx,edx17. @Alignment:18. crc32 eax,[edx*4+esi]19. inc edx20. cmp edx,ecx21. jb @Alignment22. @Exit:23. not eax24. pop ecx25. pop edx26. pop esi27. end;以上2个不同实现方式在In tel Core i7 720QM 1.60GHz CPU上测试成绩如下:(数据采用随机算法生成,1 M*100表示使用1M数据进行100次重复计算,数据量相当于100M)|数据量|常规算法时间|优化算法时间|快出百分比|| 1M *100 | X86 Time:390ms | SSE Time:32ms | 1218% || 4M *100 | X86 Time:1575ms | SSE Time:156ms | 1009% || 8M *100 | X86 Time:3136ms | SSE Time:280ms | 1120% || 32M *100 | X86 Time:12542ms | SSE Time:1092ms | 1148% |通过对比可以清楚的看到使用SSE4.2中的新指令crc32可以比常规CRC32C算法要快出最少10倍的效率,In tel 新增的指令确实对常规某些算法提供了高效的解决方案,使用好它们将对我们在以后的开发中得到质的提升。

    本例测试 DEMO: *Refere nces intel sse4 programmi ng refere nee (D91561-001)。

    点击阅读更多内容
    卖家[上传人]:jinzhuang
    资质:实名认证