From: Skybuck Flying on 13 Jun 2010 07:40 I suspect there is a bug in this assembler routine, you'd be a hero and big news if you can find it ! ;) :) I suspect the problem is with copieing large blocks of memory which have a weird size like +1, +2 +3 +4 +5 +6 +7 so anything that will fall within a memory cell of say 4 or 8 bytes... either that or something else is going on... It seems to be using floating point registers... maybe those somehow screw up ?!? Below is the pascal version followed by the assembler version which is probably the one that is being used... Good luck finding the bug since that's some shitty complex assembler code ?! ;) :) (* ***** BEGIN LICENSE BLOCK ***** * * The assembly function Move is licensed under the CodeGear license terms. * * The initial developer of the original code is Fastcode * * Portions created by the initial developer are Copyright (C) 2002-2004 * the initial developer. All Rights Reserved. * * Contributor(s): John O'Harrow * * ***** END LICENSE BLOCK ***** *) procedure Move(const Source; var Dest; count : Integer); {$IFDEF PUREPASCAL} var S, D: PChar; I: Integer; begin S := PChar(@Source); D := PChar(@Dest); if S = D then Exit; if Cardinal(D) > Cardinal(S) then for I := count-1 downto 0 do D[I] := S[I] else for I := 0 to count-1 do D[I] := S[I]; end; {$ELSE} asm cmp eax, edx je @@Exit {Source = Dest} cmp ecx, 32 ja @@LargeMove {Count > 32 or Count < 0} sub ecx, 8 jg @@SmallMove @@TinyMove: {0..8 Byte Move} jmp dword ptr [@@JumpTable+32+ecx*4] @@SmallMove: {9..32 Byte Move} fild qword ptr [eax+ecx] {Load Last 8} fild qword ptr [eax] {Load First 8} cmp ecx, 8 jle @@Small16 fild qword ptr [eax+8] {Load Second 8} cmp ecx, 16 jle @@Small24 fild qword ptr [eax+16] {Load Third 8} fistp qword ptr [edx+16] {Save Third 8} @@Small24: fistp qword ptr [edx+8] {Save Second 8} @@Small16: fistp qword ptr [edx] {Save First 8} fistp qword ptr [edx+ecx] {Save Last 8} @@Exit: ret nop {4-Byte Align JumpTable} nop @@JumpTable: {4-Byte Aligned} dd @@Exit, @@M01, @@M02, @@M03, @@M04, @@M05, @@M06, @@M07, @@M08 @@LargeForwardMove: {4-Byte Aligned} push edx fild qword ptr [eax] {First 8} lea eax, [eax+ecx-8] lea ecx, [ecx+edx-8] fild qword ptr [eax] {Last 8} push ecx neg ecx and edx, -8 {8-Byte Align Writes} lea ecx, [ecx+edx+8] pop edx @FwdLoop: fild qword ptr [eax+ecx] fistp qword ptr [edx+ecx] add ecx, 8 jl @FwdLoop fistp qword ptr [edx] {Last 8} pop edx fistp qword ptr [edx] {First 8} ret @@LargeMove: jng @@LargeDone {Count < 0} cmp eax, edx ja @@LargeForwardMove sub edx, ecx cmp eax, edx lea edx, [edx+ecx] jna @@LargeForwardMove sub ecx, 8 {Backward Move} push ecx fild qword ptr [eax+ecx] {Last 8} fild qword ptr [eax] {First 8} add ecx, edx and ecx, -8 {8-Byte Align Writes} sub ecx, edx @BwdLoop: fild qword ptr [eax+ecx] fistp qword ptr [edx+ecx] sub ecx, 8 jg @BwdLoop pop ecx fistp qword ptr [edx] {First 8} fistp qword ptr [edx+ecx] {Last 8} @@LargeDone: ret @@M01: movzx ecx, [eax] mov [edx], cl ret @@M02: movzx ecx, word ptr [eax] mov [edx], cx ret @@M03: mov cx, [eax] mov al, [eax+2] mov [edx], cx mov [edx+2], al ret @@M04: mov ecx, [eax] mov [edx], ecx ret @@M05: mov ecx, [eax] mov al, [eax+4] mov [edx], ecx mov [edx+4], al ret @@M06: mov ecx, [eax] mov ax, [eax+4] mov [edx], ecx mov [edx+4], ax ret @@M07: mov ecx, [eax] mov eax, [eax+3] mov [edx], ecx mov [edx+3], eax ret @@M08: fild qword ptr [eax] fistp qword ptr [edx] end; {$ENDIF} "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message news:5ee6b$4c14c230$54190f09$19681(a)cache4.tilbu1.nb.home.nl... > Hello my video codec has detected a strange problem with the "move" > routine of Delphi 2007. > > The bug seems to go away when I do a manual copy of a frame instead of > using the move routine like so: > (Not only that... but it becomes faster too ?!) > > It only happens for some frames and not all frames, so it seems to be > input dependent ?!? > > I am guessing that the move routine fails if the bytes end up halve way a > 32 bit cell... > > I am guessing it does not copy the last 2 or 3 bytes of the last cell... > > For example: > > (800x600x3+1) bytes might fail because the size is not a multiple of 4 > bytes ?!? > > So either the move routine is bugged or something else is going on which > seems unlikely > since the bug goes away ?!?!?? > > // bugged: > (* > // remember the original input > procedure TFastVideoCompressor.Remember; > begin > // remember current/input frame > move( mInput^, mPreviousFrame^, mSize ); > end; > *) > > // correct: > procedure TFastVideoCompressor.Remember; > var > vIndex : integer; > vInput : Prgb; > vPreviousFramePixel : Prgb; > begin > // remember current/output frame > // move( mInput^, mPreviousFrame^, mSize ); > > vInput := mInput; > vPreviousFramePixel := mPreviousFrame; > > if mArea > 0 then > for vIndex := 0 to mArea-1 do > begin > vPreviousFramePixel.mBlue := vInput.mBlue; > vPreviousFramePixel.mGreen := vInput.mGreen; > vPreviousFramePixel.mRed := vInput.mRed; > > longword(vInput) := longword(vInput) + SizeOf(Trgb); > longword(vPreviousFramePixel) := longword(vPreviousFramePixel) + > SizeOf(Trgb); > end; > > end; Bye, Skybuck.
From: Skybuck Flying on 13 Jun 2010 07:46 Actually now that I think about it... the frame is perfectly aligned on memory cells it seems... So there must be some big bug in this code... or there is some weird floating point bug in my processor ?!? (See code previous posting in this thread) Bye, Skybuck.
From: Skybuck Flying on 13 Jun 2010 07:51 New theorie: Maybe the routine only fails if other floating point routines/calculations are done around it... Like before/after the move... and then trying to do the move again and then some more calculations and so forth... Bye, Skybuck.
From: Skybuck Flying on 13 Jun 2010 07:58 Maybe it's not flawed after all... the bug seemed to disappear but now it's back again... Weird... Something else must be the problem :( Bye, Skybuck.
From: Skybuck Flying on 13 Jun 2010 08:42 Well shitty.. I did a little memory test with the computer and no errors found.. so that can't be it... So it must be a bug somewhere... it's kinda nasty with all the pointers moving by one and then the sizes need to be reduced with one... it's a bit messy :) Gotta find a way to fix that and make it better... still no clue though... could be algo bug too don't know... However one thing is for sure... CPU is probably not fast enough to do lossless video decoding with multiple compression/transformation methods... So I might have to deviate to gpu decoding at least ;) :) Bye, Skybuck :)
|
Next
|
Last
Pages: 1 2 Prev: fish eat adage, was [Re: I & J] Next: the general serve is not sufficient |