Prev: How To Know
Next: Array Problem
From: Bee on 27 Dec 2009 13:34 Based on what I have learned from this discussion, for my current needs, it seems that working with a byte array is faster to do the scan and replace operations. So now I am looking for a fast byte array replace that will replace longer (expand byte array) or shorter (shorten byte array) routines. Converting from string to byte array and back is done only at the beginning and end of the scan and replace op that I need to do. And those conversion ops seem plenty fast like 2 mSec on my PC. Thank you for the great learning experience. This has been highly educational. "Schmidt" wrote: > > "mayayana" <mayaXXyana(a)rcXXn.com> schrieb im Newsbeitrag > news:OHhzDgdhKHA.1236(a)TK2MSFTNGP04.phx.gbl... > > > The VBSpeed samples are interesting, and Olaf's > > code seems to be the clear winner there, but they > > seem to be only testing short strings. And the results > > vary so much for different calls that I wonder what > > value the tests have. In real-world usage a few > > microseconds difference is an absurd measure, especially > > given that the function is seldom called more than a few > > times. I don't see how those samples could be useful > > unless they were repeated on a typical string of perhaps > > 100 KB. > > The practical reason I wrote this highly optimized > Replace-routine was serverside code for dynamically > "glued together" WebPage-snippets (templates), some > of them below 1 or 2kByte, but some of them larger > (10-30kByte). > Building up such an "ready to post back" Webpage involved > quite an amount of such replacements - and this was a few > years back in time, where a "fast CPU" was one of the > Pentium III-class (with only about 500MHz-1GHz clock-freq). > > And the performance gain, compared with the builtin > VB6-Replace-function was really worth it. > > Nowadays the faster CPUs with their larger caches > seem to make its usage obsolete, and you're right, > the Mid-statement is not all that bad in comparison, > just look at Jost Schwiders entries into the list - though > there's a factor 2-4 anyways (on average), compared with the > higher optimized (Integer-Array-mapped) string routines > which follow his contributions. > > From a short test, I can measure about the same difference > of factor 3-4, if I time the performance of your posted > routine with my approach (corrected and tuned somewhat > by Guido Beckmann). > > Factor 4, when tested on my older Pentium III 500MHz, > and reduced to about factor 3, when compared on a > modern CPU. > > And that is comparing a generically working Replace- > routine with a currently "somewhat specialized" > code, which acts on a "limited input-range" currently. > > Since you already work with the Integer-array-mapping, > it would be consequent, to work "entirely within array-space", > also for the "copy-over" of the appropriate parts. > > You could speedup your routine by a reasonable amount, > doing so. > > Here comes a Class (based on Guidos latest corrections), > which I've just adapted a bit, to work without the Typelib > he was using - so that the code is ready for copy and paste: > > '***Into a Class > Option Explicit > > Private Type SafeArray1D > cDims As Integer > fFeatures As Integer > cbElements As Long > cLocks As Long > pvData As Long > cElements1d As Long > lLBound As Long > End Type > > Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _ > (PArr() As Any, PSrc&, Optional ByVal cb& = 4) > Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _ > (PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4) > Private Declare Sub RtlMoveMemory Lib "kernel32" _ > (dst As Any, src As Any, ByVal nBytes&) > > Private Declare Function CharLowerBuffA& Lib "user32" _ > (lpsz As Any, ByVal cchLength&) > Private Declare Function CharLowerBuffW& Lib "user32" _ > (lpsz As Any, ByVal cchLength&) > > Private aSrc%(), saSrc As SafeArray1D > Private aNew%(), saNew As SafeArray1D > Private aOld%(), saOld As SafeArray1D > Private aDst%(), saDst As SafeArray1D > Private aPosFnd&(), ubPosFnd& > Private aLowChars%(&H8000 To &H7FFF) > > Friend Function Replace(Text As String, sOld As String, sNew As String, _ > Optional ByVal Start As Long = 1, _ > Optional ByVal Count As Long = 2147483647, _ > Optional ByVal Compare As VbCompareMethod = vbBinaryCompare _ > ) As String > > Dim c&, i&, j&, cntCpy& > Dim cntFnd&, ptrSrc&, ptrDst& > Dim lenFnd&, lenSrc&, lenNew&, lenNewB& > Dim posFnd&, posOut&, posIn& > Dim ubFnd&, Fnd0%, fSameLen As Boolean > > lenSrc = Len(Text) > lenNew = Len(sNew) > lenFnd = Len(sOld) > ubFnd = lenFnd - 1 > ptrSrc = StrPtr(Text) > > If lenSrc = 0 Then Exit Function > If lenFnd = 0 Then Replace = Text: Exit Function > If Start > 0 Then i = Start - 1 > > saSrc.pvData = ptrSrc > saOld.pvData = StrPtr(sOld) > saNew.pvData = StrPtr(sNew) > > If lenFnd = lenNew Then > fSameLen = True > Replace = Text > > saDst.pvData = StrPtr(Replace) > ' ptrDst = StrPtr(Replace11) > ' saDst.pvData = ptrDst > End If > > c = lenSrc - lenFnd > If Compare = vbBinaryCompare Then > > Fnd0 = aOld(0) > > For i = i To c > 'Inline-Cascading for first Char > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then i = i + 1: _ > If aSrc(i) <> Fnd0 Then GoTo loopNxt > > If i > c Then Exit For > > 'Search all others > j = ubFnd > Do While j > If aSrc(i + j) <> aOld(j) Then GoTo loopNxt > j = j - 1 > Loop > > cntFnd = cntFnd + 1 > 'Found at Position i (0 based) > If fSameLen Then > j = lenNew > Do: j = j - 1: aDst(i + j) = aNew(j): Loop While j > Else > If cntFnd > ubPosFnd Then > ubPosFnd = ubPosFnd + 512: ReDim Preserve aPosFnd(ubPosFnd) > End If > aPosFnd(cntFnd) = i * 2 > End If > > If cntFnd = Count Then Exit For > i = i + ubFnd > loopNxt: Next i > > > Else 'vbStringCompare > > Fnd0 = aLowChars(aOld(0)) > > For i = i To c > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _ > If aLowChars(aSrc(i)) <> Fnd0 Then GoTo loopNxt2 > > If i > c Then Exit For > > 'Search all others > j = ubFnd > Do While j > If aLowChars(aSrc(i + j)) <> aLowChars(aOld(j)) Then > GoTo loopNxt2 > End If > j = j - 1 > Loop > 'Found at Position i (0 based) > cntFnd = cntFnd + 1 > If fSameLen Then > j = lenNew > Do: j = j - 1: aDst(i + j) = aNew(j): Loop While j > Else > If cntFnd > ubPosFnd Then > ubPosFnd = ubPosFnd + 512: ReDim Preserve aPosFnd(ubPosFnd) > End If > aPosFnd(cntFnd) = i * 2 > End If > If cntFnd = Count Then Exit For > i = i + ubFnd > loopNxt2: Next i > End If > > 'Generate Output > If fSameLen Then Exit Function > > If cntFnd = 0 Then > Replace = Text > Else > c = lenSrc + (lenNew - lenFnd) * cntFnd > Replace = Space(c) > ptrDst = StrPtr(Replace) > saDst.pvData = ptrDst > > lenFnd = lenFnd * 2 > If lenNew Then > lenNewB = lenNew * 2 > For i = 1 To cntFnd > posFnd = aPosFnd(i) > cntCpy = posFnd - posIn > > If cntCpy > 50 Then > RtlMoveMemory ByVal saDst.pvData, ByVal saSrc.pvData, cntCpy > saDst.pvData = saDst.pvData + cntCpy > ElseIf cntCpy > 0 Then > j = cntCpy \ 2 > Do: j = j - 1: aDst(j) = aSrc(j): Loop While j > saDst.pvData = saDst.pvData + cntCpy > End If > > posIn = posFnd + lenFnd > saSrc.pvData = ptrSrc + posIn > > If lenNew > 50 Then > RtlMoveMemory ByVal saDst.pvData, ByVal saNew.pvData, lenNewB > Else > j = lenNew > Do: j = j - 1: aDst(j) = aNew(j): Loop While j > End If > saDst.pvData = saDst.pvData + lenNewB > Next i > Else > For i = 1 To cntFnd > posFnd = aPosFnd(i) > cntCpy = posFnd - posIn > > If cntCpy > 50 Then > RtlMoveMemory ByVal saDst.pvData, ByVal saSrc.pvData, cntCpy > saDst.pvData = saDst.pvData + cntCpy > ElseIf cntCpy > 0 Then > j = cntCpy \ 2 > Do: j = j - 1: aDst(j) = aSrc(j): Loop While j > saDst.pvData = saDst.pvData + cntCpy > End If > > posIn = posFnd + lenFnd > saSrc.pvData = ptrSrc + posIn > Next i > End If > > c = lenSrc * 2 - posIn > If c > 50 Then > RtlMoveMemory aDst(0), aSrc(0), c > ElseIf c > 0 Then > c = c \ 2 > Do: c = c - 1: aDst(c) = aSrc(c): Loop While c > End If > End If > End Function > > > Private Sub Class_Initialize() > Dim c& > > ubPosFnd = 512: ReDim aPosFnd(ubPosFnd) > > saSrc.cDims = 1 > saSrc.cbElements = 2 > saSrc.cElements1d = &H7FFFFFFF > > saNew = saSrc > saOld = saSrc > saDst = saSrc > > BindArray aSrc, VarPtr(saSrc) > BindArray aNew, VarPtr(saNew) > BindArray aOld, VarPtr(saOld) > BindArray aDst, VarPtr(saDst) > > For c = -32768 To 32767: aLowChars(c) = c: Next c > If CharLowerBuffW(aLowChars(-32768), &H10000) = 0 Then > CharLowerBuffA aLowChars(65), (223 - 65) * 2 > End If > > ' added by donald, 20011210
From: Schmidt on 27 Dec 2009 15:00 "Bee" <Bee(a)discussions.microsoft.com> schrieb im Newsbeitrag news:68BAA7A1-BF2E-4042-8FA3-F9A4845FC29B(a)microsoft.com... > Based on what I have learned from this discussion, > for my current needs, ... Would be good, to have a better description of what your needs really are - is it the same "vbCr-replacement"-problem you described earlier - or some new stuff? With a bit more background better suggestions are possible. > ...it seems that working with a byte array is faster to do > the scan and replace operations. Mostly yes, but one has to decide, if the effort worth it, to write each and every "specialized string-routine" based on Byte- or 16Bit-Integer-Arrays. IMO one does not need to "go there" that often - VBs builtin stuff is sufficient in nearly all cases. > So now I am looking for a fast byte array replace that will replace > longer (expand byte array) or shorter (shorten byte array) routines. This again raises the question, what you're doing (trying) exactly - and why you're looking for more performance ... maybe the bottleneck is at some other end... > Converting from string to byte array and back is done only at > the beginning and end of the scan and replace op that I need > to do. And those conversion ops seem plenty fast like 2 mSec > on my PC. The Demos, which use/show the SafeArray-mapping, avoid explicit conversions to and from ByteArrays - they span something like a "virtual array" over an already existing String-Content. That said, in case one does it explicitely (as copy) before/after processing a "heavy routine" (especially if the passed string- parameter is a larger one) - then the overhead, compared with safearray-mapping is not all that large. The decision, if you should work with ByteArrays throughout the whole process depends on your real problem, e.g. if you start on FileContent, which was safed as ANSI ... but also on how comfortable you want to code (also reflecting a bit on "code-readability" later on) - and of course which performance requirements need to be met. The Mapping-approach is nice insofar as you could switch into (virtual) array-mode only *within* a few (bottleneck) routines maybe - and (aside from these few exceptions) always work with normally passed VB-String-Parameters through the larger process in question. Olaf
From: David Kaye on 27 Dec 2009 16:15 "Schmidt" <sss(a)online.de> wrote: >Careful with the range below Asc("A") - since you would also >cleanup all "Number-Digits" and many "wanted punctuations" >.... not sure, if that was your intent. You don't understand. I specifically said in my post that my test was not the final search/replace I was going to use. I think I also said that I was using it to have enough characters to search/replace for a test. >Repeated calls to Replace (scanning the string >over and over again, to replace different single-chars) >is "horribly inefficient" ... ;-) My experience wasn't like that at all, which was my point. An 80k file with about 5k of replacements or deletions took only 0.009 or 9/100ths of a second. To me that's fast enough, not as fast as the byte array increment thing, but plenty fast.
From: Mike Williams on 27 Dec 2009 16:19 "Bee" <Bee(a)discussions.microsoft.com> wrote in message news:68BAA7A1-BF2E-4042-8FA3-F9A4845FC29B(a)microsoft.com... > Based on what I have learned from this discussion, for my > current needs, it seems that working with a byte array is faster > to do the scan and replace operations. So now I am looking > for a fast byte array replace that will replace longer (expand > byte array) or shorter (shorten byte array) routines. Reading between the lines of your various posts in this and other related threads it appears that you want to perform a large number of different search and replace operations on a very large block of text data and that you are considering performing these operations one at a time, each time returning a suitably modified (and possibly changed in size) copy of the data before then performing the next operation on it, and in order to speed up the entire operation you are attempting to squeeze as much speed as possible out of your "search and replace" operation in the hope that your repeated calls to it for the different tasks will speed up the overall process. At least that's what I gather from the things you have posted. If that is the case then perhaps it is time to think about different ways of doing it because no matter how fast you make the search and replace operation the overall job will be slowed down a lot by your repeated calls to it and the repeated return of large mofified blocks of data. It would be far better to use even a relatively slow but flexible search and replace operation to start with, but to use it in such a way that it performs all of your otherwise separate operations (or at least as many of them as possible) all at the same time, returning the modifed copy of the data just once at the end. You can't always do this entirely of course, but in general you should aim for code that overall reduces the number of times it returns a modified copy of the data to a minimum. Mike
From: Bee on 27 Dec 2009 16:41
Thank you for your response. I am loading a notepad "compatible" file from disk (shows extra control characters as boxes, etc). It may or may not be totally pure printable text. I need to clean out all non-printing characters other than the Tab, CR and LF. I need to make proper paragraphs. So I look for non-end-of-sentence with a CRLF near after and remove the CRLF and other whitespace and replace with a space if necessary. So I scan forward then back up through the text and do a replace as necessary. I also look for other characters that I need to change or delete. Converting to and from a byte array is very fast. I think this is legal. Dim aByte() as byte aByte=sString ' to byte array work on the byte array sString = StrConv(aByte, vbUnicode) ' back to string Looping looking through a byte array is much faster so far for that part. Instr (documented?) works with a byte array as does Mid or the right side. A$=Mid(aByte,l,n) ' interesting All I think I need now is a replace that is fast that works with the byte array. I seem to remember some CopyMemory rotines that can work for a replace. I need to find some that will work here. Currently, with a very fast InString and Replace string routine the 1M text file takes over a minute to process. The loop looks at a string. This is OK but I really would like to stretch my knowledge. So for now I have a fast Instring and Replace for strings; even a fast Like. I timed parts of the process and find that the For Next loop looking at sChr was very slow. Doing the same on a byte arrays was quite fast. So I am hoping that combining the byte array For Next scan and the other byte array ops needed will be as fast as I can get. My nerons need greasing. |