Prev: How To Know
Next: Array Problem
From: Mike Williams on 27 Dec 2009 18:22 "Bee" <Bee(a)discussions.microsoft.com> wrote in message news:5F3E878D-D34E-456B-9CB1-18AAA879136C(a)microsoft.com... > Converting to and from a byte array is very fast. > I think this is legal. > Dim aByte() as byte > aByte=sString ' to byte array > work on the byte array > sString = StrConv(aByte, vbUnicode) ' back to string Well it's legal, but it doesn't do what you appear to think it does. Rather than explain what is going on I think it might be more instructive to just show you the result and then allow you to work out for yourself what is going on (remembering that a VB String has two bytes per character), so that you can post again if you can't work it out. Try this, which is your above code with an actual test string included: Dim sString As String sString = "abcd" Dim aByte() As Byte Print sString, Len(sString) aByte = sString ' to byte array 'work on the byte array sString = StrConv(aByte, vbUnicode) ' back to string Print sString, Len(sString) Mike
From: Schmidt on 27 Dec 2009 20:30 "David Kaye" <sfdavidkaye2(a)yahoo.com> schrieb im Newsbeitrag news:hh8ipi$42m$1(a)news.eternal-september.org... > >Repeated calls to Replace (scanning the string > >over and over again, to replace different single-chars) > >is "horribly inefficient" ... ;-) > > My experience wasn't like that at all, which was my point. > An 80k file with about 5k of replacements or deletions took > only 0.009 or 9/100ths of a second. You probably mean "0.090 or 9/100ths of a second". And yes, that matches with the first results in the following test-routines (on a modern CPU) - nonetheless the "repeated replaces approach" is already about 50 times slower than Larrys routine on these smaller Input-Lenghts (as your ~80kByte). But 90msec is already near the human "i can feel it"-barrier (of 1/10ths of a second)... And on larger Input in the MegaByte-range, the influence of the OLE-BSTR-caching has a much lesser effect ... on a ~2.5MB input-string your routine is then already about 160 times slower, needing about 13 seconds... so be a bit patient, until the Demo below has finished. Just test this yourself with the below copy and paste-code. (compile to native code with all advanced options please, to see the real difference). '***Into a Form Option Explicit Private Type SafeArray1D cDims As Integer fFeatures As Integer cbElements As Long cLocks As Long pvData As Long cElements1d As Long lLBound1d As Long End Type Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _ (PArr() As Any, PSrc&, Optional ByVal cb& = 4) Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _ (PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4) Private Declare Sub RtlMoveMemory Lib "kernel32" _ (dst As Any, src As Any, ByVal nBytes&) Private Declare Function QueryPerformanceFrequency& Lib "kernel32" (x@) Private Declare Function QueryPerformanceCounter& Lib "kernel32" (x@) Private Sub Form_Load() AutoRedraw = True Caption = "Click the Form" End Sub Private Sub Form_Click() Dim i&, T#, S$, S1$, S2$, S3$ S = " abc" & vbTab & "123" & vbCrLf & "ABC" & Chr(0) & "123 " For i = 1 To 12 S = S & S 'results in an about 80kByte test-string Next i S1 = S S2 = S S3 = S Print "InputLen:", Len(S), vbCrLf T = HPTimer ScrubUsingReplace S1 Print "ScrubUsingReplace", Round((HPTimer - T) * 1000, 2), Len(S1) T = HPTimer ScrubUsingLookupTable S2 Print "ScrubUsingLookupTable", Round((HPTimer - T) * 1000, 2), Len(S2) T = HPTimer ScrubUsingSafeArray S3 Print "ScrubUsingSafeArray", Round((HPTimer - T) * 1000, 2), Len(S3) '******* and the same thing again with larger Input ******** For i = 13 To 17 S = S & S 'results in an about 2.5MByte test-string Next i S1 = S S2 = S S3 = S Print "Now we leave the efficiency-range of the OLE-BSTR-cache..." Print "InputLen:", Len(S), vbCrLf T = HPTimer ScrubUsingReplace S1 Print "ScrubUsingReplace", Round((HPTimer - T) * 1000, 2), Len(S1) T = HPTimer ScrubUsingLookupTable S2 Print "ScrubUsingLookupTable", Round((HPTimer - T) * 1000, 2), Len(S2) T = HPTimer ScrubUsingSafeArray S3 Print "ScrubUsingSafeArray", Round((HPTimer - T) * 1000, 2), Len(S3) End Sub Private Sub ScrubUsingReplace(Text As String) Dim j% For j% = 0 To 64 Text$ = Replace(Text$, Chr$(j%), "") Next For j% = 128 To 255 Text$ = Replace(Text$, Chr$(j%), "") Next End Sub Private Sub ScrubUsingLookupTable(Text As String) Dim inc() As Byte, txt() As Byte Dim i As Long, src As Long, dst As Long txt = Text ReDim inc(255) For i = 0 To UBound(inc) If i > 64 And i < 128 Then inc(i) = 2 Next i Do While src < UBound(txt) txt(dst) = txt(src) src = src + 2 dst = dst + inc(txt(dst)) Loop Do While dst < UBound(txt) txt(dst) = 32 dst = dst + 2 Loop Text = Trim$(txt) End Sub Private Sub ScrubUsingSafeArray(Text As String) Dim i&, j&, aSrc%(), saSrc As SafeArray1D saSrc.cDims = 1 saSrc.cbElements = 2 'the width of an 16Bit-Integer saSrc.cElements1d = Len(Text) + 2 'two more, to reflect the LBound saSrc.lLBound1d = -2 'include the 4 Len-Info-Bytes of the BSTR saSrc.pvData = StrPtr(Text) - 4 'adapt to the real start of the BSTR If saSrc.cElements1d = 2 Then Exit Sub 'nothing to replace BindArray aSrc, VarPtr(saSrc) For i = 0 To UBound(aSrc) Select Case aSrc(i) Case Is < 65, Is > 127 '<-- define the scrubbed Char-ranges here... Case Else: aSrc(j) = aSrc(i): j = j + 1 End Select Next i RtlMoveMemory aSrc(-2), CLng(j + j), 4 'adjust to the new Len-Info ReleaseArray aSrc End Sub Private Function HPTimer#() Dim x@: Static Frq@ If Frq = 0 Then QueryPerformanceFrequency Frq If QueryPerformanceCounter(x) Then HPTimer = x / Frq End Function Olaf
From: Schmidt on 27 Dec 2009 20:54 "Bee" <Bee(a)discussions.microsoft.com> schrieb im Newsbeitrag news:5F3E878D-D34E-456B-9CB1-18AAA879136C(a)microsoft.com... > I am loading a notepad "compatible" file from disk (shows extra > control characters as boxes, etc). > It may or may not be totally pure printable text. > I need to clean out all non-printing characters other than > the Tab, CR and LF. > I need to make proper paragraphs. > So I look for non-end-of-sentence with a CRLF near after > and remove the CRLF and other whitespace and replace > with a space if necessary. > So I scan forward then back up through the text and do > a replace as necessary. > I also look for other characters that I need to change or delete. For such "pretty formatting" tasks it is difficult to give concrete advise - cant you just post your current code? > I think this is legal. > Dim aByte() as byte > aByte=sString ' to byte array > work on the byte array > sString = StrConv(aByte, vbUnicode) ' back to string Nope, as Mike already pointed out, the correct pairing would either be: aByte = sString 'two Bytes per char (no ANSI-conversion) .... sString = aByte 'two Bytes per Char back-conversion or ANSI-based (ByteArray-StepWidth = 1) aByte = StrConv(sString, vbFromUnicode) .... sString = StrConv(aByte, vbUnicode) > Currently, with a very fast InString and Replace string > routine the 1M text file takes over a minute to process. That's pretty much for an 1MB-input, yes. As said, please post some code, regarding what you currently do - would be easier than "guessing". Aside from that, I'd probably split that up into two scans - the first one doing all single-char-cleanups (replacements with "nothing", using Larrys Lookup- approach). And in the second run over the already roughly cleaned up String, I'd try to ensure your "pretty formatting-stuff". Olaf
From: Bee on 27 Dec 2009 23:03
I have something working now using strings. It is now taking about 40 secs for the 1M file. It used to be many many minutes. I am working on the ReplaceInByteArray() routine. I will post that tomorrow as a new start post. I have everything except for this ReplaceInByteArray(). So I will let you tear it apart. But be gentle. And thanks to both of you for sticking with me on this. I plan on this: (1)Use very fast string replace to do the easy stuff. I have a very fast Like search and replace now. (2)Convert to a Byte Array and do the hard looping stuff. (3)Then convert back. the code is too large and convoluted for easy study and I think I have it down to just this one ReplaceInByteArray routine 'cause all else works. and as I said, if all else fails, the sub is all working correctly with strings. |