Prev: 443413 M3i Zero , Ezflash Dsi , R4i Dsi 43531
Next: Can I get the Mime Content Type from a byte array?
From: Michelle on 11 Sep 2009 08:21 This is what i've done so far: -<code>- void ReadFile() { string pattern = "FF56131A1B087B15610800151E"; string inFile = @"D:\myfile.bin"; string outFile = @"D:\myfile.txt"; long blockSize = 512000; int prevBytes = 4; FileStream stream = File.OpenRead(inFile); if (stream.Length < blockSize) { blockSize = stream.Length; } long lastBlock = stream.Length % blockSize; byte[] buffer = new byte[blockSize]; int extraBytes = pattern.Length / 2 + prevBytes - 1; // ERROR // long blocks = (long)Math.Ceiling(stream.Length / blockSize); // The call is ambiguous between the following methods or properties: 'System.Math.Ceiling(decimal)' and 'System.Math.Ceiling(double)' int block = 1; while (block <= blocks) { if (block==blocks && block==lastBlock) { blockSize = lastBlock; // ERROR // byte[] buffer = new byte[blockSize]; // A local variable named 'buffer' cannot be declared in this scope because it would give a different meaning to 'buffer', // which is already used in a 'parent or current' scope to denote something else } } stream.Read(buffer, 0, blockSize); string hexStr = String.Join("", (rimBytes + buffer)); String.Format("{0:X2}",hexStr); // more to go } -<code>- It's based on a Powershell script i found with Google. -<script.ps1>- param ( [string]$pattern = $(throw '-pattern can''t be $null'), [string]$file = $(throw '-file can''t be $null'), [long]$blockSize = 50kb, [int]$prevBytes = 8 ) if ($blockSize -lt 1kb) {throw '-blockSize must be at least 1kb'} $file = $(if (test-path $file) {(rvpa $file).path} else { throw "Cannot find $path"}) $stream = [io.file]::OpenRead($file) if ($stream.length -lt $blockSize) {$blockSize = $stream.length} $lastBlock = $stream.length % $blockSize $buffer = new-object byte[] $blockSize $extraBytes = $pattern.length / 2 + $prevBytes - 1 $blocks = [math]::ceiling($stream.length / $blockSize) $block = 1 while ($block -le $blocks) { if ($block -eq $blocks -and $lastBlock) {$blockSize = $lastBlock $buffer = new-object byte[] $blockSize} [void]$stream.read($buffer, 0, $blockSize) $hexStr = [string]::join('', ($rimBytes + $buffer | % {'{0:X2}' -f $_})) $rimBytes = $buffer[-$extraBytes..-1] [regex]::matches($hexStr, $pattern, 'ignoreCase') | % {$i = $_.index - $prevBytes * 2 [string]::join('', $hexStr[$i..($i + $prevBytes * 2 - 1)]) | % {$target = $_ $hexBytes = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($target.subString($_,2))"} } } $block++ } [void]$stream.close [void]$stream.dispose -<script.ps1>-
From: Tom Spink on 11 Sep 2009 10:59 Michelle wrote: > Hi Tom, > >> Do you know if the hex value you are searching for is aligned at all? > > I don't exactly understand what you mean with "aligned" (maybe it's because > English is not my native language) > I need to search in all the rubbish for the pattern > (FF56131A1B087B15610800151E) > and then read the previous 4 bytes. (For example: > 0923080709C0224FFF56131A1B087B15610800151E -> 09C0224F) > The pattern is a kind of 'record footer'. But the 'records' doesn't have a > fixed length.. > > Is this the information you're needed? > > Michelle Hi Michelle, I posted a possible solution in reply to another message - I was waiting for community approval, however! -- Tom
From: Peter Duniho on 11 Sep 2009 13:05 On Fri, 11 Sep 2009 02:10:36 -0700, Tom Spink <tspink(a)gmail.com> wrote: > [...] > public static long FindArrayOffset(Stream s, byte[] arr) > { > int testByte, testIndex = 0; > long startingOffset = 0; > > for (testByte = s.ReadByte(); testByte >= 0; testByte = > s.ReadByte()) { > if (arr[testIndex++] != (byte)testByte) { > testIndex = 0; > startingOffset = s.Position; > } > > if (testIndex == arr.Length) > return startingOffset; > } > > return -1; > } > /// > > I haven't thought through all the code paths, but I've got a brain cell > nagging me about off-by-one errors, so pay particular attention to the > post-increment operator, as I may have cocked that up. I would say that the biggest issue is that if the real match is found starting within a partial match, that won't be found. For example, searching for "ABC" within the string "AABC". By the time the code "knows" that the string "AA" doesn't match with "AB", it's passed where the actual location of the searched-for string starts (the next character it's going to examing for a match with the beginning of the search string is 'B'). The problem is fixable, but would result in a more elaborate FSM implementation (it's not as simple as just backing up one byte, because the problem is more generalized than that...e.g. search for "AABC" within "AAABC", etc.). It's not clear that the OP really needs this level of complexity; managing cross-block searches isn't really _that_ hard, while implementing a correct FSM can actually be a little tricky. And assuming false starts during the search are few and very short, repeatedly examining a given byte after a false start shouldn't hurt performance too much. All that said, it's true that one significant advantage of an FSM approach is the ability to scan strictly sequentially, one byte at a time, through the file, no cross-block issues to worry about. If that's an appealing feature to the OP, in spite of the added complexity the FSM will bring, there have been several good solutions suggested along these lines in past threads in this newsgroup that essentially address the "search for a string in a stream of characters" problem. Applying the same solution to bytes instead of characters is trivial. Here's a link to one such message thread: http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_thread/thread/462037d91c08e693/ It includes a bit of a discussion on the topic, includes suggestions for at least two different viable approaches, and one of my posts in the thread links to a couple of posts in yet another previous thread where (in the first post) I posted a non-tricky, basic, generic state graph implementation, and then (in the follow-up) a little performance-fix to that original code. Pete
From: Peter Duniho on 11 Sep 2009 13:08 On Fri, 11 Sep 2009 02:59:42 -0700, Michelle <michelle(a)notvalid.nomail> wrote: > Hi Tom, > >> Do you know if the hex value you are searching for is aligned at all? > > I don't exactly understand what you mean with "aligned" (maybe it's > because > English is not my native language) I believe that Tom's asking if the pattern you're looking for will always begin at a byte the offset of which is exactly some multiple of some number larger than 1. For example, a 32-bit-aligned pattern could be found at byte offset 0, 4, 8, etc. but never at 1, 2, 3, 5, 6, 7, etc. If it's aligned, then that obviously can reduce somewhat the overhead of non-matching characters, because you have to examine on average 1/cb the number of bytes, where "cb" is the byte length of the alignment (e.g. for 32-bit alignment, "cb" is "4"). Pete
From: Michelle on 11 Sep 2009 13:18
It's not aligned. "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message news:op.uz3q77shvmc1hu(a)macbook-pro.local... > On Fri, 11 Sep 2009 02:59:42 -0700, Michelle <michelle(a)notvalid.nomail> > wrote: > >> Hi Tom, >> >>> Do you know if the hex value you are searching for is aligned at all? >> >> I don't exactly understand what you mean with "aligned" (maybe it's >> because >> English is not my native language) > > I believe that Tom's asking if the pattern you're looking for will always > begin at a byte the offset of which is exactly some multiple of some > number larger than 1. > > For example, a 32-bit-aligned pattern could be found at byte offset 0, 4, > 8, etc. but never at 1, 2, 3, 5, 6, 7, etc. > > If it's aligned, then that obviously can reduce somewhat the overhead of > non-matching characters, because you have to examine on average 1/cb the > number of bytes, where "cb" is the byte length of the alignment (e.g. for > 32-bit alignment, "cb" is "4"). > > Pete |