Prev: 443413 M3i Zero , Ezflash Dsi , R4i Dsi 43531
Next: Can I get the Mime Content Type from a byte array?
From: Michelle on 16 Sep 2009 01:26 Peter, > [. . . ] > Frankly, the more you explain about the basic problem, the less I feel > Files have structures; I can guarantee you whatever this kind of file is, > the intended user code doesn't need to search for things. It simply > parses the data and knows the precise location of particular kinds of > data within the file. The original file has a structure with non fixed-length records. There's less documentation (and no official one) available, so reverse engineering is the only option. We have the data obtained through data-recovery. So, between the requested records, there's other data stored. We can no longer rely on the original file structure. Because we don't need the complete record information, it's enough to find the mentioned patterns and reed some bytes before or after these patterns. Of course there is risk of false positives. But the combination of the pattern and the available bytes found, makes it acceptable. Because we need to do this more than once, and using WinHex is a lot of work, we decided to (try) write an application for this (only for internal use). Unfortunately I can't share all the details to the "why"s and "what"s, etc. related to our problem. Summarized it comes down: Search for the pattern: 0xFF 0x56 0x13 0x1A 0x1B 0x08 0x7B 0x15 0x61 0x08 0x00 0x15 0x1E Read the previous 4 bytes (convert them to a Decimal value) Search for the pattern: 0x07 0x00 0xXX 0x00 0x00 0x00 0x07 0x00 0xYY 0x00 0x00 0x00 0x08 0x00 ( where 0xXX can have 4 and 0xYY 6 different values) Read the next 8 bytes (convert them to a Decimal value) > [. . .] You could search for the sub-components individually. Look for > one, then look for the other in the specific place it should be if you > find the first. Though, if Regex makes the code simpler, it might well > be worth it anyway, even if it doesn't perform as well. Can I do this with the example you created ? Yesterday I spent my day reading documentation on various algorithms and C# examples. The problem is that all the examples I've found on the Internet, they intended to search for strings and not bytes. So, I've got a challenge :-)) I appreciate your help extremely ! Michelle
From: Michelle on 16 Sep 2009 01:33 Tom, > This sounds like a highly structured file - *surely* there is some sort > of descriptor at the start of it that contains a pointer to these > records. [ . . . ] Please read my reply on Peter's contribution. And also for you Tom. I appreciate your help extremely ! Michelle
From: Peter Duniho on 16 Sep 2009 03:54 On Tue, 15 Sep 2009 22:26:06 -0700, Michelle <michelle(a)notvalid.nomail> wrote: > [...] > Summarized it comes down: > Search for the pattern: 0xFF 0x56 0x13 0x1A 0x1B 0x08 0x7B 0x15 0x61 0x08 > 0x00 0x15 0x1E > Read the previous 4 bytes (convert them to a Decimal value) > > Search for the pattern: 0x07 0x00 0xXX 0x00 0x00 0x00 0x07 0x00 0xYY 0x00 > 0x00 0x00 0x08 0x00 ( where 0xXX can have 4 and 0xYY 6 different values) > Read the next 8 bytes (convert them to a Decimal value) I'm concerned that you keep writing "Decimal", when nothing about any of the description of the problem suggests you have or need Decimal values. Decimal is a specific type in .NET, a base-10 floating point structure. As I mentioned before, it takes 16 bytes. You can, of course, store an Int32 or Int64 value in a Decimal variable _after_ you've converted the raw bytes to Int32 or Int64 as appropriate. But that's a step completely independent of the file i/o and searching, and so any discussion of the Decimal type seems out of place here. The fact that it keeps coming up makes me concerned that you may not understand the distinction between Decimal and other numeric types, and/or the implications regarding how the number is stored. >> [. . .] You could search for the sub-components individually. Look >> for >> one, then look for the other in the specific place it should be if you >> find the first. Though, if Regex makes the code simpler, it might well >> be worth it anyway, even if it doesn't perform as well. > > Can I do this with the example you created ? Can you do which? You quoted two options: searching for sub-components individually, and using Regex. You can do the former by modifying the code I posted. You'll simply have to come up with a way of representing your search string in a way that can be translated into calls to the FRangesEqual() method. For example, have a List<T> of structs where the struct data type stores a reference to a byte[] containing the byte string you want to find, along with an offset within the search range for that byte string. Then pass that list to the "find" method, where it starts the search with the first element in the list, and then upon finding each element in the list, it executes another search at the offset relative to the current position in the file for the next element in the list. Repeat that until you run out of elements in the list or find a mis-match; if you run out of elements, you've found a match. Using your two sample search strings, the first time you search, the List<T> would have just one element, referencing a single byte string to look for, "0xFF 0x56 0x13 0x1A 0x1B 0x08 0x7B 0x15 0x61 0x08 0x00 0x15 0x1E", and an offset of 0. The next time you search, the List<T> would have three elements. The first would reference the byte string "0x07 0x00" and an offset of 0, the next the byte string "0x00 0x00 0x00 0x07 0x00" and an offset of 3, and the last the byte string "0x00 0x00 0x00 0x08 0x00" and an offset of 9. You can't use the code I posted to do a Regex search, not directly anyway. IMHO, if you're going to use Regex, you might as well go back to porting the PowerShell script you found. > Yesterday I spent my day reading documentation on various algorithms and > C# > examples. > The problem is that all the examples I've found on the Internet, they > intended to search for strings and not bytes. > So, I've got a challenge :-)) Any algorithm you find that is specifically for strings, you should be able to easily modify to handle bytes instead. The main issue you may run into would be examples that take advantage of string functions in existing libraries rather than implementing the algorithm themselves. You can either provide versions of those functions that work with bytes, or just stick to those algorithm examples that don't depend on library functions, but instead do all their own processing (in which case, simply changing any place a string is used to byte[] and any place a char is used to a byte). Pete
From: Michelle on 16 Sep 2009 04:40 Peter, > I'm concerned that you keep writing "Decimal", when nothing about any of > the description of the problem suggests you have or need Decimal values. > Decimal is a specific type in .NET, a base-10 floating point structure. > As I mentioned before, it takes 16 bytes. Is using 'Decimal notation' better ? > [. . . ] > and so any discussion of the Decimal type seems out of place here. Correct, it's not the main issue. [. . . ] > Can you do which? You quoted two options: searching for sub-components > individually, and using Regex. I meant search for the sub-components individually. > You can do the former by modifying the code I posted. You'll simply have > to come up with a way of representing your search string in a way that can > be translated into calls to the FRangesEqual() method. [ . . .] I examine this and see if I can get it done > You can't use the code I posted to do a Regex search, not directly > anyway. IMHO, if you're going to use Regex, you might as well go back to > porting the PowerShell script you found. Okay, that's not an option. [. . . ] > You can either provide versions of those functions that work with bytes, > or just stick to those algorithm examples that don't depend on library > functions, but instead do all their own processing (in which case, simply > changing any place a string is used to byte[] and any place a char is > used to a byte). See if I can get it done. Michelle
From: Michelle on 16 Sep 2009 11:10
Peter, [. . .] > You can use the Position property to adjust from where you're reading in > the file; save the current position, set the current position to 4 bytes > earlier than the offset of the found string, read the 4 bytes of > interest, then restore the current position to the previously saved > value. Int64 Offset1 = (ibBaseOffset + ibOffset); offset = 41152 Int64 Offset2 = stream.Position; offset = 49152 Why is Offset1 not equal to Offset2 ? Offset1 is the right offset. Changing the block size has affect ( byte[] rgbBlockCur = new byte[4096]; ) I tried several options with stream.Seek(Offset, SeekOrigin) to set the current position and restore the previously saved position. But when I read the previous 4 bytes and restore the position to the previous saved, the returned offset is not right anymore. The search continues, but it's not the right offset anymore. The first 'hit' has a right offset and the previous read bytes are right. After restoring the position to the previous saved, then it goes wrong. Michelle |