Prev: warning: dereferencing pointer does break strict-aliasing rules
Next: performance of hash_map in combination with strings on vstudio 2003
From: matt on 31 Mar 2010 23:16 Hi all, I was working on parsing a file created by Fortran which is fixed format. It is a wide stream binary file. I know that there are 6 integers which are made up of 4 characters each, followed by 2 doubles of 8 characters each. I am wondering what is the best way to convert the raw characters to integer / double values. I believe it is possible via a stream facet and/or codecvt, but not sure. Below is my code, with a hand-coded function to convert from 4 wchars to int. Certainly there must be a more elegant way. Any help would be appreciated. I'm using gcc 4.4 on a 64-bit x86 Linux machine. Thanks, Matt. [code] #include <cstdlib> #include <fstream> #include <iostream> #include <iomanip> #include <sstream> #include <cmath> using namespace std; //Define the size for an int and a double const int SIZE_INT(4); const int SIZE_DBL(8); //Function to read an int from an input stream and convert it to an int type int readInt(wistream& in) { int returnArg(0); for(int i=0; i<(SIZE_INT); ++i) { wchar_t c; in.get(c); //Convert this char to an int and accumulate stringstream ss; ss << c; int j; ss >> j; returnArg += j * pow(128, i); } return returnArg; } ///// int main() { //Open the input file stream in binary mode. NOTE: wide character stream. wifstream in("File.in", ios::binary); if(!in) { cout << "Opening of input file failed. Exiting."; return EXIT_FAILURE; } //Read 6 ints and print them to standard output for(int i=0; i<6; ++i) { int theInt = readInt(in); cout << theInt << " "; } //Do the same for doubles ..... cout << endl; return EXIT_SUCCESS; } [/code] -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Mickey on 1 Apr 2010 20:44 On Apr 1, 7:16 pm, matt <li...(a)givemefish.com> wrote: > Hi all, > > I was working on parsing a file created by Fortran which is fixed > format. It is a wide stream binary file. > I know that there are 6 integers which are made up of 4 characters > each, followed by 2 doubles of 8 characters each. > > I am wondering what is the best way to convert the raw characters to > integer / double values. I believe it is possible via a stream facet > and/or codecvt, but not sure. > > Below is my code, with a hand-coded function to convert from 4 wchars > to int. Certainly there must be a more elegant way. Any help would > be appreciated. > > I'm using gcc 4.4 on a 64-bit x86 Linux machine. > > Thanks, > Matt. > > [code] > > #include <cstdlib> > #include <fstream> > #include <iostream> > #include <iomanip> > #include <sstream> > #include <cmath> > > using namespace std; > > //Define the size for an int and a double > const int SIZE_INT(4); > const int SIZE_DBL(8); > > //Function to read an int from an input stream and convert it to an > int type > int readInt(wistream& in) > { > int returnArg(0); > for(int i=0; i<(SIZE_INT); ++i) { > wchar_t c; > in.get(c); > > //Convert this char to an int and accumulate > stringstream ss; > ss << c; > int j; > ss >> j; > returnArg += j * pow(128, i); > } > > return returnArg; > > } > > ///// > > int main() > { > //Open the input file stream in binary mode. NOTE: wide character > stream. > wifstream in("File.in", ios::binary); > if(!in) { > cout << "Opening of input file failed. Exiting."; > return EXIT_FAILURE; > } > > //Read 6 ints and print them to standard output > for(int i=0; i<6; ++i) { > int theInt = readInt(in); > cout << theInt << " "; > } > > //Do the same for doubles ..... > > cout << endl; > return EXIT_SUCCESS; > > } > > [/code] { edits: quoted banner removed. please keep readers in mind when you quote. -mod } This looks neat enough solution to me if it is working for you. Another way could be to define appropriate structure and read data into it directly. But don't expect portability that easily. If you are doing this on the same platform/processor compiler conventions it is alright otherwise things like endian`ness and integer/double representation comes into picture. Personally I would have tried the structure approach first. Regards, Jyoti -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Ulrich Eckhardt on 1 Apr 2010 20:46
matt wrote: > I was working on parsing a file created by Fortran which is fixed > format. It is a wide stream binary file. "wide stream" is not a file format. wchar_t is only an internal (in memory) representation, the external (on disk) representation on any machine is still bytes. Now, what about C++ streams? What these do is to convert the external representation in bytes (char) to the internal representation (char or wchar_t). Note that even the byte-char mapping can have a real mapping and that a char-wchar_t mapping can simply map one external byte to one internal wchar_t. If you want to read a file format, you must configure the stream accordingly and also know the external encoding (like e.g. UTF-8 or one of the codepages). > I know that there are 6 integers which are made up of 4 characters > each, followed by 2 doubles of 8 characters each. > > I am wondering what is the best way to convert the raw characters to > integer / double values. Firstly, this is not a textual file format, but C++ streams are primarily tools for those files, not so much for packed binary formats. > I believe it is possible via a stream facet and/or codecvt, but not sure. The codecvt facets are exactly what govern the conversion between external bytes and the internal character type. However, you don't even have a text file here, so this isn't really useful. What you want is to retrieve single bytes and assemble your values from it. For that, there are two things to do: 1. Use a char-stream. This allows you to retrieve bytes (chars) directly. 2. Turn of end-of-line conversion with ios_base::binary. I see below that you do that already. 3. Turn off any conversion between external bytes and internal chars, you want the raw values. For that, you can use the classic or C locale. The code for that is then std::ifstream in(filename, std::ios_base::binary); in.imbue(std::locale::classic); > int readInt(wistream& in) > { > int returnArg(0); > for(int i=0; i<(SIZE_INT); ++i) { > wchar_t c; > in.get(c); > > //Convert this char to an int and accumulate > stringstream ss; > ss << c; > int j; > ss >> j; > returnArg += j * pow(128, i); > } > > return returnArg; > } Several notes here: 1. 'in.get(c)' tells you if it succeeded, you should test that. 2. Writing a wchar_t to a char stream will treat the wchar_t as integer, so they will be written as textual integer representation, which you then read back into an integer. This is not wrong (though even there the checking for errors is missing) but way too complicated. All character types are in fact integers, so you can directly use the value as it is. 3. pow(128,i) is a floating-point function, in order to assemble integers, you can also use the shift operations. I'm wondering about the 128 though, too, I would have thought that you would need 256 here. 4. I don't see you ever getting a negative value out of this. You will have to test this and adapt it accordingly. Make sure that you have a few test files and that you also learn about the "twos complement" representation for integers in memory. 5. All you do here could be achieved using char streams, too. The only caveat is that plain char might be signed or unsigned, this is implementation-defined. You should therefore cast the char to an unsigned char, which then gives you more precise control over what you are doing. 6. SIZE_INT, no need to put that in brackets. Further, I wouldn't use ALL_UPPERCASE, leave that exclusively for macros. Note that if you follow these points, you will be able to use your code on more than 99% of all machines. For the rest, you would have to change the code to adapt to a sign-magnitude representation instead of the twos complement now used by most CPUs. Further, assuming a byte always has 8 bits is also not portable, even though it's true for most machines, too. Now, for floating-point numbers, I'm afraid you will not get away as easily. For that, you will have to know both the representation in the file and the one in memory. Try reading up on IEEE floats, which are probably what is on disk. Good luck! Uli -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |