From: Jongware on 26 Apr 2010 10:05 On 26-Apr-10 15:43 PM, Jean wrote: >> Your error is you take the environment requirement "all text strings >> should be converted to TCHAR" to mean "all /unsigned char/ strings ..." >> The environment only needs this for its own data. > > OK, i understand (well, i think i do :-) ) > > what about a file with embedded unicode text ? > in that case the content is not a simple unsigned byte list, correct ? Yes, you got it. You might want to think about how you would read a Unicode data file in a non-Unicode environment -- sort of the opposite of what you have now. (Even 'reading Unicode text' *in* a Unicode environment needs some attention, as there is not really something like 'plain Unicode text'; it might be UTF-8 encoded, which is not aware of byte ordering, or it might have the magic BOM value -- either U+FEFF or U+FFFE -- to indicate in which order the high-byte/low-byte pairs are.) Theoretically, you can write your entire program as a *non* Unicode version and test that. When upgrading to a Unicode version, all you need to change are the I/O strings -- the filename in your original code snippet, and any text strings that are to be communicated to the user. The code itself should not change. By way of a challenge: Using MSVC's text macro _T("your text") you can make your code Unicode *unaware* -- that is, it should compile and run the same, whether you have defined UNICODE or not. You should only bracket the type of strings I mentioned above, and not those in 'binary' comparisions, such as if (strcmp (magicstring, "BM")) .. where you would *not* use the automatically translated "_tcscmp" because in this case you are looking for an exact match. Happy coding, [Jw]
From: Dee Earley on 26 Apr 2010 11:10 On 26/04/2010 14:43, Jean wrote: >> Your error is you take the environment requirement "all text strings >> should be converted to TCHAR" to mean "all /unsigned char/ strings ..." >> The environment only needs this for its own data. > > OK, i understand (well, i think i do :-) ) > > what about a file with embedded unicode text ? > in that case the content is not a simple unsigned byte list, correct ? Incorrect. It will still be a stream of bytes. It just won't be "plain text". -- Dee Earley (dee.earley(a)icode.co.uk) i-Catcher Development Team iCode Systems (Replies direct to my email address will be ignored. Please reply to the group.)
From: r_z_aret on 26 Apr 2010 17:58 On Sun, 25 Apr 2010 16:32:52 +0200, "Jean" <nosp-jean(a)free.fr> wrote: My comments may be a paraphrase of Jonware's comments. See below (in line). >Hello > >this code works: > >unsigned char buffer[1025]; >fopen("toto.bmp", "rb"); >fread(buffer, sizeof(unsigned char),1024, pf); >fclose(pf); > >if(buffer[0] == 'B' && buffer[1] == 'M') > ... > >this code does not work: (compiled with UNICODE and _UNICODE) > >WCHAR buffer[1025]; >_wfopen(L"toto.bmp", L"rb"); >fread(buffer,sizeof(WCHAR),1024,pf); This line will read two bytes into each of the two byte elements of buffer. >fclose(pf); > >if(buffer[0] == 'B' && buffer[1] == 'M') > ... > >it's the comparison that does not work. >(i tried if(buffer[0] == L'B' && buffer[1] == L'M') too) >any idea ? This will compare the two bytes in buffer[0] with the two characters in L'B' and the two bytes in buffer[1] with the two characters in L'M'. This will probably not work as you expect unless the input file is Unicode text (so each character takes up two bytes in the file). I _think_ you are trying to support UNICODE and ASCII files in one program. I don't think you can do that unless you have a separate section for each, and your program determines which type of file you're reading and chooses the right code. I believe it is very tricky to determine by looking at a file's contents whether it is UNCODE or ASCII. That is why UNICODE files are usually marked by a preceding BOM (Byte Order Marker). For more info about BOM, use Google to look it up in this newsgroup. Something of a nit pick: When I first read your note, I assumed "does not work" meant "does not compile". You might try to be more explicit in the future. > >jean > ----------------------------------------- To reply to me, remove the underscores (_) from my email address (and please indicate which newsgroup and message). Robert E. Zaret, MVP PenFact, Inc. 20 Park Plaza, Suite 400 Boston, MA 02116 www.penfact.com Useful reading (be sure to read its disclaimer first): http://catb.org/~esr/faqs/smart-questions.html
From: r_z_aret on 26 Apr 2010 17:58 On Mon, 26 Apr 2010 14:08:56 +0200, Jongware <jongware(a)no-spam.plz> wrote: >On 26-Apr-10 6:47 AM, Jean wrote: > > > > "ScottMcP [MVP]"<scottmcp(a)mvps.org> a �crit dans le message de news: > > 45b16a39-252d-403a-80e4-1d1e38b57f52(a)u32g2000vbc.googlegroups.com... > >> This is comparing unicode data with ANSI characters: > >> > >> if(buffer[0] == 'B'&& buffer[1] == 'M') > >> > >> Try it this way: > >> > >> if(buffer[0] == L'B'&& buffer[1] == L'M') > > > > >>>> if(buffer[0] == L'B'&& buffer[1] == L'M') >> same effect > >For the exact same reason. > >Your buffer is a TCHAR, and in a Unicode environment it will use 2 bytes >per UC character. No, buffer is explicitly WCHAR. The definition of TCHAR depends on whether UNICODE is defined. The definition of WCHAR does not. None of the original code uses TCHAR, so TCHAR has no relevance here, and is a distraction. >You read a single-byte array into a double-byte destination. What will >the contents of buffer look like? Use your debugger! This is from memory: This statement made me think about the line using fread. See my reply to the original post. ----------------------------------------- To reply to me, remove the underscores (_) from my email address (and please indicate which newsgroup and message). Robert E. Zaret, MVP PenFact, Inc. 20 Park Plaza, Suite 400 Boston, MA 02116 www.penfact.com Useful reading (be sure to read its disclaimer first): http://catb.org/~esr/faqs/smart-questions.html
From: Jean on 27 Apr 2010 01:05 Hi Robert >> I _think_ you are trying to support UNICODE and ASCII files in one > program yes, correct >I assumed "does not work" meant "does not compile". no, it compiles correctly, the comparison (==) is not effective I use VC6 and C SDK, for XP, Vista and 7, with all the previous advices i compile now with UNICODE and _UNICODE. All my files accesses are made with _wfopen, all the readings with fread and an unsigned char buffer. It works fine with western, greek, chinese and russian file names :-) For listing the files i use a _w_finddata_t structure with _wfdindfirst, _wfindnext, it works too For displaying those file names in listviews ot statusbars, window title and so on i use WCHAR everywhere Jean <r_z_aret(a)pen_fact.com> a �crit dans le message de news: ni0ct5tluajljpgh319i20oo1emleu7bbi(a)4ax.com... > On Sun, 25 Apr 2010 16:32:52 +0200, "Jean" <nosp-jean(a)free.fr> wrote: > > My comments may be a paraphrase of Jonware's comments. See below (in > line). > >>Hello >> >>this code works: >> >>unsigned char buffer[1025]; >>fopen("toto.bmp", "rb"); >>fread(buffer, sizeof(unsigned char),1024, pf); >>fclose(pf); >> >>if(buffer[0] == 'B' && buffer[1] == 'M') >> ... >> >>this code does not work: (compiled with UNICODE and _UNICODE) >> >>WCHAR buffer[1025]; >>_wfopen(L"toto.bmp", L"rb"); >>fread(buffer,sizeof(WCHAR),1024,pf); > > This line will read two bytes into each of the two byte elements of > buffer. > > >>fclose(pf); >> >>if(buffer[0] == 'B' && buffer[1] == 'M') >> ... >> >>it's the comparison that does not work. >>(i tried if(buffer[0] == L'B' && buffer[1] == L'M') too) >>any idea ? > > This will compare the two bytes in buffer[0] with the two characters > in L'B' and the two bytes in buffer[1] with the two characters in > L'M'. This will probably not work as you expect unless the input file > is Unicode text (so each character takes up two bytes in the file). > > I _think_ you are trying to support UNICODE and ASCII files in one > program. I don't think you can do that unless you have a separate > section for each, and your program determines which type of file > you're reading and chooses the right code. I believe it is very tricky > to determine by looking at a file's contents whether it is UNCODE or > ASCII. That is why UNICODE files are usually marked by a preceding BOM > (Byte Order Marker). For more info about BOM, use Google to look it up > in this newsgroup. > > Something of a nit pick: > When I first read your note, I assumed "does not work" meant "does not > compile". You might try to be more explicit in the future. > > >> >>jean >> > > ----------------------------------------- > To reply to me, remove the underscores (_) from my email address (and > please indicate which newsgroup and message). > > Robert E. Zaret, MVP > PenFact, Inc. > 20 Park Plaza, Suite 400 > Boston, MA 02116 > www.penfact.com > Useful reading (be sure to read its disclaimer first): > http://catb.org/~esr/faqs/smart-questions.html
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Mutex race? Next: CreateFile on comm port in non-exclusive mode |