Unicode again [Win32 API]

Prev: Mutex race?
Next: CreateFile on comm port in non-exclusive mode

From: Jean on 26 Apr 2010 00:47

>> if(buffer[0] == L'B' && buffer[1] == L'M')
same effect

Jean

"ScottMcP [MVP]" <scottmcp(a)mvps.org> a �crit dans le message de news:
45b16a39-252d-403a-80e4-1d1e38b57f52(a)u32g2000vbc.googlegroups.com...
> This is comparing unicode data with ANSI characters:
>
> if(buffer[0] == 'B' && buffer[1] == 'M')
>
> Try it this way:
>
> if(buffer[0] == L'B' && buffer[1] == L'M')

From: Jongware on 26 Apr 2010 08:08

On 26-Apr-10 6:47 AM, Jean wrote:
>
> "ScottMcP [MVP]"<scottmcp(a)mvps.org> a �crit dans le message de news:
> 45b16a39-252d-403a-80e4-1d1e38b57f52(a)u32g2000vbc.googlegroups.com...
>> This is comparing unicode data with ANSI characters:
>>
>> if(buffer[0] == 'B'&& buffer[1] == 'M')
>>
>> Try it this way:
>>
>> if(buffer[0] == L'B'&& buffer[1] == L'M')
>
>
>>> if(buffer[0] == L'B'&& buffer[1] == L'M')
> same effect

For the exact same reason.

Your buffer is a TCHAR, and in a Unicode environment it will use 2 bytes
per UC character.
You read a single-byte array into a double-byte destination. What will
the contents of buffer look like? Use your debugger! This is from memory:

buffer[0] = [ 'B' 'M' ]
buffer[1] = [ SizeLo1 SizeLo2 ]
buffer[2] = [ SizeHi1 SizeHi2 ]

-- where each [x y] represents one single (double-byte) Unicode
character. Sure, you can adjust your code to

if(buffer[0] == 0x424d)
...

and that will work for this header, but you *will* run into problems
with the rest of the BMP header, as each and every odd numbered data
bytes suddenly gets associated with the next even byte, whether it wants
or not.

.... As Xavier said: a BMP is *not* a Unicode structure, but a pure
binary format. Use an unsigned char buffer.

[Jw]

From: Jean on 26 Apr 2010 08:27

> ... As Xavier said: a BMP is *not* a Unicode structure, but a pure binary
> format. Use an unsigned char buffer.

..bmp was an example, i want to be able to check any filetype, like i do in a
non-unicode environment

Jean

"Jongware" <jongware(a)no-spam.plz> a �crit dans le message de news:
4bd58258$0$22918$e4fe514c(a)news.xs4all.nl...
> On 26-Apr-10 6:47 AM, Jean wrote:
> >
> > "ScottMcP [MVP]"<scottmcp(a)mvps.org> a �crit dans le message de news:
> > 45b16a39-252d-403a-80e4-1d1e38b57f52(a)u32g2000vbc.googlegroups.com...
> >> This is comparing unicode data with ANSI characters:
> >>
> >> if(buffer[0] == 'B'&& buffer[1] == 'M')
> >>
> >> Try it this way:
> >>
> >> if(buffer[0] == L'B'&& buffer[1] == L'M')
> >
> >
>>>> if(buffer[0] == L'B'&& buffer[1] == L'M')
>> same effect
>
> For the exact same reason.
>
> Your buffer is a TCHAR, and in a Unicode environment it will use 2 bytes
> per UC character.
> You read a single-byte array into a double-byte destination. What will the
> contents of buffer look like? Use your debugger! This is from memory:
>
> buffer[0] = [ 'B' 'M' ]
> buffer[1] = [ SizeLo1 SizeLo2 ]
> buffer[2] = [ SizeHi1 SizeHi2 ]
>
> -- where each [x y] represents one single (double-byte) Unicode character.
> Sure, you can adjust your code to
>
> if(buffer[0] == 0x424d)
> ...
>
> and that will work for this header, but you *will* run into problems with
> the rest of the BMP header, as each and every odd numbered data bytes
> suddenly gets associated with the next even byte, whether it wants or not.
>
> ... As Xavier said: a BMP is *not* a Unicode structure, but a pure binary
> format. Use an unsigned char buffer.
>
> [Jw]

From: Jongware on 26 Apr 2010 08:52

On 26-Apr-10 14:27 PM, Jean wrote:
>> ... As Xavier said: a BMP is *not* a Unicode structure, but a pure binary
>> format. Use an unsigned char buffer.
>
> .bmp was an example, i want to be able to check any filetype, like i do in a
> non-unicode environment

The environment is not relevant to your file handling code.
---
Whether or not your interface -- the OS, the GUI, your mesasgeboxes and
the filenames -- are Unicode or not, your BMP files (and your 'any'
filetypes as well) will *never* magically gain Unicode contents. Their
contents will always be a simple unsigned byte list.

Your error is you take the environment requirement "all text strings
should be converted to TCHAR" to mean "all /unsigned char/ strings ..."
The environment only needs this for its own data.

[Jw]

From: Jean on 26 Apr 2010 09:43

> Your error is you take the environment requirement "all text strings
> should be converted to TCHAR" to mean "all /unsigned char/ strings ..."
> The environment only needs this for its own data.

OK, i understand (well, i think i do :-) )

what about a file with embedded unicode text ?
in that case the content is not a simple unsigned byte list, correct ?

Jean
"Jongware" <jongware(a)no-spam.plz> a �crit dans le message de news:
4bd58c97$0$22936$e4fe514c(a)news.xs4all.nl...
> On 26-Apr-10 14:27 PM, Jean wrote:
>>> ... As Xavier said: a BMP is *not* a Unicode structure, but a pure
>>> binary
>>> format. Use an unsigned char buffer.
>>
>> .bmp was an example, i want to be able to check any filetype, like i do
>> in a
>> non-unicode environment
>
> The environment is not relevant to your file handling code.
> ---
> Whether or not your interface -- the OS, the GUI, your mesasgeboxes and
> the filenames -- are Unicode or not, your BMP files (and your 'any'
> filetypes as well) will *never* magically gain Unicode contents. Their
> contents will always be a simple unsigned byte list.
>
> Your error is you take the environment requirement "all text strings
> should be converted to TCHAR" to mean "all /unsigned char/ strings ..."
> The environment only needs this for its own data.
>
> [Jw]

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Mutex race?
Next: CreateFile on comm port in non-exclusive mode