From: PRMARJORAM on 9 Sep 2009 08:49 My application is compiled in UNICODE. I am downloading webpages using cyrillic characters for their content. Although these files themselves are ASCII. --- Based on the encoding setting within the webpage or the users browser determines how the content is to be interpreted and then displayed when used within a browser. --- My problem is my CString containing this content is WCHAR and so I need to convert 2 consecutive WCHAR to a single WCHAR to then get the correct cyrillic code to display. Im not clear how to walk through the string doing this, assuming it not simply adding the two WCHAR values together? Can anyone clarify this issue? Thanks.
From: David Wilkinson on 9 Sep 2009 09:11 PRMARJORAM wrote: > My application is compiled in UNICODE. I am downloading webpages using > cyrillic characters for their content. Although these files themselves are > ASCII. > > --- > Based on the encoding setting within the webpage or the users browser > determines how the content is to be interpreted and then displayed when used > within a browser. > --- > > My problem is my CString containing this content is WCHAR and so I need to > convert 2 consecutive WCHAR to a single WCHAR to then get the correct > cyrillic code to display. > > Im not clear how to walk through the string doing this, assuming it not > simply adding the two WCHAR values together? > > Can anyone clarify this issue? If you know the code page of the web site then you can convert to UTF-16 (Windows Unicode) using MultiByteToWideChar() function. -- David Wilkinson Visual C++ MVP
From: Giovanni Dicanio on 9 Sep 2009 09:44 PRMARJORAM ha scritto: > My application is compiled in UNICODE. I am downloading webpages using > cyrillic characters for their content. Although these files themselves are > ASCII. [...] > My problem is my CString containing this content is WCHAR and so I need to > convert 2 consecutive WCHAR to a single WCHAR to then get the correct > cyrillic code to display. My understanding of your problem is as follows: You have some text (coming from an ANSI webpage, using Cyrillic codepage, i.e. something like Windows-1251). This text is stored in an instance of CString, in a Unicode app (meaning that CString is actually a CStringW, or if you are using Visual C++ 6, CString is using WCHAR as TCHAR expansion). You would like to have a CString with Unicode UTF-16 representation of your Cyrillic characters. Is this correct? If so, I would use two passes conversions: 1. Convert your CString content from Unicode to ANSI, using your code page (e.g. 1251). You could use WideCharToMultiByte as David already suggested, or you may use the easier CW2AEX helper class, specifiying proper code-page identifier (e.g. 1251 for Windows-1251 Cyrillic) in the constructor. For a list of code page identifiers you can look this: "Code Page Identifiers" http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx (Note that C<X>2<Y> helper classes are available since VC++7.1, they are not available in VC6, in case you are using this old one you must use WideCharToMultiByte Win32 API). 2. Given the ANSI string (in Cyrillic code-page) returned in point #1, you can convert it to Unicode, using MultiByteToWideChar or CA2WEX helper class. As a result of that, you will have a simple Unicode UTF-16 string storing your Cyrillic characters. HTH, Giovanni
From: Giovanni Dicanio on 9 Sep 2009 09:59 PRMARJORAM ha scritto: > My application is compiled in UNICODE. I am downloading webpages using > cyrillic characters for their content. Although these files themselves are > ASCII. [...] > My problem is my CString containing this content is WCHAR and so I need to > convert 2 consecutive WCHAR to a single WCHAR to then get the correct > cyrillic code to display. I think that what I previously wrote may not be the right answer to your question. Could it be possible for you to clarify a little better the format of the input string? For example, in the Cyrillic code page 1251 I read here: http://www.fingertipsoft.com/ref/cyrillic/cp1251.html there is a character like an upper-case "K" (code: 202 dec, 0xCA hex). How is this character stored in your input string? What are the values of the two WCHAR's that you want to convert to one single WCHAR, in this particular case? Thanks, Giovanni
From: PRMARJORAM on 9 Sep 2009 10:42 Giovanni, I must have explained the problem pretty well as you pretty much have understood it. Yes the webpage in this particular instance im downloading is as you specified. <meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> Ok using a Binary Viewer on the first cyrillic code in the <title> tag is CC B3 Which 'should' be a cyrillic capital M? I hope this helps. Thanks again. "Giovanni Dicanio" wrote: > PRMARJORAM ha scritto: > > My application is compiled in UNICODE. I am downloading webpages using > > cyrillic characters for their content. Although these files themselves are > > ASCII. > [...] > > My problem is my CString containing this content is WCHAR and so I need to > > convert 2 consecutive WCHAR to a single WCHAR to then get the correct > > cyrillic code to display. > > I think that what I previously wrote may not be the right answer to your > question. > > Could it be possible for you to clarify a little better the format of > the input string? > > For example, in the Cyrillic code page 1251 I read here: > > http://www.fingertipsoft.com/ref/cyrillic/cp1251.html > > there is a character like an upper-case "K" (code: 202 dec, 0xCA hex). > > How is this character stored in your input string? > What are the values of the two WCHAR's that you want to convert to one > single WCHAR, in this particular case? > > Thanks, > Giovanni >
|
Next
|
Last
Pages: 1 2 3 4 5 6 Prev: 063770 M3i Zero , Ezflash Dsi , R4i Dsi 68210 Next: SetTimer not calling OnTimer? |