From: Simon on 24 Mar 2010 11:09 Hi, I am trying to read a file with some Japanese words. (Well, it has a mix of Japanese and English words). // -------------------------------- // _UNICODE is defined // FILE* fp = 0; errno_t err = _tfopen_s( &fp, _T("name.txt"), _T("rb") ); .... //-- get the file length .... TCHAR* buf = new TCHAR[ length+1 ]; memset( buf, 0, length+1 ); if( fread( buf, sizeof(TCHAR), length, file ) != length ) { ... return } .... // -------------------------------- But doing that does not load the file in 'buf' properly. Even the non Japanese characters are not loaded properly. What am I doing wrong? (Using notpad++ I can see that the data is as expected). Thanks Simon
From: Giovanni Dicanio on 24 Mar 2010 12:09 "Simon" <bad(a)example.com> ha scritto nel messaggio news:#FEiyP2yKHA.4492(a)TK2MSFTNGP05.phx.gbl... > I am trying to read a file with some Japanese words. > (Well, it has a mix of Japanese and English words). I think you should figure out which encoding the file uses. The file could be Unicode UTF-16 (LE or BE), or UTF-8... There is a useful freely-available class that allows you to load texts from different formats and convert them in Unicode UTF-16 (which is Windows default Unicode format): http://www.codeproject.com/KB/files/stdiofileex.aspx HTH, Giovanni
From: Oliver Regenfelder on 24 Mar 2010 19:45 Hello, Simon wrote: > Hi, > > I am trying to read a file with some Japanese words. > (Well, it has a mix of Japanese and English words). As Giovanni already pointed out, you need to be aware of the encoding of the file. Besides the various unicode encodings he mentioned a japanese text file might also easily be encoded using shift-jis or some other non unicode encoding. > FILE* fp = 0; > errno_t err = _tfopen_s( &fp, _T("name.txt"), _T("rb") ); > ... > //-- get the file length > ... > > TCHAR* buf = new TCHAR[ length+1 ]; > memset( buf, 0, length+1 ); memset(buf, 0, sizeof(TCHAR)*(length+1)); as TCHAR will be several bytes in size if _UNICODE is defined. > if( fread( buf, sizeof(TCHAR), length, file ) != length ) Here again it should be fread(...) != sizeof(TCHAR) * length As fread returns the number of bytes read. > But doing that does not load the file in 'buf' properly. > Even the non Japanese characters are not loaded properly. > > What am I doing wrong? (Using notpad++ I can see that the data is as > expected). Well, you are reading the file as a bunch of bytes. But you have to first convert the read data from the encoding used in the file into unicode to make real sense of the content. Best regards, Oliver
From: Mihai N. on 25 Mar 2010 03:57 > But doing that does not load the file in 'buf' properly. > Even the non Japanese characters are not loaded properly. > > What am I doing wrong? (Using notpad++ I can see that the data is as > expected). As pointed out already, you have to know the encoding of the file. But based on the above my best guess is UTF-16. Then you have to define both _UNICODE and UNICODE. memset( buf, 0, length+1 ); should be memset( buf, 0, (length+1)*sizeof(TCHAR) ); -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Mihai N. on 26 Mar 2010 00:13 >> But doing that does not load the file in 'buf' properly. >> Even the non Japanese characters are not loaded properly. >> >> What am I doing wrong? (Using notpad++ I can see that the data is as >> expected). > > As pointed out already, you have to know the encoding of the file. > But based on the above my best guess is UTF-16. > > Then you have to define both _UNICODE and UNICODE. > > memset( buf, 0, length+1 ); > should be memset( buf, 0, (length+1)*sizeof(TCHAR) ); My bad! Since you have _UNICODE defined and you don't even see the English right, then the file is anything but UTF-16. if you on a Japanese system probably UTF-8 or Shift-JIS (cp932) else probably UTF-8 So load the file as bytes, then use MultiByteToWideChar. -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
|
Next
|
Last
Pages: 1 2 Prev: How to draw custom text on Title bar like "Send Feedback" on windows 7 beta Next: Deployment |