Prev: Tablelist use file selector box for interactive cell editing
Next: THANKS GOD! I GOT $2000 FROM PAYPAL....
From: Andreas Leitgeb on 3 Aug 2010 12:59 Georgios Petasis <petasis(a)iit.demokritos.gr> wrote: > Στις 3/8/2010 13:19, ο/η eugene έγραψε: >> On Aug 3, 12:03 pm, Georgios Petasis<peta...(a)iit.demokritos.gr> wrote: >>> Hm, I have found a page that states that the DROPFILES structure >>> will never contain data in utf-8 format: >>> http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-... >>> It states "Note that file names in DROPFILES structure are never in >>> UTF-8. They are either in UTF-16, or in system default ANSI code page." Would it be possible to get a byte-dump of the relevant part of that DROPFILES structure? > I am not sure I will manage to fix it. I compiled tkdnd with UNICODE & > _UNICODE defined instead of _MBCS, and treated the data as both unicode > (using Tcl_UniCharToUtfDString) and UTF-16 (using WideCharToMultiByte). > The result was the same in both cases, a wrong one. Dropping the > filename "english, русский, العربية, Ελληνικά.txt" results in "english, > @CAA:89, 'D91(J), •»»·½ΉΊ¬.txt". I pasted these two strings to "recode u8..utf16 | hd" and got: english, русский, العربية, Ελληνικά.txt 00000000: FE FF 00 65 00 6E 00 67 - 00 6C 00 69 00 73 00 68 | e n g l i s h| 00000010: 00 2C 00 20 04 40 04 43 - 04 41 04 41 04 3A 04 38 | , @ C A A : 8| 00000020: 04 39 00 2C 00 20 06 27 - 06 44 06 39 06 31 06 28 | 9 , ' D 9 1 (| 00000030: 06 4A 06 29 00 2C 00 20 - 03 95 03 BB 03 BB 03 B7 | J ) , | 00000040: 03 BD 03 B9 03 BA 03 AC - 00 2E 00 74 00 78 00 74 | . t x t| english, @CAA:89, 'D91(J), •»»·½ΉΊ¬.txt 00000000: FE FF 00 65 00 6E 00 67 - 00 6C 00 69 00 73 00 68 | e n g l i s h| 00000010: 00 2C 00 20 00 40 00 43 - 00 41 00 41 00 3A 00 38 | , @ C A A : 8| 00000020: 00 39 00 2C 00 20 00 27 - 00 44 00 39 00 31 00 28 | 9 , ' D 9 1 (| 00000030: 00 4A 00 29 00 2C 00 20 - 20 22 00 BB 00 BB 00 B7 | J ) , " | 00000040: 00 BD 03 89 03 8A 00 AC - 00 2E 00 74 00 78 00 74 | . t x t| There does seem to be some pattern... (and some exceptions to it, too)
From: Jeff Hobbs on 3 Aug 2010 15:16 On Aug 3, 8:10 am, Georgios Petasis <peta...(a)iit.demokritos.gr> wrote: > ΣÏÎ¹Ï 3/8/2010 13:19, ο/η eugene ÎγÏαÏε: > > > > > On Aug 3, 12:03 pm, Georgios Petasis<peta...(a)iit.demokritos.gr> > > wrote: > > >> Hm, I have found a page that states that the DROPFILES structure > >> will never contain data in utf-8 format: > > >>http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-.... > > >> It states "Note that file names in DROPFILES structure are never in > >> UTF-8. They are either in UTF-16, or in system default ANSI code page." > > >> So, my assumption that defining _MBCS will have them in utf-8 is not > >> valid. Windows use the default ANSI page. I will define _UNICODE and > >> handle it as a unicode string. > > >> George > > > So then can we expect a patch any time soon? :) > > I am not sure I will manage to fix it. I compiled tkdnd with UNICODE & > _UNICODE defined instead of _MBCS, and treated the data as both unicode > (using Tcl_UniCharToUtfDString) and UTF-16 (using WideCharToMultiByte). > The result was the same in both cases, a wrong one. Dropping the > filename "english, ÑÑÑÑкий, اÙعربÙØ©, Îλληνικά.txt" results in "english, > @CAA:89, 'D91(J), â¢Â»Â»Â·Â½Îά.txt". How did you get it to compile with UNICODE and _UNICODE defined? I added these to OleDND.h and get a lot of code errors about it not being really wchar-aware. Jeff
From: Georgios Petasis on 3 Aug 2010 15:29 Στις 3/8/2010 22:16, ο/η Jeff Hobbs έγραψε: > On Aug 3, 8:10 am, Georgios Petasis<peta...(a)iit.demokritos.gr> wrote: >> Στις 3/8/2010 13:19, ο/η eugene έγραψε: >> >> >> >>> On Aug 3, 12:03 pm, Georgios Petasis<peta...(a)iit.demokritos.gr> >>> wrote: >> >>>> Hm, I have found a page that states that the DROPFILES structure >>>> will never contain data in utf-8 format: >> >>>> http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-... >> >>>> It states "Note that file names in DROPFILES structure are never in >>>> UTF-8. They are either in UTF-16, or in system default ANSI code page." >> >>>> So, my assumption that defining _MBCS will have them in utf-8 is not >>>> valid. Windows use the default ANSI page. I will define _UNICODE and >>>> handle it as a unicode string. >> >>>> George >> >>> So then can we expect a patch any time soon? :) >> >> I am not sure I will manage to fix it. I compiled tkdnd with UNICODE& >> _UNICODE defined instead of _MBCS, and treated the data as both unicode >> (using Tcl_UniCharToUtfDString) and UTF-16 (using WideCharToMultiByte). >> The result was the same in both cases, a wrong one. Dropping the >> filename "english, русский, العربية, Ελληνικά.txt" results in "english, >> @CAA:89, 'D91(J), •»»·½ΉΊ¬.txt". > > How did you get it to compile with UNICODE and _UNICODE defined? I > added these to OleDND.h and get a lot of code errors about it not > being really wchar-aware. > > Jeff Yes. I have corrected all these :-) I have just committed the changes. I haven't updated TEA though, only cmake. George
From: Jeff Hobbs on 3 Aug 2010 16:22 On Aug 3, 12:29 pm, Georgios Petasis <peta...(a)iit.demokritos.gr> wrote: > ΣÏÎ¹Ï 3/8/2010 22:16, ο/η Jeff Hobbs ÎγÏαÏε: > > > > > On Aug 3, 8:10 am, Georgios Petasis<peta...(a)iit.demokritos.gr>  wrote: > >> ΣÏÎ¹Ï 3/8/2010 13:19, ο/η eugene ÎγÏαÏε: > > >>> On Aug 3, 12:03 pm, Georgios Petasis<peta...(a)iit.demokritos.gr> > >>> wrote: > > >>>> Hm, I have found a page that states that the DROPFILES structure > >>>> will never contain data in utf-8 format: > > >>>>http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-.... > > >>>> It states "Note that file names in DROPFILES structure are never in > >>>> UTF-8. They are either in UTF-16, or in system default ANSI code page." > > >>>> So, my assumption that defining _MBCS will have them in utf-8 is not > >>>> valid. Windows use the default ANSI page. I will define _UNICODE and > >>>> handle it as a unicode string. > > >>>> George > > >>> So then can we expect a patch any time soon? :) > > >> I am not sure I will manage to fix it. I compiled tkdnd with UNICODE& > >> _UNICODE defined instead of _MBCS, and treated the data as both unicode > >> (using Tcl_UniCharToUtfDString) and UTF-16 (using WideCharToMultiByte).. > >> The result was the same in both cases, a wrong one. Dropping the > >> filename "english, ÑÑÑÑкий, اÙعربÙØ©, Îλληνικά.txt" results in "english, > >> @CAA:89, 'D91(J), â¢Â»Â»Â·Â½Îά.txt". > > > How did you get it to compile with UNICODE and _UNICODE defined?  I > > added these to OleDND.h and get a lot of code errors about it not > > being really wchar-aware. > > > Jeff > > Yes. I have corrected all these :-) > I have just committed the changes. I haven't updated TEA though, only cmake. OK, using those changes I see that there is improvement after adding #define UNICODE to the sources, but not correctness yet. With the Greek text, what was ???? is now »»·½¹º¬, which happens to equate to: (demos) 62 % encoding convertfrom utf-8 Îλληνικά Â»»·½¹º¬ I suspect a conversion is occurring that shouldn't happen. Jeff
From: Georgios Petasis on 3 Aug 2010 16:32 Στις 3/8/2010 23:22, ο/η Jeff Hobbs έγραψε: > On Aug 3, 12:29 pm, Georgios Petasis<peta...(a)iit.demokritos.gr> > wrote: >> Στις 3/8/2010 22:16, ο/η Jeff Hobbs έγραψε: >> >> >> >>> On Aug 3, 8:10 am, Georgios Petasis<peta...(a)iit.demokritos.gr> wrote: >>>> Στις 3/8/2010 13:19, ο/η eugene έγραψε: >> >>>>> On Aug 3, 12:03 pm, Georgios Petasis<peta...(a)iit.demokritos.gr> >>>>> wrote: >> >>>>>> Hm, I have found a page that states that the DROPFILES structure >>>>>> will never contain data in utf-8 format: >> >>>>>> http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-... >> >>>>>> It states "Note that file names in DROPFILES structure are never in >>>>>> UTF-8. They are either in UTF-16, or in system default ANSI code page." >> >>>>>> So, my assumption that defining _MBCS will have them in utf-8 is not >>>>>> valid. Windows use the default ANSI page. I will define _UNICODE and >>>>>> handle it as a unicode string. >> >>>>>> George >> >>>>> So then can we expect a patch any time soon? :) >> >>>> I am not sure I will manage to fix it. I compiled tkdnd with UNICODE& >>>> _UNICODE defined instead of _MBCS, and treated the data as both unicode >>>> (using Tcl_UniCharToUtfDString) and UTF-16 (using WideCharToMultiByte). >>>> The result was the same in both cases, a wrong one. Dropping the >>>> filename "english, русский, العربية, Ελληνικά.txt" results in "english, >>>> @CAA:89, 'D91(J), •»»·½ΉΊ¬.txt". >> >>> How did you get it to compile with UNICODE and _UNICODE defined? I >>> added these to OleDND.h and get a lot of code errors about it not >>> being really wchar-aware. >> >>> Jeff >> >> Yes. I have corrected all these :-) >> I have just committed the changes. I haven't updated TEA though, only cmake. > > OK, using those changes I see that there is improvement after adding > #define UNICODE to the sources, but not correctness yet. With the > Greek text, what was ???? is now »»·½¹º¬, which happens to equate to: > > (demos) 62 % encoding convertfrom utf-8 Ελληνικά > »»·½¹º¬ > > I suspect a conversion is occurring that shouldn't happen. > > Jeff Which is absolute correct. The library file tkdnd_windows.tcl had in olednd::_normalise_data a call to "encoding convertfrom $data" for the CF_HDROP type, that I had completely forgotten about. Just fixed in the latest SVN HEAD. Many thanks, George
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Tablelist use file selector box for interactive cell editing Next: THANKS GOD! I GOT $2000 FROM PAYPAL.... |