Prev: Perl opening Buenos Aires Argentina
Next: FAQ 2.14 Where are the archives for comp.lang.perl.misc?
From: A. Farber on 17 Feb 2010 13:28 Hello, I have a russian card game at http://apps.facebook.com/video-preferans/ which I've recently moved from using urlencoded data to XML data in UTF-8. Since then it often hangs for the users and I suspect, that my subroutine: sub enqueue { my $child = shift; my $data = shift; my $fh = $child->{FH}; my $response = $child->{RESPONSE}; # flash.net.Socket.readUTF() expects 16-bit prefix in network order my $prefix = pack 'n', length $data; # append to the end of the outgoing queue push @{$response}, $prefix . $data; } packs wrong number of bytes for cyrillic messages. I'm using perl v5.10.0 at OpenBSD 4.5 and "perldoc -tf length" suggests using length(Encoding::encode_utf8(EXPR)) But when I put the line: use Encode::Encoding; .... my $prefix = pack 'n', length(Encoding::encode_utf8($data)); then it borks with Undefined subroutine &Encoding::encode_utf8 called at Child.pm line 229. Any help please? Also I have to mention, that when users chat in Russian, my server just passes their cyrillic messages around (with sysread - poll - syswrite). But for their cyrillic words in my program (I "use utf8;") I have to call utf8::encode($cyrillic_word) before I can write it away with syswrite or it would die ("wide char"). I've tried moving utf8::encode($data) into the enqueue subroutine above but it doesn' allow me (maybe because parts of $data are not utf8??) Regards Alex
From: sln on 17 Feb 2010 18:55 On Wed, 17 Feb 2010 10:28:59 -0800 (PST), "A. Farber" <alexander.farber(a)gmail.com> wrote: >Hello, > >I have a russian card game at >http://apps.facebook.com/video-preferans/ >which I've recently moved from using urlencoded data >to XML data in UTF-8. Since then it often hangs >for the users and I suspect, that my subroutine: > >sub enqueue { > my $child = shift; > my $data = shift; > my $fh = $child->{FH}; > my $response = $child->{RESPONSE}; > > # flash.net.Socket.readUTF() expects 16-bit prefix in network >order > my $prefix = pack 'n', length $data; > > # append to the end of the outgoing queue > push @{$response}, $prefix . $data; >} > >packs wrong number of bytes for cyrillic messages. > If '$data' is still a Perl string, I would encode() to UTF-8 octets then push @outarray, pack ('n a*', length($octets), $octets); But, you could do it a couple of different ways. Basically you want the length to be of the encoded data, not the length of the perl string (if it's in Perl character semantics). You really don't want to push '$prefix . $data' if $data is not yet encoded utf-8. If it is already encoded utf-8, then the length would be correct because its already bytes (octets), not character semantics. You should read the Unicode docs: perluniintro, perlunicode, unicode, etc. Each have links that take you to each other documentation. Below is some examples of a couple of ways to do it. See what works for you. -sln ---------------------- use strict; use warnings; use Encode; binmode (STDOUT, ':encoding(UTF-8)'); ## my $perlstring = "This is a string <\x{2100}>..."; my $utf8octets = encode('UTF-8', $perlstring); my $packd_string = pack('n', length($utf8octets)); my $unpackd_string = unpack('n', $packd_string); print "** Perl string : '$perlstring', length = ", length($perlstring),"\n\n"; print "UTF-8 octets: '$utf8octets', length = ", length($utf8octets),"\n\n"; print "Packed length of encoded string is $unpackd_string\n\n"; ## my $len_plus_octets = $packd_string . $utf8octets; print "Length.UTF-8 octets: '$len_plus_octets'\n\n"; ## my $packd_all = pack ('n a*', length($utf8octets), $utf8octets); print "Packed all : '$packd_all', length = ",length($packd_all),"\n\n"; ## my ($len,$octets) = unpack ('n a*', $packd_all); print "Unpacked all : '$octets', length = ",length($octets),"\n"; print " : read packed length = $len\n\n"; my $decoded_string = decode('UTF-8', $octets); print "** Perl string : '$decoded_string', length = ", length($decoded_string), "\n\n"; if ($decoded_string eq $perlstring) { print "** Perl strings are equal.\n"; } else { print "** Perl strings are not equal.\n"; } __END__ ** Perl string : 'This is a string <G��>...', length = 23 UTF-8 octets: 'This is a string <+�-�-�>...', length = 25 Packed length of encoded string is 25 Length.UTF-8 octets: ' ?This is a string <+�-�-�>...' Packed all : ' ?This is a string <+�-�-�>...', length = 27 Unpacked all : 'This is a string <+�-�-�>...', length = 25 : read packed length = 25 ** Perl string : 'This is a string <G��>...', length = 23 ** Perl strings are equal.
From: A. Farber on 18 Feb 2010 02:14 Thank you! I've ended up with encode($data) and after that the length() gives me the number of bytes for the syswrite (I hope)
|
Pages: 1 Prev: Perl opening Buenos Aires Argentina Next: FAQ 2.14 Where are the archives for comp.lang.perl.misc? |