From: Praveen on 16 Feb 2010 07:53 Forgot to mention. I am on ruby version ruby 1.9.1p0 (2009-01-30 revision 21907) [x86_64-linux]. Let me know if you require any info Thanks Praveen
From: Lui Kore on 16 Feb 2010 09:11 I'm not familiar with the enc_ APIs. But I think the easiest way is to use rb_funcall(some_str, rb_intern("encode") ... Praveen wrote: > Forgot to mention. I am on ruby version > > ruby 1.9.1p0 (2009-01-30 revision 21907) [x86_64-linux]. > > Let me know if you require any info > > Thanks > > Praveen -- Posted via http://www.ruby-forum.com/.
From: Heesob Park on 16 Feb 2010 09:49 Hi, 2010/2/16 Praveen <praveendevarao(a)gmail.com>: > Hi Kubo, > > I tried proceeding with the above mentioned APIs. However I am seeing > some interesting stuffs. Not sure if I am using the right constructs. > > Below is the Ruby script I am using: > > ====================================== > #encoding: utf-8 > > puts "Results in C extension" > puts "----------------------" > require 'ibm_db' > str = "insert into woods (name) values ('GÃHRINGæ')" > > conn = IBM_DB.connect 'DRIVER={IBM DB2 ODBC > DRIVER};DATABASE=devdb;HOSTNAME=9.124.159.74;PORT=50000;PROTOCOL=TCPIP;UID=db2admin;PWD=db2admin;','','' > stmt = IBM_DB.exec conn, str > IBM_DB.close conn > > print "----------------------\n\n" > > puts "Results in Ruby script" > puts "----------------------" > > puts "str.length is :#{str.length}" > puts "str.bytesize: #{str.bytesize}" > puts "**Forcing encoding**" > str1 = str.force_encoding("UTF-16LE") > puts "str.length is :#{str1.length}" > puts "str.bytesize: #{str1.bytesize}" > ====================================== > > In the script above, IBM_DB is the C extension module. However the > database call has got nothing to do with the unicode API usage. I have > just resused the module for trying the unicode support. > > The snippet in C extension that uses the unicode functions is as > below: > > ====================================== > VALUE ibm_db_exec(int argc, VALUE *argv, VALUE self){ >  rb_scan_args(argc, argv, "21", &connection, &stmt, &options); >  if (!NIL_P(stmt)) { >   rb_encoding *enc_received; >   rb_encoding *ucs2_enc = rb_enc_find("UTF-16LE"); >   rb_encoding *ucs4_enc = rb_enc_find("UTF-32LE"); > >   enc_received = rb_enc_from_index(ENCODING_GET(stmt)); > >   printf("\nString in received format: %s\n",RSTRING_PTR(stmt)); >   printf("\nrb_str_length is: %d\n",rb_str_length(stmt)); >   printf("\nRSTRING_LEN is: %d\n",RSTRING_LEN(stmt)); >   printf("\nEncoding format received: %s\n",enc_received->name); > >   stmt_ucs2  =  rb_str_export_to_enc(stmt,ucs2_enc); > >   printf("\nString in utf16 format: %s\n",RSTRING_PTR(stmt_ucs2)); >   printf("\nrb_str_length is: %d\n",rb_str_length(stmt_ucs2)); >   printf("\nRSTRING_LEN is: %d\n",RSTRING_LEN(stmt_ucs2)); >   printf("\nEncoding after conversion: %s\n",ucs2_enc->name); >  } > } > > ====================================== > > The above ruby script run produces the following output: > > ====================================== > > Results in C extension > ---------------------- > > String in received format: insert into woods (name) values > ('GÃHRINGæ') > > rb_str_length is: 89 > > RSTRING_LEN is: 47 > > Encoding format received: UTF-8 > > String in utf16 format: i #Expected because used printf > > rb_str_length is: 89 > > RSTRING_LEN is: 88 > > Encoding after conversion: UTF-16LE > ---------------------- > > Results in Ruby script > ---------------------- > str.length is :44 > str.bytesize: 47 > **Forcing encoding** > str.length is :24 > str.bytesize: 47 > > ====================================== > > I am not sure why is there a difference in the string length in the > original string [44] (UTF-8 format) and string after changing the > encoding [24] (to UTF-16LE). The same is the case in case of output in > the C extension, the bytesize and the length are same (+1 or -1) and > the length is different in different encoding formats. > 89 is not an integer but a VALUE. VALUE of 89 means 44 of integer. > Could you tell me what is that I am doing wrong? > You should use String#encode instead of String#force_encode like this: puts "**Converting encoding**" str1 = str.encode("UTF-16LE") puts "str.length is :#{str1.length}" puts "str.bytesize: #{str1.bytesize}" > Along with this, in C extension is there any API that I can call to > check if the given string is in a particular encoding or should I use > rb_enc_from_index and from there read the struct member name and > determine in the extension that I write? > Using rb_enc_get is more simple then rb_enc_from_index like this: enc_received = rb_enc_get(stmt); And, rb_str_length returns not an integer but a VALUE. So you should use NUM2INT like this: printf("\nrb_str_length is: %d\n",NUM2INT(rb_str_length(stmt))); Regards, Park Heesob
From: Praveen on 22 Feb 2010 11:27 Thanks All for your help!! Will Keep posted on how it goes. Thanks Praveen
From: Praveen on 18 Mar 2010 06:36 Hi, I wanted to know if there is any function in the C extension (Ruby-1.9) that can be used to convert the encoding of the string to the encoding format specified by the user (in his environment or by setting #encoding: at the beginning of .rb file). I did find 2 function namely rb_str_export and rb_str_export_locale. Not sure which one will convert the strings rightly to the format which the user is set. Could somebody guide. Thanks Praveen
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: ANN: toamqp 0.3.1 Next: Problems using the 'extensions' gem - can anyone help? |