Unicode / cx_Oracle problem [Python]

Prev: Import a textfile to MS SQL with python
Next: PumpMessages et WM_QUIT

From: Richard Schulman on 8 Sep 2006 18:01

Sorry to be back at the goodly well so soon, but...

....when I execute the following -- variable mean_eng_txt being
utf-16LE and its datatype nvarchar2(79) in Oracle:

cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
VALUES (:id,:mean)""",id=id,mean=mean)

I not surprisingly get this error message:

"cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF16_LE in various plausible
places, I just end up generating different errors.

Recommendations, please?

TIA,
Richard Schulman
(Remove xx for email reply)

From: Diez B. Roggisch on 9 Sep 2006 05:38

Richard Schulman schrieb:
> Sorry to be back at the goodly well so soon, but...
>
> ...when I execute the following -- variable mean_eng_txt being
> utf-16LE and its datatype nvarchar2(79) in Oracle:
>
> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
> VALUES (:id,:mean)""",id=id,mean=mean)
>
> I not surprisingly get this error message:
>
> "cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
> type unicode"
>
> But when I try putting a codecs.BOM_UTF16_LE in various plausible
> places, I just end up generating different errors.

Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.

Diez

From: Richard Schulman on 9 Sep 2006 23:17

>> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
>> VALUES (:id,:mean)""",id=id,mean=mean)
>>...
>> "cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
>> type unicode"
>>
>> But when I try putting a codecs.BOM_UTF16_LE in various plausible
>> places, I just end up generating different errors.

Diez:
>Show us the alleged plausible places, and the different errors.
>Otherwise it's crystal ball time again.

More usefully, let's just try to fix the code above. Here's the error
message I get:

NotSupportedError: Variable_TypeByValue(): unhandled data type unicode

Traceback (innermost last):

File "c:\pythonapps\LoadMeanToOra.py", line 1, in ?
# LoadMeanToOra reads a UTF-16LE input file one record at a time
File "c:\pythonapps\LoadMeanToOra.py", line 23, in ?
cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)

What I can't figure out is whether cx_Oracle is saying it can't handle
Unicode for an Oracle nvarchar2 data type or whether it can handle the
input but that it needs to be in a specific format that I'm not
supplying.

- Richard Schulman

From: Diez B. Roggisch on 10 Sep 2006 05:42

Richard Schulman schrieb:
>>> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
>>> VALUES (:id,:mean)""",id=id,mean=mean)
>>> ...
>>> "cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
>>> type unicode"
>>>
>>> But when I try putting a codecs.BOM_UTF16_LE in various plausible
>>> places, I just end up generating different errors.
>
> Diez:
>> Show us the alleged plausible places, and the different errors.
>> Otherwise it's crystal ball time again.
>
> More usefully, let's just try to fix the code above. Here's the error
> message I get:
>
> NotSupportedError: Variable_TypeByValue(): unhandled data type unicode
>
> Traceback (innermost last):
>
> File "c:\pythonapps\LoadMeanToOra.py", line 1, in ?
> # LoadMeanToOra reads a UTF-16LE input file one record at a time
> File "c:\pythonapps\LoadMeanToOra.py", line 23, in ?
> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
>
> What I can't figure out is whether cx_Oracle is saying it can't handle
> Unicode for an Oracle nvarchar2 data type or whether it can handle the
> input but that it needs to be in a specific format that I'm not
> supplying.

What does

print repr(mean)

give you?

It _looks_ to me (don't have an orcacle available right now) as if it is
a unicode object. That you have to consider as some abstract string
representation. Which means it has to be encoded in some way before sent
over the wire. There might exist db-api bindings that can deal with
them, by applying a default encoding or somehow figuring out what
encoding the DB expects. But I don't see any references to unicode in
pep 249, so I presume you can't rely on that - which seems to be the
case here.

The oracle NLS is a sometimes tricky beast, as it sets the encoding it
tries to be clever and assigns an existing connection some encoding,
based on the users/machines locale. Which can yield unexpected results,
such as "Dusseldorf" instead of "D?sseldorf" when querying a german city
list with an english locale.

So - you have to figure out, what encoding your db-connection expects.
You can do so by issuing some queries against the session tables I
believe - I don't have my oracle resources at home, but googling will
bring you there, the important oracle term is NLS.

Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("latin1")

That should help.

Diez

From: John Machin on 10 Sep 2006 18:27

Richard Schulman wrote:
> On Sun, 10 Sep 2006 11:42:26 +0200, "Diez B. Roggisch"
> <deets(a)nospam.web.de> wrote:
>
> >What does print repr(mean) give you?
>
> That is a useful suggestion.
>
> For context, I reproduce the source code:
>
> in_file = codecs.open("c:\\pythonapps\\mean.my",encoding="utf_16_LE")
> connection = cx_Oracle.connect("username", "password")
> cursor = connection.cursor()
> for row in in_file:
> id = row[0]
> mean = row[1]
> print "Value of row is ", repr(row) #debug line
> print "Value of the variable 'id' is ", repr(id) #debug line
> print "Value of the variable 'mean' is ", repr(mean) #debug line
> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
> VALUES (:id,:mean)""",id=id,mean=mean)
>
> Here is the result from the print repr() statements:
>
> Value of row is u"\ufeff(3,'sadness, lament; sympathize with,
> pity')\r\n"
> Value of the variable 'id' is u'\ufeff'
> Value of the variable 'mean' is u'('
>
> Clearly, the values loaded into the 'id' and 'mean' variables are not
> satisfactory but are picking up the BOM.

Well of course they're "unsatisfactory" and this is absolutely nothing
to do with Oracle and cx_Oracle.

row is a string of characters. row[0] is the BOM. Read my lips (from a
previous thread):

"""
Use utf_16 -- it will strip off the BOM for you.
"""
and again:
"""
| >>> codecs.open('guff.utf16le', 'r', encoding='utf_16').read()
| u'abc\n\rdef\n\rghi' ######### Look, Mom, no BOM!
"""

row[1] is the first ***character*** of what looks suspiciously like the
Python representation of a tuple:

"""(3,'sadness, lament; sympathize with, pity')"""

Who wrote that like that??? If it is at all under your control, do it
like this:
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.

My guess (not having used cx_Oracle) is that the error is happening
because the column "id" has a numeric type and you are trying to jam a
Unicode string into it. IOW, nothing to do with the "mean" column
(yet!).

BTW, I've managed to decode that "eng" means English not engineering
and "mean" means meaning i.e. not average and not stingy. Holy
obfuscation, Batman!

HTH,
John

| Next | Last
Pages: 1 2
Prev: Import a textfile to MS SQL with python
Next: PumpMessages et WM_QUIT