Prev: parsing csv
Next: Simple hack to get $600 to your home.
From: Lew Pitcher on 16 Jun 2010 19:16 On June 16, 2010 18:39, in comp.unix.shell, harryooopotter(a)hotmail.com wrote: > On Jun 16, 1:13Â pm, Lew Pitcher <lpitc...(a)teksavvy.com> wrote: > [...] >> Also, Janis' suggestion of using dd(1) is good. However, dd(1) seems to >> only recognize two EBCDIC variants, and doesn't specify /which/ variants >> they are. My guess is EBCDIC-INT and EBCDIC-US, so if you find that your >> input.txt contains some offbeat EBCDIC (like EBCDIC-JP-KANA, for >> instance), you are likely going to be out of luck with dd(1). > >> Lew Pitcher > > This is what the first line look in in hex. > > $ head -1 input.txt | od -x > 0000000 2e2e 5fcc 2e25 c1ce cbca 3fd1 2e3e 2e2e > 0000020 2e2e 2e2e 2e2e cb2e 2f3f c1f8 ce3e e12e > 0000040 ce3e 25c1 f83f 2ec1 5fcc 3e25 2ecb 3fcb > 0000060 f82f 3ec1 2ece c72e c8c8 2ef8 2e2e c4cb > 0000100 c1c7 2f5f 2ecb 5fcc 0a0d > 0000112 OK, I can't find any EBCDIC that makes sense from that hex dump. Are you certain that the file is EBCDIC? How do you know? For that matter, how did you get the file in the first place? From the dump, it looks like, perhaps, the data is binary. I see repeating binary structures there (look for the 5fcc), and the file /might/ be a pure binary data dump. FWIW, the od -x (and hexdump -x) show data words with the bytes in reverse order. This means that your file contains 2e 2e cc 5f 25 2e ce c1 ca cb d1 3f 3e 2e 2e 2e 2e 2e 2e 2e 2e 2e 2e cb 3f 2f f8 c1 3e ce 2e e1 3e ce c1 25 ef f8 c1 2e cc 5f 25 3e cb 2e cb 3f 2f f8 c1 3e ce 2e 2e c7 c8 c8 f8 2e 2e 2e cb c4 c7 c1 5f 2f cb 2e cc 5f 0d 0a First off, obviously not ASCII: ASCII only carries characters in the 00 - 7f range, and all those c* and f* characters are outside the range of ASCII. Next, if it is an EBCDIC variant, it contains a number of unusual control characters. 2e is the ACK control character in all EBCDICs, and 0a is an SS2 ("Single Shift 2") control character (0d is "Carriage Return", as it is in ASCII, and 25 is "Line Feed"). If EBCDIC, then it isn't EBCDIC-INT (CP038) or EBCDIC-US; the file contains values that aren't legal characters in either EBCDIC variant (ca, cb, ce). If it is EBCDIC-CP-US (CP037), then those ca/cb/cc/ce/ef values correspond to "Soft Hyphen" (ca), "Latin small letter O with circumflex" (cb), "Latin small letter O with Diaeresis" (cc), "Latin small letter O with Acute" (ce) and "Latin capital letter O with Tilde" (ef). From the patterns of these characters in the data, it doesn't look like you have an EBCDIC-CP-US text here, either. Hmmmmmm..... Rather than go through all the variants of EBCDIC, we probably should examine how the file was produced, how you got it, and how you know that it is EBCDIC. Perhaps there are clues there. > And iconv can understand neirther ebdic nor cp038 ... > > $ iconv -l | grep -i ebcdic > $ iconv -l | grep -i cp038 > $ > > So I could not use dd and iconv. > Anyone has any other suggestions ? > -- Lew Pitcher Master Codewright & JOAT-in-training | Registered Linux User #112576 Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/ ---------- Slackware - Because I know what I'm doing. ------
From: Harry on 17 Jun 2010 01:44 On Jun 16, 4:16 pm, Lew Pitcher <lpitc...(a)teksavvy.com> wrote: [...] > OK, I can't find any EBCDIC that makes sense from that hex dump. Are you > certain that the file is EBCDIC? How do you know? For that matter, how did > you get the file in the first place? [...] The file content was from a MQ message received on a MQ Manager sitting on zOS; the MQ Manager has a setting CCSID(37) (just found out by checking the Q Manager setting) which is COM EUROPE EBCDIC according to IBM Web site. > From the dump, it looks like, perhaps, the data is binary. I see repeating > binary structures there (look for the 5fcc), and the file /might/ be a pure > binary data dump. Perhaps the message is not EBCDIC. I am just trying to decode the message.
From: Lew Pitcher on 17 Jun 2010 09:56 On June 17, 2010 01:44, in comp.unix.shell, harryooopotter(a)hotmail.com wrote: > On Jun 16, 4:16Â pm, Lew Pitcher <lpitc...(a)teksavvy.com> wrote: > [...] >> OK, I can't find any EBCDIC that makes sense from that hex dump. Are you >> certain that the file is EBCDIC? How do you know? For that matter, how >> did you get the file in the first place? > [...] > > The file content was from a MQ message received on a MQ Manager > sitting on zOS; the MQ Manager has a setting CCSID(37) (just found > out by checking the Q Manager setting) which is COM EUROPE > EBCDIC according to IBM Web site. > >> From the dump, it looks like, perhaps, the data is binary. I see >> repeating binary structures there (look for the 5fcc), and the file >> /might/ be a pure binary data dump. > > Perhaps the message is not EBCDIC. > I am just trying to decode the message. > It's been a while since I last worked with MQ (about 7 or 8 years). My guess is that you've got, at least in part, a dump of the MQ messages, including the headers. I don't have my MQ manuals handy, to interpret the data fields, so I can't be certain. /If/ you have such messages, a straight characterset conversion (like iconv or dd) won't be as much help as you'd like. There will be binary data (at least in the header, if not in the message itself) that such characterset conversion tools will not properly handle (they will convert the data as if it were characters from the source characterset, rather than leave the binary values alone, as binary values). My suggestion would be to get a bit more info about how the mainframe people created the file, and how it landed on your unix system (are you using USS or the zOS "unix" facilities?). Remember, file transfer utilities sometimes perform a "mass characterset conversion" for you, so if you (for instance) used ftp with the "ascii" option, your data is no longer in EBCDIC, and no longer reflective of an MQ data structure. Luck be with you -- Lew Pitcher Master Codewright & JOAT-in-training | Registered Linux User #112576 Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/ ---------- Slackware - Because I know what I'm doing. ------
From: realto on 1 Jul 2010 22:26
On Jun 16, 7:16 pm, Lew Pitcher <lpitc...(a)teksavvy.com> wrote: Lew Pitcher is a domain thief. For further info, checkout http://lewpitcher.ca |