From: lihao on 2 Apr 2010 19:14 under vim, these non-ascii characters are displayed as <97>, <92> etc. and on command line with 'cat -A filename' shows: 'M-BM-^W' , 'M-BM- ^R' respectively... I know <97> is some long dash '--' and <92> is a single quotation mark. under bash, How can I transfer these characters into proper ASCII, i.e.: under vim: <97> => - <92> => ' many thanks, lihao
From: Ben Bacarisse on 2 Apr 2010 22:10 lihao <lihao0129(a)gmail.com> writes: > under vim, these non-ascii characters are displayed as <97>, <92> etc. > and on command line with 'cat -A filename' shows: 'M-BM-^W' , 'M-BM- > ^R' respectively... I know <97> is some long dash '--' and <92> is a > single quotation mark. under bash, How can I transfer these characters > into proper ASCII, i.e.: The command tr "\227\222" "-'" will translate octal 227 (hex 97) bytes into - and octal 222 (hex 92) bytes into '. This may not work because the cat -A output suggests that you may not have these exact bytes in the file. I don't use cat -A so I can't interpret the sequences you see. I find the least ambiguous output is a hex dump ('hd' or 'od -t x1'). If you know the actual encoding used, the iconv utility can be an excellent way to map between different file encodings. For example: iconv --from-code=windows-1252 --to-code=ascii//translit is a close match to what you want (but it doubles the - to --). -- Ben.
From: Thomas 'PointedEars' Lahn on 2 Apr 2010 22:21 lihao wrote: > under vim, these non-ascii characters are displayed as <97>, <92> etc. > and on command line with 'cat -A filename' shows: 'M-BM-^W' , 'M-BM- > ^R' respectively... I know <97> is some long dash '--' and <92> is a > single quotation mark. It's "em dash" and "right single quotation mark". > under bash, How can I transfer these characters into proper ASCII, i.e.: > > under vim: > <97> => - > <92> => ' (JFYI: This has nothing to do with bash, and little to do with shell- scripting. You want to choose your forum more carefully next time. <http://www.catb.org/~esr/faqs/smart-questions.html#forum>) The original encoding must be Windows-125x which are the only encodings/character sets I know to use the codepoint range 0x7F..0x9F for printable characters (ISO/IEC 8859-x have no characters at all there and ISO-8859-x/Unicode have only control characters there). <http://en.wikipedia.org/wiki/Western_Latin_character_sets_%28computing%29> There is a tool to TRanslate between user-defined sets of characters: tr '\226\222' "-'" < filename (227 and 222 are the octal representations of hexadecimal 97 and 92, respectively.) For getting all the non-equivalent single-character ASCII representations of Windows-1252 characters, you can use recode Windows-1252..ASCII filename (I have tested this to work.) You might want to make a backup of the original file before. Both provide, of course, only a crude approximation of the original characters, if at all (there is no single-character approximation for the Euro, Ellipsis, or Permille character in US-ASCII, for example; recode -f manages to represent some of them adequately, though.) However, you can convert the file to use an encoding for a character set that contains equivalent characters: iconv -f Windows-1252 -t UTF-8 < filename > filename-utf8 or iconv -f Windows-1252 -t UTF-8 -o filename-utf8 filename (or any other Unicode Transformation Format. I have tested this to work on a console with UTF-8 locale by first converting EM DASH [U+2014] followed by RIGHT SINGLE QUOTATION MARK [U+2019] with iconv to Windows-1252, opening the file with vim, observing the described `<97><92>', and converting it back to UTF-8 with iconv, observing the same characters that are in the original UTF-8-encoded file.) HTH PointedEars
From: pk on 3 Apr 2010 05:35 Thomas 'PointedEars' Lahn wrote: > (JFYI: This has nothing to do with bash, and little to do with shell- > scripting. You want to choose your forum more carefully next time. And this is not a forum, for that matter.
From: Chris F.A. Johnson on 3 Apr 2010 08:34 On 2010-04-03, pk wrote: > Thomas 'PointedEars' Lahn wrote: > >> (JFYI: This has nothing to do with bash, and little to do with shell- >> scripting. You want to choose your forum more carefully next time. > > And this is not a forum, for that matter. Of course it's a forum. It's not a _web_ forum but it is a forum. -- Chris F.A. Johnson, author <http://shell.cfajohnson.com/> =================================================================== Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress) Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress) ===== My code in this post, if any, assumes the POSIX locale ===== ===== and is released under the GNU General Public Licence =====
|
Next
|
Last
Pages: 1 2 Prev: Possible read/write conflict within awk. Next: What's the equivalent of the RE: \d+[+-]? |