Prev: [ANN] Dfect 2.0.0 (2010-03-21)
Next: super
From: Ahmad Azizan on 22 Mar 2010 03:32 Hello, I'm trying to find a ruby module/code that can decompress LZW-compression-scheme from a PDF file. However, there is no such code or module (as far as I've known) that exist publicly. PDF usually compress its stream data by using FlateDecode, ASCIIHexDecode, ASCII85Decode, and LZWDecode. In ruby, FlateDecode and ASCII85Decode can be decompressed with existing ruby module which are zlib and Ascii85. For ASCIIHexDecode, I just need to convert Hex characters to char. My problem arise from the LZWDecode since there is no module or code to decompress it. Since there is no code example of implementing the LZW decompression in ruby, I've found the implementation code from python. However, translating python into ruby seems to be a pain-in-a-butt process. Example of working LZW decompression in python is here: http://pastebin.ca/1849009 My translated code in ruby is here: http://pastebin.ca/1849012 With a small input, I can decompress the it to get the equivalent output like the python code. e.g: Python data = "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01" tmp = LZWDecode(data) print tmp data = "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01" lzw = LZWDecoder.new(data) puts lzw.run() However, with a real stream from PDF file, I cannot get the decompressed output. I guess it might be some error in the code or improper handling of special character in ruby. I've spent large amount of hours/days in digesting how to decompress LZW stream and try to translate from python to ruby. It seems that my current effort didnt give me a bright end. I really hope someone can help me pointing some of the hint or solution towards this problem. Thank you -- Posted via http://www.ruby-forum.com/.
From: Ryan Davis on 22 Mar 2010 04:20 On Mar 22, 2010, at 00:32 , Ahmad Azizan wrote: > With a small input, I can decompress the it to get the equivalent output > like the python code. > e.g: > Python > data = "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01" > tmp = LZWDecode(data) > print tmp > > data = "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01" > lzw = LZWDecoder.new(data) > puts lzw.run() > > However, with a real stream from PDF file, I cannot get the decompressed > output. I guess it might be some error in the code or improper handling > of special character in ruby. Can you get the python code to decode the real stream? That'd be one way to determine if the original data is corrupt or not.
From: Brian Candler on 22 Mar 2010 05:31 > Example of working LZW decompression in python is here: > http://pastebin.ca/1849009 > My translated code in ruby is here: http://pastebin.ca/1849012 Which version of ruby are you using? If it's 1.9 then your @fp[@inc] may fall foul of the character encoding rules. Try this in your initialize: puts @fp.encoding @fp.force_encoding("ASCII-8BIT") However if you pass in a StringIO rather than a String then you can just copy what python is doing: x = @fp.read(1) @buff = x[0].unpack("C").first and read(1) always reads single bytes. This has the advantage of being able to decompress directly from files, without reading them into RAM first. Minor suggestion: it might be more rubyish to return nil rather than raise EOFError, which would simplify your run loop to result = "" while code = readbits(@nbits) result << feed(code) end return result Regards, Brian. -- Posted via http://www.ruby-forum.com/.
|
Pages: 1 Prev: [ANN] Dfect 2.0.0 (2010-03-21) Next: super |