From: Ron Johnson on
On 07/01/2010 06:11 AM, brownh wrote:
> Thank you, Matheiu, and others. I ultimately succeeded and here report
> my experiences with the options.
>
> 1. I found several on-line free conversion services. For various
> reasons such as security and privacy I did not pursue them.
>
> 2. Install OpenOffice and OpenOffice.OpenXML
> Translator. Because this contradicted my desire for command line
> conversion rather than install big GUI apps, I did not pursue.
>
> 3. Abiword can be used to convert the document format from .docx to,
> say, .pdf. It was my intent to use a command line utility instead, but
> here report that Abiword did in fact work and automatically detected
> the input format.
>
> 4. Antiword-for-Office is a perl script, but when I tried to compile,
> found I was missing the perl Archive::Zip module. Not knowing what to
> do about that and too little time to find out, I did not pursue.

$ apt-cache search perl archive zip

This indicates that you must install libarchive-zip-perl.

> 5. Unoconv script is a debian package and seems what I really
> want. However, when I ran it, I found that it depends on JRE, although
> "$ aptitude show unoconv" indicates that it depends on python. In any
> case, I don't happen to have JRE installed in current box, and so did not
> pursue.

And you couldn't install it?

> 6. Odf-converter. This is a perl script. It requires libtiff.so.3, but
> by symlinking found that it can use libtiff.so.4 instead. With it I
> was able to generate an .otf file, which of course required Abiword to
> convert to PDF since I can't use unoconv.
>
> Haines Brown
>
>


--
Seek truth from facts.


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C2C7AB1.20009(a)cox.net
From: Camaleón on
On Wed, 30 Jun 2010 14:48:49 -0400, brownh wrote:

> I received a .docx file appended in an e-mail, and need to extract and
> convert it to a convenient format such as .html, .pdf, or plain .txt.

(...)

If it's a simple file (just plain text) you can extract (unzip) the .docx
into *.xml data for a direct view or convert into another suitable format.

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/pan.2010.07.01.11.19.29(a)gmail.com
From: Ron Johnson on
On 07/01/2010 08:42 AM, brownh wrote:
> Ron Johnson<ron.l.johnson(a)cox.net> writes:
>
>> On 07/01/2010 06:11 AM, brownh wrote:
>>> 4. Antiword-for-Office is a perl script, but when I tried to compile,
>>> found I was missing the perl Archive::Zip module. Not knowing what to
>>> do about that and too little time to find out, I did not pursue.
>
>> This indicates that you must install libarchive-zip-perl.
>
> Thanks. This seemed to get through that hang in the compile, but now
> it hangs because it can't find XML/LibXML.pm. I did a search for
> LibXML, and the obvious package, libxml-libxml-common-perl, did not
> help (I already had libxml2 installed).
$ apt-file search libXML.pm

Interesting.

$ apt-file search libXML.pm
$

>>> 5. Unoconv script is a debian package and seems what I really
>>> want. However, when I ran it, I found that it depends on JRE, although
>>> "$ aptitude show unoconv" indicates that it depends on python. In any
>>> case, I don't happen to have JRE installed in current box, and so did not
>>> pursue.
>>
>> And you couldn't install it?
>
> No, the reason is that I'm working with temporary hardware and wanted
> to avoid doing that, but now I did install the Sun JRE. When I try to
> use unoconv on a .docx file I get:
>
> unoconv: UnoException during conversion: File could not be loaded by
> OpenOffice The provided document cannot be converted to the desired
> format.
>
> This sounds like it relies on OpenOffice, which I don't have
> installed.
>

Right. It appears that the unoconv package metadata is in error.

I'd file a bug asking for clarification.

--
Seek truth from facts.


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C2C9D47.3080802(a)cox.net
From: Camaleón on
On Thu, 01 Jul 2010 10:03:28 -0400, brownh wrote:

> Camaleón writes:
>
>> (...)
>>
>> If it's a simple file (just plain text) you can extract (unzip) the
>> .docx into *.xml data for a direct view or convert into another
>> suitable format.
>
> Camaleón, I'm afraid you lost me. The file was .docx, which looks
> binary. As a result, it's MIME'd in the mail message, which makes it
> plain ASCII.

I was referring to the "content" of the .docx file, not the "nature" of
it :-).

If there are images or tables, it will be difficult to render them in the
xml file (images would be linked and tables would need a parser). But if
the .docx file just cointains a bunch of text, it can be easily readable
from the resulting xml file.

> But apparently you mean that I can run unzip on the .docx file to
> extract *.xml data. This was news to me, for I had no idea that .docx
> was an archive. But I tried it, and a number of things happened. It
> created an empty _rels directory; it created a docProps directory in
> which are app.xml and core.xml, and it created a word/ directory in
> which there are a number of *.xml files. None of these xml files are
> understood by abiword.

Yep. MS ".docx" format is far from ".odt" flexibility but it shares some
features. One if that the files are compressed and can be easily
extracted for raw reading.

"document.xml" is the main file, the one that contains the text of the
document. And being a xml file, it can be read with any editor (console
or GUI based) or any browser because is just plain text. Of course, do
not expect to get the same shape you get with the ".docx" file when
opened with a text processor, but at least you can view the content of
the file :-)

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/pan.2010.07.01.14.39.22(a)gmail.com
From: Ron Johnson on
On 07/03/2010 01:09 PM, brownh wrote:
> Ron Johnson<ron.l.johnson(a)cox.net> writes:
>
>> On 07/01/2010 08:42 AM, brownh wrote:
>>> Ron Johnson<ron.l.johnson(a)cox.net> writes:
>>>
>>>> On 07/01/2010 06:11 AM, brownh wrote:
>>>>> 4. Antiword-for-Office is a perl script, but when I tried to compile,
>>>>> found I was missing the perl Archive::Zip module. Not knowing what to
>>>>> do about that and too little time to find out, I did not pursue.
>>>
>>>> This indicates that you must install libarchive-zip-perl.
>>>
>>> Thanks. This seemed to get through that hang in the compile, but now
>>> it hangs because it can't find XML/LibXML.pm. I did a search for
>>> LibXML, and the obvious package, libxml-libxml-common-perl, did not
>>> help (I already had libxml2 installed).
>> $ apt-file search libXML.pm
>>
>> Interesting.
>>
>> $ apt-file search libXML.pm
>> $
>
> Apt-file search libXML.pm returns:
>
> libxml-libxml-perl: /usr/lib/perl5/XML/LibXML.pm
>

Ah, here's the problem: you wrote "lib" but it should be "Lib".

> If I understand correctly, if I have perl5 installed (which I do), I
> should find /usr/lib/perl5/XML/LibXML.pm on my machine. In fact it is

Well, no. You'd only find it if libxml-libxml-perl is installed.

> not in my /usr/perl5/XML/ directory. I have Simple.pm, SAX.pm,
> NamespaceSupport.pm, and corresponding subdirectories there, but not
> that file.
>
> A search on line shows that the missing module is a common problem in
> a Windows environment. For linux, I gather the error can occur if
> there is no path to libxml2. But in my case, I got it finally to work
> by installing libxml-libxml-perl.
>

--
Seek truth from facts.


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C2F80E3.2090504(a)cox.net