From: Erik Heil on
Hi there.
I believe that I have some sollutions to your problems. First of all,
you need to see whether or not your documentts are in some kind of
structured format. if they are, say DocBookXML, or something similar,
you may be able to find a quick solution to the searching problem. if
the documents are structured, you can probably parce them by entety
type. of course, this depends on how well they are marked up. Like
I've stated earlier, they key item here is to generate rapidly
searchable indexes that can be quaried against. I'm assuming that
since you deal with highly technical data, it is more or less in a
structured form. You could even generate SQL statements and possibly
use SQLLite if you don't want a full DB as overhead. Anyways, I'm more
than willing to help in any way with this project of yours. Let me
know what you think.
--Erik

On 5/29/10, Ron Johnson <ron.l.johnson(a)cox.net> wrote:
> On 05/29/2010 02:34 PM, Merciadri Luca wrote:
>> Ron Johnson wrote:
> [snip]
>>>
>>> Have you tried other PDF readers? Searched for Linux-based PDF indexers?
>> As I said in another topic, I am totally okay for free stuff (if it was
>> not the case, I would not be using Debian: thinking unfree but using
>> free is cowardice), but the fact is that I have not found a reader whose
>> range of compatibility with the PDF standard is as high as in acroread.
>> Acroread is slow, boring, sometimes buggy, but I need to use it as long
>> as I do not find a PDF reader which has such a big compatibility range.
>
> Nothing says that you must only use one reader at a time. ;)
>
> If poppler, for example, doesn't render *exactly* but searches
> /rapidly/, then you could search using poppler and "read" using
> Acroread.
>
> Alternatively, install poppler-utils for it's pdftohtml. Certainly
> it won't be perfect, but a browser might be faster than Acroread.
>
> --
> Dissent is patriotic, remember?
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
> listmaster(a)lists.debian.org
> Archive: http://lists.debian.org/4C01AFDD.7050004(a)cox.net
>
>


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/AANLkTil7rpW7qQYqmPMBBkzvWEA2a3Ns8fqCZfa43iii(a)mail.gmail.com
From: Merciadri Luca on
Yes, why not. But if they are in PDF format, how can I (re)structure
them better? Thanks.

Erik Heil wrote:
> Hi there.
> I believe that I have some sollutions to your problems. First of all,
> you need to see whether or not your documentts are in some kind of
> structured format. if they are, say DocBookXML, or something similar,
> you may be able to find a quick solution to the searching problem. if
> the documents are structured, you can probably parce them by entety
> type. of course, this depends on how well they are marked up. Like
> I've stated earlier, they key item here is to generate rapidly
> searchable indexes that can be quaried against. I'm assuming that
> since you deal with highly technical data, it is more or less in a
> structured form. You could even generate SQL statements and possibly
> use SQLLite if you don't want a full DB as overhead. Anyways, I'm more
> than willing to help in any way with this project of yours. Let me
> know what you think.
>


--
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
I use PGP. If there is an incompatibility problem with your mail
client, please contact me.




From: Merciadri Luca on
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ron Johnson <ron.l.johnson(a)cox.net> writes:

> On 05/29/2010 02:34 PM, Merciadri Luca wrote:
>> Ron Johnson wrote:
> [snip]
>>>
>>> Have you tried other PDF readers? Searched for Linux-based PDF indexers?
>> As I said in another topic, I am totally okay for free stuff (if it was
>> not the case, I would not be using Debian: thinking unfree but using
>> free is cowardice), but the fact is that I have not found a reader whose
>> range of compatibility with the PDF standard is as high as in acroread.
>> Acroread is slow, boring, sometimes buggy, but I need to use it as long
>> as I do not find a PDF reader which has such a big compatibility range.
>
> Nothing says that you must only use one reader at a time. ;)
>
> If poppler, for example, doesn't render *exactly* but searches
> /rapidly/, then you could search using poppler and "read" using
> Acroread.
>
> Alternatively, install poppler-utils for it's pdftohtml. Certainly it
> won't be perfect, but a browser might be faster than Acroread.
You're right. Why not? I'll try it out. Thanks.
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
- --

Better is the enemy of good.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iEYEARECAAYFAkwCExYACgkQM0LLzLt8MhxGMwCfT09ERGobDPabVMreQEMrI4hi
FWcAoKoOdXgyifFBY8m10TosoyPkfTA2
=4N5y
-----END PGP SIGNATURE-----


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/87ocfx26ll.fsf(a)merciadriluca-station.MERCIADRILUCA
From: Camaleón on
On Sat, 29 May 2010 20:47:52 +0200, Merciadri Luca wrote:

> I sometimes have really long documents (>4000 p) for specs., or for
> other purely technical stuff. I sometimes look for a given model, or for
> a given word. The fact is that acroread reads ~8 pg/s, and, thus, if I
> do not know that my keyword is simply at the last page of the document,
> it takes 500s ~8 minutes and a half. How can I speed it up? Why is it so
> sluggish? Do not tell me that it is limited by R/W access on the HDD...

4000 pages? Wow, I think I never opened such a document :-)

If you provide a sample link, we could run some text search performance
tests over the file.

Also, I don't have installed Acrobat on my linux boxes, but in windows,
there are two search facilities, "find" and "advanced search". The latter
is quicker.

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/pan.2010.05.30.09.54.49(a)gmail.com
From: Eduardo M KALINOWSKI on
On 05/29/2010 04:34 PM, Merciadri Luca wrote:
> As I said in another topic, I am totally okay for free stuff (if it was
> not the case, I would not be using Debian: thinking unfree but using
> free is cowardice), but the fact is that I have not found a reader whose
> range of compatibility with the PDF standard is as high as in acroread.
> Acroread is slow, boring, sometimes buggy, but I need to use it as long
> as I do not find a PDF reader which has such a big compatibility range.
>

Well, if you need Adobe Acrobat Reader, complain to Adobe that it's slow
and hope they fix it.


--
Eduardo M KALINOWSKI
eduardo(a)kalinowski.com.br


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C0255EC.40407(a)kalinowski.com.br