Prev: File::Slurp/IO::String/wantarray interaction bug
Next: How to return the line number that right next to a match?
From: Dr.Ruud on 11 Jun 2010 19:58 Peter J. Holzer wrote: > On 2010-06-10 07:02, Chris Nehren <apeiron(a)invalid.isuckatdomains.net> wrote: >> On 2010-06-09, Peter J. Holzer scribbled these curious markings: >>> On 2010-06-08 07:57, Dr.Ruud <rvtol+usenet(a)xs4all.nl> wrote: >>>> On the related subject of creating nice PDFs: >>>> we are using webkit for that for the last few years, >>>> we create many-many thousands a day, >>>> and we are very happy with the results. >>> >>> Sounds interesting. Which perl module do you use (there are several on >>> CPAN, but the descriptions don't look promising)? >> >> Not a module, per se, but I've had success with wkhtmltopdf. See >> http://code.google.com/p/wkhtmltopdf/ for more info. > > Thanks, but after playing with it for a bit I found two problems: > > 1) It pretends to be a screen device, not a printing device (so for a > stylesheet which contain both @media print and @media screen sections > it chooses the wrong ones). > 2) It sometimes makes a pagebreak in the middle of a line (so the upper > half of the line is on page 1 and the lower half of the line is on > page 2). > > It looks like the tool renders the page the same way as a browser on > screen and then cuts the result into pages. This should help: --print-media-type "page-break-inside: avoid;" http://www.smashingmagazine.com/2007/02/21/printing-the-web-solutions-and-techniques/ http://code.google.com/p/wkhtmltopdf/issues/detail?id=9 http://code.google.com/p/wkhtmltopdf/issues/detail?id=57 http://search.cpan.org/~tbr/WKHTMLTOPDF-0.02/lib/WKHTMLTOPDF.pm -- Ruud
From: Peter J. Holzer on 13 Jun 2010 05:28 On 2010-06-11 23:58, Dr.Ruud <rvtol+usenet(a)xs4all.nl> wrote: > Peter J. Holzer wrote: >> On 2010-06-10 07:02, Chris Nehren <apeiron(a)invalid.isuckatdomains.net> wrote: >>> Not a module, per se, but I've had success with wkhtmltopdf. See >>> http://code.google.com/p/wkhtmltopdf/ for more info. >> >> Thanks, but after playing with it for a bit I found two problems: >> >> 1) It pretends to be a screen device, not a printing device (so for a >> stylesheet which contain both @media print and @media screen sections >> it chooses the wrong ones). >> 2) It sometimes makes a pagebreak in the middle of a line (so the upper >> half of the line is on page 1 and the lower half of the line is on >> page 2). >> >> It looks like the tool renders the page the same way as a browser on >> screen and then cuts the result into pages. > > This should help: > --print-media-type That was the option I was looking for. I guess I didn't expect to find an option which I consider extremely important (in fact, I think it should be the default) to be hidden under "less common command switches". > "page-break-inside: avoid;" I see that I wasn't clear enough what I meant with "a pagebreak in the middle of a line", so some screenshots may help: http://www.hjp.at/junk/ss-wkhtmltopdf1.png http://www.hjp.at/junk/ss-wkhtmltopdf2.png As you can see, the last line of the page is split *horizontally* slightly above the baseline in both cases - the descenders appear at the top of the next page. That's clearly a bug and not something "page-break-inside: avoid;" is supposed to fix. "page-break-inside: avoid;" avoids pagebreaks within an element, e.g. a paragraph, but that isn't the problem here. > http://www.smashingmagazine.com/2007/02/21/printing-the-web-solutions-and-techniques/ Nice collection of links, although I'm not sure why you mention them. > http://code.google.com/p/wkhtmltopdf/issues/detail?id=9 Yup, my problem number 2 is mentioned in comment 4 here. I already found that before posting. > http://code.google.com/p/wkhtmltopdf/issues/detail?id=57 Different problem. > http://search.cpan.org/~tbr/WKHTMLTOPDF-0.02/lib/WKHTMLTOPDF.pm Ouch! My eyes! Couldn't he have named the thing WkHTMLtoPDF of WkHtmlToPdf, or something? ;-). hp
From: Dr.Ruud on 13 Jun 2010 07:56
Peter J. Holzer wrote: > On 2010-06-11 23:58, Dr.Ruud <rvtol+usenet(a)xs4all.nl> wrote: >> This should help: >> --print-media-type > > That was the option I was looking for. I guess I didn't expect to find > an option which I consider extremely important (in fact, I think it > should be the default) to be hidden under "less common command > switches". Yes, I also don't understand why "they" did it like that, it makes it all unnecessary less easy to understand. But it still all works reasonably well, we create many thousands of unique PDFs daily with it. >> "page-break-inside: avoid;" > > I see that I wasn't clear enough what I meant with "a pagebreak in the > middle of a line" [...] > the last line of the page is split *horizontally* > slightly above the baseline That's what I understood, and I assumed that you could prevent that by giving the element that attribute. BTW, the default page size is A4. The manual says: <quote> Page Breaking The current page breaking algorithm of WebKit leaves much to be desired. Basically webkit will render everything into one long page, and then cut it up into pages. This means that if you have two columns of text where one is vertically shifted by half a line, then webkit will cut a line into to pieces display the top half on one page, and the bottom half on another page. It will also break image in two and so on. If you are using the patched version of QT you can use the CSS page-break-inside property to remedy this somewhat. There is no easy solution to this problem, until this is solved try organising your HTML documents such that it contains many lines on which pages can be cut cleanly. See also: <http://code.google.com/p/wkhtmltopdf/issues/detail?id=9>, <http://code.google.com/p/wkhtmltopdf/issues/detail?id=33> and <http://code.google.com/p/wkhtmltopdf/issues/detail?id=57>. </quote> Fonts (and Qt's QPrinter::ScreenResolution) also can cause issues: http://code.google.com/p/wkhtmltopdf/issues/detail?id=72 -- Ruud |