PDF Manipulation

From: Waldo
Subject: PDF Manipulation
Date: Thu, 13 Nov 2008 04:42:27 +0000
Message-ID: <2e5da89f-042a-4b82-8d44-0a1e22f2612e@w39g2000prb.googlegroups.com>

Hi all,

I'm wondering if anyone can point me in the right direction. We're
trying to find a way to manipulate PDF files in Lisp. Mainly, we need
to be able to "read" a PDF document, determine how many pages it
contains, be able to extract each page into a separate document, and
take a "snapshot" of a page into a .GIF/.TIFF/etc thumbnail file. Any
ideas?

Thanks,
Waldo

Re: PDF Manipulation D Herring
- Re: PDF Manipulation Waldo
  - Re: PDF Manipulation Zach Beane
Re: PDF Manipulation Ken McKee

From: D Herring
Subject: Re: PDF Manipulation
Date: Thu, 13 Nov 2008 08:08:39 +0000
Message-ID: <gfgnae$1pv$1@aioe.org>

Waldo wrote:
> I'm wondering if anyone can point me in the right direction. We're
> trying to find a way to manipulate PDF files in Lisp. Mainly, we need
> to be able to "read" a PDF document, determine how many pages it
> contains, be able to extract each page into a separate document, and
> take a "snapshot" of a page into a .GIF/.TIFF/etc thumbnail file. Any
> ideas?

Its getting a little old, but cl-pdf[1] should do half of what you 
want.  Its got a pdf parser and such but I'm not sure about the the 
thumbnail snapshots.

[1]: http://www.fractalconcept.com/asp/html/cl-pdf.html

The easiest way to achieve your goal might be to use ImageMagick.  For 
example, `convert x.pdf x.png` should dump one file per pdf page, each 
named x.png.N.  Probably best done with a mix of lisp and shell 
scripting, but CL-Magick may help.

[2]: http://common-lisp.net/project/cl-magick/

- Daniel

From: Waldo
Subject: Re: PDF Manipulation
Date: Thu, 13 Nov 2008 11:37:08 +0000
Message-ID: <b0ffb95e-131e-456c-bfdd-b78632cbd685@r15g2000prh.googlegroups.com>

On Nov 13, 3:08 am, D Herring <········@at.tentpost.dot.com> wrote:
> Waldo wrote:
> > I'm wondering if anyone can point me in the right direction. We're
> > trying to find a way to manipulate PDF files in Lisp. Mainly, we need
> > to be able to "read" a PDF document, determine how many pages it
> > contains, be able to extract each page into a separate document, and
> > take a "snapshot" of a page into a .GIF/.TIFF/etc thumbnail file. Any
> > ideas?
>
> Its getting a little old, but cl-pdf[1] should do half of what you
> want.  Its got a pdf parser and such but I'm not sure about the the
> thumbnail snapshots.
>
> [1]:http://www.fractalconcept.com/asp/html/cl-pdf.html
>
> The easiest way to achieve your goal might be to use ImageMagick.  For
> example, `convert x.pdf x.png` should dump one file per pdf page, each
> named x.png.N.  Probably best done with a mix of lisp and shell
> scripting, but CL-Magick may help.
>
> [2]:http://common-lisp.net/project/cl-magick/
>
> - Daniel

Thanks. I had looked at cl-pdf but from looking and it, it seems to
only be able to "write" pdfs programmatically. Maybe I'm mistaken. I
like the ImageMagick approach. Will evaluate it.

Thanks again

From: Zach Beane
Subject: Re: PDF Manipulation
Date: Thu, 13 Nov 2008 13:26:33 +0000
Message-ID: <m3tzabwyd2.fsf@unnamed.xach.com>

Waldo <·····@infoway.net> writes:

> Thanks. I had looked at cl-pdf but from looking and it, it seems to
> only be able to "write" pdfs programmatically. Maybe I'm mistaken.

CL-PDF includes a parser that converts a PDF file to in-memory
structures. It isn't very high-level, and the functions to use it aren't
exported, but it works. See the end of pdf-parser.lisp for an example.

Zach

From: Ken McKee
Subject: Re: PDF Manipulation
Date: Fri, 14 Nov 2008 19:03:13 +0000
Message-ID: <b7b5084e-0bb9-4f1a-b80f-e28cc4db2b5f@1g2000prd.googlegroups.com>

On Nov 12, 11:42 pm, Waldo <·····@infoway.net> wrote:
> Hi all,
>
> I'm wondering if anyone can point me in the right direction. We're
> trying to find a way to manipulate PDF files in Lisp. Mainly, we need
> to be able to "read" a PDF document, determine how many pages it
> contains, be able to extract each page into a separate document, and
> take a "snapshot" of a page into a .GIF/.TIFF/etc thumbnail file. Any
> ideas?
>
> Thanks,
> Waldo

Here it is in Clozure CL. I realize this is just thinly disguised
objective-c and dependent on Apple's PDFKit and Cocoa frameworks, but
what the heck...

(in-package :ccl)
(objc:load-framework "Quartz" :quartz)
(let* ((pdf-doc (#/initWithURL: (#/alloc (@class "PDFDocument"))
          (#/fileURLWithPath: (@class "NSURL")
          (%make-nsstring "somefile.pdf"))))
       (image (#/initWithSize: (#/alloc ns:ns-image) (ns:make-ns-point
85 110))))
  (loop for i from 0 to (- (#/pageCount pdf-doc) 1) do
     (let ((page-rep  (#/dataRepresentation (#/pageAtIndex: pdf-doc
i))))
      (#/writeToFile:atomically:  page-rep (%make-nsstring
                (format nil "/tmp/output/page~d.pdf" i)) #$YES)
      (#/lockFocus image)
      (#/drawInRect: (#/imageRepWithData: ns:ns-pdf-image-rep page-
rep)
                             (ns:make-ns-rect 0 0 85 110))
      (#/unlockFocus image)
      (#/writeToFile:atomically: (#/TIFFRepresentation image)
         (%make-nsstring (format nil "/tmp/output/page~d.tiff" i)) #
$YES)
        )))