Converting pdf to text in linux

The contents of a pdf file can be converted to a simple text file using the tool “pdftotext”. The format of using pdftotext is

$ pdftotext -f -l

Let us say we have a pdf file by the name temp.pdf, of which we want to convert to text the page numbers 1 to 4 and create a text file by the name output.txt .

$ pdftotext -f 1 -l 4 temp.pdf output.txt $ ls output.txt temp.pdf

The file output.txt will have all the text contents of the temp.pdf, pages 1 to 4 but will not contain the images. The formatting by default in not maintained in the text file.

To be able to maintain the formatting also we need to pass the option -layout.

$ pdftotext -f 1 -l 4 -layout temp.pdf output.txt

Now the text file will have almost the same format as the pdf file.

Tags: ,
Copyright 2017. All rights reserved.

Posted January 22, 2013 by Tux Think in category "Linux