Converting pdf to text in linux
The contents of a pdf file can be converted to a simple text file using the tool “pdftotext”. The format of using pdftotext is
$ pdftotext -f -l
Let us say we have a pdf file by the name temp.pdf, of which we want to convert to text the page numbers 1 to 4 and create a text file by the name output.txt .
$ pdftotext -f 1 -l 4 temp.pdf output.txt $ ls output.txt temp.pdf
The file output.txt will have all the text contents of the temp.pdf, pages 1 to 4 but will not contain the images. The formatting by default in not maintained in the text file.
To be able to maintain the formatting also we need to pass the option -layout.
$ pdftotext -f 1 -l 4 -layout temp.pdf output.txt
Now the text file will have almost the same format as the pdf file.