如何将PDF转换为图像？

问题描述

我有要求将PDF页面转换为图像。有一些背景图像写有一些文字，所以当我将它保存为图像时，只有背景图像被保存。

是否有相同的软件可以将完整的页面转换为图像？

最佳解决方法

安装imagemagick
使用pdf所在的终端：convert -density 150 input.pdf -quality 90 output.png其中：
- 可以选择PNG，JPG或(虚拟)任何其他图像格式
- -density xxx会将dpi设置为xxx(通常为150和300)
- -quality xxx会将PNG，JPG和MIFF文件格式的压缩文件设置为xxx(100表示不压缩)
- 所有其他选项(如修剪，灰度等)都可以在Image Magic的网站上查看。

次佳解决方法

您可以使用pdftoppm将PDF转换为PNG：

pdftoppm input.pdf outputname -png

这将使用outputname-01.png格式输出PDF中的每个页面，其中01是页面的索引。

转换PDF的单个页面

pdftoppm input.pdf outputname -png -f {page} -singlefile

将{page}更改为页码。它的索引编号为1，因此-f 1将成为第一页。

指定转换图像的分辨率

此命令的默认分辨率是150 DPI。增加它会导致更大的文件大小和更多细节。

要提高转换PDF的分辨率，请添加选项-rx {resolution}和-ry {resolution}。例如：

pdftoppm input.pdf outputname -png -rx 300 -ry 300

第三种解决方法

IIRC GIMP能够使用PDF，即将它们转换为图像。所以如果你想立即编辑图像 – GIMP是你的朋友。

第四种方法

目前被接受的答案是做这项工作，但会导致产出规模较大并且质量下降。

答案为here的方法产生的输出与输入大小相当，并且不会遭受质量损失。

TLDR – 使用pdfimages：pdfimages -j input.pdf output

引用链接的答案：

It’s not clear what you mean by “quality loss”. That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).

Perhaps you need to use -density to do the conversion at a higher dpi:
convert -density 300 file.pdf page_%04d.jpg 
(You can prepend -units PixelsPerInch or -units PixelsPerCentimeter if necessary. My copy defaults to ppi.)

Update: As you pointed out, gscan2pdf (the way you’re using it) is just a wrapper for pdfimages (from poppler). pdfimages does not do the same thing that convert does when given a PDF as input.

convert takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.

pdfimages looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.

As a result, if what you have is a PDF that’s just a wrapper around a series of bitmaps, pdfimages will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j option to pdfimages, because a PDF can contain raw JPEG data. By default, pdfimages converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.

So, try
pdfimages -j file.pdf page 
You may or may not need to follow that with a convert to .jpg step (depending on what bitmap format the PDF was using).

I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can’t get higher quality than that.

第五种方法

要从gm convert中获取单个页面，请将[N](N为从0开始的页码)添加到PDF名称，即gm convert foo.pdf[11] out.png以从PDF获取第12页。

对于pdftoppm，使用-f N -singlefile，其中N是从1开始的页码，即pdftoppm -f 12 -singlefile foo.pdf out获得相同的结果。它似乎总是添加”.png”到输出文件名，并且没有办法阻止它。

第六种方法

如果您的PDF被扫描，图像已经存储为PDF的一部分。你只需要用pdfimages提取它们：

pdfimages my-file.pdf prefix

第七种方法

您可以使用转换并使用-density选项指定更高的密度。

例如。 convert -d 300 foo.pdf bar.png

参考资料

How to convert PDF to Image?