[Python] 파이썬 HTML파일 PDF파일로 변환하기

728x90

파이썬 HTML파일 PDF파일로 변환하기

요즘 HTML 파일들로 만들어진 Report 파일에 대해서 PDF로 변환을 해야 하는 일이 종종 있었다.

그래서 간단히 pyhtml2pdf라는 파이썬 모듈을 이용하여 HTML 파일을 PDF 파일로 변환하는 방법에 대해서 알아보도록 하자.

1. pyhtml2pdf 모듈 설치

우선 변환을 하기 위해서는 pyhtml2pdf 모듈을 설치해야 한다.

pip 명령어를 통해서 설치해보도록 하자.

pip install pyhtml2pdf

# pip를 이용한 pyhtml2pdf 모듈 설치

$ pip install pyhtml2pdf         
Collecting pyhtml2pdf
  Downloading pyhtml2pdf-0.0.6-py3-none-any.whl (5.1 kB)
Collecting selenium
  Using cached selenium-4.7.2-py3-none-any.whl (6.3 MB)
Collecting webdriver-manager
  Downloading webdriver_manager-3.8.5-py2.py3-none-any.whl (27 kB)
Collecting trio-websocket~=0.9
  Using cached trio_websocket-0.9.2-py3-none-any.whl (16 kB)
Requirement already satisfied: urllib3[socks]~=1.26 in ./venv/lib/python3.8/site-packages (from selenium->pyhtml2pdf) (1.26.12)
Collecting trio~=0.17
  Using cached trio-0.22.0-py3-none-any.whl (384 kB)
Requirement already satisfied: certifi>=2021.10.8 in ./venv/lib/python3.8/site-packages (from selenium->pyhtml2pdf) (2022.6.15)
Collecting packaging
  Using cached packaging-22.0-py3-none-any.whl (42 kB)
Collecting python-dotenv
  Downloading python_dotenv-0.21.0-py3-none-any.whl (18 kB)
Requirement already satisfied: requests in ./venv/lib/python3.8/site-packages (from webdriver-manager->pyhtml2pdf) (2.28.1)
Collecting tqdm
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 5.9 MB/s eta 0:00:00
Collecting exceptiongroup>=1.0.0rc9
  Downloading exceptiongroup-1.1.0-py3-none-any.whl (14 kB)
Collecting async-generator>=1.9
  Using cached async_generator-1.10-py3-none-any.whl (18 kB)
Collecting outcome
  Using cached outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting attrs>=19.2.0
  Downloading attrs-22.2.0-py3-none-any.whl (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.0/60.0 kB 8.7 MB/s eta 0:00:00
Collecting sniffio
  Using cached sniffio-1.3.0-py3-none-any.whl (10 kB)
Collecting sortedcontainers
  Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: idna in ./venv/lib/python3.8/site-packages (from trio~=0.17->selenium->pyhtml2pdf) (3.3)
Collecting wsproto>=0.14
  Using cached wsproto-1.2.0-py3-none-any.whl (24 kB)
Collecting PySocks!=1.5.7,<2.0,>=1.5.6
  Using cached PySocks-1.7.1-py3-none-any.whl (16 kB)
Requirement already satisfied: charset-normalizer<3,>=2 in ./venv/lib/python3.8/site-packages (from requests->webdriver-manager->pyhtml2pdf) (2.1.1)
Collecting h11<1,>=0.9.0
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Installing collected packages: sortedcontainers, tqdm, sniffio, python-dotenv, PySocks, packaging, h11, exceptiongroup, attrs, async-generator, wsproto, webdriver-manager, outcome, trio, trio-websocket, selenium, pyhtml2pdf
Successfully installed PySocks-1.7.1 async-generator-1.10 attrs-22.2.0 exceptiongroup-1.1.0 h11-0.14.0 outcome-1.2.0 packaging-22.0 pyhtml2pdf-0.0.6 python-dotenv-0.21.0 selenium-4.7.2 sniffio-1.3.0 sortedcontainers-2.4.0 tqdm-4.64.1 trio-0.22.0 trio-websocket-0.9.2 webdriver-manager-3.8.5 wsproto-1.2.0

pyhtml2pdf 모듈 이외 해당 모듈에 연관된 다른 모듈들도 설치를 할 수 있다.

그럼 해당 모듈이 잘 설치되어 있는지 한번 살펴보도록 하자.

pip freeze | grep pyhtml2pdf

# pip를 이용한 설치된 모듈 확인

$ pip freeze | grep pyhtml2pdf
pyhtml2pdf==0.0.6

pyhtml2pdf 모듈에 대해서 좀 더 자세한 사항은 아래 페이지를 통해서 확인 가능하다.

https://pypi.org/project/pyhtml2pdf/

pyhtml2pdf

Simple python wrapper to convert HTML to PDF with headless Chrome via selenium.

pypi.org

2. pyhtml2pdf 모듈을 이용하여 HTML 파일을 PDF 파일로 변환하기

이제 pyhtml2pdf 모듈을 설치했으니 간단하게 파이썬 코드를 이용해서 HTML 파일을 PDF 파일로 변환을 해보도록 하자.

# Python Example Code

# -*- coding: utf-8 -*-
import glob
from pyhtml2pdf import converter


def html2pdf(file_path):
    """
    HTML to PDF Convert
    :param file_path: HTML File Path
    :return: True / False
    """
    try:
        file_list = sorted(glob.glob(f'{file_path}/*.html'))
        if len(file_list) == 0:
            return False
        for f in file_list:
            converter.convert(f'file://{f}', f'{f.split(".html")[0]}.pdf')
        return True
    except Exception as err:
        return False
        
if __name__ == "__main__":
    path = 'a/b/c/test/'               # HTML File Full Path
    html2pdf(path)

HTML 파일 경로를 넣어 실행을 하게 되면 HTML 파일과 동일한 이름으로 PDF 파일이 생성되는 것을 확인할 수 있다.

경로에 HTML 파일이 여러 개 있다면 각각 진행되는 것을 볼 수 있다.

여기서 HTML 파일이 있는 경로를 넣을 때 가능하면 전체 경로를 모두 작성하도록 하자.

예제 파일도 같이 첨부해 놓았다.

2.1 Python Code — 2. pyhtml2pdf 모듈을 이용하여 HTML 파일을 PDF 파일로 변환 하기

이렇게 간단하게 HTML 파일을 PDF 파일로 변환해 보았다.

3. PyPDF2 모듈을 이용하여 여러 PDF 파일을 한 개로 PDF 합치기

만약 여러 개의 PDF로 파일이 생성될 경우 그것을 한 개의 PDF 파일로 합쳐줌으로써 좀 더 간단하게 사용을 할 수가 있다.

물론 MacOS에서는 이전에 작성한 문서처럼 여러 개의 PDF 파일을 하나의 PDF 파일로 합칠 수 있다.

[Mac] MacOS에서 여러 파일 PDF 결합하기

MacOS에서 여러 파일 PDF 결합하기 MacOS(맥 OS)에서 여러 이미지나, PDF 파일을 한 개의 PDF 파일로 결합하는(합치는) 기능을 제공한다. 물론 이런 기능을 제공하는 제공하는 App(앱)들이 많이 있고, 그

happylie.tistory.com

하지만 이번에도 파이썬 PyPDF2 모듈을 이용하여 여러 PDF 파일을 한 개로 합쳐 보도록 하자.

3.1 PyPDF2 모듈 설치하기

역시나 pip 명령어를 통해서 설치해보도록 하자.

$ pip install PyPDF2

# pip를 이용한 PyPDF2 모듈 설치

$ pip install PyPDF2             
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 232.6/232.6 kB 9.9 MB/s eta 0:00:00
Collecting typing_extensions>=3.10.0.0
  Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Installing collected packages: typing_extensions, PyPDF2
Successfully installed PyPDF2-3.0.1 typing_extensions-4.4.0

PyPDF2 모듈 이외 해당 모듈에 연관된 다른 모듈들도 설치를 할 수 있다.

그럼 해당 모듈이 잘 설치되어 있는지 한번 살펴보도록 하자.

pip freeze | grep PyPDF2

# pip를 이용한 설치된 모듈 확인

$ pip freeze | grep PyPDF2
PyPDF2==3.0.1

PyPDF2 모듈에 대해서 좀 더 자세한 사항은 아래 페이지를 통해서 확인 가능하다.

PyPDF2

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files

pypi.org

3.2 Python Code

이제 PyPDF2 모듈을 설치했으니 간단하게 파이썬 코드를 이용해서 여러 개의 HTML 파일을 한 개의 PDF 파일로 합쳐 보도록 하자.

# -*- coding: utf-8 -*-
import glob
from PyPDF2 import PdfMerger, PdfReader


def merge_pdf(file_path, file_name):
    """
    Merge PDF File
    :param file_path: PDF File List
    :param file_name: Merge PDF File Name
    :return: True / False
    """
    try:
        file_list = sorted(glob.glob(f'{file_path}/*.pdf'))
        if len(file_list) == 0:
            return False
        merge = PdfMerger()
        for f in file_list:
            merge.append(PdfReader(open(f, 'rb')))
        merge.write(f'{file_path}/{file_name}.pdf')
        return True
    except Exception as err:
         return False


if __name__ == "__main__":
    path = '/a/b/c/test'               # HTML File Path
    merge_pdf(path, 'TEST_PDF')     # PDF File Path & Merge PDF File Name

여러 PDF가 있는 경로와 함께 한 개로 생성될 PDF 파일명을 넣어주고 실행을 하면 여러 개의 PDF 파일이 한 개의 PDF 파일로 생성되는 것을 확인할 수 있다.

예제 파일도 같이 첨부해 놓았다.