๐Ÿ ํŒŒ์ด์ฌ ์ดˆ๋ณด์ž ๊ฐ€์ด๋“œ : PDF ์ •๋ณตํ•˜๊ธฐ(์˜ˆ์ œ ์ค‘์‹ฌ)! ๐Ÿ“„✨



์•ˆ๋…•ํ•˜์„ธ์š”, ์ฝ”๋”ฉ ์นœ๊ตฌ ์—ฌ๋Ÿฌ๋ถ„! ๐Ÿ‘‹ ๋ณต์žกํ•ด ๋ณด์ด๋Š” PDF ํŒŒ์ผ ์ฒ˜๋ฆฌ, ํŒŒ์ด์ฌ์„ ์ด์šฉํ•˜๋ฉด ์ƒ๊ฐ๋ณด๋‹ค ์‰ฝ๊ณ  ์žฌ๋ฏธ์žˆ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค, ์•Œ๊ณ  ๊ณ„์…จ๋‚˜์š”? ๐Ÿง ์˜ค๋Š˜ ์šฐ๋ฆฌ๋Š” ์ดˆ๋ณด์ž๋„ ์‰ฝ๊ฒŒ ๋”ฐ๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” ์˜ˆ์ œ ์ค‘์‹ฌ์˜ PDF ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์„ ๋ฐฐ์›Œ๋ณผ ๊ฑฐ์˜ˆ์š”. ์ด ๊ฐ€์ด๋“œ๋งŒ ์žˆ๋‹ค๋ฉด, ์ง€๋ฃจํ–ˆ๋˜ PDF ๋ฌธ์„œ ์ž‘์—…์ด ์ˆœ์‹๊ฐ„์— ์ž๋™ํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ๐Ÿš€

1. ์™œ ํŒŒ์ด์ฌ์œผ๋กœ PDF๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผ ํ• ๊นŒ์š”? ๐Ÿค”

PDF(Portable Document Format)๋Š” ๋ฌธ์„œ์˜ ๋ ˆ์ด์•„์›ƒ์„ ์œ ์ง€ํ•˜๋ฉฐ ๊ณต์œ ํ•˜๊ธฐ ์œ„ํ•œ ํ‘œ์ค€ ํ˜•์‹์ž…๋‹ˆ๋‹ค. ๋ณด๊ณ ์„œ, ๊ณ„์•ฝ์„œ, ๋…ผ๋ฌธ ๋“ฑ ์ •๋ง ๋‹ค์–‘ํ•œ ๊ณณ์—์„œ ์‚ฌ์šฉ๋˜์ฃ . ํ•˜์ง€๋งŒ ์ด ๋งŽ์€ ๋ฌธ์„œ๋ฅผ ์ˆ˜๋™์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฑด ์ •๋ง ๊ณ ๋œ ์ผ์ž…๋‹ˆ๋‹ค.

  • ⚡️ ์ž๋™ํ™”: ์ˆ˜๋ฐฑ ์žฅ์˜ PDF์—์„œ ํŠน์ • ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ฑฐ๋‚˜, ์—ฌ๋Ÿฌ ํŒŒ์ผ์„ ํ•˜๋‚˜๋กœ ํ•ฉ์น˜๋Š” ์ž‘์—…์„ ์ˆœ์‹๊ฐ„์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์–ด์š”.

  • ๐Ÿ› ️ ๋ฐ์ดํ„ฐ ์ถ”์ถœ: PDF๋Š” ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ(JSON, CSV ๋“ฑ)๊ฐ€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์—, ํŒŒ์ด์ฌ์„ ์ด์šฉํ•ด ํ•„์š”ํ•œ ํ…์ŠคํŠธ๋‚˜ ํ‘œ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๐Ÿ”„ ์ผ๊ด„ ์ฒ˜๋ฆฌ: ์—ฌ๋Ÿฌ ๊ฐœ์˜ PDF ํŒŒ์ผ์„ ๋™์‹œ์— ์ˆ˜์ •ํ•˜๊ฑฐ๋‚˜ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํŒŒ์ด์ฌ์—๋Š” PDF ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์ด ๋งŽ์ง€๋งŒ, ์˜ค๋Š˜์€ ๊ฐ€์žฅ ๋Œ€์ค‘์ ์ด๊ณ  ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์šด **PyPDF2**์™€ ํ…์ŠคํŠธ ์ถ”์ถœ์— ์œ ์šฉํ•œ **pdfminer.six**๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์•Œ์•„๋ณผ๊ฒŒ์š”!

2. ์ค€๋น„๋ฌผ: ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•˜๊ธฐ ๐Ÿ“ฅ

๊ฐ€์žฅ ๋จผ์ € ํ„ฐ๋ฏธ๋„์ด๋‚˜ ๋ช…๋ น ํ”„๋กฌํ”„ํŠธ์—์„œ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Bash
pip install PyPDF2
pip install pdfminer.six

3. PDF ํ•ฉ์น˜๊ธฐ: ๊ฒฐํ˜ผ์€ ํ•˜๋‚˜๋กœ! ๐Ÿ”— (PyPDF2)

์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ณด๊ณ ์„œ๋‚˜ ์žฅ์„ ํ•˜๋‚˜์˜ ์ตœ์ข… PDF ํŒŒ์ผ๋กœ ํ•ฉ์ณ์•ผ ํ•  ๋•Œ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. PyPDF2์˜ PdfMerger ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•ด๊ฒฐ๋ผ์š”!

Python
import PyPDF2

# PdfMerger ๊ฐ์ฒด ์ƒ์„ฑ
merger = PyPDF2.PdfMerger()

# ํ•ฉ์น  ํŒŒ์ผ๋“ค์˜ ๋ฆฌ์ŠคํŠธ (์˜ˆ์‹œ ํŒŒ์ผ๋ช…)
file_list = ['chapter1.pdf', 'chapter2.pdf', 'appendix.pdf']

for filename in file_list:
    # ๊ฐ ํŒŒ์ผ์„ merger์— ์ถ”๊ฐ€
    try:
        merger.append(filename)
    except FileNotFoundError:
        print(f"๊ฒฝ๊ณ : ํŒŒ์ผ {filename}์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๊ฑด๋„ˆ๋œ๋‹ˆ๋‹ค.")
        continue

# ์ƒˆ๋กœ์šด ํŒŒ์ผ๋กœ ์ €์žฅ
# 'wb'๋Š” Write Binary (๋ฐ”์ด๋„ˆ๋ฆฌ ์“ฐ๊ธฐ) ๋ชจ๋“œ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
with open("combined_document.pdf", "wb") as output_file:
    merger.write(output_file)

merger.close()

print("๐ŸŽ‰ PDF ํŒŒ์ผ ํ•ฉ์น˜๊ธฐ ์™„๋ฃŒ! 'combined_document.pdf'๋ฅผ ํ™•์ธํ•˜์„ธ์š”.")

✅ ์ž ๊น ํŒ! merger.append(filename, pages=(0, 2)) ์™€ ๊ฐ™์ด pages ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, ํŒŒ์ผ์˜ ์ฒซ ํŽ˜์ด์ง€๋ถ€ํ„ฐ 3๋ฒˆ์งธ ํŽ˜์ด์ง€ (์ธ๋ฑ์Šค 2 ์ง์ „) ๊นŒ์ง€๋งŒ ๊ฐ€์ ธ์™€ ํ•ฉ์น  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.


4. PDF์—์„œ ํ…์ŠคํŠธ ์ถ”์ถœํ•˜๊ธฐ: ์ •๋ณด๋Š” ์†Œ์ค‘ํ•˜๋‹ˆ๊นŒ! ๐Ÿ•ต️‍♀️ (PyPDF2 & pdfminer.six)

PDF์—์„œ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ๋ถ„์„์˜ ์ฒซ๊ฑธ์Œ์ž…๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ์ถ”์ถœ์€ PyPDF2๋กœ ์ถฉ๋ถ„ํ•˜์ง€๋งŒ, ๋ณต์žกํ•œ ๋ ˆ์ด์•„์›ƒ์˜ ํ…์ŠคํŠธ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ถ”์ถœํ•˜๋ ค๋ฉด pdfminer.six๊ฐ€ ๋” ๊ฐ•๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

A. ๊ฐ„๋‹จํ•œ ํ…์ŠคํŠธ ์ถ”์ถœ (PyPDF2)

ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ, ์ œ๋ชฉ ๋“ฑ ๊ฐ„๋‹จํ•œ ์ •๋ณด๋งŒ ํ•„์š”ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

Python
import PyPDF2

# ํŒŒ์ผ์„ ๋ฐ”์ด๋„ˆ๋ฆฌ ์ฝ๊ธฐ ๋ชจ๋“œ('rb')๋กœ ์—ฝ๋‹ˆ๋‹ค.
try:
    with open('report.pdf', 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        
        # ์ „์ฒด ํŽ˜์ด์ง€ ์ˆ˜ ํ™•์ธ
        num_pages = len(reader.pages)
        print(f"๋ฌธ์„œ์˜ ์ด ํŽ˜์ด์ง€ ์ˆ˜: {num_pages}์žฅ")
        
        # ์ฒซ ๋ฒˆ์งธ ํŽ˜์ด์ง€(์ธ๋ฑ์Šค 0)์—์„œ ํ…์ŠคํŠธ ์ถ”์ถœ
        first_page = reader.pages[0]
        text = first_page.extract_text()
        
        # ์ถ”์ถœ๋œ ํ…์ŠคํŠธ ์ถœ๋ ฅ (์ผ๋ถ€๋งŒ)
        print("\n--- ์ฒซ ํŽ˜์ด์ง€ ํ…์ŠคํŠธ (์•ž๋ถ€๋ถ„ 500์ž) ---")
        print(text[:500] + "...")
        print("--------------------------------------")
        
except FileNotFoundError:
    print("❌ 'report.pdf' ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ํŒŒ์ผ์„ ํ™•์ธํ•ด์ฃผ์„ธ์š”.")
except Exception as e:
    print(f"⚠️ ํ…์ŠคํŠธ ์ถ”์ถœ ์ค‘ ์˜ค๋ฅ˜ ๋ฐœ์ƒ: {e}")

B. ๊ณ ๊ธ‰ ํ…์ŠคํŠธ ์ถ”์ถœ (pdfminer.six)

pdfminer.six๋Š” ํ…์ŠคํŠธ์˜ ์œ„์น˜ ์ •๋ณด๊นŒ์ง€ ๋ณด์กดํ•˜๋ฉฐ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์–ด, ๋ณต์žกํ•œ ํ‘œ๋‚˜ ๋‹ค๋‹จ ํŽธ์ง‘ ๋ฌธ์„œ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ ์œ ๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ๋ฒ•์€ ๋‹ค์†Œ ๋ณต์žกํ•˜์ง€๋งŒ, ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์‰ฝ๊ฒŒ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.1

Python
from io import StringIO
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfparser import PDFParser

def extract_text_from_pdf(pdf_path):
    output_string = StringIO()
    
    with open(pdf_path, 'rb') as in_file:
        parser = PDFParser(in_file)
        doc = PDFDocument(parser)
        rsrcmgr = PDFResourceManager()
        
        # LAParams ์„ค์ •: ํ…์ŠคํŠธ ๋ ˆ์ด์•„์›ƒ ๋ถ„์„ ๋งค๊ฐœ๋ณ€์ˆ˜
        # detect_vertical=True๋Š” ์ˆ˜์ง ํ…์ŠคํŠธ ๊ฐ์ง€์— ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
        laparams = LAParams(line_overlap=0.5, char_margin=2.0, word_margin=0.1, line_margin=0.5, boxes_flow=0.5, detect_vertical=False)
        
        # TextConverter: ์ถ”์ถœ๋œ ๋‚ด์šฉ์„ output_string์— ์“ฐ๋„๋ก ์„ค์ •
        device = TextConverter(rsrcmgr, output_string, laparams=laparams)
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        
        # ๋ชจ๋“  ํŽ˜์ด์ง€๋ฅผ ์ˆœํšŒํ•˜๋ฉฐ ํ…์ŠคํŠธ ์ถ”์ถœ
        for page in PDFPage.create_pages(doc):
            interpreter.process_page(page)

    return output_string.getvalue()

# ์‚ฌ์šฉ ์˜ˆ์‹œ
try:
    full_text = extract_text_from_pdf('report.pdf')
    print("\n--- pdfminer.six ์ถ”์ถœ ํ…์ŠคํŠธ (์•ž๋ถ€๋ถ„ 500์ž) ---")
    print(full_text[:500] + "...")
    print("----------------------------------------------")
except FileNotFoundError:
    print("❌ 'report.pdf' ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.")
except Exception as e:
    print(f"⚠️ pdfminer.six ์ถ”์ถœ ์ค‘ ์˜ค๋ฅ˜ ๋ฐœ์ƒ: {e}")

5. PDF ํŽ˜์ด์ง€ ํšŒ์ „ ๋ฐ ์•”ํ˜ธํ™”/๋ณตํ˜ธํ™” ๐Ÿ”’ (PyPDF2)

๋ฌธ์„œ์˜ ๋ณด์•ˆ์„ ๊ฐ•ํ™”ํ•˜๊ฑฐ๋‚˜, ์Šค์บ”๋œ ๋ฌธ์„œ์˜ ๋ฐฉํ–ฅ์„ ์ˆ˜์ •ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

A. ํŠน์ • ํŽ˜์ด์ง€ ํšŒ์ „ํ•˜๊ธฐ ↩️

์Šค์บ”๋œ ๋ฌธ์„œ ์ค‘ ์ผ๋ถ€ ํŽ˜์ด์ง€๊ฐ€ ๊ฑฐ๊พธ๋กœ ๋˜์–ด์žˆ์„ ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Python
# 'rotate_original.pdf'๋ฅผ ์—ด์–ด 'rotated_output.pdf'๋กœ ์ €์žฅํ•œ๋‹ค๊ณ  ๊ฐ€์ •
try:
    with open('rotate_original.pdf', 'rb') as input_file:
        reader = PyPDF2.PdfReader(input_file)
        writer = PyPDF2.PdfWriter()
        
        for i in range(len(reader.pages)):
            page = reader.pages[i]
            
            # 2๋ฒˆ์งธ ํŽ˜์ด์ง€ (์ธ๋ฑ์Šค 1)๋ฅผ ์‹œ๊ณ„ ๋ฐฉํ–ฅ์œผ๋กœ 90๋„ ํšŒ์ „
            if i == 1:
                page.rotate(90)
                print(f"✔️ {i+1}๋ฒˆ์งธ ํŽ˜์ด์ง€๋ฅผ 90๋„ ํšŒ์ „ํ–ˆ์Šต๋‹ˆ๋‹ค.")
            
            writer.add_page(page)

        with open('rotated_output.pdf', 'wb') as output_file:
            writer.write(output_file)

        print("๐ŸŽ‰ ํŽ˜์ด์ง€ ํšŒ์ „์ด ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!")

except FileNotFoundError:
    print("❌ 'rotate_original.pdf' ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.")

B. PDF ์•”ํ˜ธ ์„ค์ •ํ•˜๊ธฐ ๐Ÿ”

์ค‘์š”ํ•œ ๋ฌธ์„œ๋ฅผ ๋ณดํ˜ธํ•˜๊ธฐ ์œ„ํ•ด ์•”ํ˜ธ๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Python
try:
    with open('protected_original.pdf', 'rb') as input_file:
        reader = PyPDF2.PdfReader(input_file)
        writer = PyPDF2.PdfWriter()

        # ์›๋ณธ ํŽ˜์ด์ง€๋ฅผ ๋ชจ๋‘ ์ƒˆ๋กœ์šด Writer์— ๋ณต์‚ฌ
        for page in reader.pages:
            writer.add_page(page)

        # ์•”ํ˜ธ ์„ค์ • (์•”ํ˜ธ: '1234')
        # ๊ฒฝ๊ณ : ์ด ๋ฐฉ์‹์€ ๊ฐ•๋ ฅํ•œ ์•”ํ˜ธํ™”๊ฐ€ ์•„๋‹ˆ๋ฏ€๋กœ ์ค‘์š”ํ•œ ๋ฌธ์„œ๋Š” ๋” ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
        writer.encrypt('1234')

        with open('protected_output.pdf', 'wb') as output_file:
            writer.write(output_file)
            
        print("๐ŸŽ‰ PDF์— ์•”ํ˜ธ '1234' ์„ค์ • ์™„๋ฃŒ!")

except FileNotFoundError:
    print("❌ 'protected_original.pdf' ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.")

6. ๋งˆ๋ฌด๋ฆฌํ•˜๋ฉฐ: PDF ์ฒ˜๋ฆฌ, ์ด์ œ ๋‘๋ ต์ง€ ์•Š์•„์š”! ๐Ÿ’ช

ํŒŒ์ด์ฌ์„ ์ด์šฉํ•œ PDF ์ฒ˜๋ฆฌ๋Š” ์—ฌ๊ธฐ์„œ ๋ณด์—ฌ๋“œ๋ฆฐ ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. PyPDF2์™€ pdfminer.six ์™ธ์—๋„ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œ ํ˜•์‹์œผ๋กœ ์ถ”์ถœํ•˜๋Š” ๋ฐ ํŠนํ™”๋œ **tabula-py**๋‚˜, PDF๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Pillow ๋“ฑ ์œ ์šฉํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.

์˜ค๋Š˜ ๋ฐฐ์šด ๊ธฐ๋ณธ ์ง€์‹๊ณผ ์˜ˆ์ œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ, ์—ฌ๋Ÿฌ๋ถ„์˜ ์ผ์ƒ ์—…๋ฌด๋‚˜ ํ•™์Šต ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” PDF ์ฒ˜๋ฆฌ ๋ฌธ์ œ๋ฅผ ํŒŒ์ด์ฌ์œผ๋กœ ์Šค๋งˆํŠธํ•˜๊ฒŒ ํ•ด๊ฒฐํ•ด ๋ณด์„ธ์š”! ์ฝ”๋”ฉ ์‹ค๋ ฅ๋„ ๋Š˜๊ณ , ์‹œ๊ฐ„๋„ ์ ˆ์•ฝํ•˜๋Š” ์ผ์„์ด์กฐ์˜ ํšจ๊ณผ๋ฅผ ๋ˆ„๋ฆด ์ˆ˜ ์žˆ์„ ๊ฑฐ์˜ˆ์š”.

๋Œ“๊ธ€