Skip to content

Conversation

@SurfyPenguin
Copy link

PR Title

feat: add PDF Keyword Highlighter script (closes #478 )

Summary

Added a new command-line Python script that highlights specified keywords in PDF files using PyMuPDF, complete with a dedicated folder, README, and entry in the main repository README.

Description

This pull request implements a fully featured PDF keyword highlighter as requested in issue #478, creating a new highlighted output file while keeping the original unchanged.

The changes are as follows:

  • Created new folder PDF Highlighter Script/ with pdf_highlight.py and a README.md
  • Implemented efficient keyword highlighting using page.get_text("words") for fast text extraction
  • Supported multiple keywords, optional case-sensitive search (-s flag), and punctuation stripping for accurate matching (e.g., "keyword;" matches "keyword")
  • Printed per-page and total highlight statistics in a formatted table
  • Updated root README.md to add the new script entry in alphabetical order

Checks

in the repository

  • Made no changes that degrades the functioning of the repository
  • Gave each commit a better title (unlike updated README.md)

in the PR

  • Followed the format of the pull_request_template
  • Made the Pull Request in a small level (for the creator's wellfare)
  • Tested the changes you made

Thank You,

Amartya Anand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDF Keyword Scanner & Highlighter Script

1 participant