GitHub - zantetran/PDF_Parser2ES: Get information from file PDF and insert all information into ElasticSearch

Draf pipelines

With the simple idea of using the directory structure of a PDF file, I get the tree of section and traversed each section with the corresponding chunk text information, then inserted it into ES.

1. Deploy ElasticSeach:

docker-compose up -d

2. Extract and insert information into ES:

python main.py

3. Implement API Search:

python search.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
extract		extract
ingest		ingest
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
s3-userguide.pdf		s3-userguide.pdf
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Draf pipelines

1. Deploy ElasticSeach:

2. Extract and insert information into ES:

3. Implement API Search:

About

Releases

Packages

Languages

zantetran/PDF_Parser2ES

Folders and files

Latest commit

History

Repository files navigation

Draf pipelines

1. Deploy ElasticSeach:

2. Extract and insert information into ES:

3. Implement API Search:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages