requried packages pycurl BeautifulSoup pdfminer How-To: after install pdfminer, use following command to convert pdf file to html file $ {PDFMINER_DIR}/tools/pdf2txt.py -o test.html -c utf8 test.pdf