extract_classification_data Preclassify PPTX, DOCX, PDF from training data findTags2.py is the current script.