About the Project

This NLP research project aims to seek the best methodology to extract names from Quan Song Shi's poem titles, as one of the sub-projects in China Biographical Database Project (CBDB) at Harvard University. This project is mainly conducted by me, with the guidance of Hongsu Wang and the supervision of Professor Bol.

Methodology and Procedure

62 ground truths were set by hand as the entity reference for the first stage of the research. Next, 7 popular Name Entity Recognition (NER) models from Hugging Face were being evaluated in the Jupyter Notebook. The best model shows approximately 54% accuracy rate of extraction. Furthermore, experiments using prompt engineering in Large Language Models (LLMs) were conducted to do the same task, using ChatGPT4o and Claude3.5 as examples. The initial experiment shows an accuracy rate of approximately 87%. The second experiment is still in progress (I will update when finished).

About Files

QuanSongShi_training_data contains all the data entries in Quan Song Shi for the use of training purpose. (Note this was done by me and mostly by other RAs).
Under the file HuggingFace_Model_Evaluation, all the files and materials for doing NER model evaluations can be retrieved.
Under the file LLMs_Experiments, all the materials, files, and reports of conducting LLMs experiments can be found.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
HuggingFace_Model_Evaluation		HuggingFace_Model_Evaluation
LLMs_Experiments		LLMs_Experiments
QuanSongShi_training_data.csv		QuanSongShi_training_data.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About the Project

Methodology and Procedure

About Files

About

Uh oh!

Releases

Packages

Languages

XXinZ28/CBDB_QuanSongShi

Folders and files

Latest commit

History

Repository files navigation

About the Project

Methodology and Procedure

About Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages