I want to create a dataset like [CRRD](https://github.com/THUNLP-AIPoet/Datasets/tree/master/CRRD) for english langauge **pingsheng.txt pingshui.txt pingshui_amb.pkl zesheng.txt** How I can do that specially how I can prepare a pingshui_amb.pkl like file , any source code available Thanks