Agreed with @peterbjorgensen that it would be a great idea to create over overview of what taggers might be relevant for cleaning.
Outlining
- Create a .md table with relevant taggers + a short description
- Check what filters were used for existing cleaning strategies and at least try to match them (see here)
- potentially some estimate on speed (time to process danish gigaword Wikipedia section ~55M tokens)