-
Notifications
You must be signed in to change notification settings - Fork 4
PetRobustness
Some notes on how to make parsing with Pet more robust.
For both methods, it holds that it is unclear which REL it gets. Also, is it a word or a lexeme? There is no possibility to generate back again from the MRS to the surface string, if unknown word handling has been applied. Such information should be included in the MRS somehow, and this can be achieved in the user-fns.lsp
- Stephen points out that partial lexical gaps (is verb, but only
encoded as noun) is not captured by these mechanisms, but this will be solved when the chart mapping is merged into main.
cheap -default-les
Needs a mapping in the grammar. See also PetInput.
By Yi.
cheap -predict-les
Maximum Entropy Model has to be created from a script that Yi, that needs a treebanked profile as input. In e.g. pred-lex.tdl, it should be listed which lexical types should be able to be predicted. At least 2000 sentences are needed.
* Supertagger, made by Tim and Phil. Still has to be finished and integrated.
- Is not merged to the main branch, yet, according to Yi.
- Roots (in english.set)
- choose the robust root for greater coverage (in english.set)
- Robustness rules/Mal rules
- The amount of items that are parsed of a corpus, depends heavily on
the high number of parameters passed onto the parser. Some of those settings involves restricting the search space (like reducing the maximum number of edges). The constraint that Dan uses now is the mem option in cheap (although you should half it, for some strange reason!). This seems to be the most reasonable setting.
See also PetParameters.
- always use packing
- recommend -memlimit (amount/2) rather than -limit (edges)
- -timeout=1 (second) can also be useful
- Is not merged to the main branch, yet, according to Yi.
Home | Forum | Discussions | Events