Skip to content

siewyeng/SinglishERG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SinglishERG

Information

A branch of the English Resource Grammar (ERG) that is used for Singlish. It is published under the same license as the ERG, the MIT license.

The singlish subdirectory contains the files that have been added or in the case of parse-nodes.tdl, cloned from the trunk and changed. Refer to ace/config-singlish.tdl and singlish.tdl to see which files are in use.

To compile this grammar using LKB most conveniently:

  • Under Lkb Top's Advanced menu, select 'Evaluate Lisp expression' and type in (push :singlish \*features\*)
  • Load the ERG as usual, via the Load--Complete Grammar on the LKB Top menu: trunk/lkb/script

To compile this grammar using ace:

  • In the ERG directory $ ace -G singlish.dat -g ace/config-singlish.tdl

To check the semantics:

$ echo "sentence" | ace -g [grammar].dat -Tfq

And to generate in another grammar:

echo "sentence" | ace -g [grammar1].dat -Tfq | ace -g [grammar2].dat -e

To generate from MRS:

cat [mrs] | ace -g [grammar2].dat -e
  • to change the generation root, add -r [root] to the command
  • and add --disable-subsumption-test for easier generation

When merged with the ERG, there are a few places outside the singlish directory that refer to the files here:

  • ../singlish.tdl is in the trunk top level directory
  • ../lkb/script has the feature :singlish
    • To compile with lkb, open go to options > expand menu followed by advanced > Evaluate quick Lisp expression > (push :singlish *features*)
  • ../ace/config-singlish.tdl contains the config for ace
  • there are testsuites at ../tsdb/skeletons/singlish
  • there are gold trees at ../tsdb/gold/singlish

In the github repository these are all local (ignore the ../).

Features

The following are the current features added into SinglishERG:

  • Optional copula
    • For adjectives e.g., He very good by allowing sentences to be headed by adjectives and extending VFORM to adjectives
    • In progressive verb phrases e.g., They swimming by allowing prp in Subject Head Main Clause Rule
  • Aspect marking with "already" and "ever" e.g., She eat already
  • Flexible verb agreement with verbs in base form e.g., He always go there
  • Flexible number for nouns in base form e.g., I want to use computer. by introducing unspecified bare NP rule
  • Particles with 4 levels of hierarchy e.g., It’s like that làh háh by adding hierarchy to sentences
  • One as a relative pronoun e.g., The cake I always buy one no more already.
  • One as nominaliser e.g., The computer I buy one.
  • One with a possessive meaning e.g., That is Kim one

Data

Data was extracted from examples on Wiktionary pages with words that were marked to be Singlish. It includes also the other non-Singlish definitions and usages of the words. The example sentences include some that are offensive and racist but they not taken out as it reflects how this variety is used.

To parse the data using ace (parts in brackets are optional)

  • To parse with only top tree: cat wikiexamples_next300.txt | ace (--max-words=20) -g singlish.dat -Tf1(> output.txt)
  • To remove lines starting with "#": grep -vP "^#" wikiexamples_next300.txt | ace...

lexicon_goldtrees.tdl contains the words added to the standard English lexicon when parsing the 30 Singlish sentences from skeletons/treebankset.

Testsuite

A testsuite first had to be made. Go to the folder containing make_item:

$ ./make_item –map translat i-comment [rawtestsuitename] item

Transfer the item that was made (and renamed) into the skeletons directory. And make a testsuite in the trees folder

$ delphin mkprof -s tsdb/skeletons/[testsuitename]/ trees/[name of newfolder]
$ delphin process -g [grammar].dat trees/[name of newfolder]

Note that delphin has to be accessible.

The testsuite that was used for development is contained in data/constructed_singlish_testsuite and the one that was used for testing (mentioned in the paper) can be found at data/skeletons/treebankset. The parses (with decision trees) of the treebankset are contained in data/trees.

Viewing results

To view selected combinations of results, use this line with different combination of 'i-wf' and 'readings' values. This line, for example, selects false negatives (sentences that should parse but give no readings)

$ delphin select ‘i-id readings i-input where i-wf = 1 and readings = 0’ trees/[name of newfolder]

Treebanking

$ art -f -a ‘ace - -disable-generalization -g [grammar].dat -O’ trees/[name of newfolder]

The next line of code launches the browser so the treebanking can be done.

$ fftb -g [grammar].dat - -browser - -webdir ~/bin/acetools-x86-0.9.30/assets/ tree/[name of newfolder]

To transfer gold trees for example from testsuite.16 to testsuite.17:

$ fftb -g [grammar].dat - -browser - -webdir ~/bin/acetools-x86-0.9.30/assets/ --gold tree/testsuite.16 trees/testsuite.17 - -auto

About

for a Singlish branch of the English Resource Grammar

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •