Skip to content

LkbEmacs

BenjaminWaldron edited this page Jun 12, 2005 · 23 revisions

Overview

The LKB is a stand-alone grammar development environment and can be used in connection with any text editor. The basic engineering cycle is similar to software development: a set of source files (aka a grammar) is maintained using a text editor and then compiled for run-time use in the LKB system. At the same time, integrating the LKB with the emacs editor brings certain advantages, among them ease of start-up, editing support for LKB grammar files, and some additional debugging facilities.

Extending the terse discussion from the LkbInstallation page (which hopefully just worked for you), the following paragraphs provide some more background on using the LKB with emacs, including help on running a Lisp environment as a sub-process to emacs (so as to compile the LKB source code).

Working with other scripts in Emacs

Working with Greek (UTF-8)

This is the documentation of how to use LKB and Emacs with Greek (grammar files in UTF-8). You need an Emacs upwards of 21.3 (I do not know about the requirements for XEmacs).

Start the LKB in Emacs.

In your Emacs startup file, add the following settings:

;; Sets character set to greek-iso8859-7, and coding system to greek-iso-8bit
;; Also sets a default input method (Greek)
(set-language-environment 'Greek)
(set-default-coding-systems 'mule-utf-8)
(unless (boundp 'fi:common-lisp-image-arguments)
  (setq fi:common-lisp-image-arguments nil))
(setq fi:common-lisp-image-arguments
      (nconc (list "-locale" "el_GR.utf8") fi:common-lisp-image-arguments))

The first line sets most required values. The second one changes the default coding systems (the default in the Greek language environment is ISO-8859-7, not UTF-8) so that the grammar and test suite files are read with the correct encoding. The remaining lines tells Allegro Common Lisp to use Greek.

At the end of lkb/globals.lsp, add the following:

;; Write CDB temporary files as binary
(defparameter cdb::*cdb-ascii-p* nil)

Exit LKB and Emacs, delete the temporary lexicon files.

Your LKB setup is now capable of working with Greek. You can run batch parses from Greek files, pass on Greek sentences to do-parse-tty in the *common-lisp* buffer, and the results can be viewed (with Greek characters) in the chart window, parse tree windows, etc.

For other languages, the settings are probably analogous: choose an appropriate language environment in Emacs, and choose the appropriate file coding system (if it is different from the default in the language environment - see M-x describe-language-environment).

Remain to do:

  • Greek input in the "Parse..." window
  • Greek characters in the title bars (an UTF-8-capable window manager is probably enough to solve this problem)
  • Greek characters in the menus of the parse tree window (e.g. specifying what rules or lexicon entries were used)

Note: this setup was working as described with Emacs 21.3 and Allegro Common Lisp 6.2. A recent change in Emacs seems to be causing some problems (for Greek keyboard input, input-method 'greek-jis seems to be working, but 'greek is not (anymore); the final sigma in the generation of the temporary lexicon files is problematic). We are investigating.

Working with Japanese (EUC)

The setting are similar for Japanese, although we typically do them slightly differently.

Settings for Emacs

Define a function to set a buffer's encoding japanify.

;;; this sets up an encoding
(defun japanify (buffer encoding)
  (save-excursion
    (switch-to-buffer buffer)
    (set-language-environment 'japanese)
    (set-buffer-file-coding-system encoding)
    (set-buffer-process-coding-system encoding encoding))
  (setq default-buffer-file-coding-system encoding))

Use this when calling lisp (for Japanese):

(defun lisp (&optional prefix)
  (setq lkb-tmp-dir "/tmp")
  (interactive "P")
  (load "/usr/local/delphin/acl/eli/fi-site-init")
  (setq fi:common-lisp-image-name "/usr/local/delphin/acl/alisp")
  (setq fi:common-lisp-image-file "/usr/local/delphin/acl/bclim.dxl")
  (setq fi:common-lisp-image-arguments 
    (list 
     "-locale" "japan.EUC"
     "-qq" "-L" "/usr/local/delphin/cl-init.cl"))
  (fi:common-lisp)
  (japanify "*common-lisp*" 'euc-jp))

We have found that for the latest eli and emacs 21.4, that it always sets the (stream-external-format *terminal-io*) to :emacs-mule. We prefer it to be EUC-JP, so we evaluate the following in the lisp buffer:

(setf excl:*default-external-format*
 (setf (stream-external-format *terminal-io*) :euc))

We also need to change lkb/globals.lsp by adding the following:

;; Write CDB temporary files as binary
(defparameter cdb::*cdb-ascii-p* nil)

While not strictly necessary, we also make a point of explicitly marking the encoding in the grammar and lexicon files. This is because some users prefer their defult to be euc-jp, others to be junet and others to be utf-8. The grammar, however, must be uniformly euc-jp.

;;; -*- Mode: TDL; Coding: euc-jp -*-
Other Components

If you are working with PetTop, don't forget to specify the encoding in grammar.set as well:

;;;; settings for CHEAP                 -*- Mode: TDL; Coding: euc-jp -*-
encoding := EUC-JP.

If you use the LkbLexDb, don't forget to specify the encoding when you install the lexical database:

bash install-lexdb.sh jap ~/jap/lexdb.fld ~/jap/lexdb.dfn "-E EUC_JP"
Clone this wiki locally