cl-ppcre-unicode

This system adds Unicode support to the cl-ppcre.

What does it mean? It means that after loading cl-ppcre-unicode you’ll be able to match against Unicode symbol properties.

A property matcher has a special syntax in cl-ppcre’s regexps: \p{PropertyName}.

Here is an example:

;; This is how we can find out a position
;; of the first Cyrillic letter:

POFTHEDAY> (ppcre:scan "\\p{Cyrillic}"
                       "123Ю56")
3

;; Here we are extracting a
;; sequence of Emoji from the text:
POFTHEDAY> (ppcre:regex-replace
            ".*?([\\p{Emoticons}|\\p{Supplemental Symbols and Pictographs}]+).*"
            "Hello, Lisper! 🤗😃 How are you?"
            "\\1")
"🤗😃"

We are using two different Unicode classes as properties because these two characters belong to different classes.

You can use cl-unicode to discover the character’s unicode class:

POFTHEDAY> (cl-unicode:code-block #\😃)
"Emoticons"

POFTHEDAY> (cl-unicode:code-block #\🤗)
"Supplemental Symbols and Pictographs"

The way, how cl-ppcre-unicode works is very interesting. It turns out that cl-ppcre has a special hook which allows you to define a property resolver.

For example, if you want to have a special property for vowels, you might do something like that:

POFTHEDAY> (defun my-property-resolver (property-name)
             (if (string-equal property-name
                               "vowel")
                 (rutils:fn vovel-p (character)
                   (member character '(#\A #\E #\I #\O #\U)
                           :test #'char-equal))
                 (cl-ppcre-unicode:unicode-property-resolver
                  property-name)))

POFTHEDAY> (setf cl-ppcre:*property-resolver*
                 #'my-property-resolver)

;; And now we can use the "Vowel" property in any
;; regular expressions!
POFTHEDAY> (ppcre:regex-replace-all
            "\\p{Vowel}"
            "Hello, Lisper! How are you?"
            "")
"Hll, Lspr! Hw r y?"

Isn’t this cool!? 🤪

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!