SICStus Prolog supports wide characters (up to 31 bits wide), interpreted as a superset of Unicode.
Each character in the code set has to be classified as belonging to one of the character categories, such as small-letter, digit, etc. This classification is called the character-type mapping, and it is used for defining the syntax of tokens.
Only character codes 0..255, i.e. the ISO 8859/1 (Latin 1) subset of Unicode, can be part of unquoted tokens1. This restriction may be lifted in the future.
For quoted tokens, i.e. quoted atoms and strings, almost any sequence of code points assigned to non-private abstract characters in Unicode 5.0 is allowed. The disallowed characters are those in the layout-char category except that space (character code 32) is allowed despite it being a layout-char.
An additional restriction is that the sequence of characters that makes up a quoted token must be in Normal Form C (NFC) http://www.unicode.org/reports/tr15/. This is currently (SICStus Prolog 4.0.1) not enforced. A future version of SICStus Prolog may enforce this restriction or perform this normalization automatically.
NFC is the normalization form used on the web (http://www.w3.org/TR/charmod/) and what most software can be expected to produce by default. Any sequence consisting of only characters from Latin 1 is already in NFC.
Note: Any output produced by write_term/2
with the option
quoted(true)
will be in NFC. This includes output from
writeq/[1,2]
and write_canonical/[1,2]
.
+ - * / \ ^ < > = ~ : . ? @ # $ &
In addition, the non-ASCII character codes 161..169, 171..185, 187..191,
215, and 247 belong to this character type2.
% ( ) , [ ] { | }
Other characters are unclassified and may only appear in comments and to some extent, as discussed above, in quoted atoms and strings.
token | ::= name
| |
| natural-number
| ||
| unsigned-float
| ||
| variable
| ||
| string
| ||
| punctuation-char
| ||
| layout-text
| ||
| full-stop
| ||
name | ::= quoted-name
| |
| word
| ||
| symbol
| ||
| solo-char
| ||
| [ ?layout-text ]
| ||
| { ?layout-text }
| ||
word | ::= small-letter ?alpha...
| |
symbol | ::= symbol-char... | { except in the case of a full-stop or where the first 2 chars are `/*' }
|
natural-number | ::= digit...
| |
| base-prefix alpha... | { where each alpha must be digits of the base indicated by base-prefix, treating a,b,... and A,B,... as 10,11,... }
| |
| 0 ' char-item | { yielding the character code for char }
| |
unsigned-float | ::= simple-float
| |
| simple-float exp exponent
| ||
simple-float | ::= digit... . digit...
| |
exp | ::= e | E
| |
exponent | ::= digit... | sign digit...
| |
sign | ::= - | +
| |
variable | ::= underline ?alpha...
| |
| capital-letter ?alpha...
| ||
string | ::= " ?string-item... "
| |
string-item | ::= quoted-char | { other than `"' or `\' }
|
| ""
| ||
| \ escape-sequence
| ||
quoted-atom | ::= ' ?quoted-item... '
| |
quoted-item | ::= quoted-char | { other than `'' or `\' }
|
| ''
| ||
| \ escape-sequence
| ||
backquoted-atom | ::= ` ?backquoted-item... `
| |
backquoted-item | ::= quoted-char | { other than ``' or `\' }
|
| ``
| ||
| \ escape-sequence
| ||
layout-text | ::= layout-text-item...
| |
layout-text-item | ::= layout-char | comment
| |
comment | ::= /* ?char... */ | { where ?char... must not contain `*/' }
|
| % ?char... <LFD> | { where ?char... must not contain <LFD> }
| |
full-stop | ::= . | { the following token, if any, must be layout-text}
|
char | ::= layout-char
| |
| printing-char
| ||
printing-char | ::= alpha
| |
| symbol-char
| ||
| solo-char
| ||
| punctuation-char
| ||
| quote-char
| ||
alpha | ::= capital-letter | small-letter | digit | underline
| |
escape-sequence | ::= b | { backspace, character code 8 }
|
| t | { horizontal tab, character code 9 }
| |
| n | { newline, character code 10 }
| |
| v | { vertical tab, character code 11 }
| |
| f | { form feed, character code 12 }
| |
| r | { carriage return, character code 13 }
| |
| e | { escape, character code 27 }
| |
| d | { delete, character code 127 }
| |
| a | { alarm, character code 7 }
| |
| other-escape-sequence
| ||
quoted-name | ::= quoted-atom
| |
| backquoted-atom
| ||
base-prefix | ::= 0b | { indicates base 2 }
|
| 0o | { indicates base 8 }
| |
| 0x | { indicates base 16 }
| |
char-item | ::= quoted-item
| |
other-escape-sequence | ::= x alpha... \ | {treating a,b,... and A,B,... as 10,11,... } in the range [0..15], hex character code }
|
| o digit... \ | { in the range [0..7], octal character code }
| |
| <LFD> | { ignored }
| |
| \ | { stands for itself }
| |
| ' | { stands for itself }
| |
| " | { stands for itself }
| |
| ` | { stands for itself }
| |
quoted-char | ::= <SPC>
| |
| printing-char
|
[1] Characters outside this range can still be included in quoted atoms and strings by using escape sequences (see ref-syn-syn-esc).
[2] In SICStus Prolog 4.0.0 and in SICStus 3 the lower case characters 170 and 186 were incorrectly classified as symbol-char. This was corrected in SICStus Prolog 4.0.1.