SICStus Prolog supports wide characters (up to 31 bits wide). It is assumed that the character code set is an extension of (7 bit) ASCII, i.e. that it includes the codes 0..127 and these codes are interpreted as ASCII characters.
Each character in the code set has to be classified as belonging to one of the character categories, such as small-letter, digit, etc. This classification is called the character-type mapping, and it is used for defining the syntax of tokens.
The user can select one of the three predefined wide character modes
through the environment variable SP_CTYPE
. These modes are
iso_8859_1
, utf8
, and euc
. The user can also
define other wide character modes by plugging in appropriate hook
functions; see Handling Wide Characters. In this case the user has
to supply a character-type mapping for the codes greater than 127.
We first describe the character-type mapping for the fixed part of the code set, the 7 bit ASCII.
a
through z
.
A
through Z
.
0
through 9
.
+ - * / \ ^ < > = ~ : . ? @ # $ &In
sicstus
execution mode, character code 96 (`
) is also a
symbol-char.
!
and ;
.
% ( ) , [ ] { | }
"
and '
. In iso
execution mode character code 96
(`
) is also a quote-char.
_
.
We now provide the character-type mapping for the characters above the 7 bit ASCII range, for each of the built-in wide character modes.
The iso_8859_1
mode has the character set 0..255 and the
following character-type mapping for the codes 128..255:
The utf8
mode has the character set 0..(2^31-1). The
character-type mapping for the codes 128..255 is the same as for
the iso_8859_1
mode. All character codes above 255 are
classified as small-letters.
The euc
mode character set is described in Representation of EUC Wide Characters. All character codes above 127 are classified
as small-letters.
token | ::= name
| |
| natural-number
| ||
| unsigned-float
| ||
| variable
| ||
| string
| ||
| punctuation-char
| ||
| layout-text
| ||
| full-stop
| ||
name | ::= quoted-name
| |
| word
| ||
| symbol
| ||
| solo-char
| ||
| [ ?layout-text ]
| ||
| { ?layout-text }
| ||
word | ::= small-letter ?alpha...
| |
symbol | ::= symbol-char... | { except in the case of a full-stop or where the first 2 chars are /* }
|
natural-number | ::= digit...
| |
| base-prefix alpha... | { where each alpha must be digits of the base indicated by base-prefix, treating a,b,... and A,B,... as 10,11,... }
| |
| 0 ' char-item | { yielding the character code for char }
| |
unsigned-float | ::= simple-float
| |
| simple-float exp exponent
| ||
simple-float | ::= digit... . digit...
| |
exp | ::= e | E
| |
exponent | ::= digit... | sign digit...
| |
sign | ::= - | +
| |
variable | ::= underline ?alpha...
| |
| capital-letter ?alpha...
| ||
string | ::= " ?string-item... "
| |
string-item | ::= quoted-char | { other than " or \ }
|
| ""
| ||
| \ escape-sequence | {unless character escapes have been switched off }
| |
quoted-atom | ::= ' ?quoted-item... '
| |
quoted-item | ::= quoted-char | { other than ' or \ }
|
| ''
| ||
| \ escape-sequence | {unless character escapes have been switched off }
| |
backquoted-atom | ::= ` ?backquoted-item... `
| |
backquoted-item | ::= quoted-char | { other than ` or \ }
|
| ``
| ||
| \ escape-sequence | {unless character escapes have been switched off }
| |
layout-text | ::= layout-text-item...
| |
layout-text-item | ::= layout-char | comment
| |
comment | ::= /* ?char... */ | { where ?char... must not contain */ }
|
| % ?char... <LFD> | { where ?char... must not contain <LFD> }
| |
full-stop | ::= . | { the following token, if any, must be layout-text}
|
char | ::= layout-char
| |
| printing-char
| ||
printing-char | ::= alpha
| |
| symbol-char
| ||
| solo-char
| ||
| punctuation-char
| ||
| quote-char
| ||
alpha | ::= capital-letter | small-letter | digit | underline
| |
escape-sequence | ::= b | { backspace, character code 8 }
|
| t | { horizontal tab, character code 9 }
| |
| n | { newline, character code 10 }
| |
| v | { vertical tab, character code 11 }
| |
| f | { form feed, character code 12 }
| |
| r | { carriage return, character code 13 }
| |
| e | { escape, character code 27 }
| |
| d | { delete, character code 127 }
| |
| a | { alarm, character code 7 }
| |
| other-escape-sequence
|
There are differences between the syntax used in iso
mode
and in sicstus
mode. The differences are described by providing
different syntax rules for certain syntactic categories.