SICStus Prolog supports wide characters (up to 31 bits wide), interpreted as a superset of Unicode.
Each character in the code set has to be classified as belonging to one of the character categories, such as small-letter, digit, etc. This classification is called the character-type mapping, and it is used for defining the syntax of tokens.
Only character codes 0..255, i.e. the ISO-8859-1 (Latin 1)
subset of Unicode, can be part of unquoted tokens1, unless the Prolog
flag legacy_char_classification is set; see ref-lps-flg.  This
restriction may be lifted in the future.
   
For quoted tokens, i.e. quoted atoms and strings, almost any sequence of code points assigned to non-private abstract characters in Unicode 5.0 is allowed. The disallowed characters are those in the whitespace-char category except that space (character code 32) is allowed despite it being a whitespace-char.
An additional restriction is that the sequence of characters that makes up a quoted token must be in Normal Form C (NFC) http://www.unicode.org/reports/tr15/. This is currently not enforced. A future release may enforce this restriction or perform this normalization automatically.
NFC is the normalization form used on the web (http://www.w3.org/TR/charmod/) and what most software can be expected to produce by default. Any sequence consisting of only characters from Latin 1 is already in NFC.
When the Prolog flag legacy_char_classification is set,
characters in the whitespace-char category are still treated as whitespace
but other character codes outside the range 0..255, assigned to
non-private abstract characters in Unicode 5.0, are treated as lower
case. Such characters can therefore appear as themselves, without using
escape sequences, both in quoted and unquoted tokens.
   
Note: Any output produced by write_term/2 with the option
quoted(true) will be in NFC. This includes output from
writeq/[1,2] and write_canonical/[1,2].
     
If the Prolog flag legacy_char_classification
(see ref-lps-flg) is set then the small-letter set will also
include almost every code point above 255 assigned to non-private
abstract characters in Unicode 5.0.
     
+ - * / \ ^ < > = ~ : . ? @ # $ &
In addition, the non-ASCII character codes 161..169, 171..185, 187..191,
215, and 247 belong to this character type2.
     
          % ( ) , [ ] { | }
     Other characters are unclassified and may only appear in comments and to some extent, as discussed above, in quoted atoms and strings.
| token | ::= name | |
| |  natural-number | ||
| |  unsigned-float | ||
| |  variable | ||
| |  string | ||
| |  punctuation-char | ||
| |  whitespace-text | ||
| |  full-stop | ||
| name | ::= quoted-name | |
| |  word | ||
| |  symbol | ||
| |  solo-char | ||
| | [?whitespace-text] | ||
| | {?whitespace-text} | ||
| word | ::= small-letter ?alpha... | |
| symbol | ::= symbol-char... | { except in the case of a full-stop or where the first 2 chars are ‘/*’ } | 
| natural-number | ::= digit... | |
| | base-prefix alpha... | { where each alpha must be digits of the base indicated by base-prefix, treating a,b,... and A,B,... as 10,11,... } | |
| | 0'char-item | { yielding the character code for char } | |
| unsigned-float | ::= simple-float | |
| |  simple-float exp exponent | ||
| simple-float | ::= digit... .digit... | |
| exp | ::= e|E | |
| exponent | ::= digit... | sign digit... | |
| sign | ::= -|+ | |
| variable | ::= underline ?alpha... | |
| |  capital-letter ?alpha... | ||
| string | ::= "?string-item..." | |
| string-item | ::= quoted-char | { other than ‘"’ or ‘\’ } | 
| | "" | ||
| | \escape-sequence | ||
| quoted-atom | ::= '?quoted-item...' | |
| quoted-item | ::= quoted-char | { other than ‘'’ or ‘\’ } | 
| | '' | ||
| | \escape-sequence | ||
| whitespace-text | ::= whitespace-text-item... | |
| whitespace-text-item | ::= whitespace-char | comment | |
| comment | ::= /*?char...*/ | { where ?char... must not contain ‘*/’ } | 
| | %?char... <LFD> | { where ?char... must not contain <LFD> } | |
| full-stop | ::= . | { the following token, if any, must be whitespace-text} | 
| char | ::= whitespace-char | |
| |  printing-char | ||
| printing-char | ::= alpha | |
| |  symbol-char | ||
| |  solo-char | ||
| |  punctuation-char | ||
| |  quote-char | ||
| alpha | ::= capital-letter | small-letter | digit | underline | |
| escape-sequence | ::= b | { backspace, character code 8 } | 
| | t | { horizontal tab, character code 9 } | |
| | n | { newline, character code 10 } | |
| | v | { vertical tab, character code 11 } | |
| | f | { form feed, character code 12 } | |
| | r | { carriage return, character code 13 } | |
| | e | { escape, character code 27 } | |
| | d | { delete, character code 127 } | |
| | a | { alarm, character code 7 } | |
| |  other-escape-sequence | ||
| quoted-name | ::= quoted-atom | |
| base-prefix | ::= 0b | { indicates base  2 } | 
| | 0o | { indicates base  8 } | |
| | 0x | { indicates base 16 } | |
| char-item | ::= quoted-item | |
| other-escape-sequence | ::= xalpha...\ | {treating a,b,... and A,B,... as 10,11,... } in the range [0..15], hex character code } | 
| |  digit... \ | { in the range [0..7], octal character code } | |
| | <LFD> | { ignored } | |
| | \ | { stands for itself } | |
| | ' | { stands for itself } | |
| | " | { stands for itself } | |
| | ` | { stands for itself } | |
| quoted-char | ::= <SPC> | |
| |  printing-char | 
[1] Characters outside this range can still be included in quoted atoms and strings by using escape sequences (see ref-syn-syn-esc).
[2] In release 3 and 4.0.0 the lower case characters 170 and 186 were incorrectly classified as symbol-char. This was corrected in release 4.0.1.