As opposed to UNICODE, the definition of EUC specifies only the external
representation.  The actual wide character codes assigned to the
multibyte characters are not specified.  UNIX systems supporting EUC
have their own C data type, wchar_t, which stores a wide
character, but the mapping between this type and the external
representation is not standardized.
   
We have decided to use a custom made mapping from the EUC encoding to
the character code set, as opposed to using the UNIX type
wchar_t.  This decision was made so that the code set is machine
independent and results in a compact representation of atoms.
   
EUC consists of four sub-code-sets, three of which can have multibyte external representation. Sub-code-set 0 consists of ASCII characters and is mapped one-to-one to codes 0..127. Sub-code-set 1 has an external representation of one to three bytes in the range 128-255, the length determined by the locale. Sub-code-sets 2 and 3 are similar, but their external representation is started by a so called single shift character code, known as SS2 and SS3, respectively. The following table shows the mapping from the EUC external encoding to SICStus Prolog character codes.
     Sub-
     code-set  External encoding                 Character code (binary)
     
      0        0xxxxxxx                          00000000 00000000 0xxxxxxx
     
      1        1xxxxxxx                          00000000 00000000 1xxxxxxx
               1xxxxxxx 1yyyyyyy                 00000000 xxxxxxx0 1yyyyyyy
               1xxxxxxx 1yyyyyyy 1zzzzzzzz       0xxxxxxx yyyyyyy0 1zzzzzzz
     
      2        SS2 1xxxxxxx                      00000000 00000001 0xxxxxxx
               SS2 1xxxxxxx 1yyyyyyy             00000000 xxxxxxx1 0yyyyyyy
               SS2 1xxxxxxx 1yyyyyyy 1zzzzzzzz   0xxxxxxx yyyyyyy1 0zzzzzzz
     
      3        SS3 1xxxxxxx                      00000000 00000001 1xxxxxxx
               SS3 1xxxxxxx 1yyyyyyy             00000000 xxxxxxx1 1yyyyyyy
               SS3 1xxxxxxx 1yyyyyyy 1zzzzzzzz   0xxxxxxx yyyyyyy1 1zzzzzzz
   For sub-code-sets other than 0, the sub-code-set length indicated by the locale determines which of three mappings are used (but see below the SP_CSETLEN environment variable). When converting SICStus Prolog character codes to EUC on output, we ignore bits that have no significance in the mapping selected by the locale.
The byte lengths associated with the EUC sub-code-sets are determined by
using the csetlen() function. If this function is not available
in the system configuration used, then Japanese Solaris lengths are
assumed, namely 2, 1, 2 for sub-code-sets 1, 2, and 3, respectively (the
lengths exclude the single shift character).
   
To allow experimentation with sub-code-sets differing from the locale, the sub-code-set length values can be overridden by setting the SP_CSETLEN environment variable to xyz, where x, y, and z are digits in the range 1..3. Such a setting will cause the sub-code-sets 1, 2, 3 to have x, y, and z associated with them as their byte lengths.