The default functions for reading in and writing out character codes using one of the three supported encodings are available through
SP_WcxGetcHook *SP_wcx_getc(int usage); SP_WcxPutcHook *SP_wcx_putc(int usage);
These functions return the decoding/encoding functions appropriate for
usage
, where the latter is one of the constants
WCX_USE_LATIN1
, WCX_USE_UTF8
, WCX_USE_EUC
.
The following utility functions may be useful when dealing with wide characters in internal encoding (WCI). These functions are modeled after multibyte character handling functions of Solaris.
int SP_wci_code(int *pcode, char *wci);
SP_wci_code()
determines the number of bytes that comprise the
internally encoded character pointed to by wci
. Also, if
pcode
is not a null pointer, SP_wci_code()
converts the
internally encoded character to a wide character code and places
the result in the object pointed to by pcode
. (The value of the
wide character corresponding to the null character is zero.) At most
WCI_MAX_BYTES
bytes will be examined, starting at the byte
pointed to by wci
.
If wci
is a null pointer, SP_wci_code()
simply returns
0. If wci
is not a null pointer, then, if wci
points to
the null character, SP_wci_code()
returns 0; if the next bytes
form a valid internally encoded character, SP_wci_code()
returns
the number of bytes that comprise the internal encoding; otherwise
wci
does not point to a valid internally encoded character and
SP_wci_code()
returns the negated length of the invalid byte
sequence. This latter case can not happen, if wci
points to the
beginning of a Prolog atom string, or to a position within such a
string reached by repeated stepping over correctly encoded wide
characters.
WCI_MAX_BYTES
WCI_MAX_BYTES
is a constant
defined by SICStus Prolog showing the maximal length (in bytes) of the
internal encoding of a single character code. (As the
internal encoding is UTF-8, this constant has the value 6).
int SP_wci_len(char *wci);
SP_wci_len()
determines the number of bytes comprising the
multi-byte character pointed to by wci
. It is equivalent to:
SP_wci_code((int *)0, wci);
int SP_code_wci(char *wci, int code);
SP_code_wci()
determines the number of bytes needed to
represent the internal encoding of the character code
, and,
if wci
is not a null pointer, stores the internal encoding
in the array pointed to by wci
. At most WCI_MAX_BYTES
bytes are stored.
SP_code_wci()
returns -1 if the value of code
is outside
the wide character code range; otherwise it returns the number of
bytes that comprise the internal encoding of code
.
The following functions give access to the default character type mapping and the currently selected operating system encoding/decoding functions.
int SP_latin1_chartype(int char_code);
SP_latin1_chartype
returns the character type category of the
character code char_code
, according to the ISO 8859/1
code-set. The char_code
value is assumed to be in the 1..255
range.
char* SP_to_os(char *string, int context)
char* SP_from_os(char *string, int context)
These functions simply invoke the wcx_to_os()
and
wcx_from_os()
hook functions, respectively. These are
useful in foreign functions that handle strings passed to/from the
operating system, such as file names, options, etc.