Node:WCX Utility Functions, Next:Representation of EUC Wide Characters, Previous:WCX Features in Libraries, Up:Handling Wide Characters
The default functions for reading in and writing out character codes using
one of the three supported encodings are available through
SP_WcxGetcHook *SP_wcx_getc(int usage); SP_WcxPutcHook *SP_wcx_putc(int usage);
These functions return the decoding/encoding functions appropriate for
usage, where the latter is one of the constants
WCX_USE_LATIN1, WCX_USE_UTF8, WCX_USE_EUC.
The following utility functions may be useful when dealing with wide characters in internal encoding (WCI). These functions are modeled after multibyte character handling functions of Solaris.
int SP_wci_code(int *pcode, char *wci);
SP_wci_code() determines the number of bytes that comprise the
internally encoded character pointed to by wci. Also, if
pcode is not a null pointer, SP_wci_code() converts the
internally encoded character to a wide character code and places the
result in the object pointed to by pcode. (The value of the
wide character corresponding to the null character is zero.) At most
WCI_MAX_BYTES bytes will be examined, starting at the byte
pointed to by wci.
If wci is a null pointer, SP_wci_code() simply returns
0. If wci is not a null pointer, then, if wci points to
the null character, SP_wci_code() returns 0; if the next bytes
form a valid internally encoded character, SP_wci_code() returns
the number of bytes that comprise the internal encoding; otherwise
wci does not point to a valid internally encoded character and
SP_wci_code() returns the negated length of the invalid byte
sequence. This latter case can not happen, if wci points to the
beginning of a Prolog atom string, or to a position within such a string
reached by repeated stepping over correctly encoded wide characters.
WCI_MAX_BYTES
WCI_MAX_BYTES is a constant defined by SICStus Prolog showing
the maximal length (in bytes) of the internal encoding of a single
character code. (As the internal encoding is UTF-8, this constant has
the value 6).
int SP_wci_len(char *wci);
SP_wci_len() determines the number of bytes comprising the
multi-byte character pointed to by wci. It is equivalent to:
SP_wci_code((int *)0, wci);
int SP_code_wci(char *wci, int code);
SP_code_wci() determines the number of bytes needed to
represent the internal encoding of the character code, and, if
wci is not a null pointer, stores the internal encoding in the
array pointed to by wci. At most WCI_MAX_BYTES bytes
are stored.
SP_code_wci() returns -1 if the value of code is outside
the wide character code range; otherwise it returns the number of
bytes that comprise the internal encoding of code.
The following functions give access to the default character type mapping and the currently selected operating system encoding/decoding functions.
int SP_latin1_chartype(int char_code);
SP_latin1_chartype returns the character type category of the
character code char_code, according to the ISO 8859/1
code-set. The char_code value is assumed to be in the 1..255
range.
char* SP_to_os(char *string, int context)
char* SP_from_os(char *string, int context)
These functions simply invoke the wcx_to_os() and
wcx_from_os() hook functions, respectively. These are useful in
foreign functions which handle strings passed to/from the operating
system, such as file names, options, etc.