WCX Utility Functions - SICStus Prolog

12.8 WCX Related Utility Functions

The default functions for reading in and writing out character codes using one of the three supported encodings are available through

     SP_WcxGetcHook *SP_wcx_getc(int usage);
     SP_WcxPutcHook *SP_wcx_putc(int usage);

These functions return the decoding/encoding functions appropriate for usage, where the latter is one of the constants WCX_USE_LATIN1, WCX_USE_UTF8, WCX_USE_EUC.

The following utility functions may be useful when dealing with wide characters in internal encoding (WCI). These functions are modeled after multibyte character handling functions of Solaris.

int SP_wci_code(int *pcode, char *wci);

SP_wci_code() determines the number of bytes that comprise the internally encoded character pointed to by wci. Also, if pcode is not a null pointer, SP_wci_code() converts the internally encoded character to a wide character code and places the result in the object pointed to by pcode. (The value of the wide character corresponding to the null character is zero.) At most WCI_MAX_BYTES bytes will be examined, starting at the byte pointed to by wci.

If wci is a null pointer, SP_wci_code() simply returns 0. If wci is not a null pointer, then, if wci points to the null character, SP_wci_code() returns 0; if the next bytes form a valid internally encoded character, SP_wci_code() returns the number of bytes that comprise the internal encoding; otherwise wci does not point to a valid internally encoded character and SP_wci_code() returns the negated length of the invalid byte sequence. This latter case can not happen, if wci points to the beginning of a Prolog atom string, or to a position within such a string reached by repeated stepping over correctly encoded wide characters.

WCI_MAX_BYTES: WCI_MAX_BYTES is a constant defined by SICStus Prolog showing the maximal length (in bytes) of the internal encoding of a single character code. (As the internal encoding is UTF-8, this constant has the value 6).

int SP_wci_len(char *wci);

SP_wci_len() determines the number of bytes comprising the multi-byte character pointed to by wci. It is equivalent to:

          SP_wci_code((int *)0, wci);

int SP_code_wci(char *wci, int code);

SP_code_wci() determines the number of bytes needed to represent the internal encoding of the character code, and, if wci is not a null pointer, stores the internal encoding in the array pointed to by wci. At most WCI_MAX_BYTES bytes are stored.

SP_code_wci() returns -1 if the value of code is outside the wide character code range; otherwise it returns the number of bytes that comprise the internal encoding of code.

The following functions give access to the default character type mapping and the currently selected operating system encoding/decoding functions.

int SP_latin1_chartype(int char_code);: SP_latin1_chartype returns the character type category of the character code char_code, according to the ISO 8859/1 code-set. The char_code value is assumed to be in the 1..255 range.
char* SP_to_os(char *string, int context)
char* SP_from_os(char *string, int context): These functions simply invoke the wcx_to_os() and wcx_from_os() hook functions, respectively. These are useful in foreign functions that handle strings passed to/from the operating system, such as file names, options, etc.