Node:WCX Hooks, Next:WCX Foreign Interface, Previous:WCX Environment Variables, Up:Handling Wide Characters
Users can have complete control over the way wide characters are handled by SICStus Prolog if they supply their own definitions of appropriate hook functions. A set of such functions, implementing a specific environment for handling wide characters is called a WCX box. A sample WCX box is described below (see A Sample WCX Box).
Plugging-in of the WCX hook functions can be performed by calling
void SP_set_wcx_hooks ( int usage,
SP_WcxOpenHook *wcx_open,
SP_WcxCloseHook *wcx_close,
SP_WcxCharTypeHook *wcx_chartype,
SP_WcxConvHook *wcx_from_os,
SP_WcxConvHook *wcx_to_os);
The effect of SP_set_wcx_hooks() is controlled by the value of
usage. The remaining arguments are pointers to appropriate hook
functions or NULL values, the latter implying that the hook
should take some default value.
There are three independent aspects to be controlled, and usage
should be supplied as a bitwise OR of chosen constant names for each
aspect. The defaults have value 0, so need not be included.
The aspects are the following:
This decides the default behavior of the wcx_open and
wcx_chartype hook functions (if both are supplied by the user,
the choice of the default is irrelevant). The possible values are:
WCX_USE_LATIN1 (default)
WCX_USE_UTF8
WCX_USE_EUC
iso_8859_1, utf8, and euc,
respectively; see WCX Environment Variables.
The flags below determine what function to use for conversion from/to
the operating system encoding, if such functions are not supplied by
the user through the wcx_from_os and wcx_to_os
arguments (if both are supplied by the user, the choice of default is
irrelevant).
WCX_OS_8BIT (default)
WCX_OS_UTF8
This is important if some of the conversion functions
(wcx_from_os, wcx_to_os, and wcx_getc,
wcx_putc, see later) are user-defined. In such cases it may
be beneficial for the user to inform SICStus Prolog whether the
supplied encoding functions preserve ASCII characters. (The default
encodings do preserve ASCII.)
WCX_PRESERVES_ASCII (default)
WCX_CHANGES_ASCII
We now describe the role of the arguments following usage in the
argument list of SP_set_wcx_hooks().
SP_WcxOpenHook *wcx_open
typedef void (SP_WcxOpenHook)
(SP_stream *s, SP_atom option, int context);
This function is called by SICStus Prolog for each s stream
opened, except when the encoding to be used for the stream is
pre-specified (binary files, files opened using the wci
option, and the C streams created with contexts
SP_STREAMHOOK_WCI and SP_STREAMHOOK_BIN).
The main task of the wcx_open hook is to associate the two
WCX-processing functions with the stream, by storing them in the
appropriate fields of the SP_stream data structure:
SP_WcxGetcHook *wcx_getc;
SP_WcxPutcHook *wcx_putc;
These fields are pointers to the functions performing the external decoding and encoding as described below. They are initialized to functions that truncate to 8 bits on output and zero-extend to 31 bits on input.
SP_WcxGetcHook *wcx_getc
typedef int (SP_WcxGetcHook)
(int first_byte, SP_stream *s, long *pbyte_count);
This function is generally invoked whenever a character has to
be read from a stream. Before invoking this function, however, a
byte is read from the stream by SICStus Prolog itself. If the
byte read is an ASCII character (its value is < 128), and
WCX_PRESERVES_ASCII is in force, then the byte read is
deemed to be the next character code, and wcx_getc is not
invoked. Otherwise, wcx_getc is invoked with the byte and
stream in question and is expected to return the next character
code.
The wcx_getc function may need to read additional bytes
from the stream, if first byte signifies the start of a
multi-byte character. A byte may be read from the stream
s in the following way:
byte = s->sgetc((long)s->user_handle);
The wcx_getc function is expected to increment its
*pbyte_count argument by 1 for each such byte read.
The default wcx_open hook will install a wcx_getc
function according to the usage argument. The three
default external decoding functions are also available to users
through the SP_wcx_getc() function (see WCX Utility Functions).
SP_WcxPutcHook *wcx_putc
typedef int (SP_WcxPutcHook)
(int char_code, SP_stream *s, long *pbyte_count);
This function is generally invoked whenever a character has to
be written to a stream. However, if the character code to be
written is an ASCII character (its value is < 128), and
WCX_PRESERVES_ASCII is in force, then the code is written
directly on the stream, and wcx_putc is not
invoked. Otherwise, wcx_putc is invoked with the
character code and stream in question and is expected to do
whatever is needed to output the character code to the stream.
This will require outputting one or more bytes to the stream. A
byte byte can be written to the stream s in the
following way:
return_code = s->sputc(byte,(long)s->user_handle);
The wcx_putc function is expected to return the return
value of the last invocation of s->sputc, or -1 as an
error code, if incapable of outputting the character code. The
latter may be the case, for example, if the code to be output
does not belong to the character code set in force. It is also
expected to increment its *pbyte_count argument by 1 for
each byte written.
The default wcx_open hook function will install a
wcx_putc function according to the usage
argument. The three default external encoding functions are also
available to users through the SP_wcx_putc() function
(see WCX Utility Functions).
In making a decision regarding the selection of these WCX-processing
functions, the context and option arguments of the
wcx_open hook can be used.
The option argument is an atom.
The context argument
encodes the context of invocation. It is one of the following
values
SP_STREAMHOOK_STDIN
SP_STREAMHOOK_STDOUT
SP_STREAMHOOK_STDERR
SP_STREAMHOOK_OPEN
open
SP_STREAMHOOK_NULL
open_null_stream
SP_STREAMHOOK_LIB
SP_STREAMHOOK_C, SP_STREAMHOOK_C+1, ...
SP_make_stream()
The option argument comes from the user and it can carry some
WCX-related information to be associated with the stream opened. For
example, this can be used to implement a scheme supporting multiple
encodings, supplied on a stream-by-stream basis, as shown in the
sample WCX-box (see A Sample WCX Box).
If the stream is opened from Prolog code, the option argument
for this hook function is derived from the wcx(Option)
option of open/4 and load_files/2. If this option is
not present, or the stream is opened using some other built-in, then
the value of the wcx prolog flag will be passed on to the
open hook.
If the stream is opened from C, via SP_make_stream(), then the
option argument will be the value of the prolog flag
wcx.
There is also a variant of SP_make_stream(), called
SP_make_stream_context() which takes two additional arguments,
the option and the context, to be passed on to the wcx_open
hook (see WCX Foreign Interface).
The wcx_open hook can associate the information derived from
option with the stream in question using a new field in the
SP_stream data structure: void *wcx_info, initialized
to NULL. If there is more information than can be stored in
this field, or if the encoding to be implemented requires keeping
track of a state, then the wcx_open hook should allocate
sufficient amount of memory for storing the information and/or the
state, using SP_malloc(), and deposit a pointer to that piece
of memory in wcx_info.
The default wcx_open hook function ignores its option
and context arguments and sets the wcx_getc and
wcx_putc stream fields to functions performing the external
decoding and encoding according to the usage argument of
SP_set_wcx_hooks().
SP_WcxCloseHook *wcx_close
typedef void (SP_WcxCloseHook) (SP_stream *s);
This hook function is called whenever a stream is closed, for which
the wcx_open hook was invoked at its creation. The argument
s points to the stream being closed. It can be used to
implement the closing activities related to external encoding,
e.g. freeing any memory allocated in wcx_open hook.
The default wcx_close hook function does nothing.
SP_WcxCharTypeHook *wcx_chartype
typedef int (SP_WcxCharTypeHook) (int char_code);
This function should be prepared to take any char_code >= 128
and return one of the following constants:
CHT_LAYOUT_CHAR
CHT_SMALL_LETTER
CHT_CAPITAL_LETTER
CHT_SYMBOL_CHAR
CHT_SOLO_CHAR
Regarding the meaning of these syntactic categories, see Token String.
The value returned by this function is not expected to change over
time, therefore, for efficiency reasons, its behavior is cached.
The cache is cleared by SP_set_wcx_hooks().
As a help in implementing this function, SICStus Prolog provides the
function SP_latin1_chartype(), which returns the character type
category for the codes 1..255 according to the ISO 8859/1
standard.
Note that if a character code >= 512 is categorized as a
layout-char, and a character with this code occurs within an
atom being written out in quoted form (e.g. using writeq) in
native sicstus mode (as opposed to iso mode), then
this code will be output as itself, rather than an octal escape
sequence. This is because in sicstus mode escape sequences
consist of at most 3 octal digits.
SP_WcxConvHook *wcx_to_os
typedef char* (SP_WcxConvHook) (char *string, int context);
This function is normally called each time SICStus Prolog wishes to
communicate a string of possibly wide characters to the operating
system. However, if the string in question consists of ASCII
characters only, and WCX_PRESERVES_ASCII is in force, then
wcx_to_os may not be called, and the original string may be
passed to the operating system.
The first argument of wcx_to_os is a zero terminated string,
using the internal encoding of SICStus Prolog, namely UTF-8. The
function is expected to convert the string to a form required by the
operating system, in the context described by the second,
context argument, and to return the converted string. If no
conversion is needed, it should simply return its first argument.
Otherwise, the conversion should be done in a memory area controlled
by this function (preferably a static buffer, reused each time the
function is called).
The second argument specifies the context of conversion. It can be one of the following integer values:
WCX_FILE
WCX_OPTION
WCX_WINDOW_TITLE
WCX_C_CODE
SICStus Prolog provides a utility function SP_wci_code(), see
below, for obtaining a wide character code from a UTF-8 encoded
string, which can be used to implement the wcx_to_os hook
function.
The default of the wcx_to_os function depends on the
usage argument of SP_set_wcx_hooks(). If the value of
usage includes WCX_OS_UTF8, then the function does no
conversion, as the operating system uses the same encoding as
SICStus Prolog. If the value of usage includes
WCX_OS_8BIT, then the function decodes the UTF-8 encoded
string and converts this sequence of codes into a sequence of bytes
by truncating each code to 8 bits.
Note that the default wcx_to_os functions ignore their
context argument.
SP_WcxConvHook *wcx_from_os
typedef char* (SP_WcxConvHook) (char *string, int context);
This function is called each time SICStus Prolog receives from the
operating system a zero terminated sequence of bytes possibly
encoding a wide character string. The function is expected to
convert the byte sequence, if needed, to a string in the internal
encoding of SICStus Prolog (UTF-8), and return the converted
string. The conversion should be done in a memory area controlled by
this function (preferably a static buffer, reused each time the
function is called, but different from the buffer used in
wcx_to_os).
The second argument specifies the context of conversion, as in the
case of wcx_to_os.
SICStus Prolog provides a utility function SP_code_wci(), see
below, for converting a character code (up to 31 bits) into UTF-8
encoding, which can be used to implement the wcx_from_os hook
function.
The default of the wcx_from_os function depends on the
usage argument of SP_set_wcx_hooks(). If the value of
usage includes WCX_OS_UTF8, then the function does no
conversion. If the value of usage includes
WCX_OS_8BIT, then the function transforms the string of 8-bit
codes into an UTF-8 encoded string.
Note that the default wcx_from_os functions ignore their
context argument.