Node:WCX Hooks, Next:WCX Foreign Interface, Previous:WCX Environment Variables, Up:Handling Wide Characters
Users can have complete control over the way wide characters are handled by SICStus Prolog if they supply their own definitions of appropriate hook functions. A set of such functions, implementing a specific environment for handling wide characters is called a WCX box. A sample WCX box is described below (see A Sample WCX Box).
Plugging-in of the WCX hook functions can be performed by calling
void SP_set_wcx_hooks ( int usage, SP_WcxOpenHook *wcx_open, SP_WcxCloseHook *wcx_close, SP_WcxCharTypeHook *wcx_chartype, SP_WcxConvHook *wcx_from_os, SP_WcxConvHook *wcx_to_os);
The effect of SP_set_wcx_hooks()
is controlled by the value of
usage
. The remaining arguments are pointers to appropriate hook
functions or NULL
values, the latter implying that the hook
should take some default value.
There are three independent aspects to be controlled, and usage
should be supplied as a bitwise OR of chosen constant names for each
aspect. The defaults have value 0, so need not be included.
The aspects are the following:
This decides the default behavior of the wcx_open
and
wcx_chartype
hook functions (if both are supplied by the user,
the choice of the default is irrelevant). The possible values are:
WCX_USE_LATIN1
(default)
WCX_USE_UTF8
WCX_USE_EUC
iso_8859_1
, utf8
, and euc
,
respectively; see WCX Environment Variables.
The flags below determine what function to use for conversion from/to
the operating system encoding, if such functions are not supplied by
the user through the wcx_from_os
and wcx_to_os
arguments (if both are supplied by the user, the choice of default is
irrelevant).
WCX_OS_8BIT
(default)
WCX_OS_UTF8
This is important if some of the conversion functions
(wcx_from_os
, wcx_to_os
, and wcx_getc
,
wcx_putc
, see later) are user-defined. In such cases it may
be beneficial for the user to inform SICStus Prolog whether the
supplied encoding functions preserve ASCII characters. (The default
encodings do preserve ASCII.)
WCX_PRESERVES_ASCII
(default)
WCX_CHANGES_ASCII
We now describe the role of the arguments following usage
in the
argument list of SP_set_wcx_hooks()
.
SP_WcxOpenHook *wcx_open
typedef void (SP_WcxOpenHook)
(SP_stream *s, SP_atom option, int context);
This function is called by SICStus Prolog for each s
stream
opened, except when the encoding to be used for the stream is
pre-specified (binary files, files opened using the wci
option, and the C streams created with contexts
SP_STREAMHOOK_WCI
and SP_STREAMHOOK_BIN
).
The main task of the wcx_open
hook is to associate the two
WCX-processing functions with the stream, by storing them in the
appropriate fields of the SP_stream
data structure:
SP_WcxGetcHook *wcx_getc; SP_WcxPutcHook *wcx_putc;
These fields are pointers to the functions performing the external decoding and encoding as described below. They are initialized to functions that truncate to 8 bits on output and zero-extend to 31 bits on input.
SP_WcxGetcHook *wcx_getc
typedef int (SP_WcxGetcHook)
(int first_byte, SP_stream *s, long *pbyte_count);
This function is generally invoked whenever a character has to
be read from a stream. Before invoking this function, however, a
byte is read from the stream by SICStus Prolog itself. If the
byte read is an ASCII character (its value is < 128), and
WCX_PRESERVES_ASCII
is in force, then the byte read is
deemed to be the next character code, and wcx_getc
is not
invoked. Otherwise, wcx_getc
is invoked with the byte and
stream in question and is expected to return the next character
code.
The wcx_getc
function may need to read additional bytes
from the stream, if first byte
signifies the start of a
multi-byte character. A byte may be read from the stream
s
in the following way:
byte = s->sgetc((long)s->user_handle);
The wcx_getc
function is expected to increment its
*pbyte_count
argument by 1 for each such byte read.
The default wcx_open
hook will install a wcx_getc
function according to the usage
argument. The three
default external decoding functions are also available to users
through the SP_wcx_getc()
function (see WCX Utility Functions).
SP_WcxPutcHook *wcx_putc
typedef int (SP_WcxPutcHook)
(int char_code, SP_stream *s, long *pbyte_count);
This function is generally invoked whenever a character has to
be written to a stream. However, if the character code to be
written is an ASCII character (its value is < 128), and
WCX_PRESERVES_ASCII
is in force, then the code is written
directly on the stream, and wcx_putc
is not
invoked. Otherwise, wcx_putc
is invoked with the
character code and stream in question and is expected to do
whatever is needed to output the character code to the stream.
This will require outputting one or more bytes to the stream. A
byte byte
can be written to the stream s
in the
following way:
return_code = s->sputc(byte,(long)s->user_handle);
The wcx_putc
function is expected to return the return
value of the last invocation of s->sputc
, or -1 as an
error code, if incapable of outputting the character code. The
latter may be the case, for example, if the code to be output
does not belong to the character code set in force. It is also
expected to increment its *pbyte_count
argument by 1 for
each byte written.
The default wcx_open
hook function will install a
wcx_putc
function according to the usage
argument. The three default external encoding functions are also
available to users through the SP_wcx_putc()
function
(see WCX Utility Functions).
In making a decision regarding the selection of these WCX-processing
functions, the context
and option
arguments of the
wcx_open
hook can be used.
The option
argument is an atom.
The context
argument
encodes the context of invocation. It is one of the following
values
SP_STREAMHOOK_STDIN
SP_STREAMHOOK_STDOUT
SP_STREAMHOOK_STDERR
SP_STREAMHOOK_OPEN
open
SP_STREAMHOOK_NULL
open_null_stream
SP_STREAMHOOK_LIB
SP_STREAMHOOK_C, SP_STREAMHOOK_C+1, ...
SP_make_stream()
The option
argument comes from the user and it can carry some
WCX-related information to be associated with the stream opened. For
example, this can be used to implement a scheme supporting multiple
encodings, supplied on a stream-by-stream basis, as shown in the
sample WCX-box (see A Sample WCX Box).
If the stream is opened from Prolog code, the option
argument
for this hook function is derived from the wcx(Option)
option of open/4
and load_files/2
. If this option is
not present, or the stream is opened using some other built-in, then
the value of the wcx
prolog flag will be passed on to the
open hook.
If the stream is opened from C, via SP_make_stream()
, then the
option argument will be the value of the prolog flag
wcx
.
There is also a variant of SP_make_stream()
, called
SP_make_stream_context()
which takes two additional arguments,
the option and the context, to be passed on to the wcx_open
hook (see WCX Foreign Interface).
The wcx_open
hook can associate the information derived from
option
with the stream in question using a new field in the
SP_stream
data structure: void *wcx_info
, initialized
to NULL
. If there is more information than can be stored in
this field, or if the encoding to be implemented requires keeping
track of a state, then the wcx_open
hook should allocate
sufficient amount of memory for storing the information and/or the
state, using SP_malloc()
, and deposit a pointer to that piece
of memory in wcx_info
.
The default wcx_open
hook function ignores its option
and context
arguments and sets the wcx_getc
and
wcx_putc
stream fields to functions performing the external
decoding and encoding according to the usage
argument of
SP_set_wcx_hooks()
.
SP_WcxCloseHook *wcx_close
typedef void (SP_WcxCloseHook) (SP_stream *s);
This hook function is called whenever a stream is closed, for which
the wcx_open
hook was invoked at its creation. The argument
s
points to the stream being closed. It can be used to
implement the closing activities related to external encoding,
e.g. freeing any memory allocated in wcx_open
hook.
The default wcx_close
hook function does nothing.
SP_WcxCharTypeHook *wcx_chartype
typedef int (SP_WcxCharTypeHook) (int char_code);
This function should be prepared to take any char_code
>= 128
and return one of the following constants:
CHT_LAYOUT_CHAR
CHT_SMALL_LETTER
CHT_CAPITAL_LETTER
CHT_SYMBOL_CHAR
CHT_SOLO_CHAR
Regarding the meaning of these syntactic categories, see Token String.
The value returned by this function is not expected to change over
time, therefore, for efficiency reasons, its behavior is cached.
The cache is cleared by SP_set_wcx_hooks()
.
As a help in implementing this function, SICStus Prolog provides the
function SP_latin1_chartype()
, which returns the character type
category for the codes 1..255 according to the ISO 8859/1
standard.
Note that if a character code >= 512 is categorized as a
layout-char, and a character with this code occurs within an
atom being written out in quoted form (e.g. using writeq
) in
native sicstus
mode (as opposed to iso
mode), then
this code will be output as itself, rather than an octal escape
sequence. This is because in sicstus
mode escape sequences
consist of at most 3 octal digits.
SP_WcxConvHook *wcx_to_os
typedef char* (SP_WcxConvHook) (char *string, int context);
This function is normally called each time SICStus Prolog wishes to
communicate a string of possibly wide characters to the operating
system. However, if the string in question consists of ASCII
characters only, and WCX_PRESERVES_ASCII
is in force, then
wcx_to_os
may not be called, and the original string may be
passed to the operating system.
The first argument of wcx_to_os
is a zero terminated string,
using the internal encoding of SICStus Prolog, namely UTF-8. The
function is expected to convert the string to a form required by the
operating system, in the context described by the second,
context
argument, and to return the converted string. If no
conversion is needed, it should simply return its first argument.
Otherwise, the conversion should be done in a memory area controlled
by this function (preferably a static buffer, reused each time the
function is called).
The second argument specifies the context of conversion. It can be one of the following integer values:
WCX_FILE
WCX_OPTION
WCX_WINDOW_TITLE
WCX_C_CODE
SICStus Prolog provides a utility function SP_wci_code()
, see
below, for obtaining a wide character code from a UTF-8 encoded
string, which can be used to implement the wcx_to_os
hook
function.
The default of the wcx_to_os
function depends on the
usage
argument of SP_set_wcx_hooks()
. If the value of
usage
includes WCX_OS_UTF8
, then the function does no
conversion, as the operating system uses the same encoding as
SICStus Prolog. If the value of usage
includes
WCX_OS_8BIT
, then the function decodes the UTF-8 encoded
string and converts this sequence of codes into a sequence of bytes
by truncating each code to 8 bits.
Note that the default wcx_to_os
functions ignore their
context
argument.
SP_WcxConvHook *wcx_from_os
typedef char* (SP_WcxConvHook) (char *string, int context);
This function is called each time SICStus Prolog receives from the
operating system a zero terminated sequence of bytes possibly
encoding a wide character string. The function is expected to
convert the byte sequence, if needed, to a string in the internal
encoding of SICStus Prolog (UTF-8), and return the converted
string. The conversion should be done in a memory area controlled by
this function (preferably a static buffer, reused each time the
function is called, but different from the buffer used in
wcx_to_os
).
The second argument specifies the context of conversion, as in the
case of wcx_to_os
.
SICStus Prolog provides a utility function SP_code_wci()
, see
below, for converting a character code (up to 31 bits) into UTF-8
encoding, which can be used to implement the wcx_from_os
hook
function.
The default of the wcx_from_os
function depends on the
usage
argument of SP_set_wcx_hooks()
. If the value of
usage
includes WCX_OS_UTF8
, then the function does no
conversion. If the value of usage
includes
WCX_OS_8BIT
, then the function transforms the string of 8-bit
codes into an UTF-8 encoded string.
Note that the default wcx_from_os
functions ignore their
context
argument.