Go to the first, previous, next, last section, table of contents.


Full Prolog Syntax

A Prolog program consists of a sequence of sentences or lists of sentences. Each sentence is a Prolog term. How terms are interpreted as sentences is defined below (see section Syntax of Sentences as Terms). Note that a term representing a sentence may be written in any of its equivalent syntactic forms. For example, the 2-ary functor `:-' could be written in standard prefix notation instead of as the usual infix operator.

Terms are written as sequences of tokens. Tokens are sequences of characters which are treated as separate symbols. Tokens include the symbols for variables, constants and functors, as well as punctuation characters such as brackets and commas.

We define below how lists of tokens are interpreted as terms (see section Syntax of Terms as Tokens). Each list of tokens which is read in (for interpretation as a term or sentence) has to be terminated by a full-stop token. Two tokens must be separated by a layout-text token if they could otherwise be interpreted as a single token. Layout-text tokens are ignored when interpreting the token list as a term, and may appear at any point in the token list.

We define below defines how tokens are represented as strings of characters (see section Syntax of Tokens as Character Strings). But we start by describing the notation used in the formal definition of Prolog syntax (see section Notation).

Notation

  1. Syntactic categories (or non-terminals) are written thus: item. Depending on the section, a category may represent a class of either terms, token lists, or character strings.
  2. A syntactic rule takes the general form
    C --> F1 | F2 | F3
    
    which states that an entity of category C may take any of the alternative forms F1, F2, F3, etc.
  3. Certain definitions and restrictions are given in ordinary English, enclosed in { } brackets.
  4. A category written as C... denotes a sequence of one or more Cs.
  5. A category written as ?C denotes an optional C. Therefore ?C... denotes a sequence of zero or more Cs.
  6. A few syntactic categories have names with arguments, and rules in which they appear may contain meta-variables looking thus: X. The meaning of such rules should be clear from analogy with the definite clause grammars (see section Term and Goal Expansion).
  7. In the section describing the syntax of terms and tokens (see section Syntax of Terms as Tokens) particular tokens of the category name are written thus: name, while tokens which are individual punctuation characters are written literally.

Syntax of Sentences as Terms

sentence          --> module : sentence
                   |  list
                         { where list is a list of sentence }
                   |  clause
                   |  directive
                   |  grammar-rule

clause            --> non-unit-clause | unit-clause

directive         --> command | query

non-unit-clause   --> head :- body

unit-clause       --> head
                         { where head is not otherwise a sentence }

command           --> :- body

query             --> ?- body

head              --> module : head
                   |  goal
                         { where goal is not a variable }

body             --> module : body
                   |  body -> body ; body
                   |  body -> body
                   |  \+ body
                   |  body ; body
                   |  body , body
                   |  goal

goal              --> term
                         { where term is not otherwise a body }

grammar-rule      --> gr-head --> gr-body

gr-head           --> module : gr-head
                   |  gr-head , terminals
                   |  non-terminal
                         { where non-terminal is not a variable }

gr-body           --> module : gr-body
                   |  gr-body -> gr-body ; gr-body
                   |  gr-body -> gr-body
                   |  \+ gr-body
                   |  gr-body ; gr-body
                   |  gr-body , gr-body
                   |  non-terminal
                   |  terminals
                   |  gr-condition

non-terminal      --> term
                         { where term is not otherwise a gr-body }

terminals         --> list | string

gr-condition      --> ! | { body }

module            --> atom

Syntax of Terms as Tokens

term-read-in      --> subterm(1200) full-stop

subterm(N)        --> term(M)
                         { where M is less than or equal to N }

term(N)           --> op(N,fx) subterm(N-1)
                         { except in the case of a number }
                         { if subterm starts with a (,
                           op must be followed by layout-text }
                   |  op(N,fy) subterm(N)
                         { if subterm starts with a (,
                           op must be followed by layout-text }
                   |  subterm(N-1) op(N,xfx) subterm(N-1)
                   |  subterm(N-1) op(N,xfy) subterm(N)
                   |  subterm(N) op(N,yfx) subterm(N-1)
                   |  subterm(N-1) op(N,xf)
                   |  subterm(N) op(N,yf)

term(1000)        --> subterm(999) , subterm(1000)

term(0)           --> functor ( arguments )
                         { provided there is no layout-text between
                           the functor and the ( }
                   |  ( subterm(1200) )
                   |  { subterm(1200) }
                   |  list
                   |  string
                   |  constant
                   |  variable

op(N,T)           --> name
                         { where name has been declared as an
                           operator of type T and precedence N }

arguments         --> subterm(999)
                   |  subterm(999) , arguments

list              --> []
                   |  [ listexpr ]

listexpr          --> subterm(999)
                   |  subterm(999) , listexpr
                   |  subterm(999) | subterm(999)

constant          --> atom | number

number            --> unsigned-number
                   |  sign unsigned-number
                   |  sign inf
                   |  sign nan

unsigned-number   --> natural-number | unsigned-float

atom              --> name

functor           --> name

Syntax of Tokens as Character Strings

By default, SICStus Prolog uses the ISO 8859/1 character set standard, but will alternatively support the EUC (Extended UNIX Code) standard. This is governed by the value of the environment variable SP_CTYPE (see section Getting Started).

The character categories used below are defined as follows in the two standards:

layout-char
In ISO 8859/1, these are character codes 0..32 and 127..159. In EUC, these are character codes 0..32 and 127. The common subset includes characters such as TAB, LFD, and SPC.
small-letter
In ISO 8859/1, these are character codes 97..122, 223..246, and 248..255. In EUC, these are character codes 97..122 and 128..255. The common subset includes the letters a through z.
capital-letter
In ISO 8859/1, these are character codes 65..90, 192..214, and 216..222. In EUC, these are character codes 65..90. The common subset is the letters A through Z.
digit
In both standards, these are character codes 48..57, i.e. the digits 0 through 9.
symbol-char
In ISO 8859/1, these are character codes 35, 36, 38, 42, 43, 45..47, 58, 60..64, 92, 94, 96, 126, 160..191, 215, and 247. In EUC, these are character codes 35, 36, 38, 42, 43, 45..47, 58, 60..64, 92, 94, 96, and 126. The common subset is
+-*/\^<>=`~:.?@#$&.
solo-char
In both standards, these are character codes 33 and 59 i.e. the characters ! and ;.
punctuation-char
In both standards, these are character codes 37, 40, 41, 44, 91, 93, and 123..125, i.e. the characters %(),[]{|}.
quote-char
In both standards, these are character codes 34 and 39 i.e. the characters " and '.
underline
In both standards, this is character code 95 i.e. the character _.
token             --> name
                   |  natural-number
                   |  unsigned-float
                   |  variable
                   |  string
                   |  punctuation-char
                   |  layout-text
                   |  full-stop

name              --> quoted-name
                   |  word
                   |  symbol
                   |  solo-char
                   |  [ ?layout-text ]
                   |  { ?layout-text }

quoted-name       --> ' ?quoted-item... '

quoted-item       --> char  { other than ' or \ }
                   |  ''
                   |  \ escape-sequence

word              --> small-letter ?alpha...

symbol            --> symbol-char...
                         { except in the case of a full-stop
                           or where the first 2 chars are /* }

natural-number    --> digit...
                   |  base ' alpha...
                         { where each alpha must be less than the base,
                         treating a,b,... and A,B,... as 10,11,... }
                   |  0 ' char-item
                         { yielding the character code for char }

char-item         --> char  { other than \ }
                   |  \ escape-sequence
  
base              --> digit...  { in the range [2..36] }

unsigned-float    --> simple-float
                   |  simple-float exp exponent

simple-float      --> digit... . digit...

exp               --> e  |  E

exponent          --> digit... | sign digit...

sign              --> - | +

variable          --> underline ?alpha...
                   |  capital-letter ?alpha...

string            --> " ?string-item... "

string-item       --> char  { other than " or \ }
                   |  ""
                   |  \ escape-sequence

layout-text             --> layout-text-item...

layout-text-item        --> layout-char | comment

comment           --> /* ?char... */
                         { where ?char... must not contain */ }
                   |  % ?char... LFD
                         { where ?char... must not contain LFD }

full-stop         --> .
                         { the following token, if any, must be layout-text}

char              --> { any character, i.e. }
                      layout-char
                   |  alpha
                   |  symbol-char
                   |  solo-char
                   |  punctuation-char
                   |  quote-char

alpha             --> capital-letter | small-letter | digit | underline

escape-sequence   --> b        { backspace, character code 8 }
                   |  t        { horizontal tab, character code 9 }
                   |  n        { newline, character code 10 }
                   |  v        { vertical tab, character code 11 }
                   |  f        { form feed, character code 12 }
                   |  r        { carriage return, character code 13 }
                   |  e        { escape, character code 27 }
                   |  d        { delete, character code 127 }
                   |  a        { alarm, character code 7 }
                   |  x alpha alpha
                         {treating a,b,... and A,B,... as 10,11,... }
                               { in the range [0..15], hex character code }
                   |  digit ?digit ?digit 
                               { in the range [0..7], octal character code }
                   |  ^ ?      { delete, character code 127 }
                   |  ^ capital-letter
                   |  ^ small-letter
                               { the control character alpha mod 32 }
                   |  c ?layout-char... { ignored }
                   |  layout-char  { ignored }
                   |  char    { other than the above, represents itself }

Escape Sequences

A backslash occurring inside integers in `0'' notation or inside quoted atoms or strings has special meaning, and indicates the start of an escape sequence. Character escaping can be turned off for compatibility with old code. The following escape sequences exist:

\b
backspace (character code 8)
\t
horizontal tab (character code 9)
\n
newline (character code 10)
\v
vertical tab (character code 11)
\f
form feed (character code 12)
\r
carriage return (character code 13)
\e
escape (character code 27)
\d
\^?
delete (character code 127)
\a
alarm (character code 7)
\xCD
the character code CD (two hexadecimal digits)
\octal
the character code octal base 8, where octal is up to 3 octal digits
\^char
the character code char mod 32, where char is a letter.
\layout-char
A single layout character, for example a newline, is ignored.
\c
All characters up to, but not including, the next non-layout character are ignored.
\other
A character not mentioned in this table stands for itself. For example, `\\' inserts a single backslash and `\'' inserts a single quote.

Notes

  1. The expression of precedence 1000 (i.e. belonging to syntactic category term(1000)) which is written
    X,Y
    
    denotes the term ','(X,Y) in standard syntax.
  2. The parenthesized expression (belonging to syntactic category term(0))
    (X)
    
    denotes simply the term X.
  3. The curly-bracketed expression (belonging to syntactic category term(0))
    {X}
    
    denotes the term {}(X) in standard syntax.
  4. Note that, for example, -3 denotes a number whereas -(3) denotes a compound term which has the 1-ary functor - as its principal functor.
  5. The character " within a string must be written duplicated. Similarly for the character ' within a quoted atom.
  6. Backslashes in strings, quoted atoms, and integers written in `0'' notation denote escape sequences.
  7. A name token declared to be a prefix operator will be treated as an atom only if no term-read-in can be read by treating it as a prefix operator.
  8. A name token declared to be both an infix and a postfix operator will be treated as a postfix operator only if no term-read-in can be read by treating it as an infix operator.


Go to the first, previous, next, last section, table of contents.