10.1 An Aggregation Operator for Data-Base-Style Queries—library(aggregate)

Data base query languages usually provide so-called "aggregation" operations. Given a relation, aggregation specifies

One might, for example, ask

         PRINT DEPT,SUM(AREA) WHERE OFFICE(_ID,DEPT,AREA,_OCCUPANT)

and get a table of <Department,TotalArea> pairs. The Prolog equivalent of this might be

         dept_office_area(Dept, TotalArea) :-
                 aggregate(sum(Area),
                     I^O^office(I,Dept,Area,O), TotalArea).

where Area is the column and sum(_) is the aggregation operator. We can also ask who has the smallest office in each department:

         smallest_office(Dept, Occupant) :-
                 aggregate(min(Area),
                         I^O^office(I,Dept,Area,O), MinArea),
                 office(_, Dept, MinArea, Occupant).

This module provides an aggregation operator in Prolog:

             aggregate(Template, Generator, Results)

where:

Results is unified with a form of the same structure as Template.

Things like mean and standard deviation can be calculated from sums, e.g. to find the average population of countries (defined as "if you sampled people at random, what would be the mean size of their answers to the question 'what is the population of your country?'?") we could do

     ?-  aggregate(x(sum(Pop),sum(Pop*Pop)),
                   Country^population(Country,Pop),
                   x(People,PeopleTimesPops)),
         AveragePop is PeopleTimesPops/People.

Note that according to this definition, aggregate/3 FAILS if there are no solutions. For max(_), min(_), and many other operations (such as mean(_)) this is the only sensible definition (which is why bagof/3 works that way). Even if bagof/3 yielded an empty list, aggregate/3 would still fail.

Concerning the minimum and maximum, it is convenient at times to know Which term had the minimum or maximum value. So we write

         min(Expression, Term)
         max(Expression, Term)

and in the constructed term we will have

         min(MinimumValue, TermForThatValue)
         max(MaximumValue, TermForThatValue)

So another way of asking who has the smallest office is

         smallest_office(Dept, Occupant) :-
                 aggregate(min(Area,O),
                         I^office(I,Dept,Area,O), min(_,Occupant)).

Consider queries like

         aggregate(sum(Pay), Person^pay(Person,Pay), TotalPay)

where for some reason pay/2 might have multiple solutions. (For example, someone might be listed in two departments.) We need a way of saying "treat identical instances of the Template as a single instance, UNLESS they correspond to different instances of a Discriminator." That is what

         aggregate(Template, Discriminator, Generator, Results)

does.

Operations available:

count
sum(1)
sum(E)
sum of values of E
min(E)
minimum of values of E
min(E,X)
min(E) with corresponding instance of X
max(E)
maximum of values of E
max(E,X)
max(E) with corresponding instance of X
set(X)
ordered set of instances of X
bag(X)
list of instances of X in generated order.
     bagof(X, G, B) :- aggregate(bag(X),    G, L).
     setof(X, G, B) :- aggregate(set(X), X, G, L).

Exported predicates:

forall(:Generator, :Goal)
succeeds when Goal is provable for each true instance of Generator. Note that there is a sort of double negation going on in here (it is in effect a nested pair of failure-driven loops), so it will never bind any of the variables which occur in it.
foreach(:Generator, :Goal)
for each proof of Generator in turn, we make a copy of Goal with the appropriate substitution, then we execute these copies in sequence. For example, foreach(between(1,3,I), p(I)) is equivalent to p(1), p(2), p(3).

Note that this is not the same as forall/2. For example, forall(between(1,3,I), p(I)) is equivalent to \+ \+ p(1), \+ \+ p(2), \+ \+ p(3).

The trick in foreach/2 is to ensure that the variables of Goal which do not occur in Generator are restored properly. (If there are no such variables, you might as well use forall/2.)

Like forall/2, this predicate does a failure-driven loop over the Generator. Unlike forall/2, the Goals are executed as an ordinary conjunction, and may succeed in more than one way.

aggregate(+Template, +Discriminator, :Generator, -Result)
is a generalisation of setof/3 which lets you compute sums, minima, maxima, and so on.
aggregate(+Template, :Generator, -Result)
is a generalisation of findall/3 which lets you compute sums, minima, maxima, and so on.
aggregate_all(+Template, +Discriminator, :Generator, -Result)
is like aggregate/4 except that it will find at most one solution, and does not bind free variables in the Generator.
aggregate_all(+Template, :Generator, -Result)
is like aggregate/3 except that it will find at most one solution, and does not bind free variables in the Generator.
free_variables(+Goal, +Bound, +Vars0, -Vars)
binds Vars to the union of Vars0 with the set of free variables in Goal, that is the set of variables which are captured neither by Bound nor by any internal quantifiers or templates in Goal. We have to watch out for setof/3 and bagof/3 themselves, for the explicit existential quantifier Vars^Goal, and for things like \+(_) which might look as though they bind variables but can't.
term_variables(+Term, +Vars0, -Vars)
binds Vars to a union of Vars0 and the variables which occur in Term. This doesn't take quantifiers into account at all. Could be defined as:
          term_variables(Term, Vars0, Vars) :-
          	nonvar(Term), !,
          	(   foreacharg(Arg,Term),
          	    fromto(Vars0,S0,S,Vars)
          	do  term_variables(Arg, S0, S)
          	).
          term_variables(Term, Vars0, [Term|Vars0]) :-
          	(   foreach(X,Vars0),
          	    param(Term)
          	do  X\==Term
          	), !.
          term_variables(_, Vars, Vars).
     

Send feedback on this subject.