### 10.1 An Aggregation Operator for Data-Base-Style Queries—`library(aggregate)`

Data base query languages usually provide so-called "aggregation" operations. Given a relation, aggregation specifies

• a column of the relation
• an operation, one of {sum,max,min,ave,var} or more

One might, for example, ask

```         PRINT DEPT,SUM(AREA) WHERE OFFICE(_ID,DEPT,AREA,_OCCUPANT)
```

and get a table of <Department,TotalArea> pairs. The Prolog equivalent of this might be

```         dept_office_area(Dept, TotalArea) :-
aggregate(sum(Area),
I^O^office(I,Dept,Area,O), TotalArea).
```

where Area is the column and `sum(_)` is the aggregation operator. We can also ask who has the smallest office in each department:

```         smallest_office(Dept, Occupant) :-
aggregate(min(Area),
I^O^office(I,Dept,Area,O), MinArea),
office(_, Dept, MinArea, Occupant).
```

This module provides an aggregation operator in Prolog:

```             aggregate(Template, Generator, Results)
```

where:

• Template is operator(expression) or constructor(arg,...,arg)
• each arg is operator(expression)
• operator is `sum | min | max` {for now}
• expression is an arithmetic expression

Results is unified with a form of the same structure as Template.

Things like mean and standard deviation can be calculated from sums, e.g. to find the average population of countries (defined as "if you sampled people at random, what would be the mean size of their answers to the question 'what is the population of your country?'?") we could do

```     ?-  aggregate(x(sum(Pop),sum(Pop*Pop)),
Country^population(Country,Pop),
x(People,PeopleTimesPops)),
AveragePop is PeopleTimesPops/People.
```

Note that according to this definition, `aggregate/3` FAILS if there are no solutions. For `max(_)`, `min(_)`, and many other operations (such as `mean(_)`) this is the only sensible definition (which is why `bagof/3` works that way). Even if bagof/3 yielded an empty list, aggregate/3 would still fail.

Concerning the minimum and maximum, it is convenient at times to know Which term had the minimum or maximum value. So we write

```         min(Expression, Term)
max(Expression, Term)
```

and in the constructed term we will have

```         min(MinimumValue, TermForThatValue)
max(MaximumValue, TermForThatValue)
```

So another way of asking who has the smallest office is

```         smallest_office(Dept, Occupant) :-
aggregate(min(Area,O),
I^office(I,Dept,Area,O), min(_,Occupant)).
```

Consider queries like

```         aggregate(sum(Pay), Person^pay(Person,Pay), TotalPay)
```

where for some reason `pay/2` might have multiple solutions. (For example, someone might be listed in two departments.) We need a way of saying "treat identical instances of the Template as a single instance, UNLESS they correspond to different instances of a Discriminator." That is what

```         aggregate(Template, Discriminator, Generator, Results)
```

does.

Operations available:

`count`
`sum(1)`
`sum(`E`)`
sum of values of E
`min(`E`)`
minimum of values of E
`min(`E`,`X`)`
`min(`E`)` with corresponding instance of X
`max(`E`)`
maximum of values of E
`max(`E`,`X`)`
`max(`E`)` with corresponding instance of X
`set(`X`)`
ordered set of instances of X
`bag(`X`)`
list of instances of X in generated order.
```     bagof(X, G, B) :- aggregate(bag(X),    G, L).
setof(X, G, B) :- aggregate(set(X), X, G, L).
```

Exported predicates:

`forall(`:Generator`, `:Goal`)`
succeeds when Goal is provable for each true instance of Generator. Note that there is a sort of double negation going on in here (it is in effect a nested pair of failure-driven loops), so it will never bind any of the variables which occur in it.
`foreach(`:Generator`, `:Goal`)`
for each proof of Generator in turn, we make a copy of Goal with the appropriate substitution, then we execute these copies in sequence. For example, `foreach(between(1,3,I), p(I))` is equivalent to `p(1), p(2), p(3)`.

Note that this is not the same as `forall/2`. For example, `forall(between(1,3,I), p(I))` is equivalent to `\+ \+ p(1), \+ \+ p(2), \+ \+ p(3)`.

The trick in `foreach/2` is to ensure that the variables of Goal which do not occur in Generator are restored properly. (If there are no such variables, you might as well use `forall/2`.)

Like `forall/2`, this predicate does a failure-driven loop over the Generator. Unlike `forall/2`, the Goals are executed as an ordinary conjunction, and may succeed in more than one way.

`aggregate(`+Template`, `+Discriminator`, `:Generator`, `-Result`)`
is a generalisation of `setof/3` which lets you compute sums, minima, maxima, and so on.
`aggregate(`+Template`, `:Generator`, `-Result`)`
is a generalisation of `findall/3` which lets you compute sums, minima, maxima, and so on.
`aggregate_all(+Template, +Discriminator, :Generator, -Result)`
is like `aggregate/4` except that it will find at most one solution, and does not bind free variables in the Generator.
`aggregate_all(+Template, :Generator, -Result)`
is like `aggregate/3` except that it will find at most one solution, and does not bind free variables in the Generator.
`free_variables(+Goal, +Bound, +Vars0, -Vars)`
binds Vars to the union of Vars0 with the set of free variables in Goal, that is the set of variables which are captured neither by Bound nor by any internal quantifiers or templates in Goal. We have to watch out for `setof/3` and `bagof/3` themselves, for the explicit existential quantifier Vars`^`Goal, and for things like `\+(_)` which might look as though they bind variables but can't.
`term_variables(`+Term`, `+Vars0`, `-Vars`)`
binds Vars to a union of Vars0 and the variables which occur in Term. This doesn't take quantifiers into account at all.

Send feedback on this subject.