Chapter 13. Sorting

LogiQL provides two special types of rules to support the sorting of values: seq, and list. The result of sorting by seq is an array-like representation, where the nth member in the sorted set can be looked up by n. The result of sorting by list is a linked-list-like representation, where members in the sorted set must be navigated to through first- and next-like operations.

The choice between seq and list depends on which representation is more suitable for your problem.

13.1. seq

A seq<<>> rule supports the sorting of a predicate into an array-like functional predicate. Sequence rules have the following form:

Sequence =
  Atom "<-" "seq" "<<" [ Identifier "=" SortKeys ] ">>" Formula "." .

SortKeys = Identifier
         | "(" Identifier { "," Identifier } ")".

Note

The optional text between the pair of angle brackets (<<...>>) is ignored. It is supported for backward compatibility with previous versions of LogiQL.

The formula in the body must consist of one atom. The head atom must contain all the variables that occur in the body atom, and an additional integer variable that does not occur in the body: that variable must be a key in the predicate named by the atom.

If this integer key is the only key of the predicate, then the tuples of the body atom are sorted, and the key indicates the position of each value tuple in the sorted sequence (starting at 0). This is best explained by a couple of simple examples.

Example 13.1. Sorting integers

Let a be the following unary predicate of integers:

a(x) -> int(x).

We can sort the contents of a by using the following sequence clause:

a_seq[i] = x -> int(i), int(x).
a_seq[i] = x <- seq<<>> a(x).

If a contains the tuples {(20), (60), (40)}, then the evaluation of the above rule results in a_seq[0]=20, a_seq[1]=40, and a_seq[2]=60.

The predicate a_seq must be explicitly declared, as in the example above. (The name of the predicate is not important, and may be arbitrary.)

Example 13.2. Sorting a binary predicate

b(x, y) -> string(x), string(y).
b_sort(i; x, y) -> int(i), string(x), string(y).

b_sort(i; x, y) <- seq<< >> b(x, y).

If predicate b contains the tuples {("a", "ab"), ("a", "aa"), ("b", "c")}, then b_sort will contain {(0, "a", "aa"), (1, "a", "ab"), (2, "b", "c")}.

Note the semicolon in b_sort(i; x, y): b_sort is a multi-valued functional predicate (see the section called “Multi-valued functional predicates”). This example illustrates a situation in which such predicates cannot be easily avoided.

The key of the sorted predicate can consist of several variables besides the indexing variable. If it does consist of more than one variable, then the fields that correspond to variables on the left of the indexing variable are sorted, forming groups: the remaining fields within each group are sorted separately.

All this is illustrated by the following examples.

Example 13.3. Sorting and grouping

create --unique

addblock <doc>
   produce(item, kind) -> string(item), string(kind).

   produce("carrot" , "vegetable").
   produce("apple"  , "fruit"    ).
   produce("parsley", "vegetable").
   produce("melon"  , "fruit"    ).
   produce("celery" , "vegetable").
   produce("mango"  , "fruit"    ).

   items(i; x, y) -> int(i), string(x), string(y).
   items(i; x, y) <- seq <<>> produce(x, y).

   by_kind(y, i; x) -> int(i), string(x), string(y).
   by_kind(y, i; x) <- seq <<>> produce(x, y).
</doc>
echo --- produce: ---
print produce
echo --- items: ---
print items
echo --- by_kind: ---
print by_kind

close --destroy

The output is:

created workspace 'unique_workspace_2017-10-07-17-22-41'
added block 'block_4LDQPDSB'
--- produce: ---
"apple"   "fruit"
"carrot"  "vegetable"
"celery"  "vegetable"
"mango"   "fruit"
"melon"   "fruit"
"parsley" "vegetable"
--- items: ---
0 "apple"   "fruit"
1 "carrot"  "vegetable"
2 "celery"  "vegetable"
3 "mango"   "fruit"
4 "melon"   "fruit"
5 "parsley" "vegetable"
--- by_kind: ---
"fruit"     0 "apple"
"fruit"     1 "mango"
"fruit"     2 "melon"
"vegetable" 0 "carrot"
"vegetable" 1 "celery"
"vegetable" 2 "parsley"
deleted workspace 'unique_workspace_2017-10-07-17-22-41'

Example 13.4.  An illustration of the flexibility of seq

create --unique

/////////////
addblock <doc>

b(x, y, z) -> string(x), string(y), string(z).

b("a", "ab", "abc").
b("a", "aa", "bac").
b("b", "cb", "cab").
b("b", "bc", "abc").
b("b", "bc", "aaa").
</doc>
echo --- b:
print b

/////////////
addblock <doc>

c0_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
c1_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
c2_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
c3_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
d0_sort(x, i; y, z)  -> int(i), string(x), string(y), string(z).
d1_sort(i, x; y, z)  -> int(i), string(x), string(y), string(z).
e0_sort[x, y, i] = z -> int(i), string(x), string(y), string(z).
e1_sort[x, i, y] = z -> int(i), string(x), string(y), string(z).

c0_sort(i; x, y, z)  <- seq<<>> b(x, y, z).
c1_sort(i; y, z, x)  <- seq<<>> b(x, y, z).
c2_sort(i; y, x, z)  <- seq<<>> b(x, y, z).
c3_sort(i; z, x, y)  <- seq<<>> b(x, y, z).
d0_sort(x, i; y, z)  <- seq<<>> b(x, y, z).
d1_sort(i, x; y, z)  <- seq<<>> b(x, y, z).
e0_sort[x, y, i] = z <- seq<<>> b(x, y, z).
e1_sort[x, i, y] = z <- seq<<>> b(x, y, z).
</doc>
echo --- c0_sort:
print c0_sort
echo --- c1_sort:
print c1_sort
echo --- c2_sort:
print c2_sort
echo --- c3_sort:
print c3_sort
echo --- d0_sort:
print d0_sort
echo --- d1_sort:
print d1_sort
echo --- e0_sort:
print e0_sort
echo --- e1_sort:
print e1_sort

close --destroy

The output is as follows. It might be instructive to notice the difference between c0_sort and c1_sort, as well as the similarity between d0_sort and e1_sort.

created workspace 'unique_workspace_2016-10-28-00-42-46'
added block 'block_1Z1B39FX'
--- b:
"a" "aa" "bac"
"a" "ab" "abc"
"b" "bc" "aaa"
"b" "bc" "abc"
"b" "cb" "cab"
added block 'block_1Z1DUZCQ'
--- c0_sort:
0 "a" "aa" "bac"
1 "a" "ab" "abc"
2 "b" "bc" "aaa"
3 "b" "bc" "abc"
4 "b" "cb" "cab"
--- c1_sort:
0 "aa" "bac" "a"
1 "ab" "abc" "a"
2 "bc" "aaa" "b"
3 "bc" "abc" "b"
4 "cb" "cab" "b"
--- c2_sort:
0 "aa" "a" "bac"
1 "ab" "a" "abc"
2 "bc" "b" "aaa"
3 "bc" "b" "abc"
4 "cb" "b" "cab"
--- c3_sort:
0 "aaa" "b" "bc"
1 "abc" "a" "ab"
2 "abc" "b" "bc"
3 "bac" "a" "aa"
4 "cab" "b" "cb"
--- d0_sort:
"a" 0 "aa" "bac"
"a" 1 "ab" "abc"
"b" 0 "bc" "aaa"
"b" 1 "bc" "abc"
"b" 2 "cb" "cab"
--- d1_sort:
0 "a" "aa" "bac"
1 "a" "ab" "abc"
2 "b" "bc" "aaa"
3 "b" "bc" "abc"
4 "b" "cb" "cab"
--- e0_sort:
"a" "aa" 0 "bac"
"a" "ab" 0 "abc"
"b" "bc" 0 "aaa"
"b" "bc" 1 "abc"
"b" "cb" 0 "cab"
--- e1_sort:
"a" 0 "aa" "bac"
"a" 1 "ab" "abc"
"b" 0 "bc" "aaa"
"b" 1 "bc" "abc"
"b" 2 "cb" "cab"
deleted workspace 'unique_workspace_2016-10-28-00-42-46'
 

13.2. list

A list rule can be used to sort a set of values into two result predicates, a first, and a next, thus providing a representation that is somewhat similar to a linked list.

A list rule has the following syntactic form:

List =
    Atom "," Atom "<-" "list" "<<" [ GroupBy ] ">>" Formula "." .

GroupBy = "group-by" "(" Identifiers ")" .

A list rule derives into two predicates: one representing the first tuple in the sort, and the other the next relation. If specified, the group-by results in a nested sort, wherein the variables in group-by are sorted first, and the remaining variables sorted within each value of group-by.

Example 13.5. Sorting using list without group-by

a(x) -> int(x).

first_a(x) -> int(x).
next_a(x, y) -> int(x), int(y).

first_a(x), next_a(x, y) <- list<<>> a(x).

This example illustrates the sorting of a into two predicates:

  • first_a, which contains the value of the first element in the sorted set;
  • next_a, which contains binary tuples of the form (x,y), where y is the successor of x in the sorted set.

If a contains {(20), (30), (25)}, then first_a contains {(20)}, and next_a contains {(20, 25), (25, 30)}

The optional group-by parameter can be used to perform sorting by group, in analogy with grouping in aggregate functions.

Example 13.6. Sorting with group-by

b(x, y)         -> int(x), int(y).
first_b(x, y)   -> int(x), int(y).
next_b(x, y, z) -> int(x), int(y), int(z).

first_b(x, y), next_b(x, y, z) <-
    list<< group-by(x) >> b(x, y).

Using group-by(x), the above list rule sorts only the values in the second argument position of b, for each unique x in the first argument position of b.

The predicate first_b contains sorted tuples of the form (x,y), such that

  • (x,y) belongs to b;
  • if b were sorted, (x,y) would be the first tuple in b that has x in the first position.

In other words, y is the first value in the second position in the sorted sequence of tuples that have x in the first position.

The predicate next_b contains tuples of the form (x,y,z) such that (x,z) would immediately follow (x,y) in b, if b were sorted.

If b contains {(1,2), (1,3), (1,4), (2,10), (2,11), (3,20)}, then first_b contains {(1,2), (2,10), (3,20)}, and next_b contains {(1,2,3), (1,3,4), (2,10,11)}.

It is possible to group by more than one variable. For example:

b(x, y, z)         -> int(x), int(y), int(z).
first_b(x, y, z)   -> int(x), int(y), int(z).
next_b(x, y, z, v) -> int(x), int(y), int(z), int(v).

first_b(x, y, z), next_b(x, y, z, v) <-
    list<< group-by(x, y) >> b(x, y, z).

If b contains {(1,2,0), (1,2,1), (1,3,0), (1,4,0), (1,4,1), (2,10,100)}, then first_b contains {(1,2,0), (1,3,0), (1,4,0), (2,10,100)}, and next_b contains {(1,2,0,1), (1,4,0,1)}.

Example 13.7. Sorting with multiple values

An element of the list may contain more than just one value. In the following example we group by the value of x, and additionally store pairs of numbers. next_b(x, y, z, ny, nz) should be read, roughly, as "for this value of x, the next pair after y and z is ny and nz.

b(x, y, z)              -> int(x), int(y), string(z).
first_b(x, y, z)        -> int(x), int(y), string(z).
next_b(x, y, z, ny, nz) -> int(x), int(y), string(z), int(ny), string(nz).

first_b(x, y, z), next_b(x, y, z, ny, nz) <- list<< group-by(x) >> b(x, y, z).

If b contains {(1,2,"3"), (1,3,"4"), (1,4,"5"), (2,10,"12"), (2,11,"13"), (3,20,"23")}, then first_b contains {(1,2,"3"), (2,10,"12"), (3,20,"23")}, and next_b contains {(1,2,"3",3,"4"), (1,3,"4",4,"5"), (2,10,"12",11,"13")}.

The group-by parameters, if present, must be a strict prefix of the key variables of the relation to be sorted. The following does not represent a valid use of group-by and will cause a run-time exception:

first_b(y, x), next_b(y, x, z) <-
   list<< group-by(y) >> b(x, y).

Here is another example that does not satisfy this condition:

c[x, y] = z -> int(x), int(y), int(z).
first_c(x, y, z) -> int(x), int(y), int(z).
next_c(x, y, z, u) -> int(x), int(y), int(z), int(u).

first_c(x, y, z), next_c(x, y, z, u) <-
   list<< group-by(x, y) >> c[x, y] = z.

In this case (x,y) is a prefix of the key variables (x,y) of c, but not a strict prefix. (The rule would therefore be asking the system to sort groups containing exactly one value each, which is a bit silly.)

More generally, the variable order of the first and next predicates must agree with the variable order of the predicate to be sorted. Thus, even with an empty group-by, a rule like the following would be rejected:

first(y, x), next(y, x, u, v) <- list<< >> b(x, y).

The required effect can be achieved by using an auxiliary predicate:

d(y, x) <- b(x, y).
first(y, x), next(y, x, u, v) <- list<< >> d(y, x).

Sorting Entities

Entities cannot be used as sorting variables. (They can, however, be used as group-by arguments.) Thus a rule like the following is illegal:

e(_) -> .
first(x), next(x, y) <- list<< >> e(x).

Note

The diagnostics for such an example might be somewhat difficult to interpret. They would look something like this:
Error: List P2P mapping cannot contain entity types, for P2P mapping defined at block_1Z1B37OB:22(1)--22(38):
Forall x::e,y::e .
first(x),next(x,y) <-
   list<< >>
      e(x).

Entities can be sorted indirectly via their constructor arguments.

Example 13.8. Sorting entities using their constructor predicates

e(_) -> .
cons[x] = y -> string(x), e(y).
lang:constructor(`cons).

arg(x) <- cons[x] = _.
arg:first(x) -> string(x).
arg:next(x, y) -> string(x), string(y).

arg:first(x), arg:next(x, y) <- list<< >> arg(x).

e:first(x) <- arg:first(y), cons[y] = x.
e:next(x, y) <- arg:next(u, v), cons[u] = x, cons[v] = y.