Chapter 20. Hierarchical Syntax

Hierarchical syntax is a notational convenience ("syntactic sugar") that makes it simpler to write formulas and expressions that involve hierarchically structured data. To a first approximation, you can think of hierarchical syntax as a mechanism to avoid writing the same arguments to predicates over and over again. Its use is best illustrated through examples.

Imagine you are inserting information about a new person into the workspace. Without hierarchical syntax you might write some code that looks like the following:

+person(p), +firstname[p]="John", +lastname[p]="Doe",
+street[p]="1384 West Peachtree Street", +city[p]="Atlanta"

If you instead use hierarchical syntax, you can write the same thing, but avoid having to repeatedly use p everywhere:

+person(p) {
  +firstname("John"),
  +lastname("Doe"),
  +street("1384 West Peachtree Street"),
  +city("Atlanta")
}

Based upon the arguments provided and the type of p, the compiler figures out that it must insert p as the first argument of all the atoms between the braces. Internally, the compiler then desugars the hierarchical version of this example to exactly the same logic as the non-hierarchical version. Alternatively, if you prefer to emphasize the functional nature of the predicates, you can write the example as:

+person(p) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
}

Furthermore, p is used only once so you can replace it with an underscore:

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
}

Using an underscore in such situations avoids the need for unique names when creating multiple members of some entity type with their associated data. For example, without hierarchical syntax you would write:

+person(p1),
+firstname[p1]="John",
+lastname[p1]="Doe",
+street[p1]="1384 West Peachtree Street",
+city[p1]="Atlanta",
+person(p2),
+firstname[p2]="Jane",
+lastname[p2]="Doe",
+street[p2]="1384 West Peachtree Street",
+city[p2]="Atlanta"

Here, because it is necessary to distinguish the links between the persons and their associated relationships, we must choose to use distinct variable names, p1 and p2. With hierarchical syntax we can avoid this by just using underscores:

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
},
+person(_) {
  +firstname[]="Jane",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
}

We call the formula just before the curly-braces the "head" of the hierarchical formula and the conjunction of atoms between the curly-braces the "body" of the hierarchical formula. Currently, we only allow conjunctions of atoms as the heads and bodies of hierarchical syntax. We may relax this restriction in the future based upon some additional study and user-provided use cases.

Here is a small example of how we can simplify some code using a conjunction of atoms in the head of a hierarchical formula. First we define a small schema:

person(p) ->.
car(c) ->.
name[p] = s -> person(p), string(s).
brand[c] = s -> car(c), string(s).
driven_by(c, p) -> car(c), person(p).

Without hierarchical syntax you might have written logic like the following:

person(p),
car(c),
name[p] = "Prefect",
brand[c] = "Ford",
driven_by(c, p)

With hierarchical syntax this can be written as the following formula:

(person(p), car(c)) {
  name[] = "Prefect",
  brand[] = "Ford",
  driven_by()
}

Again, based upon the arguments you have supplied in the hierarchical body and the types of p and c in the hierarchical head, the compiler determines that a use of p must be inserted into name, that a use of c must be inserted into brand, and that both c and p must be inserted into driven_by.

Inside hierarchical formulas we allow the use of hierarchical expressions. A hierarchical expression looks just like a hierarchical formula, but may be written anywhere we can write an expression like x, or foo[y]. The only restriction is that the head of a hierarchical expression must be a single atom denoting an entity. For example, you can use a hierarchical expression to simplify the following code:

+person(p),
+firstname[p] = "John",
+lastname[p] = "Doe",
+address(a),
+home[p] = a,
+street[a] = "1384 West Peachtree Street",
+city[a] = "Atlanta"

by writing

+person(_) {
  +firstname[] = "John",
  +lastname[] = "Doe",
  +home[] = +address(_) {
              +street[]="1384 West Peachtree Street"
              +city[]="Atlanta"
            }
}

When discussing aspects of hierarchical syntax that are not specific to just hierarchical formulas or just hierarchical expressions, we will use the generic term "a hierarchical".

There are limits to the compiler's ability to determine how to interpret hierarchical syntax. Given a hierarchical, the first thing the compiler does is collect a set of what we call "candidate" expressions from the head. Currently, candidate expressions can be only variables, literals, or integer expressions composed of literal values. For example, given the hierarchical

(person(p), age[p] = 42) { ... }

the candidate expressions would be p and 42. To ensure that there is always a unique interpretation of a hierarchical, we require that the types of the candidate expressions be disjoint. Roughly, you can understand disjoint to mean that the type of a candidate expression cannot be a subtype of another candidate expression or vice versa. For example, the compiler will disallow the code fragment

(person(p1), person(p2)) { ... }

because p1 and p2 both have the type person and therefore the compiler cannot decide when it should choose to insert a use of p1 rather than a use of p2. Writing the above logic would cause a HIER_AMBIGUOUS_HEAD_BINDING error to be reported.

Once the compiler has determined an unambiguous set of candidate expressions, it will start examining the atoms in the body of the hierarchical. If an atom in the hierarchical's body already has the correct number of arguments for the defined predicate, the atom will not be modified.

If the atom has fewer user-supplied arguments than expected, the compiler will begin the process of resolving the arguments. To make it easier to understand how this process works we will use a contrived example. Consider the following schema:

a(x) ->.
b(x) ->.
c(x) ->.
d(x) ->.
e(x) ->.
f(x) ->.

foo[x, y, z, w] = u -> a(x), b(y), c(z), d(w), e(u).

We will now step through the process used to resolve the arguments to the atom foo in the following logic:

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w, u] = v
  }, b(w), d(u), e(v).

The first thing the compiler does is note that the value argument of the atom has already been specified. Therefore, the compiler will ignore the value argument for the rest of the resolution process.

Next, the compiler will start by simply ignoring the key arguments the user has supplied. We can visualize the compiler's current information about the types of the foo atom's arguments as follows:

foo[•, •, •, •] = v

Here • represents a "hole", i.e., an argument position still to be filled.

The compiler will now fill in all of the argument positions whose expected types are not disjoint from the candidate expressions. In this example, the candidate expressions are x, y, and z with types a, c, and f respectively. Because the first argument of foo should be an expression of type a and the third argument of foo should be an expression of type c, the compiler will insert x and y into these positions:

foo[x, •, y, •] = v

It is worth emphasizing that the compiler will only ever insert a candidate expression into a single hole, and that candidate expressions may go unused (z in our example).

Next, the compiler will take the two arguments that the user supplied to foo and fill them into the remaining holes left to right. This means w will be inserted in the second argument and u into the fourth argument:

foo[x, w, y, u] = v

Because all argument positions of the predicate foo are now filled, the resolution process is considered successful.

However, if we had started with a slightly different example, resolution could have failed at various points. For example, if the user had written

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w] = v
  }, b(w), d(u), e(v).

then the compiler would still have filled in the candidate expressions as follows:

foo[x, •, y, •] = v

However, the compiler would then notice that the user has supplied only one key argument, w, while there are still two holes to be filled. It would report this as a HIER_ATOM_TOO_FEW_SUPPLIED error.

Another error would be detected if the user had written

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w, u, u] = v
  }, b(w), d(u), e(v).

Again, the compiler would begin by inserting the two candidate expressions:

foo[x, •, y, •] = v

This time, the compiler would notice that the user supplied three key arguments (w, u, u), while there are only two holes to be filled. This would be reported as a HIER_ATOM_TOO_MANY_SUPPLIED error.

The resolution process could have also failed if foo had been declared differently. For example, if foo had been declared as

foo[x, y, z, w] = u -> a(x), a(y), c(z), d(w), e(u).

and then the user wrote the logic

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[u] = v
  }, b(w), d(u), e(v).

the resolution process would fail. The reason is that the compiler cannot determine whether to insert the x as the first argument or as the second argument of foo:

foo[•, •, •, •] = v

Again, this is because a candidate expression will only be inserted into a single hole per atom. This kind of failure will be reported as a HIER_AMBIGUOUS_BODY_ATOM error.

Finally, it should be noted that the value argument has a special status in the resolution of a functional predicate. If foo had been declared as:

foo[x, y, z, w] = u -> a(x), b(y), c(z), d(w), a(u).

the compiler would be able to resolve

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w, u] = x
  }, b(w), d(u), e(v).

because, when the candidate expressions would be about to be filled in, the value argument would already have been determined:

      foo[•, •, •, •] = x

At this point there is only one hole where x could be inserted and satisfy the typing requirements. So there is no ambiguity. It is very important to understand that the compiler would be able to resolve this example only because it had the syntactic hint that x was to be used as the value argument. If the logic had been written as

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo(w, u, x)
  }, b(w), d(y), e(v).

then just before the insertion of candidate expressions the atom would look like:

  foo[•, •, •, •] = •

This is because the compiler would no longer know that x is intended to be the value argument, so there would now be a hole in the position of the value argument. Furthermore, the candidate expression x could be inserted into two possible holes, the first argument and the value argument. Consequently, the compiler would report an ambiguity error (HIER_AMBIGUOUS_BODY_ATOM).

Syntax.  Hierarchical syntax extends the language with the following grammatical constructions:

Hierarchical = HierHead "{" HierBody "}" .

HierHead = Atom
         | "(" HierHeadConjunction ")" .

HierHeadConjunction = Atom { "," Atom } .

HierBody = HierAtom { "," HierAtom } .

HierAtom = [ Delta ] PredicateName [ "@" Stage ] "(" HierExpressions ")"
         | [ Delta ] PredicateName [ "@" Stage ] "[" HierExpressions "]" "="
              HierExpression .

HierExpressions = HierExpression { "," HierExpression } .

HierExpression = Hierarchical | Expression .