Chapter 22. Hierarchical Syntax
Hierarchical syntax is a notational convenience ("syntactic sugar") that makes it simpler to write formulas and expressions that involve hierarchically structured data. To a first approximation, you can think of hierarchical syntax as a mechanism to avoid writing the same arguments to predicates over and over again. Its use is best illustrated through examples.
Imagine you are inserting information about a new person into the workspace. Without hierarchical syntax you might write some code that looks like the following:
+person(p), +firstname[p]="John", +lastname[p]="Doe", +street[p]="1384 West Peachtree Street", +city[p]="Atlanta"
If you instead use hierarchical syntax, you can write the same
thing, but avoid having to repeatedly use p
everywhere:
+person(p) { +firstname("John"), +lastname("Doe"), +street("1384 West Peachtree Street"), +city("Atlanta") }
Based upon the arguments provided and the type of p
, the
compiler figures out that it must insert p
as the first
argument of all the atoms between the braces. Internally, the
compiler then desugars the hierarchical version of this example
to exactly the same logic as the non-hierarchical version.
Alternatively, if you prefer to emphasize the functional nature of
the predicates, you can write the example as:
+person(p) { +firstname[]="John", +lastname[]="Doe", +street[]="1384 West Peachtree Street", +city[]="Atlanta" }
Furthermore, p
is used only once so you
can replace it with an underscore:
+person(_) { +firstname[]="John", +lastname[]="Doe", +street[]="1384 West Peachtree Street", +city[]="Atlanta" }
Using an underscore in such situations avoids the need for unique names when creating multiple members of some entity type with their associated data. For example, without hierarchical syntax you would write:
+person(p1), +firstname[p1]="John", +lastname[p1]="Doe", +street[p1]="1384 West Peachtree Street", +city[p1]="Atlanta", +person(p2), +firstname[p2]="Jane", +lastname[p2]="Doe", +street[p2]="1384 West Peachtree Street", +city[p2]="Atlanta"
Here, because it is necessary to distinguish the links between
the persons and their associated relationships, we must choose
to use distinct variable names, p1
and p2
.
With
hierarchical syntax we can avoid this by just using underscores:
+person(_) { +firstname[]="John", +lastname[]="Doe", +street[]="1384 West Peachtree Street", +city[]="Atlanta" }, +person(_) { +firstname[]="Jane", +lastname[]="Doe", +street[]="1384 West Peachtree Street", +city[]="Atlanta" }
We call the formula just before the curly-braces the "head" of the hierarchical formula and the conjunction of atoms between the curly-braces the "body" of the hierarchical formula. Currently, we only allow conjunctions of atoms as the heads and bodies of hierarchical syntax. We may relax this restriction in the future based upon some additional study and user-provided use cases.
Here is a small example of how we can simplify some code using a conjunction of atoms in the head of a hierarchical formula. First we define a small schema:
person(p) ->. car(c) ->. name[p] = s -> person(p), string(s). brand[c] = s -> car(c), string(s). driven_by(c, p) -> car(c), person(p).
Without hierarchical syntax you might have written logic like the following:
person(p), car(c), name[p] = "Prefect", brand[c] = "Ford", driven_by(c, p)
With hierarchical syntax this can be written as the following formula:
(person(p), car(c)) { name[] = "Prefect", brand[] = "Ford", driven_by() }
Again, based upon the arguments you have supplied in the hierarchical body
and the types of p
and c
in the hierarchical
head, the compiler determines that a use of p
must be
inserted into name
, that a use of c
must be
inserted into brand
, and that both c
and
p
must be inserted into driven_by
.
Inside hierarchical formulas we allow the use of
hierarchical expressions.
A hierarchical expression
looks just like a hierarchical formula, but may be written
anywhere we can write an expression like x
, or
foo[y]
. The
only restriction is that the head of a hierarchical expression
must be a single atom denoting an entity. For example, you can
use a hierarchical expression to simplify the following code:
+person(p), +firstname[p] = "John", +lastname[p] = "Doe", +address(a), +home[p] = a, +street[a] = "1384 West Peachtree Street", +city[a] = "Atlanta"
by writing
+person(_) { +firstname[] = "John", +lastname[] = "Doe", +home[] = +address(_) { +street[]="1384 West Peachtree Street" +city[]="Atlanta" } }
When discussing aspects of hierarchical syntax that are not specific to just hierarchical formulas or just hierarchical expressions, we will use the generic term "a hierarchical".
There are limits to the compiler's ability to determine how to interpret hierarchical syntax. Given a hierarchical, the first thing the compiler does is collect a set of what we call "candidate" expressions from the head. Currently, candidate expressions can be only variables, literals, or integer expressions composed of literal values. For example, given the hierarchical
(person(p), age[p] = 42) { ... }
the candidate expressions would be p
and 42
. To
ensure that
there is always a unique interpretation of a hierarchical, we
require that the types of the candidate expressions be
disjoint. Roughly, you can understand disjoint to mean that the
type of a candidate expression cannot be a subtype of another
candidate expression or vice versa. For example, the compiler
will disallow the code fragment
(person(p1), person(p2)) { ... }
because p1
and p2
both have the type
person
and therefore
the compiler cannot decide when it should choose to insert a use
of p1
rather than a use of p2
.
Writing the above logic
would cause a HIER_AMBIGUOUS_HEAD_BINDING error to be reported.
Once the compiler has determined an unambiguous set of candidate expressions, it will start examining the atoms in the body of the hierarchical. If an atom in the hierarchical's body already has the correct number of arguments for the defined predicate, the atom will not be modified.
If the atom has fewer user-supplied arguments than expected, the compiler will begin the process of resolving the arguments. To make it easier to understand how this process works we will use a contrived example. Consider the following schema:
a(x) ->. b(x) ->. c(x) ->. d(x) ->. e(x) ->. f(x) ->. foo[x, y, z, w] = u -> a(x), b(y), c(z), d(w), e(u).
We will now step through the process used to resolve the
arguments to the atom foo
in the following logic:
q(x, y, z) <- (a(x), c(y), f(z)) { foo[w, u] = v }, b(w), d(u), e(v).
The first thing the compiler does is note that the value argument of the atom has already been specified. Therefore, the compiler will ignore the value argument for the rest of the resolution process.
Next, the compiler will start by simply
ignoring the key arguments the user has supplied. We can
visualize the compiler's current information about the types
of the foo
atom's arguments as follows:
foo[•, •, •, •] = v
Here • represents a "hole", i.e., an argument position still to be filled.
The compiler will now fill in all of the
argument positions whose expected types are not
disjoint from the candidate expressions. In this example, the
candidate expressions are x
, y
, and
z
with types a
, c
,
and f
respectively. Because the first argument of
foo
should be
an expression of type a
and the third argument of
foo
should be an expression of type c
, the
compiler will
insert x
and y
into these positions:
foo[x, •, y, •] = v
It is worth emphasizing that the compiler will only ever insert
a candidate expression into a single hole, and that candidate
expressions may go unused (z
in our example).
Next, the compiler will
take the two arguments that the user supplied to foo
and fill
them
into the remaining holes left to right. This means w
will be
inserted in the second argument and u
into the fourth
argument:
foo[x, w, y, u] = v
Because all argument positions of the predicate foo
are now filled, the resolution process is considered successful.
However, if we had started with a slightly different example, resolution could have failed at various points. For example, if the user had written
q(x, y, z) <- (a(x), c(y), f(z)) { foo[w] = v }, b(w), d(u), e(v).
then the compiler would still have filled in the candidate expressions as follows:
foo[x, •, y, •] = v
However, the compiler would then notice that the user has
supplied only one key argument, w
, while there are still two
holes to be filled. It would report this as a
HIER_ATOM_TOO_FEW_SUPPLIED error.
Another error would be detected if the user had written
q(x, y, z) <- (a(x), c(y), f(z)) { foo[w, u, u] = v }, b(w), d(u), e(v).
Again, the compiler would begin by inserting the two candidate expressions:
foo[x, •, y, •] = v
This time, the compiler would notice that the user supplied three
key arguments (w
, u
, u
),
while there are only two holes to
be filled. This would be reported as a
HIER_ATOM_TOO_MANY_SUPPLIED error.
The resolution process could have also failed if
foo
had been declared differently. For example, if
foo
had been declared as
foo[x, y, z, w] = u -> a(x), a(y), c(z), d(w), e(u).
and then the user wrote the logic
q(x, y, z) <- (a(x), c(y), f(z)) { foo[u] = v }, b(w), d(u), e(v).
the resolution process would fail. The reason is that the
compiler cannot determine whether to insert the x
as the
first
argument or as the second argument of foo
:
foo[•, •, •, •] = v
Again, this is because a candidate expression will only be inserted into a single hole per atom. This kind of failure will be reported as a HIER_AMBIGUOUS_BODY_ATOM error.
Finally, it should be noted that the value argument has a
special status in the resolution of a functional predicate.
If foo
had been declared as:
foo[x, y, z, w] = u -> a(x), b(y), c(z), d(w), a(u).
the compiler would be able to resolve
q(x, y, z) <- (a(x), c(y), f(z)) { foo[w, u] = x }, b(w), d(u), e(v).
because, when the candidate expressions would be about to be filled in, the value argument would already have been determined:
foo[•, •, •, •] = x
At this point there is only one hole where x
could
be inserted and satisfy the typing requirements. So there is no
ambiguity. It is very important to understand that the
compiler would be able to resolve this example only because it had
the syntactic hint that x
was to be used as the value
argument. If the logic had been written as
q(x, y, z) <- (a(x), c(y), f(z)) { foo(w, u, x) }, b(w), d(y), e(v).
then just before the insertion of candidate expressions the atom would look like:
foo[•, •, •, •] = •
This is because the compiler would no longer know that x
is
intended to be the value argument, so there would now be a hole in the
position of the
value argument. Furthermore, the candidate
expression x
could be inserted into two possible holes,
the first argument and the value argument. Consequently, the compiler
would report an ambiguity error
(HIER_AMBIGUOUS_BODY_ATOM
).
Syntax. Hierarchical syntax extends the language with the following grammatical constructions:
Hierarchical = HierHead "{" HierBody "}" . HierHead = Atom | "(" HierHeadConjunction ")" . HierHeadConjunction = Atom { "," Atom } . HierBody = HierAtom { "," HierAtom } . HierAtom = [ Delta ] PredicateName [ "@" Stage ] "(" HierExpressions ")" | [ Delta ] PredicateName [ "@" Stage ] "[" HierExpressions "]" "=" HierExpression . HierExpressions = HierExpression { "," HierExpression } . HierExpression = Hierarchical | Expression .