Chapter 5. Grammar

A LogiQL program is a structured text document conveying a description of a desired computation to the runtime engine. This chapter describes the syntax of well-formed programs. In doing so, it introduces terms used throughout the rest of this manual in sections that provide details of the program's intended semantics. (A program that is syntactically well-formed may, of course, be incorrect for other reasons: for example, one may attempt to use an undefined predicate.)

Note

Please note that the syntactic rules in the various sections may be simplified or partial: their purpose is to provide some precision and assist readability. The complete syntax of LogiQL is provided in Section 5.2, “The Complete Grammar”.

The LogiQL grammar is expressed using the Extended Backus-Naur Form, in its original form, due to Niklaus Wirth (see the original article ). This section summarizes the notation.

A grammar is a sequence of production rules, each of which defines a non-terminal symbol in terms of constituent terminal and non-terminal symbols. Non-terminal symbols appear as identifiers, while terminal symbols are surrounded either by double quotes (") or by single quotes, i.e., apostrophes (').

A production rule consists of the non-terminal symbol being defined, followed by an equality symbol (=) and the definition. The entire rule is terminated by a fullstop (.).

The definition is an expression formed of terminal and non-terminal symbols and special operators. The operators are listed below:

  • A vertical bar (|) separates two alternative subexpressions.
  • Curly braces ({ }) enclose an iterated subexpression, i.e., one that may occur zero or more times.
  • Square brackets ([ ]) enclose an optional subexpression, i.e., one that may occur once or not at all.
  • Parentheses are used to disambiguate expressions, just like in arithmetic expressions.

Additionally, we use %% to begin a comment that extends to the end of the line.

Example 5.1. A simple grammar

The following four rules define very simple arithmetic expressions. An expression is a factor, or a sequence of factors separated by multiplication or division operators. A factor is a binary number preceded by an optional sign.

Expression = Factor { ("*" | "/") Factor } .
Factor     = [ "+" | "-" ] Number .
Number     = Digit { Digit } .
Digit      = "0" | "1" .

So, for example, 1 is an expression, and so is -101*+01/1.

It is a common convention to allow whitespace characters (such as spaces or newlines) between terminal symbols, and to treat some nonterminal symbols (identifiers, numbers etc.) as "tokens" that should not contain whitespace. According to this convention, the string -101 * + 01 / 1 would also be an expression described by the rules in Example 5.1, “A simple grammar” (but 1 0 would not).

5.1. Major Grammatical Categories

A LogiQL program is made up of separately compiled compilation units, each of which is a sequence of clauses. A clause corresponds to a statement in traditional programming languages. In its full form, it comprises a head and a body separated by an arrow.

There are three types of clauses in LogiQL: constraints, rules and facts.

  • A constraint expresses an invariant property of the data contained in the program's workspace. It is written with a rightward-facing arrow. For instance, p(x) -> x > 0. is a constraint.

    (See Chapter 16, Constraints.)

  • A rule provides instructions for computing data and corresponds roughly to a function or procedure in traditional programming languages. It is written with a leftward-facing arrow. For instance, p(x) <- q(x), r(x). is a rule.

    (See Chapter 11, Rules and Section 19.2.2, “Delta rules”.)

  • A fact directly expresses the assertion or retraction of data in the workspace. It is a syntactically degenerate form of a rule: it contains no arrow, as there is nothing on its right-hand side. For instance, the fact fruit("plum"). is exactly equivalent to fruit("plum") <- . (a rule with an empty body).

    (See Section 10.1.1, “Atoms as facts” and Section 19.2.1, “Direct manipulation of EDB predicates”.)

The main syntactic element of a clause is the formula. For example, both the head and the body are formulas.

Elementary formulas are described further in this section. The non-elementary formulas are disjunctions, conjunctions and negated formulas. (See the section called “Conjunction and Disjunction” and the section called “Negation”.)

  • A disjunctive formula has several alternative subformulas: the formula is true if at least one of these subformulas is true. The disjunction operator is the semicolon. For instance, fruit(produce) ; vegetable(produce) is a disjunctive formula.

  • A conjunctive formula has several subformulas that must all be true for the entire formula to be true. The conjunction operator is the comma. For instance, male(person), adult(person) is a conjunctive formula.

  • A negated formula has one subformula: the negated formula is true if and only if the subformula is false. The negation operator is the exclamation mark. For instance, !human(x) is a negated formula.

The elementary formulas are atoms and comparisons:

  • the most common form of an atom is the name of a predicate followed by a parenthesised list of arguments, e.g., p(x, y) (see Section 10.1, “Atoms”);
  • a comparison of two expressions is quite similar to that found in conventional programming languages, e.g., x * y >= x + z (see Section 10.2, “Comparisons”).

An important special case of comparisons is equality between a function application and another expression: such a comparison is equivalent to an atom. For example, if predicate salary is a function that maps names to salaries, then there are two equivalent ways to write an atom that expresses, roughly, the salary of John Smith is x:

  • salary("John", "Smith", x)
  • salary["John", "Smith"] = x

(See Section 9.4, “Function applications” and Section 8.2, “Functional Predicates”.)

5.2. The Complete Grammar

The grammar given below has been abstracted from the actual parser. The names of nonterminals should not always be taken at face value: for instance, ArithmeticFormula is a formula that need not have anything to do with arithmetic (e.g., it might be a comparison of two strings).

In most cases it might be more instructive to look at the partial syntactic descriptions given in the sections describing the various constructs, even though not all of them are mutually consistent.

CompilationUnit =  { Clause "." } .

Clause = Constraint | Rule | Fact .

Constraint = Formula "->"
           | Formula "->" Formula
           | "->" Formula .

Rule = Formula "<-" [ [ Identifier ArgString ] Formula ]
     | InfixAggregate .

InfixAggregate = Expression InfixAggOp Expression .

InfixAggOp = "+=" | "min=" | "max=" | "&=" | "|=" .

Fact = Formula .

Formula = Conjunction { ";" Conjunction } .

Conjunction = Unary { "," Unary } .

Unary = [ "!" ]  Unit
      | [ "!" ]  "(" Formula ")" .

Unit = Atom | HierFormula | ArithmeticFormula .

Atom = [ Delta ] AtomName "(" [ ArgumentList ] ")" .

Delta = "+" | "-" | "^" .

AtomName = Name | Identifier .

Name = Identifier ":" IdentifierOrKeyword { ":"  IdentifierOrKeyword } .

ArgumentList = Expressions [ ";" [ Expressions ] ]
             | ";" Expressions
             | Identifier ":" (Identifier | Constant) .

Expressions = Expression { "," Expression } .


HierFormula = Atom "{" HierBody "}"
            | ArithmeticFormula "{" HierBody "}"
            | "(" Formula ")" "{" HierBody "}" .

HierBody = HierAtom { "," HierAtom } .


HierAtom = [ Delta ] AtomName "(" [ HierArgumentList ] ")"
         | AppExpression "=" HierExpression .

HierArgumentList = HierExpressions [ ";" ]
                 | [ HierExpressions ] ";" HierExpressions .

HierExpressions = HierExpression { "," HierExpression } .

HierExpression = Expression | HierFormula .


ArithmeticFormula = Expression ComparisonOperator Expression
                       { ChainedComparisonOperator Expression } .

ComparisonOperator = "=" | "!=" | ChainedComparisonOperator .

ChainedComparisonOperator = "<" | ">" | "<=" | ">=" .

Expression = AdditiveExpression { "orelse" AdditiveExpression } .

AdditiveExpression =
   MultiplicativeExpression { ("+" | "-") MultiplicativeExpression } .

MultiplicativeExpression = UnaryExpression { ("*" | "/") UnaryExpression } .

UnaryExpression = Constant
                | Identifier
                | AppExpression
                | "(" Expression ")" .

AppExpression = [ Delta ] PredicateExpression "[" [ HierExpressions ] "]" .

PredicateExpression =
   (Name | Identifier) [ Stage ] { RestOfPredicateExpression } .

RestOfPredicateExpression = "[" "]"
                          | "[" HierExpressions "]" [ Stage ]
                          | "[" Name [ Stage ] "]" [ Stage ] .

Stage = "@" StageName .

StageName = "INITAL" | "PREVIOUS" | "PREV" | "FINAL" | "START" | BranchName .

BranchName = BasicIdentifier .

Constant = StringConstant
         | BooleanConstant
         | Number .

BooleanConstant = "true" | "false" .

Number = [ "-" ] RealConstant
       | [ "-" ] DecimalConstant
       | [ "-" ] IntegerConstant .

IdentifierOrKeyword = Identifier
                    | "<-"
                    | "->"
                    | "!"
                    | "true"
                    | "false" .


%% ----- Lexemes:

ArgString = "<<" NotGT2 ">>" .

Identifier = [ "`" ] BasicIdentifier { ":" BasicIdentifier } .

BasicIdentifier = ("_" | Letter) { "_" | Letter | Digit } .

IntegerConstant = Digit { Digit } .

DecimalConstant = IntegerConstant "d"
                | [ IntegerConstant ] "." IntegerConstant [ "d" ] .

RealConstant = IntegerConstant Exponent
             | [ IntegerConstant ]  "." IntegerConstant  Exponent
             | IntegerConstant [ Exponent ] "f"
             | [ IntegerConstant ]  "." IntegerConstant  [ Exponent ] "f" .

Exponent = ("e" | "E") [ "+" | "-" ] IntegerConstant .

StringConstant = BasicStringConstant { BasicStringConstant } .

BasicStringConstant = '"' { NotDQuoteOrNewline } '"'
                    | '"""' NotTripleQuote '"""' .


%% ----- Classification of characters:

Digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .

HexDigit = Digit
         | "A" | "B" | "C" | "D" | "E" | "F"
         | "a" | "b" | "c" | "d" | "e" | "f" .

%% Letter  is any Unicode character considered as a letter by Java.

%% NotDQuoteOrNewline  is any character other than a double quote or a newline.

%% NotTripleQuote  is any sequence of characters that does not contain a
%%                    sequence of three successive double quotes.

%% NotGT2  is any sequence of characters that doesn't include ">>".