LogicBlox 4 Reference Manual


About This Manual
I. Installing and Running LogicBlox
1. Obtaining and Installing LogicBlox
1.1. Install LogicBlox Natively (64-bit Linux or Mac OSX 10.10+)
1.2. Install LogicBlox Using Vagrant (Mac, Linux, Windows)
2. Starting and Stopping the LogicBlox Services
II. LogiQL
3. Introduction
4. Lexical Properties
4.1. Identifiers
4.2. Literals
4.3. Operators
4.4. Keywords
4.5. White space and comments
5. Grammar
5.1. Major Grammatical Categories
5.2. The Complete Grammar
6. Primitive Types
7. Built-in Operations
7.1. Comparisons
7.2. Arithmetic Operators
7.3. Arithmetic Functions
7.4. Rounding Functions
7.5. Integer Bit Manipulation Functions
7.6. String Operations
7.7. Boolean Operations
7.8. Date/Time Operations
7.9. Conversions
7.10. Currency
7.11. Unique Identifiers
7.12. Transaction Identifier
8. Predicates
8.1. Predicate Declaration
8.2. Functional Predicates
8.3. Entity Predicates
8.4. Reference-Mode Predicates
8.5. Constructor Predicates
8.6. Foreign Predicates
8.7. File Predicates
8.8. Derivation Types
8.9. Ordered Predicates
8.10. Local predicates
8.11. External Diff Predicates
8.12. Predicate Properties
9. Expressions
9.1. Literals
9.2. Variables
9.3. Arithmetic Operations
9.4. Function applications
9.5. Parenthesized expressions
10. Formulas
10.1. Atoms
10.2. Comparisons
10.3. Complex Formulas
11. Rules
11.1. Basics of IDB Rules
11.2. Value-constructing Rules
11.3. Derived-only Rules
11.4. Putting it all together: general recursion
12. Aggregations
13. Sorting
13.1. seq
13.2. list
14. Series
14.1. Semantics
14.2. runtotal
14.3. rndnum
15. Linear recursion
16. Constraints
16.1. Syntax and Interpretation
16.2. Common Forms of Constraints
16.3. Constraints as Predicate Declarations
17. Typing
17.1. Predicate Type Inference
17.2. Type checking
18. Default Values
18.1. Net Sales Example
18.2. Disjunctive Solution
18.3. Default Value Solution
18.4. Storage and Performance Implications
18.5. Consistent Default Values
18.6. Data Updates
18.7. Caveats
19. Transaction Logic
19.1. Preliminaries
19.2. Delta logic
19.3. Events
19.4. Stages
20. Hierarchical Syntax
21. Modules
21.1. ConcreteBlox
22. Hierarchical Import/Export
22.1. Using Hierarchical Import/Export
22.2. Set semantics for repeated fields
III. Projects
23. LogiQL Project
23.1. Project Structure
23.2. Compiling a Project
23.3. Installing a Project
IV. Web Services
24. Introduction
25. Configuration
25.1. Service Metadata
26. Protobuf and Global Protobuf Services
26.1. Implementing ProtoBuf/JSON Services
26.2. Implementing Global ProtoBuf/JSON Services
27. Data Exchange Services
27.1. Configuring Tabular Data Exchange Services
27.2. Dynamic Tabular Data Exchange Services
28. Implementing Custom Services
28.1. Custom ProtoBuf Services
29. Asynchronous Service calls
29.1. Asynchronous call return codes
30. Measure Service
30.1. Concepts
30.2. Measure Expression Grammar
31. Proxy Services
32. Extensions
32.1. Email Service
32.2. Password Management
32.3. ConnectBlox Services
V. Tools
33. LogicBlox Command Reference
33.1. Database Services Management
33.2. Workspace Commands
33.3. Replication
33.4. Unit Testing
33.5. Other Commands
34. Logging
35. Testing
36. The LB Configuration Tool
36.1. Creating config.py
36.2. Extending the LB configuration tool
37. Workbooks
37.1. Overview
37.2. Creating and deleting workbooks
37.3. Templates
37.4. Using services in workbooks
37.5. Commit and refresh, usage and configuration
37.6. Authorizing users
37.7. Building workbooks in workflows
37.8. Generating template instantiation data with workbook util
38. Workflows
38.1. Overview
38.2. The lb-workflow Workspace
38.3. The lb-workflow Language
39. Batch Execution
39.1. Executing a Batch Specification
39.2. Configuring Batch Execution
39.3. Simple Statements
39.4. Composite Statements
39.5. Writing Protobuf Messages by Hand
40. Extract Example
40.1. Command Overview
40.2. Command Options
40.3. Running the Extracted Example
40.4. Tutorials
VI. Branching
41. Database Branching
41.1. Special Branches
41.2. Illustration
41.3. Branches in LogiQL Rules
41.4. Command-line Utilities
41.5. Protobuf Interfaces
VII. Analytics
42. BloxOptimize
42.1. Introduction
42.2. Source Logic
42.3. Intermediate Language
42.4. Use of BloxOptimize
43. Mathematical Functions
43.1. Introduction
43.2. Mathematical Functions
VIII. Appendix
A. ConnectBlox
A.1. ConnectBlox TCP Transport
B. Compiler Errors and Warnings
B.1. MULTIPLE_VALUES
B.2. EDB_RULE
B.3. DIV_ZERO
B.4. NO_DECLARATION
B.5. CONSTRUCTOR_ILLEGAL_VALUETYPE
B.6. SIMILAR_VAR
B.7. SUBTYPE_MULTI
B.8. DYNAMIC_TYPE_CONSTRAINT
B.9. POLYMORPHIC_LITERAL
B.10. COMP_UNORDERED
B.11. MULTI_ENTITY_CREATION
B.12. SEVERITY_INCONSISTENT
B.13. Hierarchical syntax
B.14. Delta logic
B.15. Aggregations
B.16. Entities
B.17. Incremental Evaluation
B.18. File predicates
B.19. Derived-only predicates
C. Environment Variables
D. File Formats
D.1. Bytecode file format
D.2. Summary file format
Glossary

About This Manual

This is the Reference Manual for LogicBlox version 4. Differences between minor versions of LogicBlox are noted in the presented text with reference to release numbers (e.g., 4.1.x).

This manual is not intended for use with older versions of LogicBlox: LogicBlox 4 introduced many changes, both in functionality and the language available to the user. If you are using an earlier release of LogicBlox, please refer to the appropriate manual.

Part I. Installing and Running LogicBlox

Chapter 1. Obtaining and Installing LogicBlox

There are various ways to install LogicBlox: if you have a machine running a 64-bit flavor of Linux or MacOS X, you can install it natively. Alternatively, you can run LogicBlox using the popular Vagrant tool. This chapter describes how to install LogicBlox using either method.

1.1. Install LogicBlox Natively (64-bit Linux or Mac OSX 10.10+)

If you run a 64-bit flavor of Linux or Mac OSX 10.10+, you can install LogicBlox natively on your machine.

System requirements

  • A 64-bit flavor of Linux or Mac OSX 10.10+. Verify that you're running 64-bit Linux by running: uname -m which should return x86_64.

  • At least 8GB of available memory.

  • Python 2.7 or newer (but not Python 3.x). Available from the Python Website.

  • Java Developer Kit version 8 or newer. Available from Oracle's Java website.

Installation steps

  • Download the latest LogicBlox tarball from the LogicBlox release page.

  • Extract the tarball (tar xzf LogicBlox-4.*.tar.gz).

  • The archive contains a script to set up all the environment variables required to run LogicBlox. Before using LogicBlox you must source this script, as follows:

    $ source logicblox-4.*/etc/profile.d/logicblox.sh

    Tip

    For convenience, you may want to add this line to your shell's startup script, e.g., to ~/.bash_profile or ~/.bashrc.

1.2. Install LogicBlox Using Vagrant (Mac, Linux, Windows)

Vagrant is a popular tool that can be used to easily set up development environments for projects with complex dependencies. Vagrant will automatically create, provision and manage a Linux virtual machine (VM) that contains all the dependencies required for each project. It will mount your project directory into the VM, which allows you to use your favorite editors to edit LogiQL code outside of the VM, and compile and test your project inside the VM.

System requirements

  • Mac OS X, Linux or Windows capable of running VirtualBox.

  • At least 8GB of available memory.

Installation steps

  • Download the latest LogicBlox tarball from the LogicBlox release page.

  • Download and install VirtualBox for your platform.

  • Download and install Vagrant for your platform.

  • For each project you're likely to work on:

    • Create a directory for your LogicBlox project.

    • Copy (or hardlink) the LogicBlox tarball you downloaded into this directory.

    • Download the latest version of the LogicBlox Vagrant file into this directory, e.g., using CURL (Mac/Linux): curl -O https://bitbucket.org/logicblox/lb-vagrant/raw/tip/Vagrantfile

    • Open a terminal (command line window) and cd into your project directory, then run:

      $ vagrant up

      This will create an Ubuntu Linux VM, install LogicBlox into it and start the LogicBlox automatically.

    • To run LogicBlox commands, login to the VM using SSH:

      $ vagrant ssh

      Vagrant on Windows

      vagrant ssh will not work out-of-the-box on Windows. To set it up, follow these instructions. Alternatively, you can choose to ignore those instruction and run only single commands via SSH, which works great on Windows. See "Vagrant Tips" below on how to do this.

When you're not logged into the VM via SSH, the VM will still keep running. To clean it up you can either stop it or destroy it (note that destroying is perfectly OK because project files are not stored in the VM itself).

To delete your development VM:

$ vagrant destroy

To stop it (can be started later again with vagrant up):

$ vagrant halt

Accessing Logs and Configuration

After you run vagrant up, a directory called lb_deployment is created within your project. Among its subdirectories are logs and config. These contain the current logs and configuration of the running LB instance. You can change the configuration and restart LB to let the changes take effect.

Accessing LB Web

Vagrant automatically maps your VM's port 8080 and 8086 to your host's port 8080 and 8086. As a result, you can access LB web via http://localhost:8080 as if you were running it natively. However, if port 8080 was already in use when you ran vagrant up, it will have picked another port automatically. Watch the vagrant up output for lines like:

Fixed port collision for 8080 => 8080. Now on port 8081.

to see what the port was mapped to.

Vagrant tips

  • If you only want to run a single command within the VM, you can use:

    $ vagrant ssh -c 'your command'

    For instance:

    $ vagrant ssh -c 'lb services status'

    or:

    $ vagrant ssh -c 'lb config && make check'

    This works on Windows too, without having to install Putty.

  • To make it easier for others to get started working on your LogicBlox-based project, consider checking in your Vagrantfile into your source control repository.

  • If your project requires additional requirements, like extra packages that need to be installed or extra ports to be mapped, consider adding them to your project's Vagrantfile. The LogicBlox Vagrantfile contains some pointers on how to do this.

Chapter 2. Starting and Stopping the LogicBlox Services

Once the installation is completed, you are ready to use the LogicBlox services. These are:

  • lb-compiler-server: the (LogiQL) compiler server (see Part II);
  • lb-server: the LogicBlox database server;
  • lbe-web-server: a server that allows you to access the LogicBlox system from a web browser (see Part IV).

The LogicBlox services can be started from the command line by typing:

$ lb services start 

The status of the LogicBlox services can be requested by typing:

$ lb services status 

And, finally, the services can be stopped again by the command:

$ lb services stop 

If you need to restart the services, you can simply run:

$ lb services restart 

For more information about lb see Chapter 33. An easy preliminary introduction to the most basic concepts can be found in Section 19.1.1, Section 19.1.2 and Section 19.1.4.

All errors are stored by default under $LB_DEPLOYMENT_HOME/logs/current/ (see Appendix C). If you have problems starting the services, the log messages in that folder can help you to resolve the issue.

Part II. LogiQL

Chapter 3. Introduction

LogiQL is the primary language of the LogicBlox system. It can be regarded as a dialect of Datalog.

A beginner cannot expect to master the language in an hour or two: there are lots of concepts and new terminology. It is difficult to develop an order of presentation that will satisfy all needs, so we tried to make reading easier by introducing cross-references in the text. You might also find it useful to consult the glossary.

The notation used for syntactic definitions is described in Chapter 5.

Most of the examples can be run with the lb tool. Basic information about how to use lb can be found in Section 19.1.

Chapter 4. Lexical Properties

Note

In the text of the manual we usually use this font for quoted text. This allows us to clearly distinguish, e.g., between the letter A, the letter A in single quotes ('A') and the letter A in double quotes ("A").

LogiQL programs are represented as Unicode text encoded as UTF-8. Programs consist of a number of tokens. Each token is matched using the longest match principle: whenever adding more input characters to a valid token will result in a new valid token, the longer token is the one used. For example, >= is parsed as a single token >= rather than as the two tokens > and =.

In this manual, the notation U+nnnn is used to indicate the Unicode code point whose hexadecimal number is nnnn. For example, U+0020 is the space character. More frequently, a character is described by quoting it. For example, A is the same character as U+0041. (Note that U+0041 is just a way to refer to the character in the text: you cannot actually replace an occurrence of A with U+0041 in a LogiQL program.)

4.1. Identifiers

Identifier = BasicIdentifier { ":" BasicIdentifier } .

BasicIdentifier = ("_" | Letter) { "_" | Letter | Digit } .

An identifier is a sequence of characters where each character is a Unicode letter, a Unicode numeric digit, an underscore (_), or a colon (:). However, the first character cannot be a digit or a colon.

Example 4.1. Identifiers

x
y
cost
sales_2010
PriceStoreSku
sku:cost

The identifier whose name consists of a single underscore character (_) is called the anonymous variable. It is used to denote the existence of a value that is not referenced by the clause containing it. (See Section 9.2 for more information.)

4.2. Literals

In this section we describe the formats in which literals of various types can be expressed directly in LogiQL. The general idea is that the type of a literal be immediately visible to both the user and the compiler. Please note that when numbers are printed by the system (e.g., when listing the contents of a predicate), they will not appear in precisely this format. For example, 1f is a floating point literal, 1d and 1.1d are decimal fixed-point literals, and 1 is an integer literal. They would be printed out as 1.0, 1, 1.1 and 1, respectively. See also the section called “X:Y:convert” and the section called “string:T:convert”.

Integer Literals

IntegerLiteral = DecimalIntegerLiteral
               | HexadecimalIntegerLiteral | BinaryIntegerLiteral .

DecimalIntegerLiteral     = [ "-" ] DecimalDigit { DecimalDigit } .
HexadecimalIntegerLiteral = "0x" HexadecimalDigit { HexadecimalDigit } .
BinaryIntegerLiteral      = "0b" BinaryDigit { BinaryDigit } .

DecimalDigit     = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .
HexadecimalDigit = "A" | "B" | "C" | "D" | "E" | "F" |
                 | "a" | "b" | "c" | "d" | "e" | "f" | DecimalDigit .
BinaryDigit      = "0" | "1" .

A decimal integer literal is a sequence of one or more decimal digits, optionally preceded by a minus sign. The value must be in the range of the int type. (See the section called “int” for details.)

Integers may also be written as hexadecimal or binary literals. Hexadecimal literals are prefixed with 0x and binary literals are prefixed with 0b They are intepreted as unsigned integers, unlike decimal integer literals. Treating them as unsigned makes it more convenient to write literals involving the high bit of the underlying integer. For example, if they were interpreted as signed integers, instead of being able to write 0xFFFFFFFFFFFFFFFF to express the integer where all bits are set, the only option available would be to write -1.

Example 4.2. Integer literal

0
-123
42
0b01010
0xABCDEF0

int128 Literals

Int128Literal = [ "-" ] DecimalDigit { DecimalDigit } "q" .

DecimalDigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .

An int128 literal is a sequence of one or more decimal digits followed by q. The sequence of digits may optionally be preceded by a minus sign. The value must be in the range of the int128 type. (See the section called “int128” for details.)

Decimal Fixed-Point Literals

FixedPointLiteral = IntegerPart "d"
                  | [ IntegerPart ] "." DecimalPart [ "d" ] .

IntegerPart = DecimalIntegerLiteral .
DecimalPart = DecimalIntegerLiteral .

A decimal fixed-point literal specifies a decimal number. It is given as a decimal integer literal, followed by a period character and a decimal part, followed by an optional letter d. If d is given, the decimal part can be omitted. If the decimal part is given, the integer part can be omitted.

The value must be in the range of the decimal type. (See the section called “decimal” for details.)

Example 4.3. Fixed-point literals

31.555
31.555d
31d
.555
.555d

Binary Floating-Point Literals

FloatingPointLiteral = IntegerPart Exponent
                     | [ IntegerPart ] "." DecimalPart Exponent
                     | IntegerPart [ Exponent ] "f"
                     | [ IntegerPart ] "." DecimalPart [ Exponent ] "f" .

Exponent = ("e" | "E") [ "+" | "-" ] DecimalIntegerLiteral .

IntegerPart = DecimalIntegerLiteral .
DecimalPart = DecimalIntegerLiteral .

A binary floating-point literal specifies an IEEE 754 floating-point number. It is given as an integer literal, followed by an optional decimal part, followed by an optional exponent part, followed by the letter f.

The decimal part, if present, is indicated with a period (.) followed by another integer literal. The exponent part, if specified, is indicated with the letter e or E, followed by an optional plus (+) or minus (-) sign, followed by an integer literal.

See the section called “float” for details about the range and precision of floating-point numbers.

Example 4.4. Floating point literals

31.555f
31e12f
31e12
.0e12
31.555e-12f
31f

Although f is not strictly necessary when the exponent is given, we recommend that it never be omitted. Otherwise editing a floating-point literal might easily turn it into a decimal literal, thus causing a type error.

Boolean Literals

There are two boolean literals, true and false.

true
false

String Literals

A single-line string literal is a double quote character (", U+0022), followed by zero or more character specifiers, followed by another double quote character.

A multi-line string literal is three double quote characters ("""), followed by zero or more character specifiers including unescaped newline characters, followed by another three double quote characters. (A multi-line string literal can, but need not span multiple lines. See also the note below.)

Each character specifier determines one character that will be included in the string. The possible character specifiers are as follows:

  • Any character except a double quote, a backslash (\, U+005C), or a newline (U+000A). The character specifies itself for inclusion in the string.
  • \", indicating a double quote character (U+0022).
  • \a, indicating an alert (U+0007).
  • \b, indicating a backspace (U+0008).
  • \f, indicating a form feed character (U+000C).
  • \n, indicating a newline (line feed) character (U+000A).
  • \r, indicating a carriage return character (U+000D).
  • \t, indicating a tab character (U+0009).
  • \v, indicating a vertical tab character (U+000B).
  • \\, indicating a single backslash (U+005C) (but see the note below!).
  • \', indicating a single quote character (U+0027).
  • \u followed by exactly four hexadecimal digits, indicating the Unicode character with the code point given by those hex digits. Hexadecimal digits that are letters may be given in upper or lower case.

Note

Some special characters may occur directly in a string. For example, if a is followed by a tabulator in "a b", then the string is equivalent to "a\tb".

In a multi-line string literal backslash characters are not escaped, so """\\\\""" contains four backslashes, unlike "\\\\", which contains two. """\\\""" is perfectly legal (three backslashes), whereas "\\\" is not (and would trigger an error).

One consequence of this is that p("""\""""). is equivalent to p("\\\""), rather than p("\""). So a multi-line string literal can contain up to two double quotes in a sequence. For example, p(""""""""). is equivalent to p("\"\""). (However, adding one more double quote would cause a compilation error, which is perhaps not surprising.)

Another consequence is that if you want to insert, say, a tabulator into a multi-line string literal, then you cannot do it with a \t. Instead, you must make sure that a real tabulator character is present between the triple double quotes.

Note also that a multi-line string literal is converted to a normal string, so, for example,

"""ab
c"""

is exactly equivalent to "ab\nc".

Example 4.5. String literals

"hello, world"
""
"He said, \"It's only logical.\"\n"
"\uDEADbeef"
"""This is just
one string"""

Predicate Literals

A predicate literal is a back quote (`, U+0061) followed by an identifier. For example:

Example 4.6. Predicate literals

`p
`q
`parent

4.3. Operators

The following character sequences are used as operators in LogiQL:

. :: : , ; <- -> = < >
!= <= >= ( ) / - + * ^
@ [ ] !

4.4. Keywords

The following character sequences are keywords in LogiQL:

true false

Note that other identifiers, such as the names of data types (e.g., datetime), clause names in the module system (e.g., aliases) and aggregation rule types (e.g., seq) are treated specially by LogiQL but are not reserved words. To make your program more readable, you should avoid using these identifiers for other purposes.

4.5. White space and comments

White space and comments are used to lay out code and to separate tokens that would otherwise combine due to the longest match principle. White space and comments are immediately discarded after being parsed.

White space is any sequence of the following characters: space (U+0020), tab (U+0009), form feed (U+000C), carriage return (U+000D), or line feed (U+000A, i.e., newline).

A comment can be written in either of two ways. One way is to start with a slash and an asterisk (/*). Such a comment continues until the first instance of an asterisk followed by a slash (*/). The second way is to start with two slashes (//): the comment then extends to the end of the line. Here are two examples of comments.

Example 4.7. Syntax of comments

// This is a comment

/* This is
   a multi-
   line comment
*/

Chapter 5. Grammar

A LogiQL program is a structured text document conveying a description of a desired computation to the runtime engine. This chapter describes the syntax of well-formed programs. In doing so, it introduces terms used throughout the rest of this manual in sections that provide details of the program's intended semantics. (A program that is syntactically well-formed may, of course, be incorrect for other reasons: for example, one may attempt to use an undefined predicate.)

Note

Please note that the syntactic rules in the various sections may be simplified or partial: their purpose is to provide some precision and assist readability. The complete syntax of LogiQL is provided in Section 5.2.

The LogiQL grammar is expressed using the Extended Backus-Naur Form, in its original form, due to Niklaus Wirth (see the original article ). This section summarizes the notation.

A grammar is a sequence of production rules, each of which defines a non-terminal symbol in terms of constituent terminal and non-terminal symbols. Non-terminal symbols appear as identifiers, while terminal symbols are surrounded either by double quotes (") or by single quotes, i.e., apostrophes (').

A production rule consists of the non-terminal symbol being defined, followed by an equality symbol (=) and the definition. The entire rule is terminated by a fullstop (.).

The definition is an expression formed of terminal and non-terminal symbols and special operators. The operators are listed below:

  • A vertical bar (|) separates two alternative subexpressions.
  • Curly braces ({ }) enclose an iterated subexpression, i.e., one that may occur zero or more times.
  • Square brackets ([ ]) enclose an optional subexpression, i.e., one that may occur once or not at all.
  • Parentheses are used to disambiguate expressions, just like in arithmetic expressions.

Additionally, we use %% to begin a comment that extends to the end of the line.

Example 5.1. A simple grammar

The following four rules define very simple arithmetic expressions. An expression is a factor, or a sequence of factors separated by multiplication or division operators. A factor is a binary number preceded by an optional sign.

Expression = Factor { ("*" | "/") Factor } .
Factor     = [ "+" | "-" ] Number .
Number     = Digit { Digit } .
Digit      = "0" | "1" .

So, for example, 1 is an expression, and so is -101*+01/1.

It is a common convention to allow whitespace characters (such as spaces or newlines) between terminal symbols, and to treat some nonterminal symbols (identifiers, numbers etc.) as "tokens" that should not contain whitespace. According to this convention, the string -101 * + 01 / 1 would also be an expression described by the rules in Example 5.1 (but 1 0 would not).

5.1. Major Grammatical Categories

A LogiQL program is made up of separately compiled compilation units, each of which is a sequence of clauses. A clause corresponds to a statement in traditional programming languages. In its full form, it comprises a head and a body separated by an arrow.

There are three types of clauses in LogiQL: constraints, rules and facts.

  • A constraint expresses an invariant property of the data contained in the program's workspace. It is written with a rightward-facing arrow. For instance, p(x) -> x > 0. is a constraint.

    (See Chapter 16.)

  • A rule provides instructions for computing data and corresponds roughly to a function or procedure in traditional programming languages. It is written with a leftward-facing arrow. For instance, p(x) <- q(x), r(x). is a rule.

    (See Chapter 11 and Section 19.2.2.)

  • A fact directly expresses the assertion or retraction of data in the workspace. It is a syntactically degenerate form of a rule: it contains no arrow, as there is nothing on its right-hand side. For instance, the fact fruit("plum"). is exactly equivalent to fruit("plum") <- . (a rule with an empty body).

    (See Section 10.1.1 and Section 19.2.1.)

The main syntactic element of a clause is the formula. For example, both the head and the body are formulas.

Elementary formulas are described further in this section. The non-elementary formulas are disjunctions, conjunctions and negated formulas. (See the section called “Conjunction and Disjunction” and the section called “Negation”.)

  • A disjunctive formula has several alternative subformulas: the formula is true if at least one of these subformulas is true. The disjunction operator is the semicolon. For instance, fruit(produce) ; vegetable(produce) is a disjunctive formula.

  • A conjunctive formula has several subformulas that must all be true for the entire formula to be true. The conjunction operator is the comma. For instance, male(person), adult(person) is a conjunctive formula.

  • A negated formula has one subformula: the negated formula is true if and only if the subformula is false. The negation operator is the exclamation mark. For instance, !human(x) is a negated formula.

The elementary formulas are atoms and comparisons:

  • the most common form of an atom is the name of a predicate followed by a parenthesised list of arguments, e.g., p(x, y) (see Section 10.1);
  • a comparison of two expressions is quite similar to that found in conventional programming languages, e.g., x * y >= x + z (see Section 10.2).

An important special case of comparisons is equality between a function application and another expression: such a comparison is equivalent to an atom. For example, if predicate salary is a function that maps names to salaries, then there are two equivalent ways to write an atom that expresses, roughly, the salary of John Smith is x:

  • salary("John", "Smith", x)
  • salary["John", "Smith"] = x

(See Section 9.4 and Section 8.2.)

5.2. The Complete Grammar

The grammar given below has been abstracted from the actual parser. The names of nonterminals should not always be taken at face value: for instance, ArithmeticFormula is a formula that need not have anything to do with arithmetic (e.g., it might be a comparison of two strings).

In most cases it might be more instructive to look at the partial syntactic descriptions given in the sections describing the various constructs, even though not all of them are mutually consistent.

CompilationUnit =  { Clause "." } .

Clause = Constraint | Rule | Fact .

Constraint = Formula "->"
           | Formula "->" Formula
           | "->" Formula .

Rule = Formula "<-" [ [ Identifier ArgString ] Formula ]
     | InfixAggregate .

InfixAggregate = Expression InfixAggOp Expression .

InfixAggOp = "+=" | "min=" | "max=" | "&=" | "|=" .

Fact = Formula .

Formula = Conjunction { ";" Conjunction } .

Conjunction = Unary { "," Unary } .

Unary = [ "!" ]  Unit
      | [ "!" ]  "(" Formula ")" .

Unit = Atom | HierFormula | ArithmeticFormula .

Atom = [ Delta ] AtomName "(" [ ArgumentList ] ")" .

Delta = "+" | "-" | "^" .

AtomName = Name | Identifier .

Name = Identifier ":" IdentifierOrKeyword { ":"  IdentifierOrKeyword } .

ArgumentList = Expressions [ ";" [ Expressions ] ]
             | ";" Expressions
             | Identifier ":" (Identifier | Constant) .

Expressions = Expression { "," Expression } .


HierFormula = Atom "{" HierBody "}"
            | ArithmeticFormula "{" HierBody "}"
            | "(" Formula ")" "{" HierBody "}" .

HierBody = HierAtom { "," HierAtom } .


HierAtom = [ Delta ] AtomName "(" [ HierArgumentList ] ")"
         | AppExpression "=" HierExpression .

HierArgumentList = HierExpressions [ ";" ]
                 | [ HierExpressions ] ";" HierExpressions .

HierExpressions = HierExpression { "," HierExpression } .

HierExpression = Expression | HierFormula .


ArithmeticFormula = Expression ComparisonOperator Expression
                       { ChainedComparisonOperator Expression } .

ComparisonOperator = "=" | "!=" | ChainedComparisonOperator .

ChainedComparisonOperator = "<" | ">" | "<=" | ">=" .

Expression = AdditiveExpression { "orelse" AdditiveExpression } .

AdditiveExpression =
   MultiplicativeExpression { ("+" | "-") MultiplicativeExpression } .

MultiplicativeExpression = UnaryExpression { ("*" | "/") UnaryExpression } .

UnaryExpression = Constant
                | Identifier
                | AppExpression
                | "(" Expression ")" .

AppExpression = [ Delta ] PredicateExpression "[" [ HierExpressions ] "]" .

PredicateExpression =
   (Name | Identifier) [ Stage ] { RestOfPredicateExpression } .

RestOfPredicateExpression = "[" "]"
                          | "[" HierExpressions "]" [ Stage ]
                          | "[" Name [ Stage ] "]" [ Stage ] .

Stage = "@" StageName .

StageName = "INITAL" | "PREVIOUS" | "PREV" | "FINAL" | "START" | BranchName .

BranchName = BasicIdentifier .

Constant = StringConstant
         | BooleanConstant
         | Number .

BooleanConstant = "true" | "false" .

Number = [ "-" ] RealConstant
       | [ "-" ] DecimalConstant
       | [ "-" ] IntegerConstant .

IdentifierOrKeyword = Identifier
                    | "<-"
                    | "->"
                    | "!"
                    | "true"
                    | "false" .


%% ----- Lexemes:

ArgString = "<<" NotGT2 ">>" .

Identifier = [ "`" ] BasicIdentifier { ":" BasicIdentifier } .

BasicIdentifier = ("_" | Letter) { "_" | Letter | Digit } .

IntegerConstant = Digit { Digit } .

DecimalConstant = IntegerConstant "d"
                | [ IntegerConstant ] "." IntegerConstant [ "d" ] .

RealConstant = IntegerConstant Exponent
             | [ IntegerConstant ]  "." IntegerConstant  Exponent
             | IntegerConstant [ Exponent ] "f"
             | [ IntegerConstant ]  "." IntegerConstant  [ Exponent ] "f" .

Exponent = ("e" | "E") [ "+" | "-" ] IntegerConstant .

StringConstant = BasicStringConstant { BasicStringConstant } .

BasicStringConstant = '"' { NotDQuoteOrNewline } '"'
                    | '"""' NotTripleQuote '"""' .


%% ----- Classification of characters:

Digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .

HexDigit = Digit
         | "A" | "B" | "C" | "D" | "E" | "F"
         | "a" | "b" | "c" | "d" | "e" | "f" .

%% Letter  is any Unicode character considered as a letter by Java.

%% NotDQuoteOrNewline  is any character other than a double quote or a newline.

%% NotTripleQuote  is any sequence of characters that does not contain a
%%                    sequence of three successive double quotes.

%% NotGT2  is any sequence of characters that doesn't include ">>".

Chapter 6. Primitive Types

LogiQL supports the following primitive types: string, int, float, decimal, datetime, boolean and int128.

string

A string value is a finite sequence of Unicode characters. Valid Unicode characters are listed in the section called “String Literals”. For example, "hello" and "Ce ça" are strings.

There two ways of writing strings. In the first a string is a sequence of Unicode characters (other than line breaks) between two quote characters. Two adjacent strings will be concatenated together (strings are considered adjacent even when they are separated by a sequence of whitespace characters, including line breaks). For example, writing "foo" "bar" is equivalent to writing a single string literal, "foobar".

The second form of string is any sequence of Unicode characters, including line breaks, between two groups of three quote characters. (More details are provided in this Note.)

Example 6.1. Multi-line string literal

"""foo
bar"""

int

Values of type int are the usual mathematical integers. These values are internally represented as 64 bit two's complement binary numbers. They must therefore be in the range -(2^63) through 2^(63)-1, or -9223372036854775808 through 9223372036854775807.

The operations use the integer arithmetic of the underlying hardware. So, for example, a result greater than the maximum value may be silently converted to a negative number. (Arithmetic on values of type decimal is different: an overflow on decimal computations causes logical failure.)

Note

By saying that an operation "causes logical failure" (or "fails") we mean that the atom or comparison that contains the expression will not be true. (See also this Note.)

float

Values of type float are 64-bit binary floating-point numbers, represented according to the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

As described in the section called “Binary Floating-Point Literals”, a floating-point is written with an integer part, a decimal fractional part with a dot prefix and/or an exponent over base 10 with prefix E, and a suffix f. For example, a floating-point number can be written as 2.71f with a decimal part, 2E3f with an exponent part (equivalent to 2000.0f), or 2.71E3f with both decimal and exponent parts (equivalent to 2710.0f).

The internal representation of floating-point numbers uses the base 2.

If an arithmetic operation produces NaN (for instance, through division by 0), the value is not stored and the operation fails.

If an arithmetic operation produces a -0, it is converted and stored as a +0.

If an arithmetic operation results in a number that cannot be stored using a 64-bit representation, it is stored as either positive infinity +inf or negative infinity -inf. Two +inf values, even if resulting from different computations, are considered equal. Similarly with two -inf values. Note that LogiQL does not provide any literal representation for infinite float values. Nor is there any explicit way to check whether a value is infinite.

Example 6.2. Positive and negative infinity

The lb script shown below (see Section 19.1)

create --unique

addblock <doc>
  p(x) -> float(x).
  q(x) -> float(x).
  r(x) -> float(x).
  s(x) -> boolean(x).
  t(x) -> boolean(x).

  p(x) <- float:pow[10.0f, 2000f] = x.
  q(x) <- float:pow[ 5.0f, 2001f] = x.
  r(x) <- float:pow[-5.0f, 2001f] = x.

  s(true)  <- p(x),  q(y), x = y.
  s(false) <- p(x),  q(y), x != y.

  t(true)  <- q(x),  r(y), x = y.
  t(false) <- q(x),  r(y), x != y.
</doc>
print p
print q
print r
print s
print t

close --destroy

will produce the following output

inf
inf
-inf
true
false

decimal

decimal is a numeric type that uses a fixed-point decimal representation. Values of type decimal have upto 18 digits before the decimal point and upto 18 digits after the decimal point. So decimal can represent numbers in increments of 10^-18 in the range from -10^18 + 1 through 10^18 - 1.

This means that the smallest number that can be represented is -999,999,999,999,999,999.999,999,999,999,999,999 and the largest is 999,999,999,999,999,999.999,999,999,999,999,999. Between 1 * 10^-18 and 3 * 10^-18, there is exactly one decimal value 2 * 10^-18. Note that this is different from floating point numbers, which are very dense near 0, but very imprecise for large numbers.

Decimal arithmetic is exact within the 10^-18 resolution. This means that addition and subtraction follow the usual laws of associativity and commutativity (as long as the results of all operations are within range). Multiplication with integers is also exact: associativity, commutativity, and distributive laws apply (again, as long as all intermediate results stay within range). Note that this is not the case for floating point numbers.

Example 6.3. Floating point operations are not commutative

The following LogiQL code shows that subtraction and addition of float values are not commutative. While 0.1 + 0.2 - 0.2 - 0.1 should be zero, it is not.

I[] = 0.1f.
II[] = 0.2f.
I_plus_II[] = I[] + II[].
I_plus_II_minus_II[] = I_plus_II[] - II[].
zero[] = I_plus_II_minus_II[] - I[].
// zero[] =  2.7755575615628914e-17 but should be 0.0f 

Due to these characteristics, decimal is usually preferable for financial applications, whereas float is generally better for scientific applications.

When an arithmetic operation on decimal values overflows (i.e., the resulting value is out of range), then this is handled as logical failure. This means that the computation does not have a result, but the overflow is not an error that aborts the transaction. For example, computing 999999999999999999.0 + 1.0 does not result in a value. Logical failure is also used for conversion operations, such as string:decimal:convert, float:decimal:convert, and int:decimal:convert (see Section 7.9).

When the result of an arithmetic operation on decimal numbers has more than 18 digits after a decimal point, then it is rounded "half towards zero", as illustrated in the following example:

Example 6.4. 

 12345678.95d / 100000000000000000d  =   0.000000000123456789
 12345678.96d / 100000000000000000d  =   0.00000000012345679
-12345678.95d / 100000000000000000d  =  -0.000000000123456789
-12345678.96d / 100000000000000000d  =  -0.00000000012345679

Note

The lb tool currently exhibits the following behaviour:

  • It will print small decimals by using exponent notation. For example

    addblock <doc>
      q(0.0000001).
    </doc>
    print q

    will print out 1e-7, even though q(1e-7). would not be accepted by the compiler.

  • It will print only 9 decimal digits of a decimal number, rounding the result. For example,

    addblock <doc>
      q(999999999999999999.999999999999999999).
    </doc>
    print q

    will print out 1000000000000000000.000000000, even though that number is outside the range of decimal values.

datetime

datetime is a built-in type whose values are points in time (with a resolution of one microsecond). datetime values are always stored as UTC values. Many of the built-in predicates have parameters for specifying the time-zone when creating or displaying datetime values.

Note

Before LogicBlox version 4.4.5 the resolution of datetime values was 1 second.

boolean

Type boolean has only two values: true and false.

int128

int128 is the type of 128 bit integers. The range is -170141183460469231731687303715884105728 through 170141183460469231731687303715884105727.

Note

The primary purpose of this type is to serve as an efficient replacement for strings that are Universally Unique Identifiers (UUID). This sort of use is supported by two special operations:

  • int128:from_uuid_string[s] converts the UUID string s to a 128-bit integer;

  • int128:to_uuid_string[i] converts a 128-bit integer created by int128:from_uuid_string back to the original UUID string.

(See also Section 27.1.1 and Section 27.1.2.)

Chapter 7. Built-in Operations

LogiQL supports built-in operations over values of primitive types. These operations allow you to compare values, perform arithmetic, manipulate strings and datetime values. Some operations are supported through symbolic operators, such as +, -, etc. These operators also have their corresponding "textual" forms, such as int:add, decimal:subtract. Other operations, e.g., datetime:parse, are available only in textual form. This chapter explains these operations and their semantics.

The signatures of some functions are specified using the following form:

function_name[arg1, arg2, ..., argn] = val -> type1(arg1), type2(arg2) ,..., typen(argn), typev(val).

This can be interpreted as follows: the function function_name takes n arguments, and returns a value. The types of these arguments (either primitive types described in Chapter 6 or entity types described in Section 8.3) are type1, type2, etc., respectively. The type of the function's value is typev

This particular style of signature specification follows from the way LogiQL supports predicate declaration, detailed in Section 8.1.

7.1. Comparisons

Comparison Operations

LogiQL provides comparison operations for primitive types datetime, boolean, string, int, float, and decimal. Equality and disequality (= and !=) are also available for entity types (Section 8.3).

Only values of the same type can be compared. For example, a value of type float cannot be directly compared with a value of type decimal, neither could a value of type decimal be directly compared with a value of type boolean or type string.

If one wants to compare values of different types, an explicit conversion must first be performed (see the section called “X:Y:convert”). Please note that there is no facility for conversion between different entity types.

Comparison operations are binary, and each of them has two forms: an infix operator and a prefix binary relation. The following table lists the supported comparison operations in both forms.

OperatorPrefix formExample
=T:eq_2(x, y)3 = x, int:eq_2(3, x)
!=T:ne_2(x, y)3.0d != x, decimal:ne_2(3.0d, x)
<T:lt_2(x, y) 3 < 4, int:le_2(3, 4)
>T:gt_2(x, y) 3.1f > x, float:ge_2(3.1f, x)
<=T:le_2(x, y)3 <= 4, int:le_2(3, 4)
>=T:ge_2(x, y)"ab" >= "aab", string:ge_2("ab", "aab")

Note

T can be datetime, string, boolean, int, float, or decimal. Comparing two values of different types will cause a compile-time error:

Example 7.1. Invalid use of comparison

3.0d > 4.57f
"3.0" = 3.0

Comparison Functions

LogiQL also supports prefix binary functions for comparison, where the two arguments of the function are the values being compared, and the result of the function is a boolean value. If a comparison holds, the result is true; otherwise, it is false.

Prefix functionExample
T:eq_3[x,y] = zint:eq_3[3, 4] = false, string:eq_3["foo", "bar"] = false
T:ne_3[x,y] = zint:ne_3[3, 4] = true, string:ne_3["foo", "bar"] = true
T:lt_3[x,y] = zint:lt_3[4, 4] = false, float:lt_3[3.1f, 5.2f] = true
T:gt_3[x,y] = zint:gt_3[4, 4] = false, float:gt_3[3.1f, 5.2f] = false
T:le_3[x,y] = zint:le_3[4, 4] = true, float:le_3[3.1f, 5.2f] = true
T:ge_3[x,y] = zint:ge_3[4, 4] = true, float:ge_3[3.1f, 5.2f] = false

Note T can be datetime, string, boolean, int, float, or decimal. Just as for comparison relations above, an attempt to use these functions to compare two values of different types will cause a compile-time error.

Comparison of strings according to locale

The comparisons for strings described above simply compare the underlying byte representations in lexicographic order. We also provide operations that compare Unicode strings according to locale.

These take the form of three-argument operations, where the first argument is the locale, and the other two are the strings to be compared. There are no corresponding infix operators.

OperationMeaning
ustring:eq_3(loc, x, y)x = y
ustring:ne_3(loc, x, y)x != y
ustring:lt_3(loc, x, y) x < y
ustring:gt_3(loc, x, y)x > y
ustring:le_3(loc, x, y)x <= y
ustring:ge_3(loc, x, y)x >= y

Just as for other comparisons, LogiQL supports also the corresponding boolean functions: ustring:eq_4[localeString, leftString, rightString] = booleanResult, etc.

Currently the locale is only the language, specified in the form of a string. The string is a lower case two-letter or three-letter ISO-639 code: "en" for English, "sv" for Swedish etc.

Example 7.2. Comparison of Unicode strings according to locale

create --unique

addblock <doc>
  data1("lęk").  data2("łąk").

  data1("mot").  data2("måt").

  p(a, b) <- a < b,                    data1(a), data2(b).  // bytes
  q(a, b) <- ustring:lt_3("en", a, b), data1(a), data2(b).  // English
  r(a, b) <- ustring:lt_3("pl", a, b), data1(a), data2(b).  // Polish
  s(a, b) <- ustring:lt_3("sv", a, b), data1(a), data2(b).  // Swedish
</doc>
echo --- Byte ordering:
print p
echo --- English:
print q
echo --- Polish:
print r
echo --- Swedish:
print s

close --destroy

In Polish the letter ł comes after l. In Swedish the letter å comes after o.

created workspace 'unique_workspace_2017-02-22-19-26-48'
added block 'block_1Z1BK9GV'
--- Byte ordering:
"lęk" "måt"
"lęk" "łąk"
"mot"  "måt"
"mot"  "łąk"
--- English:
"lęk" "måt"
--- Polish:
"lęk" "måt"
"lęk" "łąk"
--- Swedish:
"lęk" "måt"
"mot"  "måt"
deleted workspace 'unique_workspace_2017-02-22-19-26-48'

The number of tuples is different for each locale, because the orderings are different. For example, in the Polish locale "lęk" precedes "łąk", but in the Swedish locale "łąk" precedes "lęk" (though most Swedes are unaware of this).

Condition Function

LogiQL supports a condition function, T:cond[c,x,y] = z, where T can be any type: primitive or entity (Section 8.3). cond has a semantics similar to if-then-else in other programming languages: if condition c holds, then variable z is assigned the value of x, otherwise, z gets the value of y.

Example 7.3. 

int:cond[true, 3, 5] = z

instantiates z to 3, but

string:cond[int:gt_3[3, 5], "yes", "no"] = z

instantiates z to "no", because the condition evaluates to false.

7.2. Arithmetic Operators

LogiQL provides a set of built-in functional predicates that allow us to do arithmetic on the primitive types. Additionally, the language features some basic operators that allow us to write traditional arithmetic expressions.

For example, x = y + z * 2 is equivalent to x = y + (z * 2). If x, y and z are integers, then this is equivalent to x = int:add[y, int:multiply[z, 2]].

Since the underlying built-in functional predicates (such as int:add) are used much less frequently than the operators, we describe them in this section.

+

The infix addition operator + can be applied to two values of the same numeric type. If you wish to add values of two different numeric types, e.g., int and decimal, an explicit conversion must be applied to one of the arguments. See Section 7.9 for details about conversion functions.

Additionally, LogiQL supports the prefix form T:add[x,y] = z, where T can be int, decimal, or float.

For int and float, the result of an addition is correct only if it is within range (see the section called “int” and the section called “float”). If the correct result is out of range, the returned result will be some incorrect value, and there will be no indication of an error. For example, 9223372036854775807 + 1 will yield -9223372036854775808.

For addition of decimal values, overflow is treated as logical failure (i.e., it does not produce a value).

*

The infix multiplication operator * can be applied to two values of the same numeric type. If you wish to apply the operator to two values of different numeric types, e.g., int and decimal, an explicit conversion must be applied to one of the arguments. See Section 7.9 for details about conversion functions.

Additionally, LogiQL supports three prefix forms of multiplication:

  • T:multiply[x, y] = z, where T can be int, decimal, or float, and both x and y mut be of type T.
  • decimal:int:multiply[x, y] = z, where a decimal value x can be multiplied by an int value y to produce a decimal value z.
  • int:decimal:multiply[x, y] = z, where an int value x can be multiplied by a decimal value y to produce a decimal value z.

For multiplications that result in a decimal value, overflow is treated as logical failure (i.e., no value is produced). For other forms of multiplication the result is correct only if it is within range.

Example 7.4. 

3 * 4 = 12
int:multiply[3, float:int:convert[3.7f]] = 9
decimal:int:multiply[0.12345, 100] = 12.345
int:decimal:multiply[100, 0.12345] = 12.345

-

The infix subtraction operator - can be applied to two values of the same numeric type. If you wish to subtract values of two different numeric types, e.g., int and decimal, an explicit conversion must be applied to one of the arguments. See Section 7.9 for details about conversion functions.

Additionally, LogiQL supports the prefix form T:subtract[x,y] = z, where T can be int, decimal, or float.

For decimal subtraction overflow/underflow is treated as logical failure (i.e., no value is produced). For the other numeric types the result is correct only if it is within range.

Note that LogiQL does not support (unary) arithmetic negation. To compute -x for variable x, you can subtract x from 0 (i.e., 0 - x) or use the T:negate function.

/

The infix division operator / can be applied to two values of the same numeric type. If you wish to divide two values of different numeric types, e.g., int and decimal, an explicit conversion must be applied to one of the arguments. See Section 7.9 for details about conversion functions.

Additionally, LogiQL supports the prefix form T:divide[x,y] = z, where T can be int, decimal, or float.

For decimal fixed-point division, overflow is handled as logical failure (i.e., no value is produced). For other types the value is correct only if it is within range (except when we divide by zero: see below).

For integer and decimal types, division by 0 results in logical failure (i.e., no value is produced). For the floating point type, division by 0 succeeds, resulting in infinity for positive values, and negative infinity for negative values.

Example 7.5. 

   -> !decimal:divide[12.0, 0.0] = _.

7.3. Arithmetic Functions

Note

Not all the functions that are defined for int are also defined for int128. You can assume a function is defined for int128 only if that type is mentioned in the description.

decimal:int:multiply

decimal:int:multiply[x, y] = z -> decimal(x), int(y), decimal(z).

See the section called “*”.

float:arccos

float:arccos[x]=y -> float(x), float(y).

Calculates the arccosine of x (for x in the interval [-1,1]).

Example 7.6. 

float:arccos[1f] = 0f.

float:arcsin

float:arcsin[x]=y -> float(x), float(y).

Calculates the arcsine of x (for x in the interval [-1,1]).

Example 7.7. 

float:arcsin[0f] = 0f.

float:arctan

float:arctan[x]=y -> float(x), float(y).

Calculates the arctangent of x.

Example 7.8. 

float:arctan[0f] = 0f.

float:exp

float:exp[x] = y -> float(x), float(y).

The exponential function, inverse of the natural log function. The x-th power of the Euler number e is y.

Example 7.9. 

float:exp[2.0f] = 7.3890569893f.
float:ln[float:exp[2.0f]] = 2.0f.

float:ln, float:log, float:log10

float:ln[x]    = y -> float(x), float(y).
float:log[x]   = y -> float(x), float(y).
float:log10[x] = y -> float(x), float(y).

Logarithm functions: ln and log compute the natural logarithm of x, and log10 computes the base-10 logarithm of x.

float:pow

float:pow[x,y] = z -> float(x), float(y), float(z).

Computes x raised to the power y, and assigns the result to z.

Example 7.10. 

float:pow[10.0f,2.0f] = 100.0f.

float:sqrt

float:sqrt[x] = y -> float(x), float(y).

Calculates the non-negative square root of x, for non-negative x. If x is negative, there is a logical failure (i.e., no value is produced).

Example 7.11. 

float:sqrt[4.0f] = 2.0f.

float:tan

float:tan[x] = y -> float(x), float(y).

Calculates the tangent of x (given in radians).

Example 7.12. 

float:tan[pi[]/2.0f] = 1.0f.

int:decimal:multiply

int:decimal:multiply[x, y] = z -> int(x), decimal(y), decimal(z).

See the section called “*”.

int:mod

int:mod[x, y] = z -> int(x), int(y), int(z).

Computes the remainder of dividing x by y. If y = 0, there is a logical failure (i.e., no value is produced).

Example 7.13. 

int:mod[10, 2] = 0.
int:mod[11, 2] = 1.
int:mod[7, 10] = 7.
int:mod[-3, 10] = -3.
int:mod[3, -10] = 3.
int:mod[123, -10] = 3.
int:mod[-123, -10] = -3.
int:mod[-123, 10] = -3.
-> !int:mod[5, 0] = _.

T:abs

int:abs[x]     = y -> int(x),     int(y).
decimal:abs[x] = y -> decimal(x), decimal(y).
float:abs[x]   = y -> float(x),   float(y).
int128:abs[x]  = y -> int128(x),  int128(y).

Calculates the absolute value y of x.

Example 7.14. 

decimal:abs[-0.532] = 0.532.
float:abs[-0.532f] = 0.532f.
int:abs[-532] = 532.

T:add

int:add[x, y]     = z -> int(x),     int(y),     int(z).
decimal:add[x, y] = z -> decimal(x), decimal(y), decimal(z).
float:add[x, y]   = z -> float(x),   float(y),   float(z).
int128:add[x, y]  = z -> int128(x),  int128(y),  int128(z).

See the section called “+”.

T:divide

int:divide[x, y]     = z -> int(x),     int(y),     int(z).
decimal:divide[x, y] = z -> decimal(x), decimal(y), decimal(z).
float:divide[x, y]   = z -> float(x),   float(y),   float(z).
int128:divide[x, y]  = z -> int128(x),  int128(y),  int128(z).

See the section called “/”.

T:max

int:max[x, y]     = z -> int(x),     int(y),     int(z).
decimal:max[x, y] = z -> decimal(x), decimal(y), decimal(z).
float:max[x, y]   = z -> float(x),   float(y),   float(z).
int128:max[x, y]  = z -> int128(x),  int128(y),  int128(z).

Returns the larger of x or y.

T:min

int:min[x, y]     = z -> int(x),     int(y),     int(z).
decimal:min[x, y] = z -> decimal(x), decimal(y), decimal(z).
float:min[x, y]   = z -> float(x),   float(y),   float(z).
int128:min[x, y]  = z -> int128(x),  int128(y),  int128(z).

Returns the smaller of x or y.

T:multiply

int:multiply[x, y]     = z -> int(x),     int(y),     int(z).
decimal:multiply[x, y] = z -> decimal(x), decimal(y), decimal(z).
float:multiply[x, y]   = z -> float(x),   float(y),   float(z).
int128:multiply[x, y]  = z -> int128(x),  int128(y),  int128(z).

See the section called “*”.

T:negate

int:negate[x]     = r -> int(x),     int(r).
decimal:negate[x] = r -> decimal(x), decimal(r).
float:negate[x]   = r -> float(x),   float(r).
int128:negate[x]  = r -> int128(x),  int128(r).

The result r equals -x.

Example 7.15. Arithmetic negation

create --unique

addblock <doc>
  p[]  = int:negate[3].
  pp[] = int:negate[int:negate[3]].

  q[]  = decimal:negate[3.2d].
  qq[] = decimal:negate[decimal:negate[3.2d]].

  r[] = float:negate[3.4e-3].
  rr[] = float:negate[float:negate[3.4e-3]].
</doc>
print p
print pp
print q
print qq
print r
print rr

close --destroy

will print

-3
3
-3.2
3.2
-0.0034
0.0034

T:range (arithmetic sequences)

int:range(start, end, stride, x) ->
      int(start), int(end), int(stride), int(x).
decimal:range(start, end, stride, x) ->
      decimal(start), decimal(end), decimal(stride), decimal(x).
float:range(start, end, stride, x) ->
      float(start), float(end), float(stride), float(x).

The range() predicate is usually used to specify a sequence, for example,

int:range(100, 200, 1, x)

specifies the values x = 100, 101, 102, ..., 200. To get the sequence 100, 110, 120, ... one would use stride = 10.

T:range(start, end, stride, x) is true when there is an integer i such that x = start + i * stride, and

  • either stride > 0 and (start ≤ x ≤ end);
  • or stride < 0 and (end ≤ x ≤ start).

The integer i is restricted to the range 0 ≤ i < 2^64.

Note

Using stride < 0 is deprecated, since the order in which values are produced is unspecified; int:range(10, 0, -1, x) is logically equivalent to int:range(0, 10, 1, x).

Note

A straightforward implementation would set x = start and then repeat x += stride. This does not work properly for floating-point types:

  • Repeated addition results in cumulative precision loss. For example, in float:range(-1000000f, 0f, 0.1f, x), which counts from -1000000 to 0 in increments of 0.1, we'd expect a sequence of 10000001 numbers, with the last two values being -0.1, 0; but if repeated addition is used, one gets a sequence of 9296504 numbers, with the last two values being -0.158386, -0.0583855. (This is the result in version 3.9.x of the system.)

  • In floating-point arithmetic, if the stride is sufficiently small then one has x + stride = x, i.e., adding the stride makes no change to x. For example, in float:range(1.0f,1.00001f,1E-10f,x), adding 1.0 + 1E-10 yields 1.0. (Version 3.9.x goes into an infinite loop for this example.)

The current LogicBlox system implements range() for floating-point types by calculating x = start + i * stride for each value to avoid the precision loss of x += stride. It uses an efficient search algorithm to enumerate values of i that produce distinct values of x; for example, float:range(1.0f, 1.00001f, 1E-10f, x) finds i = 0, 597, 1789, 2981, 4173, ... and produces the values x = 1.0000000000, 1.0000001192, 1.0000002384, ..., etc.

T:subtract

int:subtract[x, y]     = z -> int(x),     int(y),     int(z).
decimal:subtract[x, y] = z -> decimal(x), decimal(y), decimal(z).
float:subtract[x, y]   = z -> float(x),   float(y),   float(z).
int128:subtract[x, y]  = z -> int128(x),  int128(y),  int128(z).

See the section called “-”.

7.4. Rounding Functions

7.4.1. Rounding Functions for Decimal Fixed-Point

decimal:ceil

decimal:ceil[x] = y -> decimal(x), decimal(y).

Rounds the decimal value x to the smallest integral value y not smaller than x. This means that positive as well as negative values are rounded towards positive infinity.

Overflow is handled as logical failure. For the ceil function this is only possible on inputs between 10^18 - 1 and 10^18.

Example 7.16. 

decimal:ceil[10.9] = 11.0.
decimal:ceil[11.0] = 11.0.
decimal:ceil[-10.9] = -10.0.

decimal:floor

decimal:floor[x] = y -> decimal(x), decimal(y).

Rounds the decimal value x to the largest integral value y not greater than x. This means that positive as well as negative values are rounded towards negative infinity.

Overflow is handled as logical failure. For the floor function this is only possible on inputs between -10^18 and -10^18 + 1.

Example 7.17. 

decimal:floor[ 10.9] =  10.0.
decimal:floor[ 11.0] =  11.0.
decimal:floor[-10.9] = -11.0.

decimal:round

decimal:round[x] = y -> decimal(x), decimal(y).

Rounds to the nearest integer; a tie is broken by rounding towards an even value. Overflow does not produce a value.

This function is equivalent to decimal:roundHalfToEven.

Example 7.18. 

decimal:round[10.9] = 11.0.
decimal:round[10.1] = 10.0.
decimal:round[10.5] = 10.0.
decimal:round[11.5] = 12.0.
decimal:round[-10.9] = -11.0.
decimal:round[-10.1] = -10.0.
decimal:round[-10.5] = -10.0.
decimal:round[-11.5] = -12.0.

decimal:round2

decimal:round2[x, n] = y -> decimal(x), int(n), decimal(y).

Rounds the decimal value at the n'th digit of the fraction. For n = 0, this function is equilvalent to decimal:round.

A tie is broken towards an even value. This function is equivalent to decimal:roundHalfToEven2.

If n is negative, then the integral part is rounded instead. For example, for n=1 the decimal value is rounded to a multiple of ten, and for n=2 to a multiple of hundred.

Overflow does not produce a value.

Example 7.19. 

decimal:round2[123.4567,  3] = 123.457 .
decimal:round2[123.4567,  2] = 123.46 .
decimal:round2[123.4567,  1] = 123.5 .
decimal:round2[123.4567,  0] = 123.0 .
decimal:round2[123.4567, -1] = 120 .
decimal:round2[123.4567, -2] = 100 .

decimal:roundHalf*

decimal:roundHalfToEven  [x   ] = y -> decimal(x),         decimal(y).
decimal:roundHalfToEven2 [x, n] = y -> decimal(x), int(n), decimal(y)

decimal:roundHalfToOdd   [x   ] = y -> decimal(x),         decimal(y).
decimal:roundHalfToOdd2  [x, n] = y -> decimal(x), int(n), decimal(y)

decimal:roundHalfDown  [x   ] = y -> decimal(x),         decimal(y).
decimal:roundHalfDown2 [x, n] = y -> decimal(x), int(n), decimal(y)

decimal:roundHalfUp    [x   ] = y -> decimal(x),         decimal(y).
decimal:roundHalfUp2   [x, n] = y -> decimal(x), int(n), decimal(y)

decimal:roundHalfAwayFromZero  [x   ] = y -> decimal(x),         decimal(y).
decimal:roundHalfAwayFromZero2 [x, n] = y -> decimal(x), int(n), decimal(y)

decimal:roundHalfTowardZero    [x   ] = y -> decimal(x),         decimal(y).
decimal:roundHalfTowardZero2   [x, n] = y -> decimal(x), int(n), decimal(y).

Rounds similarly to decimal:round and decimal:round2, but with specific tie-breaking policies.

Example 7.20. 

decimal:roundHalfUp[0.49999] = 0.0.
decimal:roundHalfUp[-0.49999] = 0.0.
decimal:roundHalfUp[22.7] = 23.0.
decimal:roundHalfUp[-22.7] = -23.0.
decimal:roundHalfUp[22.5] = 23.0.
decimal:roundHalfUp[-22.5] = -22.0.

7.4.2. Rounding Functions for Binary Floating-Point

float:ceil

float:ceil[x] = y -> float(x), float(y).

Rounds the float value x to the smallest integral value y not smaller than x. This means that positive as well as negative values are rounded towards positive infinity.

Example 7.21. 

float:ceil[3.141592f] = 4.0f.
float:ceil[-0.532f] = 0.0f.

float:floor

float:floor[x] = y -> float(x), float(y).

Rounds the decimal value x to the largest integral value y not greater than x. This means that positive as well as negative values are rounded towards negative infinity.

Example 7.22. 

float:floor[3.141592f] = 3.0f.
float:floor[-0.532f] = -1.0f.

float:round

float:round[x] = y -> float(x), float(y).

Rounds to the nearest integer; a tie is broken by rounding towards an even value.

Example 7.23. 

float:round[0.121f] = 0.0f.
float:round[0.873f] = 1.0f.
float:round[0.5f] = 0.0f.    // Round to 0 (since 0 is even)
float:round[1.5f] = 2.0f.    // Round to 2 (since 2 is even)

float:roundHalfUp

float:roundHalfUp[x] = y -> float(x), float(y).

Rounds to the nearest integer; a tie is broken by rounding towards positive infinity.

Example 7.24. 

float:roundHalfUp[23.5f] = 24.0f
float:roundHalfUp[-23.5f] = -23.0f

7.5. Integer Bit Manipulation Functions

int:bit_not

int:bit_not[x] = y -> int(x), int(y).

Returns the bitwise complement of x. That is, all bits that were 0 in x become 1 in y, and those that were 1 in x become 0 in y.

Example 7.25. 

int:bit_not[4] = -5.
int:bit_not[0] = -1.
int:bit_not[-1] = 0.

int:bit_and

int:bit_and[x, y] = z -> int(x), int(y), int(z).

Returns the bitwise logical 'and' of the integers x and y. If bits at the same offset of x and y are both 1, then the bit at the same offset of z will also be 1. Otherwise the bit at that offset will be 0.

Example 7.26. 

int:bit_and[2, 4] = 0.
int:bit_and[1, 3] = 1.
int:bit_and[0b1111, 0b1010] = 0b1010.

int:bit_or

int:bit_or[x, y] = z -> int(x), int(y), int(z).

Returns the bitwise logical 'or' of the integers x and y. If either bit at the same offset of x and y is 1, then the bit at the same offset of z will also be 1. Otherwise the bit at that offset will be 0.

Example 7.27. 

int:bit_or[2, 4] = 6.
int:bit_or[-1, 3] = -1.
int:bit_or[0b1010, 0b0101] = 0b1111.

int:bit_xor

int:bit_xor[x, y] = z -> int(x), int(y), int(z).

Returns the bitwise logical 'xor' of the integers x and y. If one of the bits at the same offset of x and y is 1, but not both, then the bit at the same offset of z will also be 1. Otherwise the bit at that offset will be 0.

Example 7.28. 

int:bit_xor[-1, -1] = 0.
int:bit_xor[2, 4] = 6.
int:bit_xor[0b1111, 0b1010] = 0b0101.
int:bit_xor[0b1010, 0b0101] = 0b1111.

int:bit_lshift

int:bit_lshift[x, y] = z -> int(x), int(y), int(z).

Returns the bitwise left shift of the integer x by the integer y. This shifts the bits of x "higher" by y positions. As a result, y bits are truncated off the high end of x and then the y low bits are padded with 0.

It is required that y be greater than or equal to 0 and less than the maximum number of bits in the integer (64 in the current implementation of the integer type). If y is outside this range, no value is bound for z.

Example 7.29. 

int:bit_lshift[1, 1] = 2.
int:bit_lshift[2, 1] = 4.
int:bit_lshift[-1, 1] = -2.
int:bit_lshift[0b1111, 1] = 0b11110.
int:bit_lshift[0b1010, 2] = 0b101000.

int:bit_rshift

int:bit_rshift[x, y] = z -> int(x), int(y), int(z).

Returns the bitwise right shift of the integer x by the integer y. This shifts the bits of x "lower" by y positions. As a result, y bits are truncated off the low end of x and then the y high bits are padded with 0.

It is required that y be greater than or equal to 0 and less than the maximum number of bits in the integer (64 in the current implementation of the integer type). If y is outside this range, no value is bound for z.

Example 7.30. 

int:bit_rshift[1, 1] = 0.
int:bit_rshift[2, 1] = 1.
int:bit_rshift[-1, 1] = 9223372036854775807.
int:bit_rshift[-2, 1] = 4611686018427387903.
int:bit_rshift[0b1111, 1] = 0b111.
int:bit_xor[0b1010, 2] = 0b10.

int:bit_rshiftse

int:bit_rshiftse[x, y] = z -> int(x), int(y), int(z).

Returns the bitwise right shift of the integer x by the integer y, but with sign extension. This shifts the bits of x "lower" by y positions. As a result, y bits are truncated off the low end of x and then the y high bits are padded with 0 if the high (sign) bit of x is 0, otherwise they are padded with 1.

It is required that y be greater than or equal to 0 and less than the maximum number of bits in the integer (64 in the current implementation of the integer type). If y is outside this range, no value is bound for z.

Example 7.31. 

int:bit_rshiftse[1, 1] = 0.
int:bit_rshiftse[2, 1] = 1.
int:bit_rshiftse[-1, 1] = -1.
int:bit_rshiftse[-2, 1] = -1.
int:bit_rshiftse[-4, 1] = -2.
int:bit_rshiftse[0b1111, 1] = 0b111.
int:bit_rshiftse[0b1010, 2] = 0b10.

7.6. String Operations

Note

The basic string operations work directly on the byte representations of strings. In the UTF-8 encoding some Unicode characters are represented by sequences of two or three bytes, so these string operations may give surprising results.

For cases where this matters, we have recently added "Unicode-aware" string operations. Each of these operations has a name that begins with ustring: instead of string:. So, for example, string:length[s] will be the number of bytes in string s, whereas ustring:length[s] will be the number of Unicode characters in s.

At least for the time being, LogiQL sports both variants of the operations. This preserves upwards compatibility with older versions, but also allows one to use the basic operations in applications for which full support of Unicode is not needed: the basic operations are somewhat more efficient.

Backward compatibility is broken in the case of regular expressions. Earlier versions of string:match, string:matches and string:rewrite used the Posix regular expression library. To avoid spurious differences with ustring:matches etc. all these operations now use Perl-like syntax for regular expressions (in the version supported by the ICU library). However, replacement patterns can be written both in the Perl-like form (e.g., "$1") and in Posix-like form (e.g., "\\1" or """\1"""). For all but the more exotic cases the difference between the two forms of regular expressions should not be noticeable.

Warning

The "Unicode-aware" operations have been introduced in LogicBlox version 4.4.5, and are not available in earlier versions.

string:add

string:add[s1, s2] = r -> string(s1), string(s2), string(r). 

The result r is the concatenation of the strings s1 and s2.

Example 7.32. 

string:add["the", string:add[" quick", " brown"]] = "the quick brown".

Note

The symbol + may be used as shorthand.

Example 7.33. 

"the" + " quick" + " brown" = "the quick brown".

string:alpha_num / ustring:alpha_num

string:alpha_num(s) -> string(s).
ustring:alpha_num(s) -> string(s). 

True if the string s consists of digits and/or letters only. string:alpha_num does not handle multi-byte characters properly, ustring:alpha_num does.

Example 7.34. 

string:alpha_num("aZ123").   // true
string:alpha_num("ah!").     // false
string:alpha_num(" aZ123").  // false

Example 7.35. 

create --unique

addblock <doc>
  data("10 części!").

  p(n, ch) <-  string:alpha_num(ch), ch = ustring:at[s, n], data(s).
  q(n, ch) <- ustring:alpha_num(ch), ch = ustring:at[s, n], data(s).
</doc>
print data
echo --- p ---
print p
echo --- q ---
print q

close --destroy

yields

created workspace 'unique_workspace_2017-02-18-13-26-37'
added block 'block_1Z1BKQCU'
"10 części!"
--- p ---
0 "1"
1 "0"
3 "c"
4 "z"
7 "c"
8 "i"
--- q ---
0 "1"
1 "0"
3 "c"
4 "z"
5 "ę"
6 "ś"
7 "c"
8 "i"
deleted workspace 'unique_workspace_2017-02-18-13-26-37'

string:at / ustring:at

string:at[s, n] = c -> string(s), int(n), int(c).
ustring:at[s, n] = c -> string(s), int(n), string(c).

string:at returns the integer value of the n-th byte in the sequence of bytes that represent the characters of the string s. Counting starts from 0.

ustring:at returns a string containing the single character that is n-th character of the string s. Counting starts from 0.

If n is larger than the number of bytes/characters in the string, the operation silently fails.

Note

Both string:at[s, n] = c and ustring:at[s, n] = c can be used with the variable n unbound, to iterate over the bytes/characters in a string.

Example 7.36.  string:at vs. ustring:at

create --unique

addblock <doc>
  euro("12\u20AC!").

  p(n, c)  <- c  =  string:at[s, n], euro(s).
  q(n, ch) <- ch = ustring:at[s, n], euro(s).
</doc>
print euro
echo -- p --
print p
echo -- q --
print q

close --destroy

results in:

created workspace 'unique_workspace_2017-02-06-00-44-16'
added block 'block_1Z1BJU0Q'
"12€!"
-- p --
0 49
1 50
2 -30
3 -126
4 -84
5 33
-- q --
0 "1"
1 "2"
2 "€"
3 "!"
deleted workspace 'unique_workspace_2017-02-06-00-44-16'

string:hash

string:hash[s] = n -> string(s), int(n). 

Returns a hash value for the given string.

string:length / ustring:length

string:length[s] = n -> string(s), int(n).
ustring:length[s] = n -> string(s), int(n). 

string:length returns the number of bytes used for representing the characters of string s.

ustring:length returns the number of Unicode characters in string s.

Example 7.37. 

create --unique

addblock <doc>
  euro("\u20AC").

  p(n) <- n =  string:length[s], euro(s).
  q(n) <- n = ustring:length[s], euro(s).
</doc>
print euro
print p
print q

close --destroy

The result is:

created workspace 'unique_workspace_2017-02-05-20-48-24'
added block 'block_1Z1BKPXR'
"€"
3
1
deleted workspace 'unique_workspace_2017-02-05-20-48-24'

string:like / string:notlike / ustring:like / ustring:notlike

string:like(s, pattern)    -> string(s), string(pattern).
string:notlike(s, pattern) -> string(s), string(pattern).

ustring:like(s, pattern)    -> string(s), string(pattern).
ustring:notlike(s, pattern) -> string(s), string(pattern). 

string:like checks if the string s matches a wildcard pattern. A pattern may contain the character _, which represents a single character, or %, which matches any string of zero or more characters.

Use \% and \_ to match the percent-sign character and the underscore character, respectively. (Note that within a single-line string the \ itself must be escaped, so to match every string that contains a percent sign one would use the pattern "%\\%%" or """%\%%""".)

string:notlike can be used to find strings that do not match a pattern.

string:like and string:notlike treat both the string and the pattern as a sequence of bytes, which may lead to unexpected results in the presence of multi-byte Unicode characters. ustring:like and ustring:notlike treat both strings as sequences of Unicode characters. See Example 7.39.

Example 7.38. 

string:like("226-223-4921", "226%").     // true: starts with "226"
string:like("226-223-4921", "%223%").    // true: contains substring "223"
string:like("226-223-4921", "%21").      // true: ends with "21"
string:like("226-223-4921", "%2%3%4%").  // true: contains '2', '3', '4'

string:notlike("226-223-4921", "226%").  // false: starts with "226"
string:notlike("226-223-4921", "99-%").  // true: does not start with "99-"

string:like("_%", "\\_%\\%").            // true: starts with '_', ends with '%'
string:like("_%.", "\\_%\\%" ).          // false: does not end with '%'

Example 7.39.  string:like vs. ustring:like

create --unique

addblock <doc>
  data("området").
  data("öken").

  p(s) <-  string:like(s, "%r_d%"),  data(s).
  q(s) <- ustring:like(s, "%r_d%"),  data(s).
</doc>
echo --- p ---
print p
echo --- q ---
print q

close --destroy

string:like does not treat å as a single character:

created workspace 'unique_workspace_2017-02-18-21-13-34'
added block 'block_1Z1BMEKY'
--- p ---
--- q ---
"området"
deleted workspace 'unique_workspace_2017-02-18-21-13-34'

string:lower / ustring:lower

string:lower[s] = r -> string(s), string(r).
ustring:lower[s] = r -> string(s), string(r). 

string:lower converts the characters of the string to lower-case.

ustring:lower converts the characters of the string to lower case, treating Unicode characters properly.

Example 7.40. 

string:lower["PSMITH"] = "psmith".

See also Example 7.62.

string:match / ustring:match

string:match(s, regex) -> string(s), string(regex).
ustring:match(s, regex) -> string(s), string(regex). 

True if the string s matches regex. The latter is a regular expression.

Just like string:matches, string:match can give unexpected results if the regular expression contains multi-byte Unicode characters. ustring:match will work as expected.

Example 7.41. string:match

string:match("226-223-4921", """\d+-\d+-\d+""").  // true
string:match("226-223-4921", """\l+-\l+-\l+""").  // false

Example 7.42.  string:match vs. ustring:match

create --unique

addblock <doc>
  p(1) <-   string:match("źźźżż", "(ź*)(ż*)").
  q(1) <-  ustring:match("źźźżż", "(ź*)(ż*)").
</doc>
echo -- p --
print p
echo -- q --
print q

close --destroy

string:match cannot find a match, because the asterisk is applied not to the entire character, but only to its last byte:

created workspace 'unique_workspace_2017-02-17-20-08-46'
added block 'block_1Z1BIXZU'
-- p --
-- q --
1
deleted workspace 'unique_workspace_2017-02-17-20-08-46'

string:matches / ustring:matches

string:matches[s, regex, n] = subexp
   -> string(s), string(regex), int(n), string(subexp).

ustring:matches[s, regex, n] = subexp
   -> string(s), string(regex), int(n), string(subexp). 

regex is a regular expression.

Any one of these two operations matches the string s against the regular expression regex and returns the n-th matching subexpression. The operation fails if regex does not match the entire string s.

"Matching subexpression" number 0 is the entire string. The other matching subexpressions are the substrings that match those parts of regex that are enclosed in parentheses. These parts are numbered from left to right, and nested parenthesized parts are numbered before the next parenthesized part.

If an alternative subexpression matches, then there is no attempt to match the next alternative, if any. Those alternatives that did not take part in the final match are shown as matching empty strings.

All this is illustrated by the example below.

If the regular expression contains multi-byte Unicode characters, then string:matches might not give the expected effect, unlike ustring:matches. See Example 7.44.

Note

Both string:matches[s, regex, n] = subexp and ustring:matches[s, regex, n] = subexp can be used with the variable n unbound, to iterate over the matching subexpressions.

Warning

Before LogicBlox version 4.4.2 string:matches did not work quite correctly even on strings with single-byte characters. (ustring:matches was not yet available.)

Example 7.43. string:matches

The following rules attempt to match the same string with various regular expressions.

p(n, s) <- s = string:matches["aaabc", "((a*)b.)", n].
q(n, s) <- s = string:matches["aaabc", "(a*)(b)(.)", n].
r(n, s) <- s = string:matches["aaabc", "a", n].
s(n, s) <- s = string:matches["aaabc", "(a)", n].
t(n, s) <- s = string:matches["aaabc", "a*..", n].
u(n, s) <- s = string:matches["aaabc", "(a*)(..)", n].
v(n, s) <- s = string:matches["aaabc", "(a)*(..)", n].
w(n, s) <- s = string:matches["aaabc", "(a*)((a)(.))(.)", n].
x(n, z) <- z = string:matches["aaabc", "(a*(b.*))|aa(.*)", n].
y(n, z) <- z = string:matches["aaabc", "(a(b.*))|aa(.*)", n].
z(n, z) <- z = string:matches["aaabc", "(a*)(b).", n], n = 1.

The results are:

+- p
|  0,"aaabc"
|  1,"aaabc"
|  2,"aaa"
+- p [end]
+- q
|  0,"aaabc"
|  1,"aaa"
|  2,"b"
|  3,"c"
+- q [end]
+- r
+- r [end]
+- s
+- s [end]
+- t
|  0,"aaabc"
+- t [end]
+- u
|  0,"aaabc"
|  1,"aaa"
|  2,"bc"
+- u [end]
+- v
|  0,"aaabc"
|  1,"a"
|  2,"bc"
+- v [end]
+- w
|  0,"aaabc"
|  1,"aa"
|  2,"ab"
|  3,"a"
|  4,"b"
|  5,"c"
+- w [end]
+- x
|  0,"aaabc"
|  1,"aaabc"
|  2,"bc"
|  3,""
+- x [end]
+- y
|  0,"aaabc"
|  1,""
|  2,""
|  3,"abc"
+- y [end]
+- z
|  1,"aaa"
+- z [end]

The first tuple of p contains the entire string. The second tuple contains the part that matches the entire regular expression, because the latter is enclosed in parentheses. The third tuple contains the part that matches the nested parenthesized expression (a*).

The difference between u and v is caused by the placement of parentheses below or around the Kleene star (i.e., *).

In the rule for x it is the first alternative that matches; in the rule for y it is the second.

z contains just one tuple, because n is given a specific value.

Example 7.44.  string:matches vs. ustring:matches

create --unique

addblock <doc>
  p(n, z) <-  z =  string:matches["źźźżż", "(ź*)(ż*)", n].
  q(n, z) <-  z = ustring:matches["źźźżż", "(ź*)(ż*)", n].
</doc>
echo -- p --
print p
echo -- q --
print q

close --destroy

string:matches cannot find a match, because the asterisk is applied not to the entire character, but only to its last byte:

created workspace 'unique_workspace_2017-02-13-20-52-07'
added block 'block_1Z1BJN1M'
-- p --
-- q --
0 "źźźżż"
1 "źźź"
2 "żż"
deleted workspace 'unique_workspace_2017-02-13-20-52-07'

string:quote

string:quote[s, q] = r -> string(s), string(q), string(r). 

Format string s by escaping non-printable characters and adding quotes. q is the quote symbol, and must be either an empty string, or a string consisting of one single-byte character.

If q is an empty string, then the result is formatted by escaping non-printable characters, but quotes are not added.

The quote can be any character. It is always escaped when found inside the original string, to distinguish it from its uses as a quote. For example, string:quote["x", "x"] = "x\\xx", that is, a string consisting of four characters: x, \, x and x.

See also string:unquote.

Example 7.45. 

string:quote["ab", "\""] = "\"ab\"".

Example 7.46. 

create --unique

addblock <doc>
  p(x) <- x = string:quote["ab\t\rc", "\""].
  q("\"ab\\t\\rc\"").
  r(x) <- p(x), q(x).
</doc>
print p
print q
print r

close --destroy

yields

created workspace 'unique_workspace_2017-03-08-20-51-26'
added block 'block_1Z1BKMRX'
"\"ab\t\rc\""
"\"ab\t\rc\""
"\"ab\t\rc\""
deleted workspace 'unique_workspace_2017-03-08-20-51-26'

In other words, the characters in p are: ", a, b, \, t, \, r, c and ". Before quoting, the string contained a tabulator and a carriage return, after quoting it does not.

string:quote_excel

string:quote_excel[s, q] = r -> string(s), string(q), string(r). 

Very similar to string:quote, but formats the string according to the common practices of Excel. This means that when the quote character specified by q is encountered inside the string, it is not preceded by a backslash, but duplicated.

See also string:unquote_excel.

Example 7.47. 

string:quote_excel["a\"b", "\""] = "\"a\"\"b\"".
string:quote["x", "x"] = "xxxx".

string:replace

string:replace[s, substr, replace] = r
   -> string(s), string(substr), string(replace), string(r). 

Replace occurrences of the substring substr in s with replace. If substr does not occur in s, the result is s. If substr is the empty string, it is deemed to occur before each character and after the last character of s; if s is empty, then the empty substring occurs once.

Example 7.48. 

string:replace["you've got to be kidding", "you've", "you have"]
   = "you have got to be kidding".
string:replace["Warning: person %s not found", "%s", "Waldo"]
   = "Warning: person Waldo not found".
string:replace["babanana", "banana", "nana"] = "banana".
string:replace["xy", "", "What?"] = "What?xWhat?yWhat?".
string:replace["", "", "Really!"] = "Really!".

string:rewrite / ustring:rewrite

string:rewrite[s, regex, replace] = r
   -> string(s), string(regex), string(replace), string(r).

ustring:rewrite[s, regex, replace] = r
   -> string(s), string(regex), string(replace), string(r). 

Rewrite the string s using the regular expression regex and the replacement text replace. If the replacement text is to contain the dollar character ($), it must be escaped with a backslash (\).

More precisely: whenever regex matches a substring of s, that substring is replaced by the expansion of replace. (We use the term "expansion", because replace may include references to parenthesised groups in regex, e.g., \1 is the first such group. See the description of string:matches for more information about how the groups are numbered.)

The matching is "greedy", e.g., "(a*)" will match "aaa" once.

The operation fails only when the regular expression is malformed.

In the presence of multi-byte Unicode characters string:rewrite is brittle. For example, the value of string:rewrite["ééÉÉÉ", "(éé*)(É*)", """\1"""] is ééÉÉ" (which is wrong!). Worse, string:rewrite["éééÉÉÉ", "(éé*)(É*)", """\1"""] may trigger an error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 4: 'utf8' codec can't decode byte 0xa9 in position 4: invalid start byte in field: blox.connect.StringColumn.values

ustring:rewrite handles all Unicode characters properly.

Example 7.49. 

string:rewrite["afoobbarc", ".*(foo).*(bar).*", """_\1_\2_"""] = "_foo_bar_".
string:rewrite["aa b  c   d\r\ne\n\n", """[ \t\r\n]+""", " "] = "aa b c d e ".
string:rewrite["fabcd", "(a)(b)", """\2"""] = "fbcd".
string:rewrite["faabcbd", "(a*)(b)", """\1"""] = "faacd".            // 4
string:rewrite["eee", "e", "\\0"] = "eee".                           // 5
string:rewrite["fff", "f", "\\1"] = "".                              // 6
ustring:rewrite["ééÉÉÉ", "(éé*)(É*)", """\\2\1"""] = """\2éé""".     // 7
string:rewrite["""abcd""", """(a.c)d""", """A\$C\1\$"""] = "A$Cabc$".
string:rewrite["abcd", "(a.c)d", "A\\$C\\1\\$"] = "A$Cabc$".

In the fourth example the pattern matches two substrings: aab and the second b. After the second match the value of \1 is the empty string, which replaces the b.

In the fifth example \0 represents the entire string.

In the sixth example \1 is the empty string, because there are no parenthesised groups in the pattern.

In the seventh example \\2\1 represents \, followed by 2, followed by matching group number 1.

string:split / ustring:split

string:split[s, delim, n] = token
   -> string(s), string(delim), int(n), string(token).

ustring:split[s, delim, n] = token
   -> string(s), string(delim), int(n), string(token). 

delim must be a string consisting of one character. If it isn't, the operation will silently fail.

Each of these two operations splits the string s at every occurrence of delim and returns the n-th part (counting from 0). If delim does not occur in the string, the entire string is deemed to be part 0. If delim is the last character in the string, then the last part will be an empty string.

Unlike ustring:split, string:split does not work well with multi-byte Unicode characters. For example, in string:split[s, "ä", n] the delimiter will not be recognised as a single character, and the operation will fail.

Note

Both string:split[s, delim, n] = tok and ustring:split[s, delim, n] = tok can be used with the variable n unbound, to iterate over the parts of the string.

Example 7.50. string:split

create --unique

addblock <doc>
  p(n, s) <-  s = string:split["First,Part,23,, Test", ",", n].
</doc>
print p

close --destroy

results in:

created workspace 'unique_workspace_2017-02-17-22-49-35'
added block 'block_1Z1BIXOX'
0 "First"
1 "Part"
2 "23"
3 ""
4 " Test"
deleted workspace 'unique_workspace_2017-02-17-22-49-35'

Example 7.51.  string:split vs. ustring:split

create --unique

addblock <doc>
  data("AaÄäÅåÖöOo").
  data2("chcieć mieć coś to żadna cześć").

  p(n, part) <- part =  string:split[s, "ä", n],  data(s).
  q(n, part) <- part = ustring:split[s, "ä", n],  data(s).
  r(n, part) <- part = ustring:split[s, "ć", n],  data2(s).
</doc>
echo -- p --
print p
echo -- q --
print q
echo -- r --
print r

close --destroy

The results:

created workspace 'unique_workspace_2017-02-13-00-22-50'
added block 'block_1Z1BUQ4N'
-- p --
-- q --
0 "AaÄ"
1 "ÅåÖöOo"
-- r --
0 "chcie"
1 " mie"
2 " coś to żadna cześ"
3 ""
deleted workspace 'unique_workspace_2017-02-13-00-22-50'

string:substring / ustring:substring

string:substring[s, start, length] = r
   -> string(s), int(start), int(length), string(r).

ustring:substring[s, start, length] = r
   -> string(s), int(start), int(length), string(r). 

ustring:substring returns the substring of s starting at position start and containing up to length characters. If length < 0, the result is an empty string; if start < 0, the operation fails.

Note that the first character of a string has position 0, so that, say, ustring:substring[s, 0, ustring:length[s]] returns the full string.

string:substring is very similar, but counts bytes rather than characters. This makes it rather brittle. For example, if an lb script contains

data("AaÄäÅåÖöOo").

p1_4(sub) <- sub = string:substring[s, 1, 4],  data(s).

then the result will be an error message:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 3: 'utf8' codec can't decode byte 0xc3 in position 3: unexpected end of data in field: blox.connect.StringColumn.values

Example 7.52. string:substring

string:substring["SKD50194", 0, 3] = "SKD".
string:substring["SKD50194", 3, 5] = "50194".

// If start+length is past the end of the string, result will be
// less than length characters.
string:substring["SKD50194", 3, 10] = "50194".

// If start is past the end of the string, an empty string results.
string:substring["SKD50194", 10, 4] = "".

Example 7.53. ustring:substring

create --unique

addblock <doc>
  data("AaÄäÅåÖöOo").

  q(sub)      <- sub = ustring:substring[s,  0, ustring:length[s]], data(s).
  q1_4(sub)   <- sub = ustring:substring[s,  1,  4],                data(s).
  q0_14(sub)  <- sub = ustring:substring[s,  0, 14],                data(s).
  q10_14(sub) <- sub = ustring:substring[s, 10, 14],                data(s).
</doc>
print q
print q1_4
print q0_14
print q10_14

close --destroy

The result:

created workspace 'unique_workspace_2017-02-12-20-29-06'
added block 'block_1Z1BJ3LS'
"AaÄäÅåÖöOo"
"aÄäÅ"
"AaÄäÅåÖöOo"
""
deleted workspace 'unique_workspace_2017-02-12-20-29-06'

string:T:convert

string:boolean:convert[s]  = v -> string(s), boolean(v).
string:datetime:convert[s] = v -> string(s), datetime(v).
string:decimal:convert[s]  = v -> string(s), decimal(v).
string:int:convert[s]      = v -> string(s), int(v).
string:float:convert[s]    = v -> string(s), float(v). 

Converts a string to a boolean, datetime, decimal, integer, or floating-point value. If the string is unparseable, or if the resulting value would be out of range, then conversion fails (i.e., produces no value).

Example 7.54. 

string:float:convert["3.1415926535897931"] = 3.1415926535897931
string:int:convert["100"] = 100

Please note that the strings are not the LogiQL literals described in Section 4.2. For example, the value of string:float:convert["1"] is 1.0, but string:float:convert["1f"] will fail.

The system guarantees that converting a value to a string and back will preserve the value.

Example 7.55. 

string:float:convert[
     float:string:convert[3.1415926535897931f]] = 3.1415926535897931.

Note

It is not guaranteed that a conversion from string to value to string will preserve the original string: precision might be lost.

Example 7.56. 

float:string:convert[
   string:float:convert["3.141592653589793238"]] = 3.1415926535897931.
                                            ^^^                     ^

For conversions to boolean all inputs are case-insensitive. The strings "true", "TrUe", etc. are recognized as true. The strings "false", "fALSE", etc. are recognized as false. If s is a string whose lowercase form is neither "false" nor "true", then s is not converted to a boolean (i.e., the attempt to convert fails).

In the case of conversion to decimal numbers (i.e., string:decimal:convert) the string can contain an exponent part of the form en or En, where -20 <= n <= 20. If n has a value outside this range, the conversion will fail. Otherwise the digits will be shifted by n and rounded to 18 decimal digits. If the result is outside the range for LogiQL decimal integers, the conversion will fail. If the result is smaller than 1e-18 and greater than -1e-18, then the produced value will be 0.

Example 7.57. Conversion from string to decimal

string:decimal:convert["0.001e20"] = 100000000000000000
string:decimal:convert["1e-18"]) = 0.000000000000000001
string:decimal:convert["0.1e-18"]) = 0
string:decimal:convert["12345678901234567890.12345678901234565000e-2"] =
     123456789012345678.901234567890123456
string:decimal:convert["12345678901234567890.12345678901234565001E-2"] =
     123456789012345678.901234567890123457
string:decimal:convert["12345678901234567890.1234567890123456e-1"] will fail

string:trim

string:trim[s] = r -> string(s), string(r). 

Removes leading and trailing whitespace from a string.

Example 7.58. 

string:trim["  some words "] = "some words".

string:unquote

string:unquote[s, q] = r -> string(s), string(q), string(r). 

Parses string s by un-escaping non-printable characters and removing outer quotes (if any). q is the quote symbol, and must be either an empty string or a string consisting of one single-byte character.

If q is the empty string, no outer quotes are removed. Otherwise, if s does not begin with the character in q, no outer quotes are removed. In both cases, the escaped non-printable characters are still converted back to their non-escaped form: for example, the two-character sequence \ followed by t is converted to the single character \t (i.e., the tabulator).

If q is correctly formed, then the following holds:

string:unquote[string:quote[s, q], q] = s

Example 7.59. 

string:unquote["\"a\tb\"", "\""] = "a\tb".

string:unquote_excel

string:unquote_excel[s, q] = r -> string(s), string(q), string(r). 

Similar to string:unquote, except that the input string is assumed to conform to Excel-style conventions (i.e., the quote is represented by two quotes rather than an escaped quote).

Example 7.60. 

string:unquote_excel["\"a\"\"b\"", "\""]="a\"b".

string:upper / ustring:upper

string:upper[s] = r -> string(s), string(r).
ustring:upper[s] = r -> string(s), string(r). 

string:upper converts the characters of the string to upper case.

ustring:upper converts the characters of the string to upper case, treating multi-byte Unicode characters properly.

Example 7.61. 

string:upper["Psmith"] = "PSMITH".

Example 7.62. Converting to lower and upper case

create --unique

addblock <doc>
  data("AaÄäÅåÖöOo").

  pu(us)  <- us = string:upper[ s],  data(s).
  pl(ls)  <- ls = string:lower[us],  pu(us).

  qu(us)  <- us = ustring:upper[s],   data(s).
  ql(ls)  <- ls = ustring:lower[us],  qu(us).
</doc>
print data
print pu
print pl
print qu
print ql

close --destroy
ustring:upper["Psmith"] = "PSMITH".

The result is

created workspace 'unique_workspace_2017-02-12-01-13-06'
added block 'block_1Z1BJ2QD'
"AaÄäÅåÖöOo"
"AAÄäÅåÖöOO"
"aaÄäÅåÖöoo"
"AAÄÄÅÅÖÖOO"
"aaääååööoo"
deleted workspace 'unique_workspace_2017-02-12-01-13-06'

7.7. Boolean Operations

boolean:and

boolean:and[x, y] = z -> boolean(x), boolean(y), boolean(z).

Returns true if both x and y are true, false otherwise.

boolean:bitxor

boolean:bitxor[x, y] = z -> boolean(x), boolean(y), boolean(z).

Returns true if exactly one of x and y is true, false otherwise.

boolean:or

boolean:or[x, y] = z -> boolean(x), boolean(y), boolean(z).

Returns false if both x and y are false, true otherwise.

boolean:not

boolean:not[x] = y -> boolean(x), boolean(y).

Returns true if x is false, false otherwise.

7.8. Date/Time Operations

Some of the operations described in this section refer to time zones.

There are three ways to specify a time zone in LogiQL:

  • As a string with an offset from UTC time. For example, "+5:30:OO" (which can also be written as "+5:30" or "+530"). The offset must refer to an existing time zone, possibly adjusted for Daylight Saving Time. (A list of generally recognised time zones is loaded by the system from a configuration file.) The granularity of time zone offsets is 15 minutes, so an offset such as "-1:25" is probably the result of a typo.

  • As a string with an abbreviation of the official name of the time zone. For example, "EST" (which stands for "Eastern Standard Time"). Unfortunately, the names of some of the officially recognised time zones have identical abbreviations (e.g., "IST" is used both for Indian Standard Time and for Israeli Standard Time): the system will treat the abbreviation as representing only one of such zones (the choice is made in a configuration file).

  • As a string with an abbreviation of the of the official name of the time zone when it is in Daylight Saving Time. For example, Detroit is normally in the time zone denoted by EST, but during certain periods of the year the time zone can be denoted by EDT. Not every time zone has such a secondary designation, and if it does exist, then its uniqueness cannot be guaranteed.

datetime:{PART} / datetime:{PART}TZ

datetime:year[d]   = i -> datetime(d), int(i).
datetime:month[d]  = i -> datetime(d), int(i).
datetime:day[d]    = i -> datetime(d), int(i).
datetime:hour[d]   = i -> datetime(d), int(i).
datetime:minute[d] = i -> datetime(d), int(i).
datetime:second[d] = i -> datetime(d), int(i).

datetime:yearTZ[d, tz]   = i -> datetime(d), string(tz), int(i).
datetime:monthTZ[d, tz]  = i -> datetime(d), string(tz), int(i).
datetime:dayTZ[d, tz]    = i -> datetime(d), string(tz), int(i).
datetime:hourTZ[d, tz]   = i -> datetime(d), string(tz), int(i).
datetime:minuteTZ[d, tz] = i -> datetime(d), string(tz), int(i).
datetime:secondTZ[d, tz] = i -> datetime(d), string(tz), int(i).

datetime:PART extracts a component from a datetime and returns it as an integer. The versions with a timezone parameter return the part for the given timezone.

Example 7.63. 

datetime:year[datetime:create[2013, 10, 31, 15, 30, 0]] = 2013.
datetime:hour[datetime:create[2013, 10, 31, 15, 30, 0]] = 15.

datetime:hourTZ[datetime:create[2013, 10, 31, 15, 30, 0], "PST"] = 8.

datetime:add / datetime:subtract

datetime:add[old, offset, resolution] = new
   -> datetime(old), int(offset), string(resolution), datetime(new).
datetime:subtract[old, offset, resolution] = result
   -> datetime(old), int(offset), string(resolution), datetime(result).

datetime:add adds time to a datetime, while datetime:subtract subtracts time. For both, the resolution argument is a string representing the resolution of the offset. Valid resolutions are "years", "months", "days", "hours", "minutes", and "seconds".

Example 7.64. 

d[] = datetime:create[2013, 10, 31, 15, 30, 0].
datetime:add[d[], 1, "days"] = x  -> datetime:day[x] = 1.
datetime:add[d[], 1, "days"] = x  -> datetime:month[x] = 11.

datetime:subtract[d[], 1, "days"] = x -> datetime:day[x] = 30.
datetime:subtract[d[], 1, "days"] = x -> datetime:month[x] = 10.

datetime:create / datetime:createTZ

datetime:create[y, m, d, h, m, s] = dt
   -> int(y), int(m), int(d), int(h), int(m), int(s), datetime(dt).
datetime:createTZ(y, m, d, h, m, s, tz] = dt
   -> int(y), int(m), int(d), int(h), int(m), int(s), string(tz), datetime(dt).

datetime:create creates a datetime value with the given year, month, day, hour, minute, and second specified. Without a timezone parameter, the time is assumed to be UTC.

Example 7.65. 

d[] = datetime:create[2013, 10, 31, 15, 30, 0].
-> datetime:string:convert[d[]] = "2013-10-31 15:30:00 UTC".

dtz[] = datetime:createTZ[2013, 10, 31, 15, 30, 0, "PST"].
-> datetime:string:convert[dtz[]] = "2013-10-31 22:30:00 UTC".

datetime:export / datetime:import

datetime:export[d] = i -> datetime(d), int(i).
datetime:import[i] = d -> int(i), datetime(d).

datetime:export converts a datetime into an opaque integer value. The integer value i can be used via datetime:import to recreate the datetime value.

Example 7.66. 

i[] = datetime:export[datetime:create[2013, 10, 31, 15, 30, 0]].
datetime:import[i[]] = d -> datetime:hour[d] = 15.

datetime:format

datetime:format[dt, format] = result
   -> datetime(dt), string(format), string(result).
datetime:formatTZ[dt, format, tz] = result
   -> datetime(dt), string(format), string(tz), string(result).

datetime:format formats a datetime into a string format (in case of formatTZ based on a given timezone) according to the specified datetime format.

The table below lists all the supported date facet format flags.

Format SpecifierDescriptionExample
%a Abbreviated weekday name. "Mon" => Monday
%A Long weekday name. "Monday"
%b Abbreviated month name. "Feb" => February
%B Full month name. "February"
%d Day of the month as decimal 01 to 31.
%D Equivalent to %m/%d/%y.
%G This has the same format and value as %Y, except that if the ISO week number belongs to the previous or next year, that year is used instead.
%g Like %G, but without century (two digits, not four).
%j Day of year as decimal from 001 to 366 for leap years, 001 to 365 for non-leap years. "060" => Feb-29
%m Month name as a decimal 01 to 12. "01" => January
%u The day of the week as a decimal, range 1 to 7, Monday being 1.
%U The week number of the current year as a decimal number, range 00 to 53, starting with the first Sunday as the first day of week 01. In 2005, January 1st was a Saturday, so it would be treated as belonging to week 00 of 2005. (Week 00 spans 2004-Dec-26 to 2005-Jan-01: this also happens to be week 53 of 2004). The following Sunday would be in week 01.
%V The ISO 8601:1988 week number of the current year as a decimal number, range 01 to 53, where week 1 is the first week that has at least 4 days in the current year, and with Monday as the first day of the week.
%w Weekday as decimal number 0 to 6. "0" => Sunday
%W Week number 00 to 53 where Monday is first day of week 1. Sunday, January 2, 2005 would be treated as belonging to week 00; the following Monday would be week 01.
%y Two digit year. "05" => 2005
%Y Four digit year. "2005"
%Y-%b-%d Default date format. "2005-Apr-01"
%Y%m%d ISO format. "20050401"
%Y-%m-%d ISO extended format. "2005-04-01"

The table below lists all the supported time facet format flags.

Format SpecifierDescriptionExample
%- Placeholder for the sign of a duration. Only displays when the duration is negative. "-13:15:16"
%+ Placeholder for the sign of a duration. Always displays for both positive and negative. "+13:15:16"
%f Fractional seconds are always used, even when their value is zero. "13:15:16.000000"
%F Fractional seconds are used only when their value is not zero. (Note: does not print '.' if fractional seconds is zero) "13:15:16", but "05:04:03.001234"
%H The hour as a decimal number using a 24-hour clock (range 00 to 23).
%I The hour as a decimal number using a 12-hour clock (range 01 to 12).
%k The hour (24-hour clock) as a decimal number (range 0 to 23); single digits are preceded by a blank.
%l The hour (12-hour clock) as a decimal number (range 1 to 12); single digits are preceded by a blank.
%M The minute as a decimal number (range 00 to 59).
%O The number of hours in a time duration as a decimal number (range 0 to max. representable duration); single digits are preceded by a zero.
%p Either AM or PM according to the given time value, or the corresponding strings for the current locale.
%P Like %p but in lowercase: am or pm or a corresponding string for the current locale.
%r The time in a.m. or p.m. notation. In the POSIX locale this is equivalent to %I:%M:%S %p.
%R The time in 24-hour notation (%H:%M).
%s Seconds with fractional seconds. "59.000000"
%S Seconds only. "59"
%T The time in 24-hour notation (%H:%M:%S).
%q ISO time zone (output only). "-0700" // Mountain Standard Time
%Q ISO extended time zone (output only). "-05:00" // Eastern Standard Time
%z Abbreviated time zone (output only). "MST" // Mountain Standard Time
%Z Full time zone name (output only). "EDT" // Eastern Daylight Time
%ZP Posix time zone string. "EST-05EDT+01,M4.1.0/02:00,M10.5.0/02:00"

Formatting is based on the Boost library. More documentation can be found on the Boost website.

datetime:part / datetime:partTZ

datetime:part[d, part] = i -> datetime(d), string(part), int(i).
datetime:partTZ[d, part, tz] = i
   -> datetime(d), string(part), string(tz), int(i).

datetime:part extracts a component specified by the part parameter from a datetime and returns it as an integer. The versions with a timezone parameter return the part for the given timezone. Valid values of part are: "year", "month", "day", "hour", "minute", and "second".

Example 7.67. 

d[] = datetime:create[2013, 10, 31, 15, 30, 0].
d[] = x -> datetime:part[x, "year"] = 2013.
d[] = x -> datetime:part[x, "hour"] = 15.

-> datetime:partTZ[d[], "hour", "PST"] = 8.

datetime:now

datetime:now[] = d -> datetime(d).

datetime:now contains the start date and time of the transaction. Within a single transaction, it will always have the the same value. Note that there is no guarantee that datetime:now reflects the ordering of multiple transactions. For example, there could be transactions with commit order T1, T2, T3, but with T1:datetime:now[] > T2:datetime:now[].

Example 7.68. 

datetime:string:convert[datetime:now[]] = "2013-07-31 19:38:47 UTC"

datetime:offset

datetime:offset[dFrom, dTo, resolution] = offset
   -> datetime(dFrom), datetime(dTo), string(resolution), int(offset).

Calculates the difference between dTo and dFrom in a certain resolution. Available resolutions: "years", "months", "days", "hours", "minutes" and "seconds".

For year, month, and, day we truncate both dates to the resolution and compute their difference then. That is, the difference in days between "2013-01-01 23:59" and "2013-01-02 00:01" is 1.

For hour, minute, and second we compute the offset in seconds and then round the result according to the resolution.

Example 7.69. 

d1[] = datetime:create[2013, 10, 31, 15, 30, 0].
d2[] = datetime:create[2013, 11,  1, 20, 30, 0].
-> datetime:offset[d1[], d2[], "hours"] = 29.
-> datetime:offset[d1[], d2[], "minutes"] = 29 * 60.
-> datetime:offset[d1[], d2[], "days"] = 1.
-> datetime:offset[d1[], d2[], "months"] = 1.

d3[] = datetime:create[2012, 12, 31, 23, 59, 59].
d4[] = datetime:create[2013,  1,  1,  0,  0,  0].
-> datetime:offset[d3[], d4[], "hours"] = 0.
-> datetime:offset[d3[], d4[], "minutes"] = 0.
-> datetime:offset[d3[], d4[], "seconds"] = 1.
-> datetime:offset[d3[], d4[], "days"] = 1.
-> datetime:offset[d3[], d4[], "months"] = 1.
-> datetime:offset[d3[], d4[], "years"] = 1.

datetime:parse

datetime:parse[s, format] = dt -> string(s), string(format), datetime(dt).

datetime:parse parses a string representing a datetime according to a specified datetime format.

The number of supported formatting patterns is limited compared to datetime:format.

Format SpecifierDescriptionExample
%d Day of month as decimal 1-31
%m Month of year as decimal 1-12
%b Abbreviated month name Feb
%q Quarter of year as decimal 1-4
%Y Year as 4 digits
%y Year as 2 digits
%H Hours as 2 digits in 24-hour clock (00-23)
%M Minutes as 2 digits (00-59)
%s Seconds as 2 digits (00-59) and fractional seconds as 1 to 6 digits where seconds and fractional seconds are separated by '.' (dot) 12.345678
%S Seconds as 2 digits (00-59)
%f Fractional seconds as 1 to 6 digits 012345
%F Fractional seconds as 1 to 6 digits prefixed by '.' (dot), or empty if no leading '.' (dot). Note: this includes strings with and without fractional seconds .012345 and empty string
%Q Time zone specifier. Either ISO time zone or ISO extended time zone or abbreviated time zone -0700 and -07:00 and UTC
%Z Same as %Q

datetime:string:convert

datetime:string:convert[d] = s -> datetime(d), string(s).

Converts d into a human-readable representation (UTC-based).

Example 7.70. 

d[] = datetime:create[2013, 10, 31, 15, 30, 0].
-> datetime:string:convert[d[]] = "2013-10-31 15:30:00 UTC".

string:datetime:convert

string:datetime:convert[s] = d -> string(s), datetime(d)

Converts the given string into a datetime value. The string should conform to the format: "%Y-%m-%d %H:%M:%S%F %Q".

In case the time-zone is missing, the datetime value is assumed to be in the UTC timezone.

Example 7.71. 

d[] = string:datetime:convert["2013-10-31 15:30:00 UTC"].
-> datetime:string:convert[d[]] = "2013-10-31 15:30:00 UTC".

d[] = string:datetime:convert["2013-10-31 15:30:00"].
-> datetime:string:convert[d[]] = "2013-10-31 15:30:00 UTC".

7.9. Conversions

X:Y:convert

boolean:string:convert[x]  = y -> boolean(x),  string(y).

datetime:string:convert[x] = y -> datetime(x), string(y).

decimal:float:convert[x]   = y -> decimal(x), float(y).
decimal:int:convert[x]     = y -> decimal(x), int(y).
decimal:string:convert[x]  = y -> decimal(x), string(y).
decimal:decimal:convert[x] = y -> decimal(x), decimal(y).

float:decimal:convert[x]   = y -> float(x), decimal(y).
float:int:convert[x]       = y -> float(x), int(y).
float:string:convert[x]    = y -> float(x), string(y).
float:float:convert[x]     = y -> float(x), float(y).

int:decimal:convert[x]     = y -> int(x), decimal(y).
int:float:convert[x]       = y -> int(x), float(y).
int:string:convert[x]      = y -> int(x), string(y).
int:int:convert[x]         = y -> int(x), int(y).

string:boolean:convert[s]  = v -> string(s), boolean(v).
string:datetime:convert[s] = v -> string(s), datetime(v).
string:decimal:convert[s]  = v -> string(s), decimal(v).
string:int:convert[s]      = v -> string(s), int(v).
string:float:convert[s]    = v -> string(s), float(v).
string:string:convert[x]   = y -> string(x), string(y).

LogiQL supports an unchecked conversion function, X:Y:convert[x] = y, that converts a value of type X to a value of type Y. X and Y can be types int, float, decimal, or string.

LogiQL also supports conversions between datetime values and strings, as well as between strings and booleans. The boolean true always converts to the string "true" and false always converts to "false". When converting from string to boolean, all inputs are case-insensitive.

Strings produced by these conversions are not LogiQL literals: the qualifiers f and d are not added to floating point or decimal values. (See Section 4.2.)

The conversions from string (i.e., when X is string) are further described in the section called “string:T:convert”.

Please note that the conversion is unchecked: if the value x cannot be represented as a value of type Y, some information is lost.

Example 7.72. 

float:int:convert[3.141592f] = 3
int:float:convert[3] = 3.0f
int:decimal:convert[1] = 1.0d
decimal:int:convert[2.4d] = 2
boolean:string:convert[false] = "false"
string:boolean:convert["TRUE"] = true
datetime:string:convert[datetime:create[2013, 10, 31, 15, 30, 0]]]) =
    "2013-10-31 15:30:00 UTC"

blox:lang:toX

In addition to the explicit unchecked conversion function X:Y:convert, LogiQL also supports a polymorphic conversion function, blox:lang:toX, where X can be Int, Float, Decimal, or String.

The polymorphic unchecked conversion provides programming convenience in that the source type need not be specified: it is inferred by the compiler.

Example 7.73. Polymorphic unchecked conversions

blox:lang:toString[3] = x

is equivalent to

int:string:convert[3] = x

The value of x that satisfies the conversion is "3".

X:Y:eq

LogiQL supports a checked conversion function, X:Y:eq[x] = y, that converts a value of type X to a value of type Y without losing information. If the value of x cannot be precisely represented in the type Y, the function produces no result for y (i.e., the attempt to convert fails).

X and Y must be chosen from the numeric types int, decimal, or float. Furthermore, checked conversion between decimal and float is not supported.

Example 7.74. 

Checked conversions with results:

float:int:eq[3.0f] = 3.
int:decimal:eq[3] = 3.0d.

Checked conversion with no result:

float:int:eq[3.1f]

7.10. Currency

float:currency:string

float:currency:string[x, y] = z -> string(x), float(y), string(z).

Formats a float according to the locale identifier specified by the first parameter. Identifiers are interpreted by the underlying ICU implementation (see ICU User Guide for "Locale").

Example 7.75. 

numbers(-123.4f).
numbers(1234.5f).

locales(x) -> string(x).
locales("@currency=USD").
locales("@currency=EUR").
locales("de_DE.UTF-8").
locales("en_US.UTF-8").

result(x, y, z) <-
   numbers(x),
   locales(y),
   float:currency:string[y, x] = z.

yields:

-123.4 "@currency=EUR" "-123.40"
-123.4 "@currency=USD" "-US$123.40"
-123.4 "de_DE.UTF-8"   "-123,40 €"
-123.4 "en_US.UTF-8"   "-$123.40"
1234.5 "@currency=EUR" "1,234.50"
1234.5 "@currency=USD" "US$1,234.50"
1234.5 "de_DE.UTF-8"   "1.234,50 €"
1234.5 "en_US.UTF-8"   "$1,234.50"

7.11. Unique Identifiers

uid<<>>

UniqueIdRule  =
        PositiveDeltaAtoms "<-" "uid" "<<" Identifier ">>" InputFormula.

PositiveDeltaAtoms  =  PositiveDeltaAtom { ","  PositiveDeltaAtom }.

PositiveDeltaAtom  =  "+" Atom.

InputFormula = Conjunction.

The uid function can be used to generate a set of unique identifiers for all the tuples in a predicate. The key space of the resulting predicate(s) must match the key space of the input predicate(s), and the resulting predicate(s) must have integer values.

The generated identifiers are unique in the context of a single database, in particular across the branches of a single database. They are not guaranteed to be universally unique across multiple databases, e.g., if you export the identifiers from one database and import into another.

Note

  • uid<<>> cannot be used in rules for database-lifetime IDB predicates.

  • The syntax given above describes the most frequent usage. In transaction-lifetime delta rules the delta atoms in the head need not be positive, but then:

    • if the atom in the head is negative (i.e., signifies deletion), then the uid<<x>> part of the rule is essentially ignored;

    • if an atom in the head is preceded with ^ (i.e., it represents an upsert), then the effect is to renumber the affected tuples.

Example 7.76. Legal uses of uid

create --unique

addblock <doc>
  F[x] = id -> string(x), int(id).
  G[x] = id -> string(x), int(id).

  R(x) -> string(x).
  Q(x) -> string(x).

  R("Joe").     R("Jill").
  Q("Helen").   Q("Henry").  Q("Jill").

  +F[x] = id <- uid<<id>> +R(x).   // database lifetime
</doc>

exec <doc>
  +F[x] = id, +G[x] = id <- uid<<id>> Q(x), !R(x).   // transaction lifetime
</doc>
echo --- F:
print F
echo --- G:
print G

exec <doc>
  -F[x] = id <- uid<<id>> Q(x).   // id ignored
  ^G[x] = id <- uid<<id>> Q(x).   // renumbering
</doc>
echo --- F2:
print F
echo --- G2:
print G

close --destroy

In the penultimate rule (which deletes tuples from F) the use of uid is effectively ignored, while the final rule renumbers the items in G. So the results will be something like

created workspace 'unique_workspace_2017-06-07-19-27-24'
added block 'block_1Z1FGZLX'
--- F:
"Helen" 10000000040
"Henry" 10000000043
"Jill"  10000000032
"Joe"   10000000034
--- G:
"Helen" 10000000040
"Henry" 10000000043
--- F2:
"Joe" 10000000034
--- G2:
"Helen" 10000000060
"Henry" 10000000065
"Jill"  10000000066
deleted workspace 'unique_workspace_2017-06-07-19-27-24'

Example 7.77. An illegal use of uid

create --unique

addblock <doc>
  F[x] = id -> string(x), int(id).
  R(x) -> string(x).

  R("Joe").     R("Jill").

  F[x] = id <- uid<<id>> R(x).   // Not allowed in database-lifetime IDB!
</doc>
echo ---
print F

close --destroy

The compiler will report an error:

Error: Database-lifetime uid<<>> rules may contain only positive delta atoms: F[x]=id is not a positive delta atom
In P2P rule Forall id::int,x::string .
F[x]=id <-
   uid<<id>>
      R(x).

7.12. Transaction Identifier

transaction:id[]

transaction:id[] = id -> int(id).

transaction:id[] contains a unique transaction identifier for the transaction. The transaction identifiers are only unique in the context of a single database, this means identifiers are also unique across branches of a single database. They are not guaranteed to be universally unique across multiple databases or if you export the identifiers from one database and import into another.

Example 7.78. 

txn_ids(x) -> int(x).
+txn_ids(x) <- x = transaction:id[].

Chapter 8. Predicates

One of the basic constructs in LogiQL is the predicate. A predicate is a named set of tuples. For example, a predicate person may contain a set of one-tuples: {("Alice"), ("Bob"), ("Clara")}; a predicate friends may contain a set of two-tuples: {("Alice", "Clara")}. Tuples in a predicate must all have the same arity, or number of elements. This arity is also referred to as the arity of the predicate. For example, predicate person has arity one (and is also referred to as a unary predicate); predicate friends has arity two (and is also referred to as a binary predicate).

Note

  • It is possible to have a nullary predicate with arity 0: that is, the predicate is either empty, or contains only the nullary tuple ().
  • We often talk about populating a predicate. This is simply a convenient term for inserting tuples into the predicate.

For those familiar with relational databases, a predicate can be thought of as a table. There are two major differences between a predicate and a relational table:

  • A predicate contains a set of tuples, whereas a table contains a bag (a.k.a. multiset) of tuples. That is, a table may contain many duplicate copies of the same tuple, but each tuple in a predicate is different from all the others.

    Please note that inserting an existing tuple is not an error: it just won't change the state of the database.

  • A predicate does not contain NULLs. We encourage the normalization of schemata in order to represent optional data.

In the remainder of this chapter, we first cover the basics of declaring a predicate, and then the different types of predicates supported by the LogicBlox database, their declarations and the associated utilities.

8.1. Predicate Declaration

The existence of a predicate may be either explicitly declared or inferred (see Section 11.1.1). A predicate declaration specifies the predicate's name, its arity, and the type of values that may occupy each position in the tuple. We refer to these types as a predicate's argument types.

A basic predicate declaration has the following form:

PredicateDeclaration = Atom "->" Atom { "," Atom } "." .
Atom                 = Identifier "(" [ Identifiers ] ")" .
Identifiers          = Identifier { "," Identifier } .

The example below demonstrates the declarations of predicates person and friends.

Example 8.1. Basic predicate declarations

person(x) -> string(x).
friends(person1, person2) -> string(person1), string(person2).

The first declaration states that person is an unary predicate (its arity is one): it can contain only one-tuples. Furthermore, the first (and only) value in each tuple must be a value of type string.

The second declaration states that friends is a binary predicate (its arity is two): it can contain only two-tuples. Furthermore, the first value in each tuple must be a value of type string, as must the second value in each tuple.

A predicate's name, arity, and the type of its values is called the predicate's signature. Predicate declarations are used by the LogicBlox database to check a program for correctness, as well as to efficiently manage the storage of its contents and the evaluation of queries involving the predicate.

As readers will see in Chapter 16, predicate declarations use the syntactic form of constraints. A constraint must satisfy very specific requirements to be properly interpreted as a declaration. Section 16.3 explains these requirements in detail.

8.2. Functional Predicates

A predicate can be declared to be functional, i.e., one that represents a function from its key to its value. (In mathematics a function---also known as a mapping or a map---is just a special kind of relation, and a predicate represents a relation.)

A functional predicate contains n-tuples, where n must be at least 1. The first k arguments of a functional predicate are its keys, or key arguments. The last n - k arguments are its values, or value arguments.

If n - k > 1, the predicate is called a multi-valued functional predicate; if n - k = 1, the functional predicate is single-valued. Functional predicates are most often single-valued, and the unqualified term "functional predicate" usually refers to the single-valued case.

If k = 0 (i.e., there is no key), then the functional predicate is effectively a global constant (or, if multi-valued, a tuple of constants). Such a predicate is sometimes called a scalar predicate (or just a scalar).

A functional predicate must satisfy the constraint that no two tuples in the predicate can share the same key (which, as mentioned above, can consist of several arguments). When this constraint is violated, the LogicBlox database reports a functional dependency violation (a.k.a. FDV) and aborts the transaction.

Single-valued functional predicates

The declaration of a single-valued functional predicate takes the following form:

FunctionalPredicateDeclaration =
    FunctionalExpression "=" ValueArgument "->" Atom { "," Atom } "." .

FunctionalExpression = Identifier "[" [ KeyArguments ] "]" .

ValueArgument = Identifier .
KeyArguments  = Identifiers .

Identifiers = Identifier { "," Identifier } .

The key arguments of the predicate must be declared within the square brackets [ and ]. The value argument of the predicate must be declared after the equality (=).

If the key arguments are missing, the declared predicate will be able to contain only one value, i.e., it will effectively be a constant.

Example 8.2. Declaring a functional predicate

In the following example, we declare a functional predicate that maps a person's name to her age. The first two fields in each tuple of this predicate are the key, and the third field is the value associated with the key.

age[given_name, family_name] = age -> string(given_name),
                                      string(family_name),
                                      int(age).

Predicate age can contain tuples such as {("Alice", "Smith", 20), ("Bob", "Jones", 25), ("Alice", "Jones", 20)}. However, given these tuples, an attempt to add ("Alice", "Smith", 40) will cause the LogicBlox database to report an error (functional dependency violation).

Multi-valued functional predicates

The declaration of a multi-valued functional predicate takes the following form:

MultiValuedFunctionalPredicateDeclaration =
    MultiValuedFunctionalAtom "->" Atom { "," Atom } "." .

MultiValuedFunctionalAtom = Identifier "(" KeyArguments ";" ValueArguments ")"
                          | Identifier "(" ";" ValueArguments ")"
                          | Identifier "(" KeyArguments ";"  ")" .

KeyArguments   = Identifiers .
ValueArguments = Identifiers .

Identifiers = Identifier { "," Identifier } .

The semicolon (;) separates the list of key arguments (which may be empty) from the list of value arguments (which may also be empty). The list of key arguments and the list of value arguments cannot both be empty.

Of course, the predicate cannot be called "multi-valued" if the number of value arguments is smaller than two, and we recommend that the syntax be used only for such properly multi-valued predicates.

We mention just for completeness that:

  • It is pointless to use a multi-valued functional atom such as p(x, y ; ) (i.e., with no value arguments), since it is equivalent to the basic atom p(x, y).

  • A declaration such as p( ; x) -> int(x). (i.e., with no key arguments and one value argument), is equivalent to p[] = x -> int(x), which is the recommended form. Such a declaration can be regarded as the declaration of a constant.

  • A multivalued functional atom such as p( ; x, y) (i.e., with no key arguments and more than one value argument) is a way of declaring a predicate that can hold only one tuple. This form is very seldom used.

Note

Frequent unnecessary use of multi-valued functional predicates is considered bad style. However, sometimes such predicates cannot be avoided: see Example 13.2.

Example 8.3. 

In the following, a social security number is mapped to a person's name and age.

ssn_to_name_age(ssn ; name, age) -> string(ssn), string(name), int(age).

It is usually preferable to have two separate functions:

ssn_to_name[ssn] = name -> string(ssn), string(name).
ssn_to_age[ssn]  = age  -> string(ssn), int(age).

One-to-one functional predicates

A functional predicate can be declared as being one-to-one (i.e., injective). For a predicate named F, the declaration can be given in one of the following two forms:

lang:oneToOne(`F).

lang:isOneToOne[`F] = true.

The declaration causes the compiler to generate constraints that ensure only one combination of key arguments is associated with a given value.

Warning

Before LogicBlox version 4.4.5 such declarations could only be used for functional predicates with only one key argument and one value argument.

Example 8.4. Declaring an injective function

create --unique

addblock <doc>

  F[x, y] = z -> string(x), string(y), string(z).

  lang:oneToOne(`F).
</doc>

exec <doc>
  +F["a", "b"] = "alpha-beta".
  +F["b", "g"] = "beta-gamma".
  +F["a", "B"] = "alpha-beta".    // Not injective!
</doc>
print F

close --destroy

Execution results in failure with an error message:

Error: Constraint failure(s):
block_1Z1FGYMG:0(0)--0(0):
    false <-
      Exists vx2::string,vy2::string,x0::string,x1::string,y0::string,y1::string .
         string:eq_2(vx2,vy2),
         F[x0,x1]=vx2,
         F[y0,y1]=vy2,
         !(
            string:eq_2(x1,y1)
         ).
(1) vx2="alpha-beta",vy2="alpha-beta",x0="a",x1="B",y0="a",y1="b"
(2) vx2="alpha-beta",vy2="alpha-beta",x0="a",x1="b",y0="a",y1="B"

8.3. Entity Predicates

Entity predicates are unary predicates used to represent elements of a program's problem domain. An entity predicate can be thought of as a user-defined type that complements the set of primitive types built into LogiQL: for this reason it is sometimes called an entity type.

For example, instead of declaring a unary predicate state that contains string values, one can declare an entity predicate state. The values of the state entity predicate can be referenced by their string identifiers, yet are distinctly different from the string values of their names, as well as from values of other entity predicates (unless subtyping is used: see Section 8.3.1).

The members (values) of an entity predicate are called entities, or entity values. Using entities instead of primitive values such as strings and integers makes a program easier to understand, because the program's data structures correspond better to your own mental model of how the data is arranged. Additionally, using entities helps catch errors such as confusing the state named "Georgia" with a person named "Georgia".

Creating Entities.  Entity values can be created in one of two mutually exclusive ways:

  • First, an entity predicate can be associated with a reference-mode predicate (often shortened to refmode predicate). A refmode predicate allows the user to create each entity together with an associated identifier: the identifier can then be used to refer to the entity.
  • Alternatively, the entity predicate can be associated with one or more functional constructor predicates. The entities are the values of the function.

Note

Reference mode predicates are now deprecated. It is recommended that new projects use only entities with constructor predicates.

We defer the discussion of refmode predicates to Section 8.4, and the discussion of constructor predicates to Section 8.5.

Declaring An Entity Predicate.  An entity predicate can be declared with or without a reference-mode predicate. The syntax is as follows:

EntityPredicateDeclaration = Identifier "(" Identifier ")" "->" "."
                           | Identifier "(" Identifier ")" ","
                               Identifier "(" Identifier ":" Identifier ")" "->"
                                 Identifier "(" Identifier ")" "." .

Example 8.5.  Entity predicate declarations with and without reference-mode predicates

person(x), person_has_name(x : n) -> string(n).
thing(x) -> .

Entity predicate person is associated with a refmode predicate, person_has_name.

Entity predicate thing is not associated with a refmode predicate.

Optionally, an explicit entity predicate declaration may be included:

ExplicitEntityDeclaration =
         "lang:isEntity" "[" "`" Identifier "]" "=" ("true" | "false") "."
       | "lang:entity" "(" "`" Identifier ")" "." .

With the first form, the predicate is declared to be an entity predicate if the right-hand side of the equality is the value true; otherwise the predicate is explicitly not an entity predicate. With the second form, using lang:entity, the predicate is explicitly declared to be an entity predicate.

Explicit entity predicate declarations are optional for top-level entity predicates (i.e., those that are not subtypes of other entity types). In Section 8.3.1 we discuss the utility of these explicit declarations.

Example 8.6. 

Below you can find an example where person is explicitly declared as an entity type. Note that there is also a predicate declaration for person in addition to the entity predicate declaration.

person(p) -> .
lang:entity(`person).

Example 8.7. 

Here is an alternative method of declaring person to be an entity predicate:

person(p) -> .
lang:isEntity[`person] = true.

8.3.1. Subtyping

An entity predicate can be declared to be the subtype of some parent entity predicate. The subtyping relationship between entity predicates forms a tree: subtyping relationships cannot form a cycle, nor can an entity predicate have multiple parent types.

The values of an entity subtype and the values of its supertype form a subset relationship: a value of a subtype entity predicate is also a value of the supertype entity predicate. Two entity subtypes of the same supertype may, but need not, contain the same values.

Explicit Subtype Declaration.  An explicit subtype declaration contains two parts: a subsetting constraint, and an explicit entity predicate declaration. The subsetting constraint has the following form:

SubtypeDeclaration = Identifier "(" Identifier ")" "->"
                        Identifier "(" Identifier ")" "." .

Example 8.8. Declaring entity subtypes

The following example explicitly declares female to be a subtype of person. We assume that person has previously been declared as an entity predicate.

female(x) -> person(x).
lang:entity(`female).

Alternatively, replacing the second line with lang:isEntity[`female]=true. would achieve the same effect.

Implicit Subtype Declaration.  A unary predicate T may be implicitly declared to be a entity subtype if it satisfies both of the following conditions:

  • There exists a constraint T(x) -> E(x)., where E is an entity predicate (possibly an entity subtype).
  • T is used as an entity type in a predicate declaration. That is, there exists a predicate declaration constraint where T appears on the right-hand-side as a type. For example,
    P(... x ...) -> ... T(x) ... .

Example 8.9. An implicit subtype declaration

The following example implicitly declares female to be an entity predicate, and that it is a subtype of person. Predicate girl is not considered an entity predicate. (See Example 8.13 for more information about this example.)

female(x) -> person(x).
girl(x) -> female(x).

Example 8.10. Another implicit subtype declaration

In this example, female is used, in the second line, as a type in the declaration of predicate person_hasmother, and thus is implicitly declared as an entity predicate.

female(x) -> person(x).
person_hasmother[p] = m -> person(p), female(m).

An implicit subtype declaration can be explicitly overridden with lang:isEntity[`T] = false.

It is a compile-time error for a program to include two entity predicate declarations for the same predicate with different supertypes -- unless the two supertypes are in a subtype relationship with each other.

Example 8.11. An entity type with two supertypes

The following declarations will be rejected as invalid, since female cannot be a subset of both person and car.

person(x) -> .
car(x) -> .
female(x) -> person(x).
female(x) -> car(x).

However, the following declarations are valid, since female and person are declared to be in a subtype relationship.

person(x) -> .
female(x) -> person(x).

girl(x) -> female(x).
girl(x) -> person(x).
lang:isEntity[`girl] = true.

It is also a compile-time error for an entity predicate to be declared both as a top-level entity type and the subtype of another top-level entity type.

Example 8.12.  Entity predicate declared to be both a top entity type and a subtype

The following declarations will be rejected as invalid by the compiler:

person(x) -> .
person(x) -> thing(x).

Example 8.13.  A true subtype vs. a predicate that only looks like one

The following is an expanded version of Example 8.9.

addblock <doc>
  person(x) -> .
  girl(x)   -> person(x).

  name[nm] = p -> string(nm), person(p).
  lang:constructor(`name).

  name["Betty"] = p, girl(p).
</doc>
echo --- person:
print person
echo --- girl:
print girl

This would print out

--- person:
[10000000004]
--- girl:
[10000000004]

As we see, girl can contain elements of type person. However, it is not a subtype of person. This can be demonstrated by extending the program with

addblock <doc>
  foo[x] = v -> person(x), int(v).
  foo[x] = v -> girl(x), int(v).

  name["Betty"] = p, foo[p] = 7.
  name["Bob"]   = p, foo[p] = 8.
</doc>

An attempt to execute would result in

Error: Constraint failure(s):
block_4LDQPDUZ:2(3)--2(31):
    false <-
      Exists v::int,x::person .
         foo[x]=v,
         !(
            girl(x)
         ).
(1) v=8,x=[10000000006]

What happen here is that foo[x] = v -> girl(x), int(v). is treated as a constraint that requires keys of foo to be elements of girl, and the entity that is constructed by "Bob" violates the constraint.

If, however, we ensured that girl is a constructor, for example by changing the declarations to

  person(x) -> .
  girl(x)   -> person(x).
  lang:entity(`girl).

then the program would not fail, because the second line in

  foo[x] = v -> person(x), int(v).
  foo[x] = v -> girl(x), int(v).

would be treated as just a type declaration (redundant, but compatible with the first one).

It may be somewhat surprising that if we remove the first line above (i.e., the key of foo is declared as being just of type girl), then the program will still be correct: the entity that represents the person whose name is "Bob" does not belong to type girl, but belongs to its supertype, so the type checker does not complain.

When in doubt about the status of a predicate, you can run the lb command predinfo, for example

predinfo girl

and look for the value of field is_entity.

8.4. Reference-Mode Predicates

Note

Reference mode predicates are now deprecated. It is recommended that new projects use only entities with constructor predicates (see Section 8.5).

A reference-mode predicate, or refmode for short, is a binary predicate that must be associated with a top-level entity predicate. Specifically, each tuple in a refmode predicate relates a value of a primitive type to exactly one entity. The primitive value thus provides the programmer with a way to identify and refer to an entity value.

The refmode predicate must be declared together with its entity predicate. Furthermore, for every value of that entity predicate, there must be a reference-mode value that uniquely identifies the entity value.

Example 8.14. Declaring an entity predicate with refmode

Entity predicate person in this example is declared to have the refmode predicate person_has_name, and a refmode value of type string. That is, person values can be identified using their string-valued names.

person(p), person_has_name(p:n) -> string(n).

An entity predicate can have only one associated refmode predicate. If an entity predicate has a refmode predicate, then its values must be constructed along with the associated refmode values.

The construction of values in rules, and the role of refmode predicates in it, are described in Section 11.2. It is also possible to construct entity values by explicitly adding facts, as shown in the example below.

Example 8.15. Constructing an entity by adding a fact

Given the declarations in Example 8.14 we can create a new person entity with the associated name "Bob" by executing the code shown below (the word "executing" was carefully chosen: see Chapter 19).

+person(p), +person_has_name(p : "Bob").

It is recommended that this be written in the functional style:

+person(p), +person_has_name[p] = "Bob".

We can even omit the explicit mention of person and write just

+person_has_name[_] = "Bob".

Auto-Numbered Refmode Predicates

Refmodes values may be automatically created by the system. This is a useful feature when the exact values of refmodes do not matter, as long as they are different from one another. The value type of an auto-numbered refmode must be int.

Example 8.16. 

The following example declares entity predicate q and its auto-numbered refmode predicate q_id:

q(x), q_id(x:i) -> int(i).
lang:autoNumbered(`q_id).

Auto numbered entity types can be populated by associating them with constructors, or by populating their subtypes.

Example 8.17. Populating an auto-numbered predicate

The following lb script (see Section 19.1) uses constructor predicates (see Section 8.5) to populate person, which is an entity predicate with an auto-numbered refmode predicate.

(Note that we use name both as the name of a predicate and the name of a variable to show that this is allowed, but such usage is not necessarily recommended.)

create --unique

addblock <doc>
  person(p), person_id(p : i) -> int(i).
  lang:autoNumbered(`person_id).

  name[name] = c -> string(name), person(c).
  lang:constructor(`name).
</doc>

exec <doc>
  +name["Betty"] = _.
  +name["Bob"] = _.
  +name["Betty"] = _.
</doc>
echo --- person:
print person
echo --- name:
print name
echo --- person_id:
print person_id

addblock <doc>
  person_name_id(name, id) <- person_id(p : id), name[name] = p.
</doc>
echo --- person_name_id
print person_name_id

close --destroy

Here is the output (whose exact form may change between versions of the system):

created workspace 'unique_workspace_2017-03-28-23-28-23'
added block 'block_1Z1F0KWV'
--- person:
[10000000004] 10000000004
[10000000005] 10000000005
--- name:
"Betty" [10000000005] 10000000005
"Bob"   [10000000004] 10000000004
--- person_id:
[10000000004] 10000000004 10000000004
[10000000005] 10000000005 10000000005
added block 'block_1Z1O5ERS'
--- person_name_id
"Betty" 10000000005
"Bob"   10000000004
deleted workspace 'unique_workspace_2017-03-28-23-28-23'

Each tuple of person contains an internal identifier of the entity (shown in square brackets) and the associated refmode value. These numbers happen to be identical.

Each tuple in name contains the name and information about the associated entity (i.e., person).

Each tuple in person_id contains information about the entity and the (duplicated) auto-generated number.

The tuples in person_name_id pair names of entities with the the auto-generated numbers.

Notice that the repetition of +name["Betty"] = _. did not result in producing a new entity.

Example 8.18. Populating an auto-numbered predicate via its subtypes

The following lb script (see Section 19.1) uses constructor predicates (see Section 8.5) to populate subtypes of person, which is an entity predicate with an auto-numbered refmode predicate.

create --unique

addblock <doc>
  person(p), person_id(p : i) -> int(i).
  lang:autoNumbered(`person_id).

  child(c) -> person(c).
  adult(a) -> person(a).

  diminutive[name] = c -> string(name), child(c).
  lang:constructor(`diminutive).

  official[name] = a -> string(name), adult(a).
  lang:constructor(`official).
</doc>

exec <doc>
  +diminutive["Betty"] = _.
  +diminutive["Bob"] = _.

  +official["Elizabeth"] = _.
  +official["Robert"] = _.
  +official["Bob"] = _.
</doc>
echo --- child:
print child
echo --- adult:
print adult
echo --- person:
print person
echo --- diminutive:
print diminutive
echo --- official:
print official
echo --- person_id:
print person_id

close --destroy

The output is shown below. See the example above for an explanation.

created workspace 'unique_workspace_2017-03-28-21-33-33'
added block 'block_1Z1F0LVQ'
--- child:
[10000000005]
[10000000007]
--- adult:
[10000000001]
[10000000004]
[10000000006]
--- person:
[10000000001] 10000000001
[10000000004] 10000000004
[10000000005] 10000000005
[10000000006] 10000000006
[10000000007] 10000000007
--- diminutive:
"Betty" [10000000005]
"Bob"   [10000000007]
--- official:
"Bob"       [10000000004]
"Elizabeth" [10000000006]
"Robert"    [10000000001]
--- person_id:
[10000000001] 10000000001 10000000001
[10000000004] 10000000004 10000000004
[10000000005] 10000000005 10000000005
[10000000006] 10000000006 10000000006
[10000000007] 10000000007 10000000007
deleted workspace 'unique_workspace_2017-03-28-21-33-33'

It is worth noticing that the use of "Bob" in child and the use of "Bob" in adult produce two different entities in person.

8.5. Constructor Predicates

Constructor predicates are a generalization of the concept of refmode predicates. A constructor predicate acts as a one-to-one function (injection) that maps a multi-dimensional key to an entity. Key arguments can be either of primitive types or of entity types. The value type of a constructor predicate must be of an entity type, either without a refmode or with an auto-numbered refmode.

Just like refmode predicates, constructor predicates can be used to create values for their associated entity predicates. We discuss this in more detail in Section 11.2.

To declare a predicate as a constructor predicate we use lang:constructor or lang:isConstructor, as illustrated in the following example.

Example 8.19. Declaration of a constructor predicate

person(p) -> .         // A refmodeless entity type
person_from_names[first_name, last_name] = p
   -> string(first_name), string(last_name), person(p).
lang:constructor(`person_from_names).

Above, the predicate person_from_names is declared to be a constructor predicate. We can use person_from_names to construct new person values, associating each unique first/last name pair with a unique person.

Instead of lang:constructor(`person_from_names) we could have used lang:isConstructor[`person_from_names] = true.

See Example 8.17 for a simple example of how a constructor predicate is used to populate the associated entity predicate.

An entity predicate can be associated with more than one constructor. This is illustrated by the following example.

Example 8.20. An entity predicate with several constructors

Please compare the following with this example in Section 8.4.

create --unique

addblock <doc>
  person(p) -> .

  name1[name] = p -> string(name), person(p).
  lang:constructor(`name1).

  name2[name] = p -> string(name), person(p).
  lang:constructor(`name2).

  name3[name] = p -> string(name), person(p).
  lang:constructor(`name3).
</doc>

exec <doc>
  +name1["Betty"] = _.
  +name1["Bob"] = _.
  +name2["Betty"] = _.
  +name3["Betty"] = _.
</doc>
echo --- person:
print person

addblock <doc>
  person1(name, p) <- name1[name] = p.
  person2(name, p) <- name2[name] = p.
  person3(name, p) <- name3[name] = p.
</doc>
echo --- person1:
print person1
echo --- person2:
print person2
echo --- person3:
print person3

close --destroy

The results are:

created workspace 'unique_workspace_2017-03-29-18-30-24'
added block 'block_1Z1F0LKG'
--- person:
[10000000004]
[10000000005]
[10000000006]
[10000000007]
added block 'block_1Z1RHD7C'
--- person1:
"Betty" [10000000005]
"Bob"   [10000000006]
--- person2:
"Betty" [10000000007]
--- person3:
"Betty" [10000000004]
deleted workspace 'unique_workspace_2017-03-29-18-30-24'

Notice that each of the constructors maps the same string to a different entity, so that each of them effectively gives access to a different subset of the entity predicate person.

The various aspects of entity predicates are often somewhat hard to integrate in one's mind, so we end this section with a more complicated example. (To understand it fully, one must first skim through the material in Section 19.1 and Section 19.2.1.)

Example 8.21. A more complicated example with entity predicates

The following block declares two entity predicates with refmodes: product and color. It also declares a predicate that associates a product with its color (for the purposes of this simple example we will assume each product has only one color).

addblock <doc>
  product(prod), product_id(prod : id) -> string(id).
  color(c),      color_id(c : id)      -> string(id).

  product_color[prod] = c -> product(prod), color(c).
</doc>

We can populate the predicates as follows (but please see the note at the beginning of Section 19.2.1):

exec <doc>
  +color_id[_] = "pink".
  +color_id[_] = "black".

  +product_id[p] = "Barbie", +product_color[p] = c <- color_id[c] = "pink".
  +product_id[p] = "Darth",  +product_color[p] = c <- color_id[c] = "black".
</doc>

We will now extend our database with information about features of the merchandise. One important kind of feature is color, another may be the department in which an item is sold. The various kinds of features can be thought of as disjoint subsets of the set of entities of type Feature.

(Since we are in a prototyping phase, we represent a department simply by its name: in a full-blown application it would be an entity.)

addblock <doc>
  Feature(_) -> .

  colorFeature[c] = f -> color(c), Feature(f).
  lang:constructor(`colorFeature).

  departmentFeature[dept] = f -> string(dept), Feature(f).
  lang:constructor(`departmentFeature).
</doc>

Finally, we declare a predicate that associates each feature with its description, which is a string. We then write the rules that populate this predicate.

The body of the first rule retrieves the colors of all the products in the database, extracts their names from the reference mode predicate color_id and creates strings that are the desired descriptions of the associated features. The head of the rule generates the features asssociated with the retrieved colors and associates them with the appropriate descriptions.

The second rule is similar, but simpler, because we are still only prototyping everything that has to do with departments.

In both the rules the first conjunct in the head (Feature(f)) makes the rule easier to understand. It could be omitted, and the effect would still be the same. The conjunct product(prod) in the body of the first rule can also be omitted.

addblock <doc>
  featureDescription[f] = description -> Feature(f), string(description).

  Feature(f),
  featureDescription[f] = desc,
  colorFeature[c] = f
      <-  desc = "product_color: " + id,
          color_id(c : id),
          product(prod),
          product_color[prod] = c.

  Feature(f),
  featureDescription[f] = desc,
  departmentFeature[dept] = f
      <-  desc = "department: " + dept,
          (dept = "toys"; dept = "lingerie").
</doc>

We can now print the feature descriptions:

print featureDescription

The result is shown below.

[10000000000] "department: toys"
[10000000001] "department: lingerie"
[10000000002] "product_color: black"
[10000000003] "product_color: pink"

8.6. Foreign Predicates

Note

This section requires some understanding of branches. See Chapter 41.

Transaction-lifetime rules (see Chapter 19) may refer to predicates that exist in both the current branch and a foreign branch. This is done by writing P@branchname for a predicate P and a branch named branchname. Writing P without the @branchname tag refers to the predicate P that resides in the branch associated with the current transaction (e.g., NAME in transaction --branch NAME), or the open workspace's current default branch if the branch is unspecified. The signatures on both branches must match, otherwise a runtime error will be raised.

When the foreign predicate has an entity type in its signature, all uses of that entity type must be "guarded" by a use in the current branch. Entities are local to a single path of a branch's history, so any split in history creates distinct entities for the same application of a constructor to some previously unseen arguments. The guardedness restriction ensures entities stay local to a branch.

Example 8.22. Guardedness of foreign entities

Assume the person_from_names example from Section 8.5. The use of person is a "guard" for the use of entity p in the following rule:

+same_here(p) <- person_from_names@foreign[_, _] = p, person(p).

Notice that same_here will be populated only by those entities of type person that (1) were present in this branch (or its ancestor) before the foreign branch diverged from it, and (2) are still present in the foreign branch when the rule is evaluated.

Example 8.23. Reconstructing foreign entities

The lb script below shows that it is possible to construct entities from the values of constructor predicates in a different branch. However, the entities on the two branches will be different.

create --unique

addblock <doc>
  person(p) -> .
  person_from_names[first_name, last_name] = p
     -> string(first_name), string(last_name), person(p).
  lang:constructor(`person_from_names).
</doc>

branch foreign

addblock --branch foreign <doc>

  person(p), person_from_names[fn, ln] = p <- first_name(fn), last_name(ln).

  first_name("John").
  last_name("Smith").

  first_name("Mary").
  last_name("Jones").
</doc>

exec <doc>
  +person(p),
  +person_from_names[fn, lm] = p <- person_from_names@foreign[fn, lm] = _.
</doc>

print person
print person_from_names

close --destroy

Results:

created workspace 'unique_workspace_2016-07-23-01-28-59'
added block 'block_1Z1C3B1V'
added block 'block_1Z1EX11X'
[10000000000]
[10000000001]
[10000000002]
[10000000003]
"John" "Jones" [10000000001]
"John" "Smith" [10000000000]
"Mary" "Jones" [10000000003]
"Mary" "Smith" [10000000002]
deleted workspace 'unique_workspace_2016-07-23-01-28-59'

Please note that this simple scheme works only if the arguments of the foreign constructor predicate are of primitive types (i.e., are not entities).

8.7. File Predicates

Note

File predicates should not be used except for implementing very low level functionalities. Most programmers should use Tabular Data Exchange services (see Section 27.1).

File predicates are one means by which a program can perform I/O. These predicates are used to treat an external file as a predicate: if a predicate is declared to be a file predicate, then the tuples of this predicate correspond to the contents of the file. File predicates can be used inside exec blocks (including queries) and inactive blocks (including pre-compiled queries).

As a simple example, consider the following logic, which copies records from an input file to an output file, incrementing each number that it encounters.

Example 8.24. Simple input/output using file predicates

_in(offset; s, x) -> int(offset), string(s), int(x).
lang:physical:filePath[`_in] = "input.csv".
lang:physical:fileMode[`_in] = "import".

_out(s, x) -> string(s), int(x).
lang:physical:filePath[`_out] = "output.csv".
lang:physical:fileMode[`_out] = "export".

_out(s, y) <- _in(_; s, x), y = x + 1.

In this example, the input file input.csv is represented by the predicate _in, which has three attributes: an integer key argument offset representing the physical position of a line in the file, followed by two value attributes (of types string and int, respectively) corresponding to the first and second column in the file. The output file output.csv is represented by the predicate _out.

lang:physical:filePath[`_in] and lang:physical:filePath[`_out] are used to specify that the predicates _in and _out are file predicates, and to tell the system where to find the corresponding file. lang:physical:fileMode[`_in] and lang:physical:fileMode[`_out] are used to specify whether a file predicate is used for import or for export.

In the above example, if input.csv contains the following data:

John,43
Mary,25
Bill,14

then, after executing this logic, the file output.csv will contain:

John,44
Mary,26
Bill,15

Import file predicates must have a signature that includes an integer offset attribute, separated from the other attributes by a semicolon; export file predicates do not have an offset argument. All attributes of file predicates must have primitive (non-entity) types. The conversion from values in the file to the specified primitive types happens automatically.

As a second example, the following logic defines a small schema to model persons, and imports the contents of file input.csv to represent persons and their names.

Example 8.25. 

addblock '
   person(p), person_has_name(p:n) -> string(n).
'

exec '
   _in(offset; s, i) -> int(offset), string(s), string(i).
   lang:physical:filePath[`_in] = "input.csv".
   lang:physical:fileMode[`_in] = "import".

   +person(p), +person_has_name[p] = s <- _in(_; s, _).
'

The files accessible by file predicates are known as delimited files. These are text files, where each line is a sequence of fields separated by a delimiter. A common example of a delimited file is a csv (comma-separated values) file used by spreadsheet programs as exported text versions of spreadsheet content. In the case of csv files, the comma character (",") is the delimiter. The comma character is also the default delimiter used by file predicates, but other characters can be configured using lang:physical:delimiter. Note that choosing the newline character ("\n") as the delimiter has the effect that each line of the file is treated as a record consisting of a single field.

File predicate settings (values must be literal strings, not variables).
lang:physical:filePath Path to the file (required).
lang:physical:fileMode One of "import", "export". If not specified, "import" is used by default.
lang:physical:delimiter Character used as delimiter. If not specified, "," is used by default.
lang:physical:columnNames Comma-separated list of column names. If set, the first line of the file is treated as a header, and the columns of the file are identified by their column name in the header, rather than by their order of occurrence.
lang:physical:hasColumnNames If set to true (or if lang:physical:columnNames is specified), the first line of the file is treated as a header, rather than as a record.

When using lang:physical:columnNames, it is possible to specify optional columns by using square brackets (as in lang:physical:columnNames[`_in]="item,price,[discount]"). Optional columns must be of type string. When an optional column is absent in the file, the empty string "" is used as a default value. Specifying a column as being optional has no effect on file export.

On file import, when a record has the wrong number of fields, or a value cannot be successfully converted to the specified primitive type, an error message is emitted and the transaction is aborted.

8.8. Derivation Types

Every predicate has a derivation type associated with it. This can be extensional, intensional, or derived-only. The derivation type of a predicate indicates how it receives its values, and whether its contents are maintained or materialized (cached).

Extensional predicate.  An extensional predicate is most frequently referred to as an EDB predicate. It stores values of the extensional database, that is, values that are inputs to the database. These values exist because the user explicitly imported them into the database. EDB predicates are populated through data imports, or through direct manipulation of the predicate via "delta logic" (see Section 19.2). Data in extensional predicates, once removed, cannot be recovered (unless there is a back-up copy of the data).

EDB predicates are also used to store event data, and to trigger other events (Section 19.3).

Intensional predicates: Derived or DerivedAndStored.  A "DerivedAndStored" predicate is most frequently referred to as an IDB or intensional predicate. Its values are computed via IDB rules (Chapter 11) -- effectively, logical implications that specify what values the predicate should contain, based on the values of other EDB or IDB predicates. As the values of IDB predicates are computed from rules, their values can always be recovered from the EDB predicates. Their values are also automatically maintained by the database so that the logical implications defining them always hold. A "Derived" (also known as Derived-only) predicate is a refinement of the above. It specifies not only that the content of a predicate is computed, but also that its contents are not materialized (or cached). Derived-only is explained in more detail in Section 11.3.

A predicate's derivation type can be explicitly declared using the following form:

DerivationTypeDeclaration =
  'lang:derivationType' '[' '`' Identifier ']' '=' '"' DerivationType '"' '.' .
DerivationType = 'Extensional' | 'DerivedAndStored'
               | 'Derived'     | 'NotDerived'
               | 'IntegrityConstraint'.

A predicate's derivation type can also be inferred. The types that are inferrable are extensional and intensional. A derived-only derivation type must be explicitly declared.

A predicate is inferred to be an IDB if there is an IDB rule that derives into it (Section 11.1). It is inferred to be an EDB if there is a delta rule (a.k.a. EDB rule) that derives into it (Section 19.2). If there is no rule that derives into it at all, then it is treated as an EDB predicate, until a rule is added to make its derivation type IDB.

Example 8.26. Predicate derivation type inference

If a predicate P is used with a delta rule
+P(x) <- H(x), x < 20.
then LogiQL infers that P must have an "Extensional" derivation type.

Example 8.27. Explicitly declaring predicates with derivation types

p(x, y) -> int(x), int(y).
lang:derivationType[`p] = "Extensional".

q(x) -> .
lang:derivationType[`q] = "DerivedAndStored".

r(x, y) -> string(x), q(x).
lang:derivationType[`r] = "Derived".

A predicate's derivation type is "NotDerived" by default until declared or inferred. The "IntegrityConstraint" derivation type is not used and will be removed in the future.

8.9. Ordered Predicates

An entity predicate captures a set of values. It is often useful to display those values in a particular order (e.g., when creating reports). For instance, entities representing months should be ordered as January, February, March, etc.

LogicBlox supports a simple programming idiom for such cases: the predicate e_next defines an ordering for entity e. This support takes two main forms:

  • special syntax lang:ordered(`e) declares e_next and introduces helper predicates and logic for orderings;
  • tools such as TDX (Section 27.1) respect e_next for reporting and fill in e_next on import.

Using either rules or tool support, the user is responsible for inserting tuples into the e_next predicate to specify the desired ordering.

Example 8.28. 

The following logic orders month-entities in the natural way:

month(_) -> .
mkMonth[s] = m -> string(s), month(m).

lang:entity(`month).
lang:ordered(`month).
lang:constructor(`mkMonth).

month(jan),
month(feb),
month(mar),

mkMonth["January" ] = jan,
mkMonth["February"] = feb,
mkMonth["March"   ] = mar,

month_next[jan] = feb,  // Establish an order on months
month_next[feb] = mar
  <- .

// Examples of generated predicates.
-> month_first[] = mkMonth["January"].
-> month_last[]  = mkMonth["March"].
mkMonth["March"] = mar -> month_prev[mar] = mkMonth["February"].

The declaration lang:ordered(`e) has the same meaning as the following block:

// Predicate declarations
e_first[]  = n -> e(n).
e_last[]   = n -> e(n).
e_prev[n1] = n2 -> e(n1), e(n2).
e_next[n1] = n2 -> e(n1), e(n2).

// First, last, prev defined automatically from next
e_first[]  = n <- e(n), !(e_next[] = n).
e_last[]   = n <- e(n), !(e_next[n] = _).
e_prev[n2] = n1 <- e_next[n1] = n2.

// Constraints
e(_) -> e_first[] = _.
e(_) -> e_last[] = _.

When modular logic is used, the lang:ordered declaration may be placed either in a module's exports section (thus publicly declaring the predicates) or in the clauses section (for private declarations). See Chapter 21.

8.10. Local predicates

The term local predicate refers to a predicate whose name begins with an underscore (_). The predicate is local to the block in which it is declared. We mention this here for completeness, but see Section 19.1.3 for an explanation (and Example 19.1 for an example).

8.11. External Diff Predicates

During the course of a transaction the contents of a predicate may undergo various changes (Section 19.4). A predicate may also have different versions on different branches (Chapter 41). An external diff (or ediff ) predicate is a system-generated predicate that is associated with an arbitrary "normal" predicate p and that contains the difference between two versions of p. (If the versions are not different, the predicate is empty.)

An ediff predicate is materialised and populated only if it is used; it has transaction lifetime (Section 19.1.5), and is computed at stage INITIAL (Section 19.4.2).

See Section 10.1.5 for information about how to use (and---as a side effect---create) external diff predicates. (See also Section 9.4).

For example, the contents of the external diff predicate (p@prev \ p@Branch)(x, y) is the set difference between the contens of p@prev and the contents of of p@Branch. One can think of this predicate as if were defined by the (syntactically illegal) rule:

(p@prev \ p@Branch)(x, y) <-  p@prev(x, y),  ! p@Branch(x, y).

The actual computation, however, is much more efficient than the evaluation of such a rule. Moreover, if the flipped version of the ediff predicate ((p@Branch \ p@Bprev) in our example) is also needed, then the two are computed simultaneously, at a cost that is comparable to computing only one of them.

Example 8.29. Computing and using external diff predicates

In the example below there are three invocations of three ediff predicates. For example, (foo@BE\foo@prev)(x) refers to the difference between the current contents of foo in branch BE and the contents of foo in the current branch before the transaction started.

create --unique

addblock <doc>
  foo[x] = y -> int(x), string(y).
  fun[x] = y -> int(x), string(y).
  fie[x] = y -> int(x), string(y).
  bar[x] = y -> int(x), string(y).
  baz[x] = y -> int(x), string(y).

  expected_fie[x] = y -> int(x), string(y).
  expected_bar[x] = y -> int(x), string(y).
  expected_baz[x] = y -> int(x), string(y).
</doc>

exec <doc>
  +foo[1] = "a".  +foo[2] = "b".
</doc>

branch BE

exec --branch BE <doc>
  +foo[11] = "A".  +foo[22] = "B".
</doc>


exec <doc>
  +foo[3] = "a".  +foo[11] = "A".

  +fun[x] = y <- foo@BE[x] = y.
  +fie[x] = y <- (foo@BE\foo@prev)[x] = y.
  +bar[x] = y <- (foo\foo@BE)[x] = y.
  +baz[x] = y <- (foo@BE\foo)[x] = y.
</doc>
echo foo:
print foo
echo fun:
print fun
echo fie:
print fie
echo bar:
print bar
echo baz:
print baz

close --destroy 

The result:

created workspace 'unique_workspace_2016-12-01-17-22-46'
added block 'block_1Z1B36YP'
foo:
1  "a"
2  "b"
3  "a"
11 "A"
fun:
1  "a"
2  "b"
11 "A"
22 "B"
fie:
11 "A"
22 "B"
bar:
3 "a"
baz:
22 "B"
deleted workspace 'unique_workspace_2016-12-01-17-22-46'

8.12. Predicate Properties

In the previous sections, we introduced property declarations that indicate whether a predicate is an entity, a constructor, or an auto-numbered refmode. Some additional predicate properties are explained in this section.

Predicate properties can be declared by using the following syntax:

pred_property_decl = true_property "(" "`" Identifier ")" "."
                   | bool_property "[" "`" Identifier "]" "="
                        ("true" | "false") "." .

true_property      = "lang:autoNumbered"
                   | "lang:constructor"
                   | "lang:derivationType"
                   | "lang:entity"
                   | "lang:ordered"
                   | "lang:pulse"
                   | "lang:oneToOne" .

bool_property      = "lang:isAutoNumbered"
                   | "lang:isConstructor"
                   | "lang:isEntity"
                   | "lang:isPulse"
                   | "lang:isOneToOne" .

Predicate properties must be declared at the same time as a predicate is declared. Once set, a property cannot be changed without rebuilding the workspace (and thus re-declaring the predicate).

Identifier indicates the predicate for which the property is being set.

These properties are explained in other sections of this manual:

Chapter 9. Expressions

An expression is a syntactic construct that represents a value. For example, 3+4 is an expression that adds the value 4 to the value 3: the value of the expression is 7. Another example is x+1, which adds 1 to whatever value is in x, and produces a new value.

This chapter describes the different kinds of expressions you can write, as well as how those expressions compute a value.

In general, it is also possible for an expression not to have a value. For example, 5/0 doesn't compute any result.

9.1. Literals

Expressions may be literals of the different primitive types supported by LogiQL:

Expression    = StringLiteral | BooleanLiteral | NumberLiteral .
NumberLiteral = DecimalLiteral | FloatLiteral | IntegerLiteral .

Example 9.1. Literals

"hello"
12
3.141529f

9.2. Variables

      Expression = BasicIdentifier .

Syntactically, a variable (or, more precisely, the name of a variable) is just an identifier (without a colon), such as x, cost or Jane. The semantics, however, require an explanation.

Note

The identifier _ (i.e., a single underscore) is special: it represents a so-called anonymous variable. An anonymous variable has no other occurrences, that is, every occurrence of _ is equivalent to a brand new, unique identifier.

When a variable has only one occurrence by design, it is a good idea to make it anonymous, and thus avoid a warning from the compiler. The compiler reports non-anonymous variables with single occurrences, as these sometimes arise because of trivial typos, and the resulting errors would otherwise be difficult to catch.

If you'd like to keep the name of a variable meaningful even if it has only a single occurrence, just begin its name with an underscore, e.g., _sales. (The only effect of the underscore is to suppress the warning, otherwise the name is like any other: unlike in the case of anonymous variables, two occurrences in the same scope will refer to the same variable.)

The concept of "a variable"

In imperative programming languages (such as C, Java, Python...) a variable essentially represents a memory location that may have some contents, and the contents may change over time. For example, in many such languages x = x + 2 is not an equality, but an assignment statement. Execution of the statement usually increases (by 2) the contents of a memory location represented by x, and thus the "value of variable x". (We say "usually", because the effect may sometimes be different, due to the vagaries of finite integer arithmetic.)

This should be contrasted with high-school algebra or physics, where a variable represents some concrete value: its function is somewhat similar to a pronoun in natural language.

In that context, t = s / v represents the relation between some concrete time t, some concrete distance s and some concrete average speed v. Once we "plug in" concrete values (data from today's trip, say), the variables disappear, and we are left with an equality between concrete arithmetic expressions.

Variables in LogiQL are much more like the variables of algebra than like those of imperative programming languages. The closest counterparts are the variables of classical first-order logic, where we deal with statements such as the following:

  1. "For any integers x and y, x > y."
  2. "For any integer x there exists an integer y such that x > y."
  3. "x > y"

The first of the above statements happens to be false, the second happens to be true, and the third may be false or true, depending on how we choose to instantiate x and y, i.e., depending on what concrete values are represented by these variables.

That the last statement is neither true nor false is due to the fact that the variables are free. By contrast, in the other statements they are bound by the quantifiers "for any" and "there exists". Indeed, the choice of a particular integer cannot affect the truth of a statement that purports to hold for any integer: if that particular integer is a counterexample, then the statement is simply false.

"For any" (or, equivalently, "for all") is the universal quantifier; "there exists" is the existential quantifier. It is worth noticing that

  • there does not exist an x such that p(x) holds is equivalent to for every x, p(x) does not hold;
  • for every x, p(x) holds is equivalent to there does not exist an x such that p(x) does not hold.

The above is a generalisation of deMorgan's rules:

  • not (p and q) = not p or not q;
  • not (p or q) = not p and not q.

Variables in LogiQL are like the variables of logic.

It should be noted that:

  • A LogiQL variable is local to the rule in which it occurs (see Chapter 11). This means that occurrences of the same variable name in different rules refer to different variables. (Variables can also occur in facts, but a fact can be considered as a degenerate rule. Moreover, variables in facts are not useful: see Section 10.1.1.)
  • Additionally, each variable is quantified by a universal or existential quantifier: see the section called “Rules and the quantification of variables”. (The quantifier may further limit the scope of the variable: see Example 11.4.)
  • The closest counterparts to global variables are scalars, i.e., functional predicates without arguments (see Section 8.2). For example, we might model the maximum delay between placing an order and receiving the shipment by introducing the predicate max_delay[] = x -> int(x)., and making sure it has some content, e.g., max_delay[] = 3. Such a predicate can contain at most one tuple, which can be easily accessed at any place in the program.

Bound variables and their instantiations

A variable such as x is just a placeholder. In order to use it in a computation, we must instantiate it, i.e., replace it with a concrete value. The value is called the variable's instantiation. (An instantiation is often referred to as a binding, but the latter term can be confusing, as it is often used for quite diverse purposes in different contexts.)

There are several ways in which a LogiQL variable can be instantiated:

  • If the variable occurs in an atom in the body of a rule (Section 10.1.3), and the atom refers to an EDB or IDB predicate (Chapter 8, Chapter 11): the variable can then become instantiated to a value derived from the values in the predicate.
  • If the variable occurs on one side of an equality such as x = y + z * 2, and all the variables on the other side are bound (see below).
  • Sometimes the compiler is able to determine that a variable is bound in more complicated cases. For example, if x and z are known to be bound in x = y + z * 2, then y is also bound. Sometimes, however, the current compiler is not so sophisticated: for example, if x is known to be boolean, then x != true is not recognised as bound.

If a variable occurs in either of such contexts, then that occurrence is called a binding occurrence, and the variable is said to be bound. (This usage of the word "bound" is quite different from the conventional usage in logic: in the conventional sense, every variable in a LogiQL rule is bound by an implicit quantifier.)

In general, a bound variable should be positively bound, i.e., it should have a binding occurrence that is not in the scope of a negation. In the section called “Negation” we discuss the somewhat subtle exception to this rule.

Please notice that the second case above does not constitute a circular definition (of the notion of being bound): the definition is recursive, but there is a base case. Ultimately, all instantiations of variables are derived from the database or from equations with values that are derived from literals (e.g., x = 2 * 3 / 4).

A variable that is not bound is called unbound. A legal rule (or fact) in LogiQL must be syntactically correct, but it also must not contain occurrences of unbound variables.

It should be noted that the fact that a variable is bound does not ensure that it will actually be instantiated to anything. If the variable is bound by an occurrence in a predicate, the predicate may be empty. If the variable is bound by an equality, the other side of the equality might not evaluate to a value, as in x = y/0 (or, indeed, as in x = y, if y is not actually instantiated to anything).

In order to deal with such cases, and in order to understand the semantics of LogiQL declaratively (i.e., without reference to the details of the computation, which can be quite involved), it is sometimes best to think of a variable as representing the set of all its instantiations. For a bound variable this set will always be finite, though sometimes it might be empty. (Treating a variable as a set is actually only a first approximation: see Section 10.1.3.)

Note

One consequence of such a viewpoint is that an expression such as x + y * 2 represents not a value, but a set of values (that is, all the different values that would be produced by systematically replacing variables with each of their instantiations). In this manual we often talk about "the value" of an expression, simply because explaining everything in terms of sets would make the text too stilted. One should, however, keep this perspective in mind, especially when mention is made of the evaluation of an expression ending in failure (e.g., during division by zero): this simply means that the set of values is empty.

Example 9.2. Bound and unbound variables

p(x, 7).

The variable x is unbound, so this is not a legal fact (see Section 10.1.1).

p(x, y) <- x != y.

The variables x and y are unbound (they cannot be instantiated by the disequality), so this is not a legal rule.

p(x, y) <- x != y, q(x), r(y).

If q and r are unary (i.e., of arity one) predicates of the same type, then the variables x and y are bound. (The predicates must be of the same type, because the variables are used in a disequality.)

The last rule effectively declares p to be a predicate that contains all pairs (x, y), such that value x is in predicate q, value y is in predicate r, and the two values are different. See Section 10.1.2.

9.3. Arithmetic Operations

Expression = Expression ("+" | "-" | "*" | "/") Expression .

The multiplication (*) and division (/) operators have higher precedence than addition (+) and subtraction (-) operators, and otherwise operations associate to the left.

Example 9.3. 

x + y * z is equivalent to x + (y * z), because * has higher precedence than +. Otherwise, ambiguities are resolved by associating to the left. For example, x - y + z is equivalent to (x - y) + z.

An arithmetic expression with one operator evaluates to its usual numeric value if both the operands are numbers and the value can be represented.

Example 9.4. Addition

The expression 3+4 evaluates to the number 7.

Example 9.5. Division by zero

If the value of y is 0, then the expression x/y will not have a value: its evaluation will fail.

For example, given the following logic,

p(0).  p(1).  p(2).  p(3).

q(20). q(10).

r(z) <- p(y), q(x), z = x / y.

r will contain the values 3, 5, 6, 10 and 20.

As an exception, the result of dividing two integers is always an integer. If numeric division results in a fraction, then the value of the expression is that fraction rounded toward zero.

Example 9.6. Integer division

The value of -4/-3 is 1. The value of 4/-3 is -1.

Additionally, + can be used for strings. If both operands are strings, then the value of the expression is the string that is their concatenation.

Example 9.7. 

("abc" + "def") evaluates to the string "abcdef".

If a floating-point primitive produces a NaN value, no value is produced. If a floating-point operation produces -0.0, it is silently converted to 0.0.

Example 9.8. 

+F(0.0f/0.0f). will not result in anything being inserted to F, since 0.0f/0.0f produces NaN.

Arithmetic operators must be used with expressions of the same type. The operation will result in a value of the same type, as well. It is a compile error for the right-hand-side of an operator to be a different type than the left-hand-side.

Example 9.9. 

5d * 3.0d results in value 1.6667d, a decimal value.

Example 9.10. 

5d * 3.0f results in a compiler error, TYPE_ERROR.

9.4. Function applications

Expression          = FunctionApplication
                    | ExternalDiffFunctionApplication .
FunctionApplication = PredicateDescriptor "[" [ ArgumentList ] "]" .

ExternalDiffFunctionApplication =
    (PredicateDescriptor \ PredicateDescriptor) "[" [ ArgumentList ] "]" .

PredicateDescriptor = PredicateName [ "@" Suffix ].
PredicateName       = Identifier.
ArgumentList        = Expression { "," Expression } .

The optional suffix in PredicateDescriptor can be a reference to a transaction stage (Section 19.4.4) or the name of a branch (Chapter 41).

ExternalDiffFunctionApplication is an application of an external diff predicate, which is allowed when the predicate whose versions are compared happens to be functional. See Section 8.11 and Section 10.1.5 for more information.

Function applications can be written only for predicates that are functional. It is an error to write an application for a non-functional predicate.

The expressions between square brackets ([ and ]) are called arguments of the application. The number of arguments supplied must be exactly the number of key arguments of the relevant functional predicate.

An application expression applies a functional predicate to a series of zero or more arguments. The expression evaluates to the value in the value position of that tuple of the functional predicate that matches the arguments.

Example 9.11. 

Given a database where the predicate sold contains tuples: { ("squids", 1995, 100), ("salmon", 1995, 20) }, the application sold["squids", 1995] evaluates to 100.

Here are a few example applications:

Example 9.12. 

cost["squids", 1995]
cost[item, year]
cost[bestseller[year], year + 1]

It is worth noting that an application is purely a notational convenience, and can always be rewritten to something else. See Section 10.1 and Section 10.2.

Example 9.13. Functional notation is just a convenience

If area_of_room is a functional predicate, then the following two lines are equivalent in all respects (but the first one is to be preferred on styslistic grounds).

area_of_room[12, 10] = a
area_of_room(12, 10, a)

If the identifiers new_var_1 and new_var_2 are unused elsewhere in the current rule, then

... <- ... cost[bestseller[year], year + 1] < can_spend ...

is equivalent to

... <- ... bestseller(year, new_var_1)
          cost(new_var_1, year + 1, new_var_2),
          new_var_2 < can_spend
          ...

Similarly, the following

p(cost[x]) <- purchased_item(x).

is equivalent to

p(c) <- c = cost(x, c), purchased_item(x).

9.5. Parenthesized expressions

Expression = "(" Expression ")" .

An expression may be enclosed in parentheses. The value of such an expression is the same as the value of the enclosed expression. Parenthesized expressions are useful for overcoming the usual order of precedence.

Example 9.14. 

2 * 3 + 4 is not the same as 2 * (3 + 4), because the parentheses cause the summation to be evaluated before the multiplication.

Chapter 10. Formulas

Roughly speaking, a formula is a LogiQL construct that may be true or false, depending on the information currently in the database. If a formula contains variables, then it is, in general, the choice of instantiations for these variables that makes the formula false or true: the same formula may be false for some instantiations and true for others. For example, the simple formula x < y is true if we instantiate x to 7 and y to 8, but false if we instantiate both variables to 5.

The evaluation of a formula usually results in computing variable instantiations that make it true, or checking that there are no such instantiations, i.e., that the formula is always false. All this is explained in more detail below.

A formula that is true is often said to hold, and a false formula does not hold.

10.1. Atoms

Formula = Atom .

Atom = BasicAtom         | FunctionalAtom
     | ReferenceModeAtom | ExternalDiffAtom .

BasicAtom = PredicateDescriptor "(" [ ArgumentList ] ")" .

ArgumentList = Expression { "," Expression } .

ReferenceModeAtom = PredicateDescriptor "(" Expression ":" Expression ")".

PredicateDescriptor = PredicateName [ "@" Suffix ].
PredicateName       = Identifier.
Suffix              = Identifier.

The optional suffix in PredicateDescriptor can be a reference to a transaction stage (Section 19.4.4) or the name of a branch (Chapter 41). In the text below we will simply talk about a predicate name, tacitly assuming that in most contexts it can be augmented with a suffix. It is important to note that the suffix cannot be present if the atom occurs in the head of a clause.

(There is one exception to the rule: one is allowed to declare constraints for predicates with stage names, e.g., p@prev(x) -> x > 0.. However, such constraints currently have no effect.)

There are four kinds of atoms: basic atoms, functional atoms, reference-mode atoms and external diff atoms.

Functional atoms are described in Section 10.1.4.

A reference-mode atom consists of a predicate name followed by two expressions, separated by a colon and enclosed in parentheses. The first of these expressions must be an identifier (i.e., the name of a variable), and the second must be either an identifier or a literal. The predicate must be a reference mode predicate. See Section 8.4 for more information.

External diff atoms are described in Section 10.1.5.

A basic atom consists of a predicate name followed by a list of arguments, which may be empty. A non-empty list of arguments is a list of expressions separated by commas.

The number of expressions must be equal to the arity of the predicate, and the type of each expression must be the type of the corresponding position in tuples that belong to the predicate. The predicate may be a functional predicate, but in the case of a single-value functional predicate it is better style not to use a basic atom, but a functional atom, e.g., f[x, y] = z instead of f(x, y, z).

Note

In the rest of this section we use the unadorned term "atom" to denote a basic atom. Most of what we write will be applicable (with simple mutations) also to atoms that are not basic atoms.

In general, an atom is a formula meant to express a certain relation between its arguments (see Example 10.1). By contrast, a predicate is essentially a relation in the mathematical sense, i.e., a set of tuples. (The main difference is that, unlike a mathematical relation, a predicate can evolve over time, as its contents change.)

An atom is fully instantiated if all the variables occurring in its arguments are instantiated. A fully instantiated atom is true if the values of its arguments form a tuple that is (currently) in the named predicate. (We say "the values of its arguments", because the arguments are expressions.)

Example 10.1. Atoms

square_of(2, 3*3)          // 9 is a square of 2
Person(p)                  // p is a person
loves(p, _)                // p loves something
loves(p, "Juliet")         // p loves Juliet
loves("Romeo", "Juliet")   // Romeo loves Juliet

As we can see from the first example, whatever an atom expresses need not necessarily be true. (However, if this atom were a fact, we would probably not put it in our program: given some intended intepretation of the predicates, we try to make our databases reflect reality.)

There is a certain amount of leeway in how we interpret an atom. For example, we have chosen to read the last atom above as expressing "Romeo loves Juliet", but we might as well have chosen another reading: "Juliet loves Romeo". The choice is a matter of convention, and the syntax does not reflect it directly.

(The usual convention is the one we have adopted above. For the the second possibility we would normally have switched the arguments, or chosen another name for the predicate, e.g., is_loved_by.)

The exact meaning of an atom depends on the context in which it occurs. There are three possibilities:

  • an atom may be a fact;
  • an atom may occur in the head of a rule;
  • an atom may occur in the body of a rule.

10.1.1. Atoms as facts

A stand-alone atom followed by a period is called a fact. It represents a tuple in the predicate named by the atom.

Example 10.2. A fact

p(2 * 2, 2 + 3).

The above indicates that tuple (4, 5) belongs to predicate p. From the form of the atom we can see that p is a binary predicate (i.e., its arity is two), and that both its arguments are of type int.

The arguments of a fact must be expressions that evaluate to values. In particular, none of them may contain variables (such variables would be unbound).

The type of each argument must be the type of the corresponding position of tuples in the predicate.

Example 10.3. Bad facts

p(3 + x, 8).

The above will be rejected by the system (with an appropriate diagnostic message), because x is not a value: it is an unbound variable (see the section called “Bound variables and their instantiations”).

p("alpha", "beta").

If we had already submitted the fact in Example 10.2, then the above will be rejected, because p has been implicitly declared as

p(x, y) -> int(x), int(y).

The possibility of implicitly declaring a predicate by supplying a fact is a special case of the mechanism described in Section 11.1.1. Please note that in this case the implicit declaration is that of an IDB predicate (see the section called “Kinds of rules”).

As noted in Chapter 8, predicates contain sets of tuples. It is not an error to have a number of facts that correspond to the same tuple, but the predicate will contain only one of those tuples.

Example 10.4. Predicates do not contain duplicates

p(1 * 2, 2 * 2).
p(2 * 3, 3 * 3).
p(2 * 1, 2 + 2).
p(3 * 2, 3 + 3).

If the above is all the information that we have about predicate p, then p will contain only three tuples:

p(2, 4).
p(6, 9).
p(6, 6).

10.1.2. Atoms in heads of rules

Rules are described in Chapter 11.

An atom in the head of a rule is not dissimilar to a fact. There are two main differences:

The atom may be thought of as a shorthand for the set of facts to which it can be converted by systematically replacing its variables with those of their instantiations for which the body is true.

(The careful reader will notice that a fact can be regarded as a degenerate form of a rule: one where the non-existent body is trivially true.)

Example 10.5. A simple rule

q(0). q(1). q(2).

r(x + y, x * y) <- q(x), q(y).

If these are all the facts for q, and q will not change, then the rule is equivalent to the following set of facts:

r(0, 0).
r(1, 0).
r(2, 0).
r(2, 1).
r(3, 2).
r(4, 4).

(In the above, we have removed duplicate facts that would have no effect: see Example 10.4.)

Of course, the advantage of using the rule is that it will cause the system to automatically update the contents of predicate r whenever tuples are added to (or removed from) q.

Example 10.8 might make things a little more clear.

10.1.3. Atoms in bodies of rules

Rules are described in Chapter 11. To simplify this introductory description, we will assume that none of the atoms discussed here is in the scope of a negation. Negated atoms will be discussed in the section called “Negation”.

An atom in the body of a rule is either true or false, depending, in general, both on its form and on the contents of the database. This is clarified below.

An atom in the body is true if and only if it matches at least one of the tuples that are in the database and belong to the predicate named by the atom.

An atom matches a tuple if each of its arguments matches the value in the corresponding position of the tuple, as follows:

  • If the argument is an expression that does not contain uninstantiated variables, then it matches a value v if and only if the value of the expression is identical to v.
  • If the argument contains a variable x, the attempt to match will involve an attempt to instantiate x, so that the value of the entire argument will become identical to the corresponding value in the tuple: if such an instantiation can be found, the match will be successful. Once the variable is instantiated, all its other occurrences are effectively replaced by its value.

This is perhaps easier to understand if one treats the expressions in an atom as syntactic sugar, as shown in the following example:

Example 10.6. Expressions in the arguments of a body atom

The rule

q(x) <- p(x, x * x). 

is equivalent to

q(x) <- p(x, y), y = x * x. 

In the second version, (x, y) is matched against all the tuples of p, thus instantiating variables x and y. The resulting instantiations are accepted only if they satisfy y = x * x, and q is populated only with those instantiations of x that form a part of the accepted instantiations. These are exactly the instantiations of x for which p(x, x * x) is true.

See Chapter 11 and the section called “Conjunction and Disjunction”. See also Example 10.7.

In general, an atom may match a number of tuples. If the process of matching involves instantiating some variables, we may consider each variable as being instantiated to a set of values, as discussed in the section called “Bound variables and their instantiations”. This, however, is a simplification. If the atom contains occurrences of two or more variables, then we should not think of each variable as being instantiated independently: it is, rather, the entire collection of variables occurring in the atom that is being instantiated. This is illustrated by the example below.

Example 10.7. Instantiating a collection of variables

p(1, 3).  p(2, 4).

q(x * y) <- p(x, y).

The predicate q will contain two values: 3 and 8. This is because x is instantiated to 1 only when y is instantiated to 3, and to 2 only when y is instantiated to 4.

In other words, the atom in the body can be thought of as instantiating an ordered tuple (x, y): the instantiation consists of the set{(1, 3), (2, 4)}. This is essentially equivalent to the following table in a relational database:

xy
13
24

If we regarded the atom as instantiating the variable x to {1, 2} and the variable y to {3, 4}, then we would incorrectly think of q as containing 3, 4, 6 and 8.

As we can see from the example and the preceding discussion, an atom in the body of a rule may be regarded as playing two roles:

  • it is a filter that rejects those tuples of the named predicate that do not match the arguments;
  • it is a generalized projection operation that extracts a table of instantiations for the variables that occur in its arguments.

The following example might help to illustrate the three uses of atoms. See Chapter 11 for more information.

Example 10.8. An illustration of the use of atoms

p(1, 2).  p(1, 3).  p(2, 4).  p(4, 5).  p(5, 5).

q(x, x * 2) <- p(x, x + 1).

r(x) <- p(x - 1, x).

s(x) <- p(x - 1, x + 1). 

Assume the predicate p contains only tuples declared explicitly by the facts above, i.e.,

(1, 2)
(1, 3)
(2, 4)
(4, 5)
(5, 5)

Then the predicate q will contain the following tuples:

(1, 2)
(4, 8)

This is because the atom in the body of the rule for q matches only

(1, 2)
(4, 5)

so x can be thought of as instantiated to the set {1, 4}.

It is now clear that predicate r will contain the values (one-tuples) 2 and 5, while s will contain 2 and 3.

It should be noted that the expressions that may be used for matching-and-instantiating must be relatively simple. For example, the following rule will not be accepted by the LogicBlox compiler, even though we might have no trouble determining what tuples should belong to t (given the definitions in Example 10.8):

t(x, y) -> int(x), int(y).

t(x, y) <- p(x - y, x + y).

By contrast, the following is quite acceptable (though not very useful):

t(x, y) <- p(x, x + y).

The compiler will usually provide us with sufficient explanations if we get carried away and write things that it cannot handle.

An outline of an explanation for the curious.  Expressions in arguments are just syntactic sugar. The compiler converts an atom such as p(x, x + y) into

p(x, z), z = x + y

Matching (x, z) against the tuples in predicate p is trivial. Since x and z are bound, the compiler can make y bound by rewriting z = x + y to y = z - x. It is not difficult to see that in the case of p(x - y, x + y) such simple rewriting will not make y bound. We could instantiate y to the set of all integers, and then use p as a filter, but that is not a practical proposition.

10.1.4. Functional atoms

FunctionalAtom = FunctionalExpression "=" Expression
               | PredicateDescriptor "(" KeyArguments ";" ValueArguments ")"
               | PredicateDescriptor "(" KeyArguments ";" )"
               | PredicateDescriptor "(" ";" ValueArguments ")"
               .

FunctionalExpression = PredicateDescriptor "[" [ KeyArguments ] "]" .

KeyArguments   = Expressions .
ValueArguments = Expressions .

Expressions = Expresssion { "," Expression } .

PredicateDescriptor = PredicateName [ "@" Suffix ] .
PredicateName       = Identifier .
Suffix              = Identifier .

Functional atoms of the first kind have the syntactic form of comparisons (see Section 10.2). We list them here, because semantically they are atoms.

A functional atom is used to refer to a functional predicate (see Section 8.2). As always, if the atom occurs in the head of a rule (or in a fact), then the PredicateDescriptor must be a simple PredicateName.

If the functional predicate has only one value argument, we use the first form, as in the following example:

Example 10.9. Single-valued functional atoms

The following program declares and populates two functional predicates, f and g.

create --unique

addblock <doc>
  f[x] = y -> int(x), int(y).
  g[x] = y -> int(x), int(y).

  f[1] = 2.
  f[2] = 4.
  f[3] = 6.

  g[x + 1] = f[x] * 3.
</doc>
print g

close --destroy

This results in

created workspace 'unique_workspace_2017-01-14-23-54-28'
added block 'block_1Z1B35MR'
2 6
3 12
4 18
deleted workspace 'unique_workspace_2017-01-14-23-54-28'

This form of functional atom can be thought of as convenient syntactic sugar for a basic atom. One must not forget, however, that by using it in a declaration we also trigger the automatic generation of constraints that ensure the predicate will always be a function (i.e., will contain at most one tuple for each combination of the key arguments). Moreover, functional notation cannot be used for predicates that have not been declared to be single-valued functions.

Example 10.10.  Variants of Example 10.9

create --unique

addblock <doc>
  f(x, y) -> int(x), int(y).
  g(x, y) -> int(x), int(y).

  f(1, 2).
  f(2, 4).
  f(3, 6).

  g(x + 1, y * 3) <- f(x, y).
</doc>
print g

close --destroy

To ensure that f and g are functions, one could also do this:

create --unique

addblock <doc>
  f[x] = y -> int(x), int(y).
  g[x] = y -> int(x), int(y).

  f(1, 2).
  f(2, 4).
  f(3, 6).

  g(x + 1, y * 3) <- f(x, y).
</doc>
print g

close --destroy

It is strongly recommended that such a mixture of styles be avoided. We should use the style shown in Example 10.9.

The second form of functional atoms is used to reference multi-valued functional predicates (see the section called “Multi-valued functional predicates”). They are exactly like basic atoms, except that the list of key arguments is separated from the list of value arguments by a semicolon. Using such an atom in a declaration ensures that the predicate will indeed be a function. When used elsewhere, the atom can be replaced with a corresponding basic atom, but this is not recommended.

Example 10.11. Multi-valued functional atoms

The following is a declaration of a multi-valued functional predicate:

m(a ; b, c) -> int(a), int(b), int(c).

The following two facts add two tuples to m. (Notice that there is no semicolon in the second one: this is legal, but not recommended.)

m(1; 2, 3).
m(11, 2, 3).

However, if we now tried to add m(1; 3, 4)., we would get an error message that begins with

Error: Function cannot contain conflicting records: attempt to insert [1,3,4,1], but conflicting record [1,2,3,1] exists.

Example 10.12. Functional atoms without key arguments

The following creates (and implicitly declares) a multi-valued functional predicate with no key arguments. Since there is only one (implicit) key, such a predicate can contain only one tuple, so it is effectively a constant pair of values.

n(; 2, 3).

The following two forms may be used to retrieve the two values:

n(; x, y)
n(x, y)

10.1.5. External diff atoms

ExternalDiffAtom = ExternalDiffSpecifier "(" [ ArgumentList ] ")"
                 | ExternalDiffSpecifier "[" [ ArgumentList ] "]" "=" Expression
                 .

ExternalDiffSpecifier = "(" PredicateDescriptor "\" PredicateDescriptor ")" .

PredicateDescriptor = PredicateName [ "@" Suffix ] .
PredicateName       = Identifier .
Suffix              = Identifier .

ArgumentList = Expression { "," Expression } .

An external diff (or ediff) atom is a reference to an external diff predicate: see Section 8.11. An ediff atom can occur only in the body of a transaction-lifetime delta rule. An external diff predicate is computed only when it is referred to by some ediff atom.

The predicate name in both occurrences of PredicateDescriptor must be the same.

The suffix can be the name of a branch (Chapter 41) or of a transaction stage (either previous or initial, see Section 19.4.4). A PredicateDescriptor has only one suffix, so it can refer to a particular stage only on the current branch. A predicate name p without a suffix is equivalent to p@initial.

10.2. Comparisons

Comparison         = Expression ComparisonOperator Expression .
ComparisonOperator = "=" | "!=" | "<" | ">" | "<=" | ">=" .

A comparison formula compares two expressions with each other. The two expressions must have the same type.

If the comparison operator is equality (=), then the formula is true if the two values are equal. If the operator is a disequality (!=), the formula is true if the two values are different. Equality and disequality can be used to compare two values of any type, either a primitive type or an entity type.

The operators < (less than), > (greater than), <= (less or equal), and >= (greater or equal) can be applied only to ordered primitive types, such as numbers, strings, and dates. Strings have a lexicographic ordering, e.g., "Ann" < "Bob" and "Ann" < "Anne".

A formula with a single comparison operator is just a more convenient notation ("syntactic sugar") for an atom that refers to a built-in predicate: see the section called “Comparison Operations”.

Note

A single-valued functional atom (see Section 10.1.4) has a syntactic form that is indistinguishable from that of a comparison, but the two are quite different. For example, for integer x and y the comparison x * 2 = y - 1 is syntactic sugar for the atom int:eq_2(x * 2, y - 1), while the functional atom f[x * 2] = y - 1 is syntactic sugar for f(x * 2, y - 1).

10.2.1. Chained Comparison Formulas

Formula = Expression ComparisonOperator Expression
             { ChainComparisonOperator Expression } .

ChainComparisonOperator = "<" | ">" | "<=" | ">=" .

Ordered comparison operators may be chained. This is just a more convenient notation ("syntactic sugar") for a conjunction of simple comparison formulas (see the section called “Conjunction and Disjunction”).

Example 10.13. Chained comparison with the same operator

3 < 4 < 5

The above formula evaluates to true, because it has the same meaning as:

3 < 4, 4 < 5

Example 10.14. Chained comparison with different operators

3 < 4 > 2

Similarly to the previous example, the above formula has the same meaning as:

3 < 4, 4 > 2

The first comparison operator in a chained comparison formula can be = or !=.

Example 10.15. (In)equality in chained comparison

5 = 3 < 5
5 != 3 < 4

The first formula above evaluates to false, while the second evaluates to true.

10.3. Complex Formulas

Atoms can be composed into more complex formulas by means of logical operators. There are three logical operators: conjunction, disjunction, and negation.

Conjunction and Disjunction

Formula = Formula ("," | ";") Formula .

A conjunction is written as two formulas separated by a comma (,). A conjunction is true whenever both constituent formulas are true.

Example 10.16. Conjunction

3 < 4, 4 < 5   // true, because both are true
3 < 4, 4 > 5   // false, because only the first is true

A disjunction is written as two formulas separated by a semicolon (;). A disjunction is true if at least one of the two formulas is true.

Example 10.17. Disjunction

3 < 4; 4 > 5  // true, because the first of the two is true
3 < 4; 4 < 5  // true, because both are true

The precedence of the disjunction operator (;) is lower than that the conjunction operator (,).

Example 10.18. Combining conjunction with disjunction

The following two rules are equivalent:

p(x, y) <- q(x), r(y) ; s(y), t(y, x).
p(x, y) <- (q(x), r(y)) ; (s(y), t(y, x)).

Each of these rules is equivalent to the following two rules:

p(x, y) <- q(x), r(y).
p(x, y) <- s(y), t(y, x).

To get rid of the disjunction from the following rule

p(x, y) <- q(x), (r(x, y); s(y, x)).

we would have to introduce an auxiliary predicate

p(x, y) <- q(x), t(x, y).
t(x, y) <- r(x, y).
t(x, y) <- s(y, x).

Please note that the usual convention is to write a space after, but not before, the comma. We sometimes make the semicolon more visible by adding extra whitespace, as in the first example above.

Both the conjuncts (disjuncts) may contain occurrences of the same variables.

If there is only one common variable, and we consider a variable as a set of all its instantiations, then for that variable the conjunction (disjunction) corresponds roughly to the intersection (union) of the sets of instantiations for which the conjuncts (disjuncts) are true.

Example 10.19. Conjunction and disjunction: the simplest case

p(1). p(2). p(3).
q(2). q(3). q(4).
r(3). r(4). r(5).

s(x) <- p(x), q(x), r(x).

t(x) <- p(x); q(x), r(x).

The predicate s contains only 3.

The predicate t contains 1, 2, 3 and 4 (but not 5).

In general the situation is more complicated. As discussed in Example 10.7, an atom in the body of a rule instantiates a tuple of variables to a set of tuples, i.e., it essentially produces a relation ("table") similar to the ones used in relational databases (the main difference being that LogiQL has set semantics, while SQL has multiset semantics). Given this view, conjunction can be thought of as corresponding to a generalised natural join; disjunction would then be a generalised union. This is illustrated by the following example.

Example 10.20. Conjunction and disjunction: a more general case

p(1, 3).   p(2, 4).   p(2, 20).
q(1, 10).  q(2, 20).  q(3, 30).

r(x + y + z) <- p(x, y), q(x, z).

s(x + y + z) <- p(x, y), z = 0; q(x, z), y = 0.

The predicate r contains the values 14. 26 and 42. In this case the conjunction acts as a natural join. Given the following tables:

xy
13
24
220
xz
110
220
330

we produce

xyz
1310
2420
22020

The predicate s contains 4, 6, 11, 22, and 33. In this case the disjunction is equivalent to the union of the following two tables:

xyz
130
240
2200
xyz
1010
2020
3030

The union is

xyz
130
240
1010
2020
2200
3030

and the duplicate 22 was removed from s. In general, both the join and the union will always be sets (i.e., will contain no duplicate tuples).

Note that in the rule for s we had to add explicit instantiations for z and y: otherwise the compiler would have complained about unbound variables. The rule is equivalent to:

s(x + y + z ) <-  p(x, y), z = 0.
s(x + y + z ) <-  q(x, z), y = 0.

Negation

Formula = "!" Formula .

A negation holds when the negated formula is false. For example, !(0 > 1) is true.

Example 10.21. Negations

!p(x, y)
!(p(x), q(x))

Negation has higher precedence than conjunction, and conjunction has higher precedence than disjunction. However, negation has a lower precedence than a comparison operator.

Example 10.22. Precedence of negation, disjunction and conjunction

p(); q(), r()  // a disjunction: equivalent to p(); (q(), r())
!a(), b()      // a conjunction: equivalent to (!a()), b()
! f[x] = y     // a negation:    equivalent to !(f[x] = y)

If we allowed the negated formula to produce instantiations for a variable, then negation would be tantamount to taking the complement of the set of instantiations. This would, in general, be too expensive (and sometimes even impossible).

Example 10.23. No binding under negation

Given the declarations

p(a)     -> int(a).
q(a)     -> int(a).
f[a] = b -> int(a), int(b).

the rule

p(x) <- ! f[7] = x.

would not be accepted by the compiler (the diagnostic message being variable 'x' is not bound; it must be used outside a negation). Indeed, the instantiation of x would consist of the entire set of integers, except the one which is the value of f[7].

The following, however, would be accepted, because of the binding occurrence of x in q.

p(x) <- ! f[7] = x, q(x).

For this reason a LogiQL rule must satisfy the following condition: all variables that appears in the head must be positively bound. "Positively" means simply: in a formula that is not negated.

A more precise statement of this rule would be: every variable that occurs in the head must have a binding occurrence outside the scope of a negation.

(A brief explanation for the curious: a variable that has a binding occurrence outside the scope of a negation may also have binding occurrences inside the scope of a negation, but the compiler will ensure that these will not actually produce any instantiations. There is a distinction between so-called iterable positions of a variable, which are used to actually extract instantiations from the database, and searching positions, which are used to check for the presence of already-extracted instantiations, by means of an index. The latter are allowed inside negations.)

A bound variable is usually positively bound. There is an exception in the case of variables that have no occurrences outside the scope of a negation.

The instantiations of a variable whose occurrences appear only in the scope of a negation are not "exported" outside the negation, so there is no conceptual or major practical difficulty in allowing it to be instantiated there. However, for somewhat obscure technical reasons, in LogiQL this is allowed only if the binding occurrence is in the value position of a functional predicate.

Example 10.24. A legal use of negatively bound variable

p(a)     -> int(a).
q(a)     -> int(a).
f[a] = b -> int(a), int(b).

p(x) <- !(f[x] = y, y < x), q(x).

The rule above is accepted by the compiler. Variable y has only a negative binding occurrence, but that is in the value position of f. Note that the argument of f is bound by the occurrence in q(x).

The rule has a somewhat artificial form, because we wanted to keep the example simple. This particular rule would normally be written as

p(x) <- ! f[x] < x, q(x).

Example 10.25. An illegal use of a negatively bound variable

p(x)    -> int(x).
q(x)    -> int(x).
s(x, y) -> int(x), int(y).

p(x) <- !(s(x, y), y < x), q(x).

The rule above is not accepted by the compiler, which produces the following error message: existentially quantified variable 'y' occurs negatively in an iterable position.

For this simple example this is not a real limitation, since the rule can be rewritten to a legal one by introducing an auxiliary predicate:

t(x) <- s(x, y), y < x.
p(x) <- !t(x), q(x).

Unfortunately such rewriting is not always possible: see Example 11.12.

It is worth noting that in some cases negation in LogiQL is not quite the same as what one might naively expect. This is illustrated by the following table.

Table 10.1. Treatment of negation and equality

ScenarioObservations
A[x] = α, B[x] = αA[x] = B[x], !(A[x] != B[x])
A[x] = α, B[x] = β, α ≠ β !(A[x] = B[x]), A[x] != B[x]
A[x] = α, B[x] undefined!(A[x] = B[x]), !(A[x] != B[x])
A[x] undefined, B[x] = α!(A[x]=B[x]), !(A[x] != B[x])
A[x] undefined, B[x] undefined!(A[x] = B[x]), !(A[x] != B[x])

The last three examples are particularly instructive. Because either A[x] or B[x] (or both) is missing, the checks for equality and disequality will fail. The outer negation operations will therefore succeed.

Please note that these examples do not violate classical logic. For example, !(A[x] = B[x]) is actually shorthand for there does not exist a y such that A[x] = y and B[x] = y, which is not the same as A[x] != B[x].

Tip

The interaction of negation and equality/disequality is slightly different in LogicBlox 4.x than in earlier releases. In LogicBlox 4.x (A[x] != B[x]) always has the same meaning as (B[x] != A[x]).

Parenthesized Formulas

Formula = "(" Formula ")" .

A formula surrounded by parentheses is also a formula. The two formulas have the same truth value.

Example 10.26. Parenthesized Formulas

(p(); q()), r() // this is a conjunction

Chapter 11. Rules

A rule is a logical implication that specifies how the content of a predicate can be derived from the contents of other predicates. Rules have the following syntactic form:

Rule = Formula "<-" Formula "." .

The arrow (<-) is an implication operator, read as "is implied by".

The formula on the left-hand-side of the implication operator is referred to as the head of the rule; the one on the right-hand-side is the body.

The meaning of a rule is as follows: if the formula in the body holds, then the formula in the head also holds. (We sometimes say that if the formula in the body holds, then the rule fires.)

The LogicBlox database system makes sure that the implication is satisfied: this is done by populating the predicates that occur in the head of the rule with the appropriate tuples. In other words, the predicates that occur in the head are automatically derived from the predicates that occur in the body (see also Section 11.1.2).

We sometimes informally say that a rule derives into its head predicate(s).

Rules and the quantification of variables

Note

The purpose of this section (which might be skipped on a first reading of this manual) is to bring to light, in an informal fashion, some of the subtleties that have to do with the quantification of variables. The information might make the wording of compiler diagnostics somewhat less mysterious. Not all the rules given in the examples are actually legal LogiQL rules: this is further elaborated at the end of the section.

A rule is a logical implication. A rule such as the following

P <-  Q.

corresponds to the logical statement P is implied by Q.

This statement is false only when Q holds, but P does not hold. So it is equivalent to the statement P or not Q.

For more complicated bodies we must remember to use deMorgan's rules. So, for example,

P <-  Q, R, S.

corresponds to the logical statement P is implied by Q and R and S. This is equivalent to P or not (Q and R and S), which, by deMorgan's rules, is equivalent to P or not Q or not R or not S.

A LogiQL rule does not contain free variables: all the variables are (implicitly) quantified.

In a rule without negations (see the section called “Negation”), all the variables are universally quantified ("for all"). However, a variable that does not occur in the head can be also be thought of as existentially quantified ("there exists").

Example 11.1. Body variables are also existentially quantified

is_grandparent(x) <- parent(x, y), is_parent(y).

The variable y occurs only in the body. The following are two natural readings of the rule:

For all x and all y, if x is a parent of y and y is a parent, then x is a grandparent.

For all x, if there exists a y such that x is a parent of y and y is a parent, then x is a grandparent.

It is easy to show that these two are indeed equivalent. Recall that head is implied by body is equivalent to head or not body, and when a universal quantifier is moved into a negation, it becomes existential: for all x not P holds is the same as not (exists x such that P holds).

The second reading shows that y can be thought of as existentially quantified (and the rule's body is then the scope of the existential quantifier). The variable x is not existentially quantified, because the quantifier For all x cannot be moved into the body: that would leave the occurrence of x in is_granpdarent(x) free (and the two occurrences would no longer refer to the same variable).

Example 11.2. Intuition can be misleading

In many cases it is more natural and less confusing to treat a variable as existentially quantified. For example, the following is an obvious statement: If, for every y that is a child of x, y has no child, then x has no grandchild.

We might be tempted to encode this as the following rule:

has_no_grandchild(x) <- parent(x, y), has_no_child(y).

That would be wrong. What this rule actually encodes is the slightly different statement: For all x and all y, if y is a child of x and y has no child, then x has no grandchild.

Our error becomes more obvious if we read the rule as the equivalent statement: For all x, if there exists a y such that y is a child of x and y has no child, then x has no grandchild.

We can now immediately see that if x has several children, and one of them has no child, it does not necessarily follow that x has no grandchild: x could have a grandchild by one of the other children.

If the body of a rule contains negated formulas, the situation gets a little more complicated. Every variable that has occurrences only inside a negated formula is quantified existentially. The scope of the existential quantifier is the smallest negated formula that contains all the occurrences (but one must be careful to distinguish between occurrences of different variables with the same name: see Example 11.4).

Example 11.3. Variables that occur only in the scope of a negation

Given the intuitively obvious meanings of the predicates, the following defines the notion of having a grandchild:

has_grandchild(x) <- parent(x, y), has_child(y).

This can be used to define the notion of not having a grandchild:

has_no_grandchild(x) <- ! has_grandchild(x).

We can now expand the definition of has_grandchild to obtain:

has_no_grandchild(x) <- ! (parent(x, y), has_child(y)).

In this rule the variable y has occurrences only in the scope of the negation. The rule can be read as follows: For any x, x has no grandchild if there does not exist a y such that x is a parent of y and y has a child.

"There does not exist" (or, more precisely, "it is not true that there exists") is a negated existential quantifier. The scope of y is the scope of the quantifier, i.e., in this case the entire negated formula.

Since there are no other variables named y (see Example 11.4) we can bring the quantifier outside the negation (thus changing it to a universal quantifier). The rule would now read: For any x, x has no grandchild if, for all y, it is not the case that x is a parent of y and y has a child.

This is equivalent to For any x, x has no grandchild if, for all y, either x is not a parent of y or y has no child.

Or, equivalently (by the definition of implication): For any x, x has no grandchild if, for every y such that x is a parent of y, y has no child. This is essentially the same as the statement with which we began Example 11.2

Example 11.4. A quantifier limits the scope of a variable

We could define has_child as

has_child(x) <- daughter(x, y).
has_child(x) <- son(x, y).

We might want to use this for defining the predicate has_no_child:

has_no_child(x) <- ! daughter(x, y), ! son(x, y).

The rule can be read as: For any x, if there is no y such that y is a daughter of x and there is no y such that y is a son of x, then x has no child.

Please note that there are two existential quantifiers, and each of them quantifies a variable named y. These are actually two different variables, each of them with the name y. The rule might as well have been written as follows:

has_no_child(x) <- ! daughter(x, y), ! son(x, z).

In fact, since both y and z have only one occurrence each, the recommended way to write this is

has_no_child(x) <- ! daughter(x, _), ! son(x, _).

The point of this example is to show that:

  • the scope of a variable may be much smaller than the entire rule, and two occurrences of the same identifier need not refer to the same variable;
  • increasing the scope of a quantifier may require renaming of variables.

Note

The restrictions of LogiQL usually protect the user from most of the subtleties illustrated in this section. In particular, there is a restriction that variables must be positively bound (with the exception noted in the section called “Negation”). This restriction would make it necessary to modify the rules in our examples: all the existential quantifiers could then be regarded as having the same scope (viz., the body of the rule). For example, the rule

has_no_child(x) <- ! daughter(x, y), ! son(x, y).

would have to be rewritten as

has_no_child(x) <- person(x), person(y),
                   ! daughter(x, y), ! son(x, y).

Kinds of rules

There are many ways to divide LogiQL rules into categories. In particular, they can be classified as follows:

  • IDB (intensional database) rules;
  • EDB (extensional database) rules;
  • aggregation rules.

EDB rules are often referred to as active logic, and to understand them one must understand transaction semantics in LogicBlox. We therefore defer their description to Chapter 19, in particular to Section 19.2 and Section 19.3.

Aggregation rules are the subject of Chapter 12.

In this chapter we focus on IDB rules.

11.1. Basics of IDB Rules

An IDB rule may contain an arbitrarily complex formula in the body, and a conjunction of at least one atom in the head.

The following example defines IDB rules for computing the ancestral relationship between people.

Example 11.5. Computing ancestors with IDB rules

person(x), person_name(x:n) -> string(n).
parent(x, y) -> person(x), person(y).
ancestor(x, y) -> person(x), person(y).

ancestor(x, y) <- parent(x, y).
ancestor(x, y) <- parent(x, z), ancestor(z, y).

The above example has two rules (in the last two lines).

The first rule reads: if x is a parent of y, then x is an ancestor of y.

The second rule reads: if x is a parent of z, and z is an ancestor of y, then x is an ancestor of y.

A database might contain the following tuples for the predicate parent (we use the name of a person to refer to its entity value):

"Bob" "Jack"
"Bob" "Jill"
"Jack" "Alice"

Given the above, we can expect that evaluating the two rules in Example 11.5 would result in the following tuples in ancestor:

"Bob" "Jack"
"Bob" "Jill"
"Jack" "Alice"
"Bob" "Alice"

Instead of specifying multiple rules whose head atoms refer to the same predicate, we may also specify one rule, with a disjunction in the body. The disjunction operator is the semicolon (;).

Example 11.6. 

The following rule is equvalent to the two rules in Example 11.5:

ancestor(x, y) <-
    parent(x, y)
  ; parent(x, z), ancestor(z, y).

The rule may be read: If x is a parent of y, or x is a parent of an ancestor of y, then x is an ancestor of y.

Finitely Bound Variables.  It is a requirement that a rule must compute a finite number of tuples. For example, the following is not a valid rule and will be rejected by the compiler:

smaller_than(x, y) -> int(x), int(y).

smaller_than(x, y) <- x < y.

It is clear that smaller_than would contain an infinite number of tuples (or, at least, infinite for all practical purposes). LogiQL does not allow such rules. Typically, rules that do not result in a finite number of tuples will result in compiler error messagess about invalid variable bindings. The compiler determines whether a rule results in a finite number of tuples by analyzing whether each variable that appears in the rule head is finitely bound. In this case, neither x nor y is finitely bound.

The exceptions to this requirement are value-constructing rules (Section 11.2), and derived-only rules (Section 11.3).

Positively Bound Variables.  Another requirement is that every variable in the body must be:

  • either positively bound (i.e., with at least one binding occurrence that is not in the scope of a negation: that binding occurrence should be "finite" in the sense described above);
  • or negatively bound in the value position of a functional predicate (see the section called “Negation” for further explanations and an example).

This ensures that the search space for rule evaluation is finite.

11.1.1. Predicate Inference

As a programming convenience, LogiQL infers the existence of predicates from rules (and/or facts). This inference reduces the need for programmers to explicitly declare predicates (as described in Chapter 8).

Example 11.7. Inferring predicates

The following program is equivalent to the one in Example 11.5:

person(x), person_name(x:n) -> string(n).
parent(x, y) -> person(x), person(y).

ancestor(x, y) <- parent(x, y).
ancestor(x, y) <- parent(x, z), ancestor(z, y).

Note that the above does not include explicit declarations of ancestor. From the two rules that derive into ancestor, the compiler infers that ancestor exists, and that its arity is two, with both arguments being of type person.

A special case of such predicate inference is inference from facts, which can be considered as rules with empty bodies. See also Section 10.1.1.

11.1.2. Incremental Maintenance

The LogicBlox database maintains the consistency of rules automatically, and in an incremental fashion.

Automatic maintenance means that the contents of the predicates named in each rule's head are consistent with the rule and the contents of the predicates mentioned in the rule's body. If the latter change over time, the head predicates will be updated accordingly, with no actions required from the programmer.

Example 11.8. 

Please refer to Example 11.5 and the text that follows it.

If the tuple ("Bob", "Jack") is removed from the predicate parent, the tuples ("Bob", "Jack") and ("Bob", "Alice") will be automatically removed from predicate ancestor.

Similarly, if a tuple is added to parent, then the corresponding tuple(s) will be added to ancestor as well.

Incremental maintenance means that, when there is a change in the contents of predicates mentioned in the bodies of some rules, the predicates mentioned in the heads are not recomputed from scratch. The recomputation effort is proportional to the actual differences in the predicates.

11.2. Value-constructing Rules

Rules can be used not only to derive new tuples from existing values. They can also be used to derive, or construct, new values. Value-constructing rules use constructor predicates (Section 8.5) to accomplish this.

The following example illustrates the use of a constructor predicate, person_by_name, to construct new person entity values.

Example 11.9.  Constructing person entity values by means of a rule

person(x) -> .
person_by_name[first, last] = p -> string(first), string(last), person(p).
lang:constructor(`person_by_name).

names_input(first, last) -> string(first), string(last).

person(p), person_by_name[first, last] = p <- names_input(first, last).

For each tuple of (first, last) in predicate names_input, the above rule creates exactly one person value.

Note that the rule can also be written in a simplified form:

person_by_name[first, last] = _ <- names_input(first, last).

A variable that occurs in the head of a rule, but not its body, would not be bound (see the section called “Bound variables and their instantiations” and Section 10.1.2). Instead of just disallowing it, LogiQL gives it a special interpretation: it is treated as an existentially quantified variable that can be instantiated by a constructor to a freshly-created entity (see the example above). Such a variable cannot, of course, be instantiated by two different constructor atoms in the the same rule.

Example 11.10. An illegal value-constructing rule

person_by_ssn[ssn] = p -> string(ssn), person(p).
lang:constructor(`person_by_ssn).

input_data(first, last, ssn) ->
  string(first), string(last), string(ssn).

person(p),
   person_by_name[fn, ln] = p,
   person_by_ssn[ssn] = p <- input_data(fn, ln, ssn).

In the above rule the variable p is shared by two constructor atoms: person_by_name and person_by_ssn. This code is rejected at compile-time.

11.3. Derived-only Rules

By default, when a rule defines what should be derived into a predicate, the result of evaluating the rule is materialized and stored as the content of that predicate. There are two reasons why this might sometimes not be desirable:

  • Performance: The result of the evaluation is used infrequently, or the size of the result is disproportionately large compared to the complexity of the computation. That is, it is cheaper to simply re-evaluate the rule (and derive the contents of a predicate) than it is to store the result.
  • Expressiveness: A rule specifies a computation with infinite results, and thus cannnot be materialized. However, the logic is useful in many contexts. This is often the case when a rule specifies some arithmetic computation.

In either of these situations, the programmer may choose to define derived-only rules, which must derive into derive-only predicates.

The following example illustrates the definition and use of a derived-only predicate and rule.

Example 11.11.  Defining a derived-only predicate for computing the hypotenuse of a right triangle

The predicate hypotenuse takes the length of the two sides of a right triangle, and computes the length of the hypotenuse. We would like to reuse the definition in many different rules. However, the result of the computation cannot possibly be materialized, as it is infinite. We therefore declare hypotenuse to be a derived-only predicate, by using the pragma lang:derivationType.

hypotenuse[x, y] = z -> float(x), float(y), float(z).
lang:derivationType[`hypotenuse] = "Derived".

hypotenuse[x, y] = float:sqrt[x * x + y * y].

The predicate can be used in rules where its key arguments are positively and finitely bound. For instance, we can use hypotenuse in the following rule to help determine which tuples in candidates can be the sides of a right triangle:

candidates(x,y) -> float(x), float(y).

possible_triangles(x, y) <- candidates(x, y), hypotenuse[x, y] = _.

A derived-only rule by itself is never evaluated. It is simply unfolded into the rule in which its derived-only predicate is used.

11.4. Putting it all together: general recursion

We now know the basics of LogiQL. Before moving on to more specialised constructs, let us look at a larger example (which is somewhat silly, but might nevertheless be instructive).

Note

In what follows, we will use recursion, i.e., predicates defined by rules that use the contents of those predicates. The recursion is general : there are no special restrictions put onto the form of the recursive rules. While this works perfectly on simple examples, it might be a very serious drain on resources in the context of a larger application. See Chapter 15 for more information, and for ways of making recursion more efficient in those cases when what we really want is iteration.

Example 11.12. IDB rules and general recursion

To celebrate the 80 birthday of great-grandfather Abe, who is a little heavy but still very fit, the menfolk of the Johnston family decided to go for a hunting trip in Alaska. The trip will involve renting a light plane, so it is necessary to find out the total weight of all the participants.

We start by describing the men of the family, and their relationships. We represent the men not as strings, but as enities (of type man).

  man(m), name(m : x) -> string(x).

  father(f, son) -> man(f), man(son).

  name(_ : "Abe").
  name(_ : "Bob").   name(_ : "Charlie"). name(_ : "Dave").
  name(_ : "Ed").    name(_ : "Fred").    name(_ : "George").
  name(_ : "Henry").                      name(_ : "Ike").    name(_ : "Jim").

  father(a, b),
  father(a, c), father(a, d) <- name(a : "Abe"),     name(b : "Bob"),
                                                     name(c : "Charlie"),
                                                     name(d : "Dave").
  father(b, e)               <- name(b : "Bob"),     name(e : "Ed").
  father(c, f)               <- name(c : "Charlie"), name(f : "Fred").
  father(d, g)               <- name(d : "Dave"),    name(g : "George").
  father(e, h)               <- name(e : "Ed"),      name(h : "Henry").
  father(g, i), father(g, j) <- name(g : "George"),  name(i : "Ike"),
                                                     name(j : "Jim").

The weights of everyone are as follows:

weight[m] = w -> man(m), int(w).
weight[a] = 200 <- name(a : "Abe").
weight[b] = 180 <- name(b : "Bob").
weight[c] = 170 <- name(c : "Charlie").
weight[d] = 160 <- name(d : "Dave").
weight[e] = 160 <- name(e : "Ed").
weight[f] = 150 <- name(f : "Fred").
weight[g] = 140 <- name(g : "George").
weight[h] = 100 <- name(h : "Henry").
weight[i] = 110 <- name(i : "Ike").
weight[j] = 100 <- name(j : "Jim").

We now want to compute, for each man, the combined weight of him and all his descendants. The result for great-grandfather Abe will then be the answer we are looking for.

We declare the functional predicate that maps each man to his combined weight:

// The total weight of `m` and all his descendants.
total_weight[m] = w -> man(m), int(w).

We must now find rules that will populate this predicate with the correct information.

The first step is trivial:

total_weight[m] = weight[m] + weight_of_descendants[m].

// The total weight of all the descendants of `m`.
weight_of_descendants[m] = w -> man(m), int(w).

Now the fun begins: how do we define weight_of_descendants? You are encouraged to try it for yourself before reading on.

For men who are not fathers, this is simple enough:

weight_of_descendants_of[m] = 0 <- man(m), ! father(m, _).

(Notice that we had to use man(m) in order to bind the variable outside the negation: see the section called “Negation”.)

Since none of the men has more than three sons, we might be tempted to write:

weight_of_descendants_of[m] = total_weight[s] +
                              total_weight[t] + total_weight[u]
   <- father(m, s), father(m, t), father(m, u).

A moment's reflection will show that this logic is flawed: there is nothing to prevent s, t and u from representing the same person, whose weight would thus be counted more than once. Moreover, if m has more than one son, then there are several possible ways of satisfying such a rule, and some of them will give different answers: here the LogicBlox system would help us by complaining about a functional dependency violation (FDV).

We can try to fix this:

weight_of_descendants_of[m] = total_weight[s] +
                              total_weight[t] + total_weight[u]
   <- father(m, s), father(m, t), father(m, u),
      t != s, u != s, u != t.

The result is that this rule never fires, and we get information only about the total weights of Fred, Ike, Henry and Jim (who have no sons). This is not surprising, since only Abe has three sons, and there is no rule that will compute the total weight for any of them.

A way out of this predicament is to provide a separate rule for each of the remaining cases: one son and two sons. For the case of two sons it would seem that the following should do the trick:

weight_of_descendants_of[m] = total_weight[s] + total_weight[t]
   <- father(m, s), father(m, t), s != t,
      ! (father(m, u), u != s, u != t).

The compiler does not allow such a rule (see Example 10.25).

An attempt to mechanically rewrite this to

weight_of_descendants_of[m] = total_weight[s] + total_weight[t]
   <- father(m, s), father(m, t), s != t,
      ! has_other_sons(m, s, t).

has_other_sons(m, s, t) -> man(m), man(s), man(t).
has_other_sons(m, s, t) <- father(m, u), u != s, u != t.

will bring no immediate joy. The compiler will rightly complain that variable t is unbound (indeed, variable s is unbound, too.)

If you haven't solved the problem already, this might be a good time to experiment with various approaches. We provide just one possible solution:

// The total weight of all the descendants of `m`.
weight_of_descendants_of[m] = w -> man(m), int(w).

weight_of_descendants_of[m] = 0 <- man(m), ! father(m, _).

weight_of_descendants_of[m] = total_weight[s]
   <- father(m, s), ! has_at_least_two_sons(m).

weight_of_descendants_of[m] = total_weight[s] + total_weight[t]
   <- father(m, s), father(m, t), t != s,
      ! has_at_least_three_sons(m).

weight_of_descendants_of[m] = total_weight[s] +
                              total_weight[t] + total_weight[u]
   <- father(m, s), father(m, t), father(m, u),
      t != s, u != s, u != t.


// Does `m` have at least two different sons?
has_at_least_two_sons(m) <- father(m, s), father(m, t), t != s.

// Does `m` have at least three different sons?
has_at_least_three_sons(m) <- father(m, s), father(m, t), father(m, u),
                              t != s, u != s, u != t.

Chapter 12. Aggregations

An aggregation rule applies a function across all values selected by a given formula. Aggregations have the following syntactic form:

Aggregation =
   Formula "<-" "agg" "<<" AggregationSpecifiers ">>" Formula .

AggregationSpecifiers = AggregationSpecifier { "," AggregationSpecifier }.

AggregationSpecifier =
   OutputVariable "=" Identifier "(" [ InputVariable ] ")".

OutputVariable = Identifier.
InputVariable  = Identifier.

All aggregation specifiers have a common format: an output variable followed by = and a function whose argument is an input variable (except for count, which has no argument). The input variable should be instantiated by the formula that follows the aggregation specifier.

It is worth noting that:

  • The syntax of an aggregation is such that everything that follows the >> is treated as a single formula. The formula cannot contain a disjunction. (See Example 12.5.)

  • The collections associated with input variables are not their instantiations in the proper sense of the word, and in particular they need not be sets. They are computed on the fly, which is much cheaper than materialising them and removing duplicates. See Example 12.4.

  • The formula on the left-hand side must be a conjunction of single atoms. For example, the following rule:

    total_weight[m] = weight[m] + ws
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].

    will cause a compilation error with the following diagnostic message:

    error: '(weight[m] + ws)' is a complex expression which is not allowed in the head of an aggregation (code: COMPLEX_EXP_IN_AGG_HEAD)
          total_weight[m] = weight[m] + ws
          ^^^^^^^^^^^^^^ 
  • A variable that is used as the output of an aggregation must obey certain restrictions. In particular, in the head of the rule it can appear only as a value argument of a single-valued functional predicate. If there are more atoms in the head, each of them must be a single-valued functional predicate, and each must have the output variable as the value argument.

    For example, each of the following two rules is illegal:

    total_weight[m] = w, other_weight[s] = ws
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].
    total_weight[m] = ws, other_weight(s, ws)
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].

    The following would be accepted, because ws is always in a value position in both atoms:

    total_weight[m] = ws, other_weight(s; ws)
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].
  • An aggregation rule cannot be recursive (directly or indirectly). For example, in Example 11.12 one might want to write the rule

    total_weight[m] = ws
       <- agg<< ws = total(w) >> father(m, s), w = total_weight[s].

    This would result in an error message beginning with:

    Error: Recursion through aggregation (AGGREGATE_RECURSION) in rule:
    
    Forall m::man,s::man,w::int,ws::int .
    total_weight[m]=ws <-
       agg<<ws = total(w)>>
          father(m,s),
          total_weight[s]=w.

Aggregation specifiers

The accepted forms of AggregationSpecifier are as follows:

V = count()

Counts the number of tuples in the collection defined by the given formula. If the collection is empty, then the aggregation doesn't calculate anything (i.e., it fails), so the calculated count will never be 0.

The parentheses may be omitted (unless the specifier is followed by another one, see Example 12.3).

V1 = total(V2)

Calculates the sum, V1, of the collection of values identified by V2. However, the collection of tuples computed in the body is not projected onto V2: see Example 12.4.

The items in the collection must be of a numeric type.

V1 = min(V2)

Calculates the minimum value, V1, of the collection of values identified by V2.

The items in the collection must be of an ordered type (note that false < true, and strings are ordered lexicographically).

V1 = max(V2)

Calculates the maximum value, V1, of the collection of values identified by V2.

The items in the collection must be of an ordered type (note that false < true, and strings are ordered lexicographically).

Example 12.1. A total aggregation

The following is an aggregation that sums over the the values in input, and stores the resulting value in result.

input(x) -> int(x).

result[] = x <- agg<<x = total(y)>> input(y).

Example 12.2. Counting the tuples

The following counts the number of tuples in predicate p:

count_p[] = z <- agg << z = count() >> p(_).

Recall that LogiQL has set semantics (unlike SQL, which has multiset/bag semantics), so all the tuples in a predicate are different from one another. If predicate p were given by the following facts,

p(1).
p(2).
p(3).
p(2).
p(1).

then the value of count_p[] would be 3.

Example 12.3. Multiple specifiers

A single aggregation can contain more than one specifier, as in the following query (which uses only local predicates, see Section 8.10):

query <doc>
  _p(0). _p(2). _p(4). _p(6).

  _c_p[] = c, _s_p[] = s, _m_p[] = m
     <- agg << c = count(), s = total(x), m = max(x) >> _p(x).

  _(c, s, m) <- c = _c_p[], s = _s_p[], m = _m_p[].
</doc>

(Note that in such a context count without parentheses would cause a syntactic error.)

The result will be:

/--------------- _ ---------------\
4 12 6
\--------------- _ ---------------/

Example 12.4. There is no projection onto input variables

The syntax of an aggregation is such that everything that follows the >> is treated as a single formula. So, for example,

sum[] = z <- agg<< z = total(x) >> p(x), q(_).

is equivalent to

sum[] = z <- agg<< z = total(x) >> (p(x), q(_)).

If p and q are given by

p(0).  p(1).  p(2).
q("ab").  q("abc").

then the value of sum[] will be 6, because the formula will produce the collection

{ (0, "ab"),  (0, "abc"),
  (1, "ab"),  (1, "abc"),
  (2, "ab"),  (2, "abc") } 

The summation is carried out over the first elements of the tuples, without first calculating a projection, which would be the set {0, 1, 2}.

Example 12.5. No disjunctions in an aggregated formula

Since an aggregation is computed on the fly, the associated formula must be a straightforward conjunction. For example, the rule

count_p3[] = z <- agg << z = count >> p(), (p(_); p(_)).

would be rejected by the compiler, with the message

error: disjunction is not supported in aggregation.  Try defining a separate predicate with the disjunction and use that predicate in this aggregation. (code: AGG_DISJ)

Example 12.6. A restriction on the output of aggregation

The following attempt to count the tuples in p would not be accepted:

count_p(z) <- agg<< z = count >> p(_).

The compiler would diagnose this rule as follows:

the aggregation output 'z' appears as a key in this predicate (code: AGG_KEY_OUTPUT)

It is possible for the predicate in the head of an aggregation rule to have arguments other than the one computed by the aggregation. Such variables provide a functionality reminiscent of GROUP BY in SQL. For example, imagine that the input data source contains information about the employees in each department: worksIn(employee, department), and you wish to report the count of employees in each department, and not just the count for the company as a whole. You might write your aggregation as follows:

Example 12.7. A per-group count aggregation

worksIn(employee, department) -> string(employee), string(department).

result[d] = x <- agg<<x = count()>> worksIn(_, d).

Note how the variable d acted to indicate that the counting was to be on a per-department basis. Also, we did not have to explicitly provide a named variable for the employee role, but used an anonymous variable instead.

This grouping approach also works with additional arguments and with the other aggregation functions, as demonstrated in the following example:

Example 12.8. A multi-argument total aggregation

Item(i), hasItemCode(i:c) -> string(c).
Region(r), hasRegionName(r:rn) -> string(rn).
Quarter(q), hasQuarterNr(q:qn) -> int(qn).
nrSoldOf_In_In_[i, r, q] = n -> Item(i), Region(r), Quarter(q), int(n).

// Total number sold for each item-region combination:
totalNrSoldOf_In_[i, r] = n -> string(i), string(r), int(n).
totalNrSoldOf_In_[i, r] = n
   <- agg<<n = total(qty)>> nrSoldOf_In_In_[i, r, _] = qty.

Chapter 13. Sorting

LogiQL provides two special types of rules to support the sorting of values: seq, and list. The result of sorting by seq is an array-like representation, where the nth member in the sorted set can be looked up by n. The result of sorting by list is a linked-list-like representation, where members in the sorted set must be navigated to through first- and next-like operations.

The choice between seq and list depends on which representation is more suitable for your problem.

13.1. seq

A seq<<>> rule supports the sorting of a predicate into an array-like functional predicate. Sequence rules have the following form:

Sequence =
  Atom "<-" "seq" "<<" [ Identifier "=" SortKeys ] ">>" Formula "." .

SortKeys = Identifier
         | "(" Identifier { "," Identifier } ")".

Note

The optional text between the pair of angle brackets (<<...>>) is ignored. It is supported for backward compatibility with previous versions of LogiQL.

The formula in the body must consist of one atom. The head atom must contain all the variables that occur in the body atom, and an additional integer variable that does not occur in the body: that variable must be a key in the predicate named by the atom.

If this integer key is the only key of the predicate, then the tuples of the body atom are sorted, and the key indicates the position of each value tuple in the sorted sequence (starting at 0). This is best explained by a couple of simple examples.

Example 13.1. Sorting integers

Let a be the following unary predicate of integers:

a(x) -> int(x).

We can sort the contents of a by using the following sequence clause:

a_seq[i] = x -> int(i), int(x).
a_seq[i] = x <- seq<<>> a(x).

If a contains the tuples {(20), (60), (40)}, then the evaluation of the above rule results in a_seq[0]=20, a_seq[1]=40, and a_seq[2]=60.

The predicate a_seq must be explicitly declared, as in the example above. (The name of the predicate is not important, and may be arbitrary.)

Example 13.2. Sorting a binary predicate

b(x, y) -> string(x), string(y).
b_sort(i; x, y) -> int(i), string(x), string(y).

b_sort(i; x, y) <- seq<< >> b(x, y).

If predicate b contains the tuples {("a", "ab"), ("a", "aa"), ("b", "c")}, then b_sort will contain {(0, "a", "aa"), (1, "a", "ab"), (2, "b", "c")}.

Note the semicolon in b_sort(i; x, y): b_sort is a multi-valued functional predicate (see the section called “Multi-valued functional predicates”). This example illustrates a situation in which such predicates cannot be easily avoided.

The key of the sorted predicate can consist of several variables besides the indexing variable. If it does consist of more than one variable, then the fields that correspond to variables on the left of the indexing variable are sorted, forming groups: the remaining fields within each group are sorted separately.

All this is illustrated by the following examples.

Example 13.3. Sorting and grouping

create --unique

addblock <doc>
   produce(item, kind) -> string(item), string(kind).

   produce("carrot" , "vegetable").
   produce("apple"  , "fruit"    ).
   produce("parsley", "vegetable").
   produce("melon"  , "fruit"    ).
   produce("celery" , "vegetable").
   produce("mango"  , "fruit"    ).

   items(i; x, y) -> int(i), string(x), string(y).
   items(i; x, y) <- seq <<>> produce(x, y).

   by_kind(y, i; x) -> int(i), string(x), string(y).
   by_kind(y, i; x) <- seq <<>> produce(x, y).
</doc>
echo --- produce: ---
print produce
echo --- items: ---
print items
echo --- by_kind: ---
print by_kind

close --destroy

The output is:

created workspace 'unique_workspace_2017-10-07-17-22-41'
added block 'block_4LDQPDSB'
--- produce: ---
"apple"   "fruit"
"carrot"  "vegetable"
"celery"  "vegetable"
"mango"   "fruit"
"melon"   "fruit"
"parsley" "vegetable"
--- items: ---
0 "apple"   "fruit"
1 "carrot"  "vegetable"
2 "celery"  "vegetable"
3 "mango"   "fruit"
4 "melon"   "fruit"
5 "parsley" "vegetable"
--- by_kind: ---
"fruit"     0 "apple"
"fruit"     1 "mango"
"fruit"     2 "melon"
"vegetable" 0 "carrot"
"vegetable" 1 "celery"
"vegetable" 2 "parsley"
deleted workspace 'unique_workspace_2017-10-07-17-22-41'

Example 13.4.  An illustration of the flexibility of seq

create --unique

/////////////
addblock <doc>

b(x, y, z) -> string(x), string(y), string(z).

b("a", "ab", "abc").
b("a", "aa", "bac").
b("b", "cb", "cab").
b("b", "bc", "abc").
b("b", "bc", "aaa").
</doc>
echo --- b:
print b

/////////////
addblock <doc>

c0_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
c1_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
c2_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
c3_sort(i; x, y, z)  -> int(i), string(x), string(y), string(z).
d0_sort(x, i; y, z)  -> int(i), string(x), string(y), string(z).
d1_sort(i, x; y, z)  -> int(i), string(x), string(y), string(z).
e0_sort[x, y, i] = z -> int(i), string(x), string(y), string(z).
e1_sort[x, i, y] = z -> int(i), string(x), string(y), string(z).

c0_sort(i; x, y, z)  <- seq<<>> b(x, y, z).
c1_sort(i; y, z, x)  <- seq<<>> b(x, y, z).
c2_sort(i; y, x, z)  <- seq<<>> b(x, y, z).
c3_sort(i; z, x, y)  <- seq<<>> b(x, y, z).
d0_sort(x, i; y, z)  <- seq<<>> b(x, y, z).
d1_sort(i, x; y, z)  <- seq<<>> b(x, y, z).
e0_sort[x, y, i] = z <- seq<<>> b(x, y, z).
e1_sort[x, i, y] = z <- seq<<>> b(x, y, z).
</doc>
echo --- c0_sort:
print c0_sort
echo --- c1_sort:
print c1_sort
echo --- c2_sort:
print c2_sort
echo --- c3_sort:
print c3_sort
echo --- d0_sort:
print d0_sort
echo --- d1_sort:
print d1_sort
echo --- e0_sort:
print e0_sort
echo --- e1_sort:
print e1_sort

close --destroy

The output is as follows. It might be instructive to notice the difference between c0_sort and c1_sort, as well as the similarity between d0_sort and e1_sort.

created workspace 'unique_workspace_2016-10-28-00-42-46'
added block 'block_1Z1B39FX'
--- b:
"a" "aa" "bac"
"a" "ab" "abc"
"b" "bc" "aaa"
"b" "bc" "abc"
"b" "cb" "cab"
added block 'block_1Z1DUZCQ'
--- c0_sort:
0 "a" "aa" "bac"
1 "a" "ab" "abc"
2 "b" "bc" "aaa"
3 "b" "bc" "abc"
4 "b" "cb" "cab"
--- c1_sort:
0 "aa" "bac" "a"
1 "ab" "abc" "a"
2 "bc" "aaa" "b"
3 "bc" "abc" "b"
4 "cb" "cab" "b"
--- c2_sort:
0 "aa" "a" "bac"
1 "ab" "a" "abc"
2 "bc" "b" "aaa"
3 "bc" "b" "abc"
4 "cb" "b" "cab"
--- c3_sort:
0 "aaa" "b" "bc"
1 "abc" "a" "ab"
2 "abc" "b" "bc"
3 "bac" "a" "aa"
4 "cab" "b" "cb"
--- d0_sort:
"a" 0 "aa" "bac"
"a" 1 "ab" "abc"
"b" 0 "bc" "aaa"
"b" 1 "bc" "abc"
"b" 2 "cb" "cab"
--- d1_sort:
0 "a" "aa" "bac"
1 "a" "ab" "abc"
2 "b" "bc" "aaa"
3 "b" "bc" "abc"
4 "b" "cb" "cab"
--- e0_sort:
"a" "aa" 0 "bac"
"a" "ab" 0 "abc"
"b" "bc" 0 "aaa"
"b" "bc" 1 "abc"
"b" "cb" 0 "cab"
--- e1_sort:
"a" 0 "aa" "bac"
"a" 1 "ab" "abc"
"b" 0 "bc" "aaa"
"b" 1 "bc" "abc"
"b" 2 "cb" "cab"
deleted workspace 'unique_workspace_2016-10-28-00-42-46'
 

13.2. list

A list rule can be used to sort a set of values into two result predicates, a first, and a next, thus providing a representation that is somewhat similar to a linked list.

A list rule has the following syntactic form:

List =
    Atom "," Atom "<-" "list" "<<" [ GroupBy ] ">>" Formula "." .

GroupBy = "group-by" "(" Identifiers ")" .

A list rule derives into two predicates: one representing the first tuple in the sort, and the other the next relation. If specified, the group-by results in a nested sort, wherein the variables in group-by are sorted first, and the remaining variables sorted within each value of group-by.

Example 13.5. Sorting using list without group-by

a(x) -> int(x).

first_a(x) -> int(x).
next_a(x, y) -> int(x), int(y).

first_a(x), next_a(x, y) <- list<<>> a(x).

This example illustrates the sorting of a into two predicates:

  • first_a, which contains the value of the first element in the sorted set;
  • next_a, which contains binary tuples of the form (x,y), where y is the successor of x in the sorted set.

If a contains {(20), (30), (25)}, then first_a contains {(20)}, and next_a contains {(20, 25), (25, 30)}

The optional group-by parameter can be used to perform sorting by group, in analogy with grouping in aggregate functions.

Example 13.6. Sorting with group-by

b(x, y)         -> int(x), int(y).
first_b(x, y)   -> int(x), int(y).
next_b(x, y, z) -> int(x), int(y), int(z).

first_b(x, y), next_b(x, y, z) <-
    list<< group-by(x) >> b(x, y).

Using group-by(x), the above list rule sorts only the values in the second argument position of b, for each unique x in the first argument position of b.

The predicate first_b contains sorted tuples of the form (x,y), such that

  • (x,y) belongs to b;
  • if b were sorted, (x,y) would be the first tuple in b that has x in the first position.

In other words, y is the first value in the second position in the sorted sequence of tuples that have x in the first position.

The predicate next_b contains tuples of the form (x,y,z) such that (x,z) would immediately follow (x,y) in b, if b were sorted.

If b contains {(1,2), (1,3), (1,4), (2,10), (2,11), (3,20)}, then first_b contains {(1,2), (2,10), (3,20)}, and next_b contains {(1,2,3), (1,3,4), (2,10,11)}.

It is possible to group by more than one variable. For example:

b(x, y, z)         -> int(x), int(y), int(z).
first_b(x, y, z)   -> int(x), int(y), int(z).
next_b(x, y, z, v) -> int(x), int(y), int(z), int(v).

first_b(x, y, z), next_b(x, y, z, v) <-
    list<< group-by(x, y) >> b(x, y, z).

If b contains {(1,2,0), (1,2,1), (1,3,0), (1,4,0), (1,4,1), (2,10,100)}, then first_b contains {(1,2,0), (1,3,0), (1,4,0), (2,10,100)}, and next_b contains {(1,2,0,1), (1,4,0,1)}.

Example 13.7. Sorting with multiple values

An element of the list may contain more than just one value. In the following example we group by the value of x, and additionally store pairs of numbers. next_b(x, y, z, ny, nz) should be read, roughly, as "for this value of x, the next pair after y and z is ny and nz.

b(x, y, z)              -> int(x), int(y), string(z).
first_b(x, y, z)        -> int(x), int(y), string(z).
next_b(x, y, z, ny, nz) -> int(x), int(y), string(z), int(ny), string(nz).

first_b(x, y, z), next_b(x, y, z, ny, nz) <- list<< group-by(x) >> b(x, y, z).

If b contains {(1,2,"3"), (1,3,"4"), (1,4,"5"), (2,10,"12"), (2,11,"13"), (3,20,"23")}, then first_b contains {(1,2,"3"), (2,10,"12"), (3,20,"23")}, and next_b contains {(1,2,"3",3,"4"), (1,3,"4",4,"5"), (2,10,"12",11,"13")}.

The group-by parameters, if present, must be a strict prefix of the key variables of the relation to be sorted. The following does not represent a valid use of group-by and will cause a run-time exception:

first_b(y, x), next_b(y, x, z) <-
   list<< group-by(y) >> b(x, y).

Here is another example that does not satisfy this condition:

c[x, y] = z -> int(x), int(y), int(z).
first_c(x, y, z) -> int(x), int(y), int(z).
next_c(x, y, z, u) -> int(x), int(y), int(z), int(u).

first_c(x, y, z), next_c(x, y, z, u) <-
   list<< group-by(x, y) >> c[x, y] = z.

In this case (x,y) is a prefix of the key variables (x,y) of c, but not a strict prefix. (The rule would therefore be asking the system to sort groups containing exactly one value each, which is a bit silly.)

More generally, the variable order of the first and next predicates must agree with the variable order of the predicate to be sorted. Thus, even with an empty group-by, a rule like the following would be rejected:

first(y, x), next(y, x, u, v) <- list<< >> b(x, y).

The required effect can be achieved by using an auxiliary predicate:

d(y, x) <- b(x, y).
first(y, x), next(y, x, u, v) <- list<< >> d(y, x).

Sorting Entities

Entities cannot be used as sorting variables. (They can, however, be used as group-by arguments.) Thus a rule like the following is illegal:

e(_) -> .
first(x), next(x, y) <- list<< >> e(x).

Note

The diagnostics for such an example might be somewhat difficult to interpret. They would look something like this:
Error: List P2P mapping cannot contain entity types, for P2P mapping defined at block_1Z1B37OB:22(1)--22(38):
Forall x::e,y::e .
first(x),next(x,y) <-
   list<< >>
      e(x).

Entities can be sorted indirectly via their constructor arguments.

Example 13.8. Sorting entities using their constructor predicates

e(_) -> .
cons[x] = y -> string(x), e(y).
lang:constructor(`cons).

arg(x) <- cons[x] = _.
arg:first(x) -> string(x).
arg:next(x, y) -> string(x), string(y).

arg:first(x), arg:next(x, y) <- list<< >> arg(x).

e:first(x) <- arg:first(y), cons[y] = x.
e:next(x, y) <- arg:next(u, v), cons[u] = x, cons[v] = y.

Chapter 14. Series

LogiQL provides a special form of rule to support generation of series from a given iterator function.

All series rules have the following general structure:

R(x, v) <-
    series<< v = Func<initParam>[index](value) >>
       phi(x, initParam, index, value).

where

  • <initParam> and (value) are optional (in the sense that they do not appear in all the specific forms of series rules);
  • Func is a function that generates the series;
  • phi is a formula that includes occurrences of the following variables:
    • variables that appear in the head of the rule (here schematically represented as x);
    • the initialization parameters of Func (if any);
    • variables used to index the elements of the series (here schematically represented as index);
    • variables used as arguments to the generator function, if any (here schematically represented as value).

Func, the generator function, can be thought of as a wrapper for the following two functions:

state = Func_init(initParam)
Initializes the generator state from the initial parameters.
(state', v) = Func_next(state, value)
Computes the next generator state and output from the previous state and the current value.

14.1. Semantics

The semantics of series can be described as follows. First, the body is wrapped in an auxiliary predicate:

R%tmp(x, initParam, index, value) <- phi(x, initParam, index, value).

Then we populate R via the following procedure:

for each (x, initParam, _, _) in R%tmp do:
  state := Func_init(initParam)
  for each (index, value) s.t. R%tmp(x, initParam, index, value), in sorted order, do:
    (state, v) := Func_next(state, value)
    insert R(x, v)

The outer loop goes over the various groups (if we use group-by, see below), the inner loop generates the sequences (series) of results for each group.

14.2. runtotal

Introduction

The running total aggregation computes an accumulated total over a time series. For example, the following table illustrates how the runtotal aggregation computes the total sales at a given date from a predicate that contains day-by-day sales:

dayAug 1Aug 2Aug 3Aug 4Aug 5Aug 6Aug 7Aug 8
sales1436-2802
acc. sales1581412202022

In LogiQL, the accumulative sales can be defined using the running total series aggregation as follows:

Example 14.1. Running total series

sales[day]     = t -> int(day), decimal(t).
acc_sales[day] = t -> int(day), decimal(t).

acc_sales[day] = t <-
   series<< t = runtotal[day](sls) >>  sales[day] = sls.

There is often a need to compute multiple running totals, for example separately for each location, product, or bank account. This is known as a group-by (cf. a similar mechanism in sorting, as illustrated in Example 13.6). The following LogiQL rule shows how the accumulated sales can be computed separately for each stock keeping unit:

sales[sku, day] = t -> sku(sku), int(day), decimal(t).
acc_sales_by_sku[sku, day] = t -> sku(sku), int(day), decimal(t).

acc_sales_by_sku[sku, day] = t <-
   series<< t = runtotal[day](sls) >>  sales[sku, day] = sls.

Semantically, this is equivalent to the following normal total aggregation, but the running total is computed more efficiently by not repeating the computation of intermediate totals.

day(x) -> int(x).

acc_sales[sku, day1] = t <-
   agg<< t = total(sls) >>
      sales[sku, day2] = sls,
      day2 <= day1,
      day(day2),
      day(day1).

The runtotal aggregation provides a mechanism for resetting the accumulated total at specific points in time, for example at the beginning of each month. The following extends the sales example with resets:

Example 14.2. Running total series with resets

sales_runtotal_by_sku[sku, day] = t <-
   series<<t = runtotal[day](sls) resets at reset[sku, day] = sls>>
      sales[sku, day] = sls.

Predicate reset could have any other name, of course.

Please see the section called “Detailed Usage for Reset functionality” for more information.

The following table illustrates how the reset functionality works in a simple case:

dayAug 1Aug 2Aug 3Aug 4Aug 5Aug 6Aug 7Aug 8
sales1436-2802
reset   100    
acc. sales15810098106106108

Detailed usage

The runtotal body (i.e., the part that follows >>) must have only one atom, and the atom must refer to a single-valued functional predicate whose value must be of a summable type (int, decimal, or float). All the key variables in the body atom must appear in the head of the rule.

The running total aggregation requires the time argument (day in the example) to be the rightmost argument of the predicate.

While we use the terms “time series” and “time argument” for convenience, the time argument is not required to be a datetime or to represent time.

If the runtotal predicate has key arguments other than the time argument, then the other key arguments function as a group-by (sku in the example).

The time series argument must be of a primitive type. All primitive types have a sort order (e.g., for integers 1 < 2, for strings “a” < “ab”, for datetime 2015-09-11 15:00:00 < 2015-09-11 16:00:00), and the order of these values determines the order in which values are accumulated.

Detailed Usage for Reset functionality

If the reset functionality is used, then the reset atom and the body atom must have the same signature and must use the same variable names with the same order. Adding a reset changes the computation of the accumulated total in the following way:

  1. If both the reset predicate and the body predicate have a value, then the generated value is the reset value.
  2. If reset has no value and body has a value, then the generated value is the previous generated value plus the body value.
  3. If reset has a value and body has no value, then no value is generated. Additionally, the reset value is treated as "the previous generated value" for the next generated value.
  4. If neither reset nor body have a value, then no value is generated.

The following table shows an example of the different scenarios:

dayAug 1Aug 2Aug 3Aug 4Aug 5Aug 6Aug 7
sales143 4 5
reset 2   3 
acc. sales(b) 1(a) 2(b) 5(d)(b) 9(c)(b) 8

This way of handling reset will not be correct for all applications, but its design makes it easy to change into something that is required.

For example, to include reset values in the resulting running total:

acc_sales[sku, day] = v <- intermediate[sku, day] = v.
acc_sales[sku, day] = v <- reset[sku, day] = v.

intermediate[sku, day] = t <-
   series<<t = runtotal[day](sls) resets at reset[sku, day] = sls>>
      sales[sku, day] = sls.

This formulation will not result in a functional dependency violation when the sales predicate also has an incremental value for a specific day, because of rule (a).

If the incremental value should be added to the reset value, then the reset predicate can be computed separately. For example:

reset_incr[sku, day] = reset[sku, day] + sales[sku, day].
reset_incr[sku, day] = reset[sku, day] <- !sales[sku, day] = _.

intermediate[sku, day] = t <-
   series<<t = runtotal[day](sls) resets at reset_incr[sku, day] = sls>>
      sales[sku, day] = sls.

14.3. rndnum

LogiQL provides a number of functions that allow the user to generate a collection of random numbers drawn from a particular distribution. Here is a simple example:

s3[st, w] = v -> store(st), week(w), float(v).
s3[st, w] = v <-
   series<< v = rnd_binomial<1, m, seed>[w] >>
      week(w), store(st), m = store:med[st], seed = store:number[st].

The predicate s3 contains a set of random numbers drawn from a Bernoulli distribution (binomial with n = 1) with median store:med[st] for each store st. All numbers along the time series dimension [w] will be drawn form the same distribution. In general the series function has the form v = rnd_distrname_type(x1, ..., xn, s) where x1 through xn are parameter values specific to the distribution in question, and s is an integer representing the random generator seed. The use of the seed is mandatory. The variables x1, ..., xn, s must be bound on the right hand side of the series rule.

The currently available generators for distributions are as follows:

rnd_uniform_int<min, max, seed> = z -> int(min), int(max), int(z), int(seed)
Uniform discrete distribution for the interval [min,max].
rnd_uniform_real<min, max, seed> = z -> float(min), float(max), float(z), int(seed)
Uniform distribution U(min,max).
rnd_binomial<n, p, seed> = z -> int(n), float(p), float(z), int(seed)
Binomial distribution with parameters n and p.
rnd_cauchy<mu, x0, seed> = z -> float(mu), float(x0), float(z), int(seed)
Cauchy distribution with location parameter x0 and scale parameter mu.
rnd_poisson<lambda, seed> = z -> float(lambda), int(z), int(seed)
Poisson distribution with parameter lambda.

Example 14.3. Random number generators

emp(fname, lname, serial) -> string(fname), string(lname), int(serial).
emp("TJ",    "Green",       132).
emp("Dan",   "Olteanu",     135).
emp("Todd",  "Veldhuizen",  323).
emp("Geoff", "Washburn",     41).
emp("Benny", "Kimelfeld",  5936).

sample(m) -> int(m).
sample(1).
sample(2).
sample(3).
sample(4).
sample(5).

salary[fname, lname, year, sample] = amount ->
   string(fname), string(lname), int(amount), int(year), int(sample).
salary[fname, lname, _, sample] = _ -> emp(fname, lname, _), sample(sample).

height[fname, lname, year, sample] = h ->
   string(fname), string(lname), float(h), int(year), int(sample).
height[fname, lname, _, sample] = _ -> emp(fname, lname, _), sample(sample).

// uniform int
salary[f, l, 2000, i] = s <-
   series<< s = rnd_uniform_int<500, 1000, serial>[i] >>
      emp(f, l, serial), sample(i).

// uniform real
height[f, l, 2000, i] = h <-
   series<< h = rnd_uniform_real<3f, max, serial>[i] >>
      max = 6f, emp(f, l, serial), sample(i).

// binomial
salary[f, l, 2001, i] = s <-
   series<< s = rnd_binomial<t, 0.2f, serial>[i] >>
      t = 5, p = 0.2f, emp(f, l, serial), sample(i).

Please note that the line that immediately follows the declaration of salary or height is not a declaration, but a more general constraint: see Chapter 16.

Chapter 15. Linear recursion

Basic concepts

As is well-known, the Fibonacci numbers are given by the following recipe:

  • fib(1) = 1;
  • fib(2) = 1;
  • for any i > 2, fib(i) = fib(i - 1) + fib(i - 2).

This is easily translated directly to LogiQL. We must, of course, take care to limit the range of Fibonacci numbers to be computed, as they form an infinite set. (Additionally, since we are using ordinary integers, we should be aware that the representation of fib(93) would already exceed 63 bits and show up as a negative number.)

Example 15.1. Simple recursion (Fibonacci numbers)

create --unique

addblock <doc>
  fib[i] = n -> int(i), int(n).

  fib[1] = 1.
  fib[2] = 1.
  fib[i] = fib[i - 1] + fib[i - 2] <-  2 < i <= 20.
</doc>
print fib

close --destroy

As expected, this yields the following results:

created workspace 'unique_workspace_2017-10-07-18-21-35'
added block 'block_4LDQPDSB'
1  1
2  1
3  2
4  3
5  5
6  8
7  13
8  21
9  34
10 55
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
deleted workspace 'unique_workspace_2017-10-07-18-21-35'

Such a recursive predicate is computed quite quickly for this simple example, but in general recursive computations may be unacceptably slow. This is because the predicate is produced bottom-up during the process of maintenance (see Section 19.4.1). Given the facts fib(1) and fib(2), the LogicBlox system uses the recursive rule to produce fib(3). The contents of a predicate have changed, so this triggers another round of maintenance, producing fib(4), which triggers yet another round...

If the recursive predicate is large, or if there are large predicates that are dependent on the recursive one, each such round of maintenance may involve significant computation. (For example, all interim versions of predicates are materialised, new samples are taken to construct indices, the differences between each predicate and its previous version are computed...)

Unlike the general recursive rules in Example 11.12, the rule for computing fib has a very special form: each value depends only on a strictly limited number of immediately preceding values, and this limited number is known in advance. Another well-known function with this property is the factorial: n! = n (n - 1)!.

Even though defined recursively, such functions are routinely computed without applying recursion, by using a technique that is sometimes called a "sliding window". The function creates a list of results, and the computation of each new member of the list can be based upon k most recent results, where k is an integer constant that is determined at compile time. One can think of the computation in terms of a list chasing its own tail.

LogiQL provides an efficient mechanism for computing such functions, which are often referred to as "linear recursive".

Note

Apart from being grammatically suspect (shouldn't it be linearly recursive?), the term is used somewhat loosely. One widely-recognised "definition" is: A linear recursive function is a function that only makes a single call to itself each time the function runs. This very imprecisely formulated quasi-definition would presumably describe the factorial, but not the Fibonacci function.

Before describing the mechanism in more detail, let us take a quick peek at how it can be used in our example.

Example 15.2. Simple "linear recursion" (Fibonacci numbers)

create --unique

addblock <doc>

  fib[i] = n -> int(i), int(n).

  first(i)   -> int(i).
  next(i, n) -> int(i), int(n).

  first(1).
  next(i, n) <- int:range(1, 19, 1, i), n = i + 1.

  fib[_] = _ <-
     linear_recursion<<
        lang:pragma:baseCase(`first).
        lang:pragma:recursiveCase(`next).

        fib[i1] = 1 <- first(i1).
        fib[i2] = 1 <- first(i1), next(i1, i2).
        fib[ n] = i <- i = fib[s1] + fib[s2], next(s2, s1), next(s1, n).
     >> first(_), next(_, _).
</doc>
print fib

close --destroy

A more readable version of the same logic uses functional predicates for first and next:

create --unique

addblock <doc>

  fib[i] = n -> int(i), int(n).

  first[] = i -> int(i).
  next[i] = n -> int(i), int(n).

  first[] = 1.
  next[i] = i + 1 <- int:range(1, 19, 1, i).

  fib[_] = _ <-
     linear_recursion<<
        lang:pragma:baseCase(`first).
        lang:pragma:recursiveCase(`next).

        fib[first[]]       = 1.
        fib[next[first[]]] = 1.
        fib[next[next[s]]] = fib[next[s]] + fib[s].
     >> _ = first[], _ = next[_].
</doc>
print fib

close --destroy

In both cases the results are the same as in Example 15.1.

As we see from the example, the "linear recursive" predicate is defined by means of a construct somewhat similar to agg (Chapter 12) or list (Section 13.2).

The recursive predicate (which must be a single-valued functional predicate) is mentioned in the head of a rule whose premise (right-hand side) is of the form linear_recursion<<...>> body, where body is a formula that must mention two auxiliary predicates: one that defines the first key (the "base case"), and one that defines the progression of keys (the "recursive case"). In Example 15.2 these predicates happen to be named first and next, respectively.

Each atom in the body of linear_recursion should reference a materialised predicate (in particular, it should not be a built-in such as =). Its arguments should all be variables.

The recursive predicate cannot be local.

The contents of the linear_recursion<<...>> construct must begin with declarations of the two auxiliary predicates, as shown in the example. This is followed by the rules that actually define the value of the recursive predicate for various keys, either in terms of the "base case", or in terms of "earlier" keys, where the progression of keys is defined by the "recursive case".

The names of the variables in the body are often ignored. In our example it would be OK to have a body like

     _ = next[F], F = first[].

or the equivalent

     _ = next[first[]].

However, the variable that is the value argument of the "recursive case" should not have other occurrences in the body.

If the body contains atoms that refer to any other predicates, their arguments are treated as "group-by" variables: see below.

The variables in the head are essentially ignored, except when it is impossible to give them types.

We recommend the style used in our examples, i.e., that all these variables be made anonymous, except when there is a good reason to name them.

Entity keys, grouping

The keys may be entities: the "recursive case" predicate will effectively order them. The "base case" predicate and the "recursive case" predicate may have additional arguments that are used for grouping (similar to the one described in Section 13.1 and illustrated in Example 13.3). These additional arguments must satisfy the following requirements:

  • They must precede the arguments used for determining the first key and the key sequence.

  • They must all be of entity types.

  • The sequence of extra arguments in the "base case" predicate must be of the same length as the sequence of extra arguments in the "recursive case" predicate, and the corresponding extra arguments from both predicates must be of identical types.

  • For the mandatory occurrences of the "base case" predicate and the "recursive case" predicate in the body of the linear_recursion construct, the extra arguments must be variables that are pairwise identical in both predicates.

These requirements sometimes force us to write code that is a little more convoluted than we would like, as illustrated by FirstDayForFruit and NextDayForFruit in the following example.

Example 15.3. "Linear recursion" with entities and grouping

The example illustrates the structure of a toy application that sums up the sales of each fruit in our friendly neighbourhood fruit stall. The linear_recursion construct is used to compute accumulated sales separately for each kind of fruit, thanks to the additional grouping argument f in the body of the construct.

create --unique

addblock <doc>

  // Entity types.
  Fruit(f), FruitName(f : nm) -> string(nm).

  FruitName(_ : "apple").
  FruitName(_ : "avocado").
  FruitName(_ : "persimmon").

  WeekDay(d), WeekDayName(d : nm) -> string(nm).

  WeekDayName(_ : "Monday").     WeekDayName(_ : "Tuesday").
  WeekDayName(_ : "Wednesday").  WeekDayName(_ : "Thursday").
  WeekDayName(_ : "Friday").

  // The succession of days in a week.
  firstDay[] = Mon,
  nextDay[Mon] = Tue, nextDay[Tue] = Wed, nextDay[Wed] = Thu, nextDay[Thu] = Fri
  <-  WeekDayName(Mon : "Monday"),    WeekDayName(Tue : "Tuesday"),
      WeekDayName(Wed : "Wednesday"), WeekDayName(Thu : "Thursday"),
      WeekDayName(Fri : "Friday").

  // How many pounds of each fruit sold in each day of the week?
  Sales[fruit, day] = weight -> Fruit(fruit), WeekDay(day), decimal(weight).

  Sales[apple    , Mon] = 100d,
  Sales[apple    , Tue] = 150d,
  Sales[apple    , Wed] = 120d,
  Sales[apple    , Thu] =  90d,
  Sales[apple    , Fri] =  10d,
  Sales[persimmon, Mon] =   0d,
  Sales[persimmon, Tue] =  10d,
  Sales[persimmon, Wed] =  20d,
  Sales[persimmon, Thu] =  15d,
  Sales[persimmon, Fri] = 100d
  <- FruitName(apple : "apple"), FruitName(persimmon : "persimmon"),
     WeekDayName(Mon : "Monday"),    WeekDayName(Tue : "Tuesday"),
     WeekDayName(Wed : "Wednesday"), WeekDayName(Thu : "Thursday"),
     WeekDayName(Fri : "Friday").

  // Somewhat silly auxiliaries for grouping in `linear_recursion`.
  FirstDayForFruit[f]   = firstDay[]  <- Fruit(f).
  NextDayForFruit[f, d] = nextDay[d]  <- Fruit(f).

  // Accumulated sales of this fruit by the end of this day.
  AccSales[fruit, day] = weight -> Fruit(fruit), WeekDay(day), decimal(weight).

  AccSales[_, _] = _  <- linear_recursion <<
     lang:pragma:baseCase(`FirstDayForFruit).
     lang:pragma:recursiveCase(`NextDayForFruit).

     AccSales[f,  d] = Sales[f, d] <- d = FirstDayForFruit[f].
     AccSales[f, nd] = AccSales[f, d] + Sales[f, nd]
       <- nd = NextDayForFruit[f, d].

  >> FirstDayForFruit[f] = _, NextDayForFruit[f, _] = _.

  // How many pounds of each fruit sold in a week?
  WeeklySales[fruit] = AccSales[fruit, Fri] <- WeekDayName(Fri : "Friday").

</doc>
echo SALES
print Sales
echo WEEKLY SALES
print WeeklySales

close --destroy

The output is:

created workspace 'unique_workspace_2018-01-05-16-31-49'
added block 'block_4LDQPDSB'
SALES
[10000000007] "apple"     [10000000000] "Tuesday"   150
[10000000007] "apple"     [10000000001] "Thursday"  90
[10000000007] "apple"     [10000000003] "Wednesday" 120
[10000000007] "apple"     [10000000004] "Friday"    10
[10000000007] "apple"     [10000000006] "Monday"    100
[10000000013] "persimmon" [10000000000] "Tuesday"   10
[10000000013] "persimmon" [10000000001] "Thursday"  15
[10000000013] "persimmon" [10000000003] "Wednesday" 20
[10000000013] "persimmon" [10000000004] "Friday"    100
[10000000013] "persimmon" [10000000006] "Monday"    0
WEEKLY SALES
[10000000007] "apple"     470
[10000000013] "persimmon" 145
deleted workspace 'unique_workspace_2018-01-05-16-31-49'

Additional features

The rules that can appear within the angle brackets in a linear_recursion constructs must be rather simple. For example, they can only perform straightforward lookup: all the necessary joins must be carried out in the body. A more severe restriction is that the variables used to extract data from other predicates must also appear in the body. This is one of the reasons we were forced to define FirstDayForFruit and NextDayForFruit in our example.

Example 15.4. An incorect variant of Example 15.3.

If we try to replace the linear_recursion construct in our example by a variant that seems more natural, to wit,

  AccSales[_, _] = _  <- linear_recursion <<
     lang:pragma:baseCase(`firstDay).
     lang:pragma:recursiveCase(`nextDay).

     AccSales[f,  d] = Sales[f, d] <- d = firstDay[].
     AccSales[f, nd] = AccSales[f, d] + Sales[f, nd]
       <- nd = nextDay[d].

  >> firstDay[] = _, nextDay[_] = _.

we will get an error message

Error: Expected all reads from non-local predicates to be variables equivalent to P2P body variables, got AccSales[f,d]=__a in rule Forall __a::decimal,d::WeekDay,f::Fruit .
AccSales[f,d]=__a <-
   firstDay[]=d,
   Sales[f,d]=__a.

There is a more direct way to get around this particular difficulty. Rules within the angle brackets of a linear_recursion construct can refer to the "current" version of a body variable V by means of current:V[]. Moreover, the grouping variables can be specified in a separate atom in the body. So the rule for AccSales can also be written as in the following example.

Example 15.5. Using current: and a separate grouping predicate

  AccSales[_, _] = _   <- linear_recursion <<
     lang:pragma:baseCase(`firstDay).
     lang:pragma:recursiveCase(`nextDay).

     AccSales[f,  d] = Sales[f, d] <- f = current:f[], d = current:day[].
     AccSales[f, nd] = AccSales[f, d] + Sales[f, nd]
       <- f = current:f[], d = current:day[], nd = nextDay[d].

  >> Fruit(f), firstDay[] = day, nextDay[_] = _.

There is even a feature that allows us to replace current: with something else, e.g., magic:. All that is needed is another pragma at the beginning of the part in angular brackets:

    lang:pragma:prefix(`magic).

Finally, one should be aware that the same linear_recursion construct can be used to populate more than one recursive predicate. For example, our simple program for the neighbourhood fruit stall can be extended to compute the income for each fruit, based on a daily price.

Example 15.6. Linear recursion with two recursive predicates

Here are the relevant fragments of Example 15.3, as modified in Example 15.5, but with an additional recursive predicate. In this example the two recursive predicates share rules (inside the angular brackets), but in general this need not be the case.

  // How many pounds of each fruit sold in each day of the week?
  // What was the daily price?
  Sales[fruit, day] = weight -> Fruit(fruit), WeekDay(day), decimal(weight).
  Price[fruit, day] = price  -> Fruit(fruit), WeekDay(day), decimal(price).

  Sales[apple    , Mon] = 100d,   Price[apple    , Mon] = 0.50d,
  Sales[apple    , Tue] = 150d,   Price[apple    , Tue] = 0.60d,
  Sales[apple    , Wed] = 120d,   Price[apple    , Wed] = 0.70d,
  Sales[apple    , Thu] =  90d,   Price[apple    , Thu] = 0.90d,
  Sales[apple    , Fri] =  10d,   Price[apple    , Fri] = 1.10d,
  Sales[persimmon, Mon] =   0d,   Price[persimmon, Mon] = 1.50d,
  Sales[persimmon, Tue] =  10d,   Price[persimmon, Tue] = 2.00d,
  Sales[persimmon, Wed] =  20d,   Price[persimmon, Wed] = 2.50d,
  Sales[persimmon, Thu] =  15d,   Price[persimmon, Thu] = 2.50d,
  Sales[persimmon, Fri] = 100d,   Price[persimmon, Fri] = 1.00d
  <- FruitName(apple     : "apple"),
     FruitName(persimmon : "persimmon"),
     WeekDayName(Mon : "Monday"),    WeekDayName(Tue : "Tuesday"),
     WeekDayName(Wed : "Wednesday"), WeekDayName(Thu : "Thursday"),
     WeekDayName(Fri : "Friday").

  // Accumulated sales of/income for this fruit by the end of this day.
  AccSales [fruit, day] = weight -> Fruit(fruit), WeekDay(day), decimal(weight).
  AccIncome[fruit, day] = amount -> Fruit(fruit), WeekDay(day), decimal(amount).

  AccSales[_, _] = _, AccIncome[_, _] = _   <- linear_recursion <<
     lang:pragma:baseCase(`firstDay).
     lang:pragma:recursiveCase(`nextDay).

     AccSales [f, d] = Sales[f, d],
     AccIncome[f, d] = Sales[f, d] * Price[f, d]
         <- f = current:f[], d = current:day[].

     AccSales [f, nd] = AccSales [f, d] + Sales[f, nd],
     AccIncome[f, nd] = AccIncome[f, d] + Sales[f, nd] * Price[f, nd]
         <- f = current:f[], d = current:day[], nd = nextDay[d].

  >> Fruit(f), firstDay[] = day, nextDay[_] = _.

  // How many pounds of each fruit sold in a week, and what was the income
  // for that fruit?
  WeeklySales [fruit] = AccSales [fruit, Fri] <- WeekDayName(Fri : "Friday").
  WeeklyIncome[fruit] = AccIncome[fruit, Fri] <- WeekDayName(Fri : "Friday").

Caveats

It should be noted that the instantiations of the grouping variables in the recursive rule are determined "behind the scenes", which may sometimes seem a little surprising. For example, if the recursive rule in Example 15.3 is replaced with

     AccSales[f, nd] = AccSales[ff, d] + Sales[f, nd]
       <- nd = NextDayForFruit[f, d].

then f and ff will effectively be treated as the same variable.

In the interest of protecting the unwary, in some cases the system will not allow such usage. For example, if we wrote

  AccSales[f, nd] = AccSales[ff, d] + AccSales[f, d] + Sales[f, nd]
    <- nd = NextDayForFruit[f, d].

then the computation would terminate with

Error: Can't have two different key variables in the same position of
two occurrences of a recursive predicate in the same rule.
The variables are `f` and `ff`.
The current atom is: `AccSales[f,d]=__b`.
The current rule is:
Forall __e::decimal,d::WeekDay,f::Fruit,ff::Fruit,nd::WeekDay .
AccSales[f,nd]=__e <-
   Exists __a::decimal,__b::decimal,__c::decimal,__d::decimal .
      NextDayForFruit[f,d]=nd,
      decimal:add[__c,__d]=__e,
      decimal:add[__a,__b]=__c,
      AccSales[ff,d]=__a,
      AccSales[f,d]=__b,
      Sales[f,nd]=__d.

Developers of large applications should be aware that rules within the angle brackets access predicates only for lookup, so the usual optimisation known as "leapfrog tree join" does not apply.

Another thing to be aware of is that the list-chase operation carried out within the linear_recursion construct is applied to every grouping body join, even if there is logic inside the body that would restrict the number of actual groups to something far smaller. This can be circumvented by factoring out the grouping predicate, as in Example 15.5. Here is how we can avoid computing AccSales for those kinds of fruit that were not sold at all during this week:

  FruitSold(f) <- Fruit(f), Sales[f, _] != 0d.

  AccSales[_, _] = _   <- linear_recursion <<
     lang:pragma:baseCase(`firstDay).
     lang:pragma:recursiveCase(`nextDay).

     AccSales[f,  d] = Sales[f, d] <- f = current:f[], d = current:day[].
     AccSales[f, nd] = AccSales[f, d] + Sales[f, nd]
       <- f = current:f[], d = current:day[], nd = nextDay[d].

  >> FruitSold(f), firstDay[] = day, nextDay[_] = _.

This trick can, of course, be applied to more interesting cases, where there is more than one grouping variable and the variables come from a join of two or more predicates.

Note

The implementation of the linear_recursion construct is somewhat brittle, and is based on a number of assumptions that might sometimes seem unnatural. To give just one instance, in Example 15.2 we cannot replace fib[next[first[]]] = 1. with the equivalent fib[2] = 1.: this would cause an error.

If you want to apply a pattern that is different from the ones shown here, we recommend that you test it on a small example.

Chapter 16. Constraints

A LogiQL constraint is a language construct used to express an invariant property of a program. There are two types of constraints in LogiQL:

  • those that are statically checked by the compiler to always hold for any database instance;
  • those that are dynamically checked at runtime: a violation of such constraints will cause a transaction to abort.

The statically checked constraints are predicate declarations; in order to be recognized by the compiler as such, they must follow the specific format described in Section 16.3. Dynamic constraints are enforced whenever the state of the database changes: see Chapter 19.

When the system detects a constraint failure, the diagnostic message may be somewhat difficult to read. See the section called “Internal use of pulse predicates” for a short explanation and a concrete example.

16.1. Syntax and Interpretation

Constraints are sometimes called right-arrow rules because they most often take the following form:

Constraint = Formula "->" Formula "." .

Note that the arrow (->) points to the right.

The interpretation of a constraint is that whenever the formula on the left-hand-side of the arrow holds, the formula the right-hand-side of the arrow must hold as well.

A constraint of the form f1 -> f2 can thought of as equivalent to !(f1, !f2) (i.e., it is not the case that f1 holds, but f2 does not). You may see variants of the latter form in runtime error reports issued when a constraint violation is detected.

16.2. Common Forms of Constraints

The following table lists common forms of constraints:

NameDescriptionExample
Equality constraint A restriction to ensure that the population of one or more predicate roles must be equal to the population of other roles (in this predicate or in another predicate). Patient p has a diastolic blood pressure reading if and only p has a systolic blood pressure reading.
diastolicBPof[p] = _ -> systolicBPof[p] = _.
systolicBPof[p] = _ -> diastolicBPof[p] = _.
Exclusion constraint A restriction on two or more roles to ensure that no tuple may instantiate more than one of those roles at the same time. No person authors and reviews the same book.
reviews(p, b) -> !authors(p, b).
Inclusive-Or constraint A restriction on two or more roles played by instances of a common type to ensure that each instance of that type plays at least one of those roles. Each valued employee is industrious or intelligent.
ValuedEmployee(p) -> isIndustrious(p) ; isIntelligent(p).
Exclusive-Or constraint A restriction on two or more roles played by instances of a common type to ensure that each instance of that type plays exactly one of those roles. Each person is male or female but not both.
Person(p) -> isMale(p) ; isFemale(p).
isMale(p) -> !isFemale(p).
Frequency constraint A restriction on a list of one or more roles to ensure that, at any given time, each instance in the population of that role list appears there a specified number of times. Each reviewer is assigned at most three papers to review.
positiveNrPapersAssignedTo[r] = n <-
    agg<<n = count()>> isAssigned(r, _).
positiveNrPapersAssignedTo[_] = n -> n <= 3.
Uniqueness constraint A restriction on a list of one or more roles to ensure that, at any given time, each instance in the population of that role list appears there at most once. (This is an important special case of a frequency constraint.) Each passport number is held by at most one person.
passportNrOf[p1] = n, passportNrOf[p2] = n ->
    p1 = p2.
Mandatory role constraint A restriction on a single role of a predicate to ensure that each instance in the population of the role’s type must play that role. Each person was born on some date.
Person(p) -> birthdateOf[p] = _.
Ring constraint A logical constraint on two type-compatible arguments of a predicate. Kinds of ring constraints include irreflexivity, asymmetry, intransitivity and acyclicity. No person is her/his own parent. (This is an example of an irreflexivity constraint.)
!isParentOf(p, p).
Subset constraint A restriction to ensure that the population of one or more predicate roles must be a subset of the population of other roles. If student s passed course c then s was enrolled in c.
passed(s, c) -> enrolledIn(s, c).
Value constraint A restriction on a role that specifies what values can populate that role. (This may, but need not, take the form of an Inclusive-Or constraint.) Possible gender codes are “M” and “F”.
hasGenderCode(_:gc) -> gc = "M" ; gc = "F".

Example 16.1. Constraints for a partial order

In the first block we introduce constraints that ensure the predicate leq is a partial order. (We assume leq can be defined only on a subset of items, which makes for a more interesting reflexivity constraint.)

In the exec block we add two tuples explicitly, and use delta rules to add those that must be there to satisfy the constraints.

create --unique

addblock <doc>

  item(x), item:nr(x:n) -> int(n).         // an entity type

  item:nr(_:n) <- int:range(0, 9, 1, n).   // populated

  // A partial order among items:
  leq(x, y) -> item(x), item(y).

  leq(x, _) ; leq(_, x) -> leq(x, x).      // reflexive
  leq(x, y) -> !(leq(y, x), x != y).       // antisymmetric
  leq(x, y), leq(y, z) -> leq(x, z).       // transitive

</doc>

exec <doc>
  // Introduce ordering among some items:
  +leq(x, y) <- item:nr(x:1), item:nr(y:2).
  +leq(x, y) <- item:nr(x:0), item:nr(y:1).

  // Ensure reflexivity and transitivity:
  +leq(x, x) <- +leq(x, _) ; +leq(_, x).
  +leq(x, z) <- +leq(x, y), +leq(y, z).
</doc>
print leq

close --destroy

The results are:

created workspace 'unique_workspace_2017-06-05-21-37-48'
added block 'block_1Z1FGZTM'
[10000000004] 1 [10000000004] 1
[10000000004] 1 [10000000007] 2
[10000000005] 0 [10000000004] 1
[10000000005] 0 [10000000005] 0
[10000000005] 0 [10000000007] 2
[10000000007] 2 [10000000007] 2
deleted workspace 'unique_workspace_2017-06-05-21-37-48'

If we comment out, say, the last rule in the exec block, we will see:

Error: Constraint failure(s):
block_1Z1FGZTM:6(3)--6(35):
    false <-
      Exists x::item,y::item,z::item .
         leq(x,y),
         leq(y,z),
         !(
            leq(x,z)
         ).
(1) x=[10000000005],y=[10000000004],z=[10000000007]

16.3. Constraints as Predicate Declarations

As described in Chapter 8, predicate declarations have the syntactic form of constraints. A constraint is considered a predicate declaration if it satisfies the following specific requirements:

Left-hand side of the right-arrow symbol

The left-hand side of the right-arrow (->) determines what predicates are being declared. It must be one of the following:

  • a single atom, in which case the predicate of that atom is being declared;
  • two atoms, in which case an entity is being declared along with its reference mode predicate.

If the left-hand side is a single atom, then every argument of the atom must be a distinct variable.

If it is a conjunction of two atoms, one atom must declare the entity predicate, and it must be of the form p(x) for some predicate p and variable x. The other atom must declare the reference-mode predicate, and it must be of the form q(x : id), where x is the variable that is used in the first atom and id is the reference mode variable (see below).

Right-hand side of the right-arrow symbol

The right-hand-side of the right-arrow (->) may be either empty, or a conjunction of atoms.

If the left-hand side is a single unary atom, then the right-hand side is allowed to be empty: the constraint is then a top-level entity declaration.

If the left-hand side is a single atom and the right-hand side is not empty, then every variable that appears on the left-hand side must also appear in a unary atom on the right-hand side: each such unary atom forms the type bound of the variable it contains.

If the left-hand side contains a conjunction of two atoms, meaning that it declares an entity and its reference mode predicate, then the right-hand side must contain exactly one unary atom that provides a type bound for the reference mode variable.

All variables that appear on the right-hand side of the constraint must also appear on the left-hand side.

It is possible to have more than one declaration for the same predicate. Those multiple declarations can be exact duplicates, or some of them can be more specific than others, or they can each provide different information about the predicate.

Example 16.2. Examples of predicate declarations

earnings(r, a) -> region(r), int(a).
expenditures(r, _) -> region(r).
expenditures(_, a) -> int(a).
expenditures(_, a) -> int(a), a >= 0.
person(x), person:eid(x:id) -> int(id).
region(r) -> area(r).
area(r) -> .
success() -> .

Note that even though all of the above constraints can be used to determine the types of predicates, some of them cannot be completely statically guaranteed. For instance, the compiler cannot guarantee statically that each expenditures tuple has a number greater than or equal to 0 as its second value. This type of constraint is checked at runtime.

Chapter 17. Typing

LogiQL programs are type checked. The compiler determines a type for every argument of every predicate, every variable and expression. These types are used by the compiler to verify that the types are used consistently, thus helping programmers to avoid making certain kinds of mistakes. Types are also used by the runtime system for performance optimizations.

17.1. Predicate Type Inference

Every predicate that is used in a program must have either a predicate type declaration or at least one rule from which the compiler can infer its type. If the program includes type declarations for a predicate, then those type declarations must specify a type for each argument to the predicate.

For any predicate that has no type declaration, a type will be inferred for it from rules that use that predicate in their right-hand side. Predicate type inference attempts to choose the most specific type for the predicate that will allow the program to be correctly typed. This attempt often succeeds, but not always, and even if it succeeds, it might choose types other than the ones you intended. If you want to be certain, then supply a type declaration.

Example 17.1. 

In the following program, the declaration of predicate parentof specifies each argument to be a person (an entity type). Predicate ancestorof does not have a type declaration, so its type is inferred from the rules. The inferred type in this case is that ancestorof has two arguments, each of which is also a person.

person(x) -> .
parentof(x, y) -> person(x), person(y).

ancestorof(x, y) <- parentof(x, y).
ancestorof(x, z) <- ancestorof(x, y), parentof(y, z).

17.2. Type checking

Once the compiler has determined the entity types and has determined the types of all predicates, it uses those types to apply a variety of checks for common errors in the program's rules and constraints.

One kind of check is type consistency. If a variable appears more than once in a formula, then the types associated with those variable occurrences must bear a subtyping relationship.

Example 17.2. 

In this example, variable b is used both as a person and as an integer. These types are not in a subtyping relationship, so the code has a type consistency error.

person(x) -> .
parentof(x, y) -> person(x), person(y).
likes_number(x, y) -> person(x), int(y).

p(a) <- parentof(a, b), likes_number(a, b). // Type consistency error

Another check the system performs is that the head of a rule (and therefore the tuples generated by the rule) will not violate the type of the head's predicate.

Example 17.3. 

In this example there is an attempt to assert that each each instantiation of variable a is a mother. However, the programmer accidentally mixed up the variables in the body of the rule. Since the motherof predicate is declared to have only females in its first argument, the code has a TYPE_TOO_BIG error.

person(x) -> .
female(x) -> person(x).
parentof(x, y) -> person(x), person(y).
motherof(x, y) -> female(x), person(y).

motherof(a, b) <- parentof(a, b), female(b). // TYPE_TOO_BIG error

This last example illustrates that the compiler's type checking is approximate. The programmer may ensure that all possible values for variable a correspond to females, in which case the program is technically type correct. However, the compiler has no way of knowing this, so it makes a conservative guess that a violation might possibly occur. In this way it can guarantee that all compiled programs are type correct, even if some type correct programs do not compile.

Chapter 18. Default Values

Each predicate in a LogicBlox database consists of a set of tuples. In a functional predicate, all values but one in a tuple define a key that is unique among all the tuples in the predicate. For example, a sales predicate could define a functional mapping from combinations of skus, stores, and days to the count of a product sold in a store on a given day. Example tuples of the form (sku, store, week, sales) in the sales predicate could be

“Sweater”,      “Atlanta-Midtown”,  “20150704”,   3
“T-shirt”,      “Atlanta-Midtown”,  “20150708”,  12
“Windbreaker”,  “Athens-North”,     “20150708”,   1
“T-shirt”,      “Athens-North”,     “20150702”,   4

All the possible key combinations for a functional predicate define the set of possible tuples the predicate can hold: namely those that associate some value with a valid key combination. However, not all possible key combinations must have a corresponding value. Some or all of the possible tuples can be missing from the database (for example, there is no value in the above data for “Sweater” in the “Athens-North” store on any given day). The meaning of a missing tuple is typically specific to each predicate, and depends on the requirements of the application. The set of possible tuples for a predicate that are not missing from the database is often informally referred to as the set of populated facts in the predicate.

LogiQL rules will often combine predicates using conjunctions. These conjunctions will use the intersection of populated fact keys in the predicates to produce a set of resulting tuples (i.e., an inner join, for those with relational database experience). In practice, this can mean that a LogiQL calculation doesn’t produce data that might be expected in the business context. Application developers must consider how conjunctions in rules and missing tuples in predicates affect the results produced by the rules. As an example, if the returns predicate has the following two tuples of the form (sku, store, week, returns)

“T-shirt”,  “Atlanta-Midtown”,  “20150708”,  1
“T-shirt”,  “Athens-North”,     “20150702”,  2

then the conjunction of the above sales tuples with these returns tuples contains only those tuples that have the same keys in both predicates (the tuples are of the form (sku, store, week, sales, returns)):

“T-shirt”,  “Atlanta-Midtown”,  “20150708”,  12, 1
“T-shirt”,  “Athens-North”,     “20150702”,   4, 2

In past LogicBlox releases (i.e., 3.x), the concept of a “default value” of a predicate has been used to help manage these issues. LogicBlox version 4 releases have recently reintroduced the default value concept. In the remainder of this chapter we discuss the use of default values in more detail.

18.1. Net Sales Example

Consider a simple retail example where net_sales is computed by subtracting returns from sales. An implementation of this in LogiQL might look like the following:

// Define sku, store, and day entities (used as key types for other predicates)
sku(sk), sku_id(sk:id) -> string(id).
store(st), store_id(st:id) -> string(id).
day(d), day_id(d:id) -> string(id).

// Define sales, returns, and net_sales data predicates
sales[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
returns[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
net_sales[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).

// Compute net_sales from sales and returns
net_sales[sk, st, d] = sales[sk, st, d] - returns[sk, st, d].

The net_sales rule above will only produce values for [sku, store, day] key combinations that have both a sales and a returns value. To get a little insight into the reasons for this, it might help to look at a slightly more verbose, but equivalent, rule for computing net_sales:

net_sales[sk, st, d] = ns <-
   sales[sk, st, d] = s,
   returns[sk, st, d] = r,
   ns = s - r.

This rule can be read as “Assign net_sales for a particular [sku, store, day] to the value ns where ns is s - r AND s is the sales value for the [sku, store, day] AND r is the returns value for the [sku, store, day]”. The conjunctions (ANDs) in LogiQL cause net_sales values to be produced for the intersection of the [sku, store, day] keys in both sales and returns. In other words, for those with traditional relational database experience, the LogiQL database engine is doing an inner join between sales and returns. If a [sku, store, day] combination has a sales value but no returns value (or vice versa), then the intersection will not contain that [sku, store, day] and no net_sales value for it will be produced.

To see this, put the above rules into a file called sales.logic, and create another file called load_data.logic that contains

+sku(sk), +sku_id[sk] = "sku_1".
+sku(sk), +sku_id[sk] = "sku_2".

+store(st), +store_id[st] = “store_A”.
+store(st), +store_id[st] = “store_B”.

+day(d), +day_id[d] = "20150601".
+day(d), +day_id[d] = "20150602".
+day(d), +day_id[d] = "20150603".

^sales[sk, st, d] = 10.0d <-
   sku_id[sk] = "sku_1",
   day_id[d] = "20150601",
   ( store_id[st] = "store_A"
   ; store_id[st] = "store_B"
   ).

^returns[sk, st, d] = 2.0d <-
   sku_id[sk] = "sku_1",
   store_id[st] = "store_A",
   day_id[d] = "20150601".

Then execute the following commands:

lb create --overwrite /defval
lb addblock -f sales.logic /defval
lb exec -f load_data.logic /defval
lb print /defval sales
lb print /defval returns
lb print /defval net_sales

The output should be similar to the following (same number of rows and same values produced, the key indices in brackets might differ):

$ lb print /defval sales
[10000000005] "sku_1" [10000000004] "store_A" [10000000007] "20150601" 10.00000
[10000000005] "sku_1" [10000000006] "store_B" [10000000007] "20150601" 10.00000
$ lb print /defval returns
[10000000005] "sku_1" [10000000004] "store_A" [10000000007] "20150601" 2.00000
$ lb print /defval net_sales
[10000000005] "sku_1" [10000000004] "store_A" [10000000007] "20150601" 8.00000 

In this example, the sales predicate has values for the key combinations of [sku_1, store_A, 20150601] and [sku_1, store_B, 20150601]. The returns predicate has only one value for the key combination of [sku_1, store_A, 20150601]. The intersection (inner join) between the keys of sales and returns is [sku_1, store_A, 20150601], which means that net_sales will only have one value. This is counterintuitive from a business perspective, and most likely not what was intended.

The net_sales calculation would make more sense if a missing returns or missing sales was treated as if the value were zero, producing a net_sales value if either sales OR returns had a value for a particular [sku, store, day] key. This desired behavior would use the union (outer join for those with relational database experience) of sales and returns to determine what net_sales keys should contain values. This can be accomplished in LogiQL either by using disjunctive rules or by setting default values for the three predicates involved in the calculation. Both approaches are discussed in more detail below.

18.2. Disjunctive Solution

The rules in sales.logic can be altered as follows, using disjunction to tell the database engine how to compute a net_sales value if the value for either sales OR returns is missing for any [sku, store, day] key that is in the union of existing sales and returns keys.

// Define sku, store, and day entities (used as key types for other predicates)
sku(sk), sku_id(sk:id) -> string(id).
store(st), store_id(st:id) -> string(id).
day(d), day_id(d:id) -> string(id).

// Define sales, returns, and net_sales data predicates
sales[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
returns[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
net_sales[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).

// Compute net_sales from sales and returns, filling in missing
// values with zeros
net_sales[sk, st, d] = sales[sk, st, d] - returns[sk, st, d].
net_sales[sk, st, d] = sls - ret <-
   returns[sk, st, d] = ret,
   !sales[sk, st, d] = _,
   sls = 0.0d.
net_sales[sk, st, d] = sls - ret <-
   sales[sk, st, d] = sls,
   !returns[sk, st, d] = _,
   ret = 0.0d.

After changing the sales.logic file, execute the following commands again

lb create --overwrite /defval
lb addblock -f sales.logic /defval
lb exec -f load_data.logic /defval
lb print /defval sales
lb print /defval returns
lb print /defval net_sales

to see the output expected originally

$ lb print /defval sales
[10000000006] "sku_1" [10000000000] "store_A" [10000000005] "20150601" 10.00000
[10000000006] "sku_1" [10000000003] "store_B" [10000000005] "20150601" 10.00000
$ lb print /defval returns
[10000000006] "sku_1" [10000000000] "store_A" [10000000005] "20150601" 2.00000
$ lb print /defval net_sales
[10000000006] "sku_1" [10000000000] "store_A" [10000000005] "20150601" 8.00000
[10000000006] "sku_1" [10000000003] "store_B" [10000000005] "20150601" 10.00000

Writing disjunctive rules like this is reasonable for a toy example, but is both tedious and error prone for real applications that contain many thousands of rules. It is especially problematic for rules whose bodies reference many predicates that might not all have values for the same keys, as one must then consider all the predicate combinations to determine the correct behavior. Moreover, this approach entails potentially large storage and performance penalties: see Section 18.4.

18.3. Default Value Solution

A better approach is to set default values for the sales, returns, and net_sales predicates. A predicate with a default value has a tuple for every combination of key values (i.e., it is a total function). In this case, specifying that each of these predicates has a default value of zero effectively means that all [sku, store, day] key combinations with missing values in the examples above will now have values of zero. Since values are defined for all [sku, store, day] key combinations, the intersection (inner join) of sales and returns in the net_sales rule will now have the same effect as the union (outer join), producing a net_sales tuple for every possible [sku, store, day] combination. Note that the LogicBlox database stores only non-default tuples: this can lead to significant space savings (see Section 18.4).

To use default values in the net_sales example, change the sales.logic file to contain lang:defaultValue directives as follows

// Define sku, store, and day entities (used as key types for other predicates)
sku(sk), sku_id(sk:id) -> string(id).
store(st), store_id(st:id) -> string(id).
day(d), day_id(d:id) -> string(id).

// Define sales, returns, and net_sales data predicates
sales[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
lang:defaultValue[`sales] = 0.0d.

returns[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
lang:defaultValue[`returns] = 0.0d.

net_sales[sk, st, d] = v -> sku(sk), store(st), day(d), decimal(v).
lang:defaultValue[`net_sales] = 0.0d.

// Compute net_sales from sales and returns
    net_sales[sk, st, d] = sales[sk, st, d] - returns[sk, st, d].

Execute the same set of commands as before.

lb create --overwrite /defval
lb addblock -f sales.logic /defval
lb exec -f load_data.logic /defval
lb print /defval sales
lb print /defval returns
lb print /defval net_sales

The output should be the same as for the disjunctive example above.

$ lb print /defval sales
[10000000004] "sku_1" [10000000005] "store_A" [10000000001] "20150601" 10.00000
[10000000004] "sku_1" [10000000007] "store_B" [10000000001] "20150601" 10.00000
$ lb print /defval returns
[10000000004] "sku_1" [10000000005] "store_A" [10000000001] "20150601" 2.00000
$ lb print /defval net_sales
[10000000004] "sku_1" [10000000005] "store_A" [10000000001] "20150601" 8.00000
[10000000004] "sku_1" [10000000007] "store_B" [10000000001] "20150601" 10.00000 

Note that the tuples that have a default value are not printed, because they would usually be very numerous.

18.4. Storage and Performance Implications

For good performance, is important that:

  • the database does not store tuples that have a default value, for example returns values of zero, and
  • the database does not do unnecessary computations over default values, for example subtracting returns tuples with value zero from sales tuples with value zero to compute net_sales of zero.

The LogicBlox database does not physically store every logical value that may be contained in a predicate with a default value. A retail application could have millions of skus, thousands of stores, and thousands of days. If there is only one returns value for one sku at one store on one day, it makes sense to physically store only one tuple and not consume space for the missing values for [sku, store, day] key combinations. For this reason, the database only stores the non-default tuples. This kind of storage is often described as sparse storage. Apart from saving space, sparse storage has significant performance benefits: when evaluating a rule, the system need not consider billions of possible keys with default values.

In the disjunctive example above, both sales and returns consume space only for the non-default tuples. But there are extra disjunctive rules which insert zero values for any missing sales or returns values. These rules will cause the net_sales predicate to be fully populated (i.e., it will have a physically stored value for every possible key combination). This could not only take up a lot of disk space but could also introduce performance and storage problems for other rules that refer to net_sales.

For predicates with a default value, the LogicBlox database uses sparse storage: it physically stores only values that are different from the default value. For the example in Section 18.3, the sales, returns, and net_sales predicates will store only non-zero values, so these predicates will tend to consume much less disk space and take less computation time in rules. When executing rules whose bodies refer to default valued predicates, the system constructs a logical view of the predicate where “missing” values (those not physically stored) are replaced by the default value as needed. The optimization challenge for the database is to minimize the usage of default value tuples as much as possible.

Note that mixing predicates with and without default values can result in storage explosion similar to the one we saw in the disjunctive example. If sales and returns have a default value of zero, but net_sales does not have a default value defined, then the net_sales = sales - returns rule will end up fully populating the net_sales predicate. The rule can be written to filter out the zeros in net_sales, but it is best to consistently use or not use default values in application predicates related to each other via LogiQL rules. A predicate without a default value can usually be efficiently computed from predicates with default values by filtering out the virtual default values from the predicates used in the calculation. For example, consider a predicate called yesterday_sales that doesn’t have a default value and is computed from the sales predicate that has a default value of zero. Zero values in the sales predicate can be filtered out and excluded from the yesterday_sales values by adding an s != 0 conjunct, as in the rule below:

yesterday_sales[sk, st, yesterday] = s <-
   s = sales[sk, st, d], yesterday=day:previous[d], s != 0.

The evaluation optimization doesn’t currently apply when a condition like s != 0 is used on values produced by a rule: the condition is only effective when used on values that are inputs to the rule. For example, the following would not necessarily be evaluated very efficiently:

net_sales[sk, st, d] = ns <-
   ns = sales[sk, st, d] - returns[sk, st, d], ns != 0.

18.5. Consistent Default Values

The default values for predicates that are used in LogiQL rules must be consistent . For example, if the default value of sales is 3 and the default value of returns is 1, the default value of net_sales must be 2 (because net_sales = sales - returns). The LogiQL compiler will report errors for rules that use predicates with incompatible default values. This is not a major restriction since the most common default values (zero for numeric predicates, false for boolean predicates, empty strings) will work as expected in most cases.

18.6. Data Updates

A predicate with a default value logically has a tuple for every possible key combination, so insertion and retraction operations do not make sense. Instead, always use upsert operations on such predicates. Upserting the default value is equivalent to retraction if the previous value was not the default. For example:

// insert a new value or change an existing value
^sales[sk, st, d] = 10.0d <-
   sku_id[sk] = "sku_1",
   day_id[d] = "20150601",
   store_id[st] = "store_A”.

// clear (retract) a value by updating to the default value
^sales[sk, st, d] = 0.0d <-
   sku_id[sk] = "sku_2",
   day_id[d] = "20150601",
   store_id[st] = "store_B".

18.7. Caveats

A default value cannot be defined for predicates whose keys include those with primitive types (int, float, decimal, string, etc.). This is because the database system must have a finite set of potential key values for a default-valued predicate. Remember that a predicate with a default value has a logical value for every possible key combination. If such a predicate had a key type with an infinite set of possible values (like the primitive types), rules that refer to the predicate would have to logically consider an infinite number of tuples.

A few operations are not currently supported (or are not efficient) for predicates with default values:

  • min and max aggregations over predicates with default values are not implemented efficiently;
  • count aggregations are generally a problem when used on predicates with default values, and are better written as multiplications of count aggregations on mappings to higher aggregation levels. Count aggregations that exclude the default value (e.g., have F[x] != 0 in the body) are efficient.

Also note that you cannot change a default value once it has been set. Default values are specified at predicate declaration time and are fixed from then on.

Chapter 19. Transaction Logic

A LogicBlox database contains data (the contents of predicates, plus some internal information) and "logic" (declarations, rules and constraints, in compiled form).

As is usual in other database systems, changes to the database are performed in units called transactions. A transaction is a series of actions such as deletion of old data, addition of new data and logic, evaluation of rules that update the data, and verification that the data violates no constraints. (The last two kinds of actions are automatically initiated and performed by the system.)

If any of these actions result in an error (e.g., detection of an inconsistency, or an error detected during compilation of new logic), then the transaction is aborted, and the state of the database is not changed: it is as if the transaction never occurred. If there is no error, the state of the database produced by the transaction is known to be consistent, and the transaction may be committed (i.e., the new state of the database may be made to become the permanent one).

Transactions in LogicBlox are somewhat complicated: in particular, each transaction consists of several separate stages. To effectively use the LogicBlox system one must have at least a basic understanding of these complications. This chapter is an introduction to the topic.

In order to make the chapter more self-contained, we begin with a preliminary introduction to the various concepts necessary for appreciating the workings of a transaction. The latter are described in Section 19.4.

Some of the preliminary information is intended to help you experiment with the system. We recommend that you try to construct and execute some simple examples, perhaps basing them loosely on the examples in this manual. Even a little hands-on experimentation can go a long way towards gaining more confidence with the system (as well as towards uncovering gaps and ambiguities in the knowledge that you gained from just reading the manual).

19.1. Preliminaries

19.1.1. The lb tool

There are several direct and indirect ways to load and execute (evaluate) "logic", i.e., a "program" written in LogiQL. For the purposes of this chapter we will assume that the user accesses the LogicBlox server via the interactive mode of a command-line tool called lb (see Chapter 33). The tool is invoked by typing lb in your terminal. It is often most convenient to put lb commands in a file, for example file.lb (the suffix .lb is mandatory). The commands in the file can then be executed by writing the following on the command line:
lb file.lb

The list of all commands can be obtained by invoking lb -h, but since it is long and somewhat confusing, we will briefly describe the most essential ones below.

19.1.2. Workspaces

A workspace is essentially an instance of the LogicBlox database. The user can fill it with data and rules, modify its contents, evaluate commands within the context so created, etc. All these things cannot be done outside a particular workspace, so you will have to create one in order to experiment with LogiQL.

The lb tool provides the following basic commands for manipulating workspaces:

create name

Creates and opens a workspace with the given name. If the workspace is to be treated as just a temporary scratchpad, you can replace name with --unique. (Please note that even then the workspace will persist if it is not explicitly destroyed.)

close

Close the current workspace. Use close --destroy to also delete it.

close --destroy is particularly useful if the current workspace is a temporary scratchpad one, as it will be assigned a long name. (However, be aware that such a workspace will persist if the close command has not been reached: this may happen, for instance, when the execution is aborted because of an error.)

open name

Open an existing workspace with this name.

delete name

Delete the workspace with this name.

The easiest way to learn more about how to use each of these commands is to request help information directly from the tool. For example, to find out about the options for delete we could write

lb delete -h

These are not all the commands that manipulate workspaces. To see a complete list of commands invoke lb -h from the command line.

19.1.3. Blocks

A declaration, rule, fact, etc., is not processed or compiled by itself, but as a part of a larger unit called a block. The division of logic into blocks is carried out by the user, and can be pretty much arbitrary. There is, however, a facility for declaring a predicate as local to the block, i.e., invisible in other blocks (see Section 8.10).

Blocks can contain only logic, not lb commands such as close or print.

A little more information about blocks can be found in Section 19.1.6.

A block can be thought of as a unit of compilation, but is not, in general, a unit of execution (see Section 19.4.3).

19.1.4. Loading and executing logic

Most of the commands listed below take a block as an argument (see Section 19.1.3). The text of the block (which often consists of a number of lines) can be enclosed in single quotes (i.e., apostrophes: '), or -- equivalently -- put between <doc> and </doc>. We will follow the latter convention in our examples.

The following commands should suffice for running the simplest examples. (As always, use lb command -h for details, and lb -h for the complete list of commands.)

print name

Print the contents of the named predicate.

echo text

Print the rest of the line. Useful, e.g., for making the output of print more self-explanatory.

addblock block

Add the block to the current workspace.

The logic in the block usually has database lifetime (see Section 19.1.5). See Section 19.1.6 for the exception.

exec block

Execute the block within the contents of the current workspace near the beginning of the transaction (see Section 19.4).

If the block declares any predicates, they must be local, i.e., their names must begin with underscores. They will not be accessible from outside the block, and will be discarded at the end of the transaction. (See Section 8.10.)

All the facts, constraints and rules in the block (if any) must refer to local predicates. Apart from that, the block may contain only delta logic (Section 19.2).

The logic in the block has transaction lifetime (see Section 19.1.5).

query block

A variant of exec that is executed at the very end of a transaction (see Section 19.4). Execution of the block may modify the database, but these modifications are strictly temporary, and disappear after execution is terminated (even if it is terminated by abortion).

The limitations noted for exec apply.

It is often convenient to precede or follow the block argument with one or more arguments of the form --print name to print the contents of the named predicate(s).

The logic in the block has query lifetime (see Section 19.1.5).

By default, each of these lb commands is treated as a separate transaction. It is, however, easy to enclose several of them in the same transaction: see Example 19.11.

Example 19.1. A command or query cannot declare a non-local predicate

Suppose the file non-local-in-exec.lb contains the following text (see Section 19.2 for the meaning of + before a predicate name):

create --unique

addblock <doc>
  p(x) -> int(x).
</doc>

exec <doc>
  q(x) -> int(x).

  +q(x) <- int:range(0, 3, 1, x).

  +p(x) <- q(x).
</doc>

print p

close --destroy

An attempt to execute the file will result in the following:

> lb non-local-in-exec.lb
created workspace 'unique_workspace_2016-02-08-21-50-37'
added block 'block_1Z38MU1A'
block block_1Z3DEKTT: line 1: error: every predicate declared in a command or query must be local: 'q' (code: NON_LOCAL_PREDICATE_DECLARATION)
q(x) -> int(x).
^^^^

1 ERROR 

The error message tells us to rename q to _q. The example will then run as expected:

> lb non-local-in-exec.lb
created workspace 'unique_workspace_2016-02-08-21-50-56'
added block 'block_1Z38MU1A'
0
1
2
3
deleted workspace 'unique_workspace_2016-02-08-21-50-56' 

Please note that _q is local to the block passed to exec, so it cannot be mentioned directly as an argument to the print command. This would result in the following message from the compiler:

error: Could not find predicate _q 

As noted above (in Section 19.1.3), print _q cannot appear directly in the block passed to exec.

19.1.5. The notion of "lifetime"

Depending on how a block is brought into the system, it (or the logic contained in it) is said to have one of three possible lifetimes:

database lifetime

This term applies to logic that is permanently installed in the workspace and survives the transaction (unless the transaction is aborted).

For example, the lb command addblock installs database-lifetime logic (except when the command is given the additional argument --inactive: see Section 19.1.6).

Database-lifetime logic is often referred to by the shorter term installed logic.

Note

"Permanently" installed logic can be removed by an explicit command: for example, the removeblock command in lb.

transaction lifetime

This term applies to logic that is available throughout the transaction, but is not permanently installed in the workspace and does not survive the transaction. Its execution may, however, have lasting effects on database-lifetime predicates.

For example, the lb command exec temporarily installs transaction-lifetime logic.

query lifetime

This term applies to logic that is available only during the execution of queries, is not permanently installed in the workspace and does not survive the transaction.

Unlike transaction-lifetime logic, query-lifetime logic is executed at the very end of a transaction and has access to all its effects; moreover, any modifications to the database performed during the execution of query-lifetime logic are strictly temporary and do not survive the transaction.

For example, the lb command query temporarily installs query-lifetime logic.

See Section 19.4 for more details about how and when a transaction handles logic of different lifetimes.

19.1.6. Inactive Blocks

Apart from the "normal" blocks described above, the LogicBlox system supports inactive blocks. These are somewhat similar to "precompiled queries" of SQL.

An inactive block is a block that is processed ahead of time, and stored in compiled form as a persistent part of the database. It can be activated on demand, i.e., executed as transaction-lifetime or query-lifetime logic. Activating the block installs and executes it at the requested lifetime, but it is then discarded at the end of the transaction. The block persists in inactive form as a part of the database, ready to be activated again and again.

To install an inactive block through the lb tool, one can exeute the command lb addblock --name block_name --inactive. Although this is the addblock command, the logic will not have database lifetime, so -- just like in the case of exec and query -- the block can declare only local predicates.

In order to execute an inactive block one can use the command execblock block_name.

We defer an example to the end of the section that introduces delta logic (which must be used in the example). See Example 19.8.

19.2. Delta logic

Facts (Section 10.1.1) and IDB rules (Chapter 11) can be used to populate only intensional (IDB) predicates. Extensional (EDB) predicates (see Section 8.8) are populated by "EDB logic" which is often referred to as deltas, or delta logic. This section is an introduction to the topic of deltas.

Note

It is important to remember that EDB predicates must be manipulated by EDB logic, and IDB predicates cannot be manipulated by EDB logic. The system will raise an error if this rule is violated. The diagnostic message may sometimes be a little confusing: if you write a single IDB rule for an EDB predicate with many delta rules, the IDB rule may take precedence, and the message will begin with something like

error: predicate 'union' is a derived predicate (intensional, IDB) and should not be used in the head of a delta rule (extensional, EDB).

19.2.1. Direct manipulation of EDB predicates

Note

The explicit modification operations presented below are suitable only for relatively minor changes to the database. The reason is that each such operation is internally translated into a so-called frame rule (see Section 19.4.6). The frame rule will then be used by the general mechanism for updating predicates during maintenance (see Section 19.4.1).

If the number of frame rules becomes too large, efficiency suffers. So if you want to insert more than several hundred tuples, you should use CVS/TDX import instead (see Chapter 27).

A fact such as age("Mary", 7). is a static declaration that predicate age will always contain the tuple ("Mary", 7).

EDB predicates must support deletion of data, so if age is an EDB predicate, we must be able to insert or delete a particular tuple. The notation for this is called delta atoms, and it is quite intuitive:

+age("Mary", 7).

Insert the tuple ("Mary", 7) into predicate age. Do nothing if the tuple is already there.

-age("Mary", 7).

Delete the tuple ("Mary", 7) from predicate age. Do nothing if the tuple is not there.

Please note that the arguments must be fully instantiated, i.e., none of them may contain an unbound variable. (See the section called “Bound variables and their instantiations”.)

If age is a functional predicate, we can also use the forms +age["Mary"]=7. and -age["Mary"]=_.

For a functional predicate it is often convenient to update information associated with a given key (e.g., on Mary's birthday), so we have a third possibility:

^age["Mary"] = 8.

If predicate age contains a tuple whose key is "Mary", delete that tuple. Then insert the tuple ("Mary", 8).

This operation is often called an upsert ("update or insert"). It will amount to a simple insertion if there was no tuple to be deleted.

It is sometimes convenient (e.g., in the compiler's error messages) to use the term "delta" to denote one of the three prefixes introduced above.

Deletion of tuples from a functional predicate

To delete a tuple from a functional predicate you must provide only the keys. For example, to delete information about Mary's age in our running example, we write just

-age["Mary"] = _ . 

An attempt to provide the value would be treated as an error, regardless of whether the value is correct or not. This may be a little surprising at first, but is in fact both logical and convenient.

Deletion of tuples from an entity predicate

Just like insertion, deletion of tuples from an entity predicate must be performed via its associated constructor predicate or refmode predicate.

Example 19.2. Deletion from an entity predicate with a constructor

create --unique

addblock <doc>
  person(p) -> .
  person_by_name[name] = p -> string(name), person(p).
  lang:constructor(`person_by_name).

  person(p), person_by_name[nm] = p <- name(nm).

  name(nm) -> string(nm).
</doc>

exec <doc>
  +name("Jay").
  +name("Jane").
</doc>

echo person:
print person
echo person_by_name:
print person_by_name

exec <doc>
  -name("Jay").
</doc>

echo person:
print person
echo person_by_name:
print person_by_name

close --destroy

The result is:

created workspace 'unique_workspace_2016-06-29-21-40-59'
added block 'block_1Z1C3B7J'
person:
[10000000004]
[10000000005]
person_by_name:
"Jane" [10000000005]
"Jay"  [10000000004]
person:
[10000000005]
person_by_name:
"Jane" [10000000005]
deleted workspace 'unique_workspace_2016-06-29-21-40-59' 

Example 19.3. Deletion from an entity predicate with a refmode

In the example below, -person_has_name(_ : "Jay"). can also be written in the form -person_has_name[_] = "Jay".

create --unique

addblock <doc>
  person(p), person_has_name(p : nm) -> string(nm).
</doc>

exec <doc>
  +person(p), +person_has_name(p : "Jay").
  +person(p), +person_has_name(p : "Jane").
</doc>

echo person:
print person
echo person_has_name:
print person_has_name

exec <doc>
  -person_has_name(_ : "Jay").
</doc>

echo person:
print person
echo person_has_name:
print person_has_name

close --destroy

The result is:

created workspace 'unique_workspace_2016-06-29-21-55-07'
added block 'block_1Z1C3AAW'
person:
[10000000004] "Jay"
[10000000005] "Jane"
person_has_name:
[10000000004] "Jay"  "Jay"
[10000000005] "Jane" "Jane"
person:
[10000000005] "Jane"
person_has_name:
[10000000005] "Jane" "Jane"
deleted workspace 'unique_workspace_2016-06-29-21-55-07'

19.2.2. Delta rules

EDB predicates can also be manipulated by delta rules. A delta rule is similar to an IDB rule, but each of the head atoms is prefixed with a delta, i.e., +, - or ^ (the latter only in the case of functional predicates).

The meaning of these prefixes is as described in Section 19.2.1: a head atom in a delta rule generates only insertions, only deletions, or upserts.

The prefixes can also be used for some or all of the atoms in the body. It is convenient to think of +p as a predicate that contains all the tuples that were requested to be inserted into predicate p by the current transaction; -p would contain all the tuples that were requested to be deleted from p by the transaction. A body atom such as ^f[x]=y is treated as equivalent to +f[x]=y. See Section 19.4 for more details. (Please note the difference between a request for an insertion or deletion and an actual insertion or deletion. A request for an insertion will not result in an insertion if the tuple is already present in the database; a request for deletion will not result in a deletion if the tuple is not present in the database.)

Note

  • The above is just an approximation of the truth. If the name of a delta atom in a body refers to an entity predicate, a refmode predicate or a constructor predicate, then not every request for insertion will be accessible through that atom: you will not see requests that would result in the creation/addition of an entity that already exists.

  • The runtime system does actually construct so-called delta predicates to keep information about the requests for insertions and deletions. The internal names of these predicates are somewhat different from +p or -p. These delta predicates are pulse predicates. See Section 19.3.1 and the section called “Stage suffixes in auxiliary internal predicates”.

  • If you are thinking about using delta atoms in the body of a rule, you might want to consider whether external diff predicates would not be more appropriate for your application. (See Section 8.11.)

Such delta atoms are also allowed in the body of an IDB rule, but only within transaction-lifetime or query-lifetime logic (e.g., within a block that is executed by exec or query). The IDB rule must derive into a local predicate. (See Section 19.1.5 and Section 8.10.)

Example 19.4. Simple delta rules

p(x) -> int(x).
q(x) -> int(x).
r(x) -> int(x).

+q(x) <- +p(x).
+r(x) <- -p(x). 

If the rules above are the only rules for q and r, then q will contain the set of all the integers that have ever been inserted into p, while r will contain the set of all the integers that have ever been requested to be deleted from p. (The word ever should be interpreted as since q and r have been declared.)

(If p is an IDB predicate, then q will include all that has been added to p after q has been declared, including explicitly declared facts of p. r will be empty, of course.)

Installed delta rules

A delta rule can appear in a transaction-lifetime or query-lifetime block (e.g., if the block is an argument to exec or query). A delta rule can also be installed as database-lifetime logic (e.g., when it appears in a block that is an argument to addblock).

There is an important requirement that must be satisfied by every database-lifetime delta rule: its body must contain at least one delta atom (we say that the rule is guarded by that delta atom).

The rationale for this is as follows: a rule such as

+p(x) <- q(x). 

actually derives into the delta predicate +p, which is a pulse predicate (see Section 19.3.1). At the end of a transaction all pulse predicates must be cleared of all contents, which would be inconsistent with the rule if q were not empty. By introducing a "guard" (such as +r(x) in the rule below) we ensure that, as pulse predicates are made empty, the guard will not hold, so the rule will derive nothing and consistency will be preserved.

+p(x) <- q(x), +r(x). 

Example 19.5. Installed delta rules must be guarded

addblock <doc>

p(x) -> int(x).
q(x) -> int(x).
r(x) -> int(x).
s(x) -> int(x).

+p(x) <- q(x), (r(x) ; +s(x)).

</doc> 

The delta rule triggers the following error message:

error: Installed delta rules must be guarded by a delta or pulse predicate in the body of the rule. (code: DELTA_UNGUARDED)
    +p(x) <- q(x), (r(x) ; +s(x)).
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

This is, of course, because a rule with a disjunction is equivalent to several rules (see Example 10.18).

To satisfy the requirement, we must rewrite the rule into one of the following forms:

+p(x) <- q(x), (+r(x) ; +s(x)).

+p(x) <- +q(x), (r(x) ; s(x)).

+p(x) <- +q(x), (+r(x) ; +s(x)).

+p(x) <- +q(x), (r(x) ; +s(x)).

+p(x) <- +q(x), (+r(x) ; s(x)). 

19.2.3. Insertions vs. deletions

If insertions and deletions to the same predicate are carried out at the same stage of a transaction (Section 19.4), then the deletions are always done first.

More precisely: information about insertions and deletions is gathered and compared in advance, and deletions that would be "undone" by insertions are not carried out (without even paying the cost of accessing the contents of the predicate to see whether the relevant tuples are there).

So for example, as far as the effect on predicate q is concerned, the pair of rules

+q(x)  <- +p(x).
-q(x)  <- +p(x). 

has exactly the same effect as any one of the following rules:

+q(x) <- +p(x).
+q(x), -q(x) <- +p(x).
-q(x), +q(x) <- +p(x). 

The effect would not be exactly the same, as the requests for deletion are noted, and may affect the effects of any rule that has -q(x) in its body.

Example 19.6. Insertions "take precedence"

What will be the printout produced by the following?

create --unique

//------------------
addblock <doc>

p(x) -> int(x).

</doc>

//------------------
exec <doc>

+p(x) <- int:range(0, 5, 1, x).
-p(x) <- int:range(0, 5, 2, x).

</doc>

print p

close --destroy 

One might expect p to contain only the odd integers between 0 and 5, but it will in fact contain all the integers between 0 and 5.

19.2.4. Delta logic is more imperative than logical

It must be noted that "delta logic" is a bit of a misnomer. Unlike IDB rules, delta rules have no obvious logical interpretation, at least not a static one, as their effect may very much depend on the particulars of the various changes applied to the contents of the database, rather than on their overall effect.

For example, the presence of an (active) IDB rule such as

p(x) <- q(x).

ensures that the contents of predicate p will always include the contents of predicate q. However, no such conclusion can be drawn from the delta rule

+p(x) <- +q(x).

After all, the user is free to delete some elements from p without removing them from q.

Example 19.7. The difference between IDB rules and EDB (delta) rules

The following file has a mixture of IDB predicates and EDB predicates. In this example, once we declare set1 and set2, the IDB rules provide enough information about the types of the IDB predicates, so we do not have to declare them explicitly. For EDB predicates and delta rules such type inference is not currently supported, so union must be declared.

The EDB predicates set1 and set2 are populated by the first exec command. Recall that these delta rules could not have been installed by the first block, because they are not guarded by delta atoms in the body (see the section called “Installed delta rules”).

The second exec command modifies set1 and set2 via explicit deletions and insertions.

create --unique

//------------------
addblock <doc>

set1(x)  -> int(x).  // intended to be EDB
set2(x)  -> int(x).  // intended to be EDB
union(x) -> int(x).  // intended to be EDB

+union(x) <- +set1(x) ; +set2(x).       // EDB

intersection(x) <- set1(x), set2(x).    // IDB

symdiff(x) <- set1(x), ! set2(x) ; set2(x), ! set1(x).  // IDB

</doc>

//------------------
exec <doc>  // populate set1 and set2

+set1(x) <- int:range(0, 5, 1, x).      // yes, EDB
+set2(x) <- int:range(3, 7, 1, x).      // yes, EDB

</doc>

//------------------
exec <doc>  // modify set1 and set2

-set1(0).
-set1(5).
-set2(5).
+set2(10).

</doc>

//------------------
echo SET1:
print set1
echo SET2:
print set2
echo UNION:
print union
echo INTERSECTION:
print intersection
echo SYMDIFF:
print symdiff

close --destroy 

Execution results in the following printout:

created workspace 'unique_workspace_2016-02-11-20-20-25'
added block 'block_1Z38MVCB'
SET1:
1
2
3
4
SET2:
3
4
6
7
10
UNION:
0
1
2
3
4
5
6
7
10
INTERSECTION:
3
4
SYMDIFF:
1
2
6
7
10
deleted workspace 'unique_workspace_2016-02-11-20-20-25' 

Notice that the rules for intersection and symdiff cause all the changes in set1 and set2 to be correctly tracked. However, the delta rule for union is not fired by deletions, so the predicate contains what we would consider obsolete values. If we wanted union to be an EDB predicate, and yet to always contain the union of the two sets, we would have to add the following two delta rules:

-union(x) <- -set1(x), ! set2(x).    // track deletions
-union(x) <- -set2(x), ! set1(x).    // track deletions

This illustrates quite clearly the advantages of declarative IDB logic. However, there are situations where EDB logic cannot be dispensed with, as already shown in Example 19.4, where we introduced predicates that log all the changes made to another predicate.

Example 19.8. A simple inactive block

The following example illustrates the use of inactive blocks (see Section 19.1.6). It is a simple variant of Example 19.7.

Suppose that we have two predicates, set1 and set2, that will be modified heavily during a long computation. After that these predicates will remain unchanged, and the rest of the computation will access a predicate that contains their symmetric difference.

In order to avoid unnecessary recomputation of the symmetric difference whenever the sets are modified, we might choose to make symdiff an EDB predicate, and populate it only when it is finally needed. This could be accomplished by declaring the logic that computes symdiff as an inactive block, and activating the block at the right moment. The listing below shows the pattern.

create --unique

//---------------
addblock <doc>

set1(x)    -> int(x).
set2(x)    -> int(x).
symdiff(x) -> int(x).  // symmetric difference

</doc>

//---------------
addblock --name SymDiff --inactive <doc>

+symdiff(x) <- set1(x), ! set2(x) ; set2(x), ! set1(x).

</doc>

//---------------
exec <doc>
+set1(x) <- int:range(0, 5, 1, x).
+set2(x) <- int:range(3, 7, 1, x).
</doc>

echo SYMDIFF 1:
print symdiff

execblock SymDiff    // <----------

echo SYMDIFF 2:
print symdiff

close --destroy 

Execution will yield the result shown below. We see that symdiff will become populated only after the inactive block is activated (by execblock).

created workspace 'unique_workspace_2016-02-15-04-22-51'
added block 'block_1Z38MUEE'
added block 'SymDiff'
SYMDIFF 1:
SYMDIFF 2:
0
1
2
6
7
deleted workspace 'unique_workspace_2016-02-15-04-22-51' 

19.3. Events

19.3.1. Pulse predicates

A pulse predicate is a special kind of EDB predicate whose contents always have transaction lifetime, even though the predicate itself may have database lifetime. In other words, the predicate will always be made empty at the end of a transaction.

Pulse predicates are often used to propagate information about events, and are sometimes referred to as event logic.

For example, a GUI may interact with the LogicBlox database by inserting a tuple into a pulse predicate whenever the user clicks on a button. This may trigger evaluation of some rules, which in turn may use pulse predicates to trigger other computations.

To create a pulse predicate pp one must declare the type of pp in the usual fashion, and then add

lang:pulse(`pp). 

or

lang:isPulse[`pp] = true. 

A pulse predicate pp can usually be accessed only via delta atoms of the form +pp(...) (where the three dots represent arguments, if any). An exception is made for event rules (see Section 19.3.2).

As an EDB predicate, a pulse predicate can have its values changed only through delta operations. Moreover, since it is empty prior to the current transaction, there is seldom a need to delete tuples. If pp is a pulse predicate, then the presence of a body atom of the form -pp(...) is probably a programming error.

Internal use of pulse predicates

In Section 19.2.2 we mentioned "delta predicates" that are created and maintained by the system to keep information about requests for insertions and deletions. The delta predicates are pulse predicates. (See also the section called “Stage suffixes in auxiliary internal predicates”.)

The runtime system also uses a pulse predicate to support checking of constraints. A constraint such as

A(x) -> x > 2. 

is translated to a rule that looks, roughly, like this:

system:constraint_fail(...) <- A(x), !(x > 2), ... . 

A constraint failure can now be detected during the general process of evaluating rules. (See Section 19.4.1.)

Example 19.9. Diagnostics for a constraint failure

Consider the following example:

create --unique

addblock <doc>

  age[name] = years -> string(name), int(years).
  age[_] = years -> 0 <= years < 150.

</doc>

exec <doc>

  +age["John"] = 151.

</doc>

close --destroy 

The result of running this is an error with a diagnostic message that begins like this:

Error: Constraint failure(s):
block_1Z38I0AT:2(1)--2(34):
    false <-
      Exists __a::string,__b::int,__c::int,years::int .
         age[__a]=years,
         !(
            Exists ___b3::int .
               int:le_2(__b,___b3),
               int:eq_2(years,___b3),
               int:lt_2(___b3,__c)
         ),
         int:eq_2(__b,#0#),
         int:eq_2(__c,#150#).
(1) __a="John",__b=0,__c=150,years=151 

This is a somewhat user-friendly presentation of the kind of rule that was described above. Since the user is not interested in the pulse predicate system:constraint_fail, it is presented as false.

The variables whose names begin with double underscores are generated internally by the system. Information about the types of variables is provided after the double colons, and the implicit quantifiers are shown.

int:le_2 etc. are standard LogiQL comparison operations which are normally written as the infix operators <= etc. (see Section 10.2).

Literal values appear between pound marks (#).

The entire formula can thus be read as

false <-
  Exists a, b, c, years such that
     age[a] = years,
     !(
        Exists b3 such that
           b <= b3,
           years = b3,
           b3 < c
      ),
     b = 0,
     c = 150. 

After additional obvious simplification we get

false <-
     age[a] = years,
     ! (0 <= years, years < 150). 

which is quite close to the appearance of the original constraint.

19.3.2. Event Rules

Event rules provide the programmer with a means of defining delta rules without having to use explicit delta operators. Event rules are closely tied to pulse predicates, as every event rule contains some reference to pulse predicates.

A rule is considered to be an event rule if all of the following conditions hold:

  • the enclosing block has transaction lifetime;
  • the rule contains no delta atoms;
  • all atoms in the head of the rule refer to pulse predicates;
  • at least one atom in the body of the rule refers to a pulse predicate.

Example 19.10. An event rule

If we execute the following, the output will be 5.

create --unique

//--------------
addblock <doc>

p(x) -> int(x).
q(x) -> int(x).

pp(x) -> int(x).
lang:pulse(`pp).

pq(x) -> int(x).
lang:pulse(`pq).

p(x) <- int:range(0, 5, 1, x).

+q(x) <- +pp(x).

</doc>

//--------------
exec <doc>

pp(x) <-  p(x), pq(x).   // <----- an event rule

+pq(5). +pq(6).

</doc>

//--------------
print q

close --destroy 

19.4. Stages

We are now ready to take a closer look at the anatomy of a transaction and how it affects the way we write LogiQL code.

After introducing the important notion of "maintenance" we look at the stages of a transaction, introduce "stage suffixes", and briefly mention a few topics that are more or less closely tied to stages, stage suffixes and delta logic.

19.4.1. Maintenance

Predicates may depend on each other. For example, if a rule refers to p in the head, and to q and r in the body, then p depends on q and r: when the contents of one (or both) of the latter predicates are changed, the contents of p may also have to be changed. One can also consider this rule as dependent on any rule that derives into q and r.

Information about such dependencies (plus some auxiliary information) is expressed in internal data structures maintained by the system. The structures are known as execution graphs, because they allow the runtime system to execute (evaluate) rules more efficiently and correctly:

  • there is no need to evaluate rules whose bodies do not refer to predicates that have been recently updated;
  • it is better to evaluate rules that are depended on (rules for q and r in our example) before evaluating a rule that depends on it (our rule for p).

The dependency graph is not, in general, acyclic, so updates may have to be performed repeatedly until there are no more changes to be made. The resulting stable state of the database is colloquially referred to as the fixpoint.

The term maintenance refers to the process of:

  1. updating the execution graph to reflect additions and removals of logic;
  2. using the updated graph as a guide in the evaluation of rules, until a fixpoint is reached.

In other words, maintenance ensures that the contents of all predicates are consistent with the contents of the other predicates and with all the rules that are currently active.

19.4.2. The Six Stages

The execution of a transaction consists of several successive stages. The figure and table below provide a quick overview. We then give some additional information in the text below.

It should be noted that:

  • each stage (except for START and transaction setup) involves a round of maintenance (and has its own execution graph);
  • only database-lifetime predicates (and the execution graph of stage FINAL) persist across transactions.

Stage Description
Transaction setup

Initialization of datetime:now[] and transaction:id[]. Their values will not change throughout the transaction. This means that all uses of datetime:now[] will result in the same value, even though the system time will progress during the transaction. (See the section called “datetime:now” and the section called “transaction:id[]”.)

START: the preamble

Addition/activation and removal of logic from the workspace, according to the requests (e.g., lb commands) made for this transaction:

  • Addition of:
    • transaction-lifetime, database-lifetime and query-lifetime blocks;
    • inactive blocks.
  • Activation of selected inactive blocks.
  • Removal of database-lifetime blocks.
INITIAL: evaluation of initial logic

Evaluation of transaction-lifetime logic; the transaction makes changes to EDB predicates, and to transaction-lifetime local predicates. (See Section 19.1.5.)

FINAL: maintenance of installed logic

Maintenance of database-lifetime logic, which involves evaluation of installed rules. This ensures that:

  • IDB predicates are kept up-to-date with the changes made by the transaction;
  • database-lifetime delta rules are applied to the affected EDB predicates.
QUERY: evaluation of query logic

Evaluation of query-lifetime logic. This is quite similar to stage INITIAL, except that

  • logic at stage QUERY has access to the effects of stage FINAL;
  • all the effects of stage QUERY (in particular: changes made to predicates) will disappear when the stage terminates.

It is worth noting that an error in stage QUERY is treated as an error in the transaction: the transaction will abort. So this stage can be used for additional checking of consistency (in a way that is not expressed by the installed constraints).

Cleanup and teardown

The purpose of this stage is to ensure that the system is ready for the next transaction:

  • Removal of transaction-lifetime blocks and predicates.
  • Removal of the contents of pulse predicates (followed by an additional round of maintenance on the execution graph of stage FINAL, in order to reset it to a state corresponding to empty pulse predicates and to ensure that no further changes to non-pulse predicates occur). (See Section 19.3.1.)
  • Removal of various kinds of auxiliary data, such as information about the contents of predicates at various stages. (See Section 19.4.4.)
  • Various internal operations that set up the database for the next transaction.

19.4.3. Order of execution

If a transaction executes multiple inactive blocks or transaction-lifetime blocks at stage INITIAL, then the blocks are not necessarily executed in the order in which they have been written. The rules from all the blocks are combined, and an attempt is made to sort them according to their dependencies, in order to make evaluation more efficient; however, cyclic dependencies between rules, even from different blocks, are allowed and cause no problems.

Example 19.11. Execution of multiple blocks

The following example demonstrates that execution of the blocks is not sequential, and that cyclic dependencies between blocks are supported. The example prints 5, which is only possible if the blocks are not executed sequentially.

create --unique

addblock <doc>
  p(x) -> int(x).
  t1(x) -> int(x).
  t2(x) -> int(x).
  lang:pulse(`t1).
  lang:pulse(`t2).
</doc>

//---------------------------------
transaction     // start a new transaction

exec <doc>
  +t2(x) <- +t1(x).
</doc>

exec <doc>
  +t1(5).
  +p(x) <- +t2(x).
</doc>

commit          // commit and end the transaction
//---------------------------------

print p

close --destroy 

Notice that we enclosed two blocks (added by exec commands) within one transaction. Had we relied on the default behaviour (one transaction per block), p would have been empty.

19.4.4. Stage suffixes

LogicBlox allows logic to refer to the state of a predicate during an earlier (or current) stage of a transaction by means of a stage suffix of the form @stage.

Note

The same notation (i.e., extending a predicate name with @name) is used also for referencing predicates from different branches: see Section 41.3.

These stage suffixes have very different meanings for "normal" atoms and for delta atoms. For a unary predicate p:

p@previous(x) or p@prev(x)

refers to the contents of p just before the transaction began.

(The perceptive reader will notice that this appears as @start in the diagram above. That is also the name used internally by the system, so it might appear in some error messages.)

p@initial(x) or p@init(x)

refers to the contents of p just after stage INITIAL, i.e., after evaluating the transaction-lifetime rules that make changes to EDB predicates, but before installed (i.e., database-lifetime) rules are evaluated.

p@final(x)

refers to the contents of p after stage FINAL, i.e., after installed logic rules have been evaluated and a fixpoint has been reached. Since a stage tag cannot refer to a stage that is executed later than the logic in which the tag occurs, @final is only valid in installed rules: it must not be used in a transaction-lifetime rule.

p(x)

an atom without a stage suffix is interpreted as referring to the contents of the predicate after stage FINAL.

+p@initial(x) or +p@init(x)

refers to the insert requests for p in stage INITIAL.

+p@final(x)

refers to the insert requests for p in stage FINAL.

+p(x)

refers to the insert requests for p in this transaction so far:

  • +p@initial(x) in stage INITIAL;
  • (+p@initial(x); +p@final(x)) in stage FINAL.

Example 19.12. Stage suffixes

The script below sets up a transaction (marked by "====== A"), in which f@previous = {"start"}, there is an initial delta +f("initial"), and a final delta +f("final").

create --unique

addblock <doc>
   f(s)           -> string(s).
   insert_to_f(s) -> string(s).

   // This installed rule will create a @final delta:
   +f(s) <- +insert_to_f(s).
</doc>

exec <doc>
   +f("start").
</doc>

echo "======= A"
exec <doc>
   // This will create an @initial delta:
   +f("initial").

   // This will create a @final delta, via the installed rule:
   +insert_to_f("final").
</doc>

close --destroy 

In an installed logic rule, in the transaction marked "====== A", the predicates f, f@previous, etc. would have the following contents:

Stage tag Predicate Deltas
@previous

f@previous = {"start"}

@initial

f@initial = {"start", "initial"}

+f@initial = {"initial"}

@final

f@final = {"start", "initial", "final"}

+f@final = {"final"}

(no stage tag)

f = {"start", "initial", "final"}

+f = {"initial", "final"}

Example 19.13.  Using stage suffixes to obtain information about tuples to be deleted

In the following script we create a predicate in the first transaction, and populate it in the second one.

In the third transaction we want to update the age of Mary and delete all information about people whose age is 6. Unfortunately, we no longer remember who are those people. We can query the database, and the right way to do this is by using a stage suffix, as shown below.

create --unique

addblock <doc>
   age[name] = years -> string(name), int(years).
</doc>

exec <doc>
   +age["Mary"] = 7.
   +age["John"] = 6.
   +age["Jim"]  = 10.
</doc>

print age
echo ------

exec <doc>
   ^age["Mary"] = 8.

   -age[person] = _ <- age@prev[person] = 6.  // <<<<<<<<<<<<<<<
</doc>

print age
echo ------

close --destroy 

The resulting printout looks like this:

created workspace 'unique_workspace_2016-06-24-21-00-21'
added block 'block_1Z1C3A9I'
"Jim"  10
"John" 6
"Mary" 7
------
"Jim"  10
"Mary" 8
------
deleted workspace 'unique_workspace_2016-06-24-21-00-21' 

If we want to just delete "Jim", we don't have to remember his age: it is enough to just write

-age["Jim"] = _.

Indeed, an attempt to specify the age

-age["Jim"] = 10 

would result in an error message.

Stage suffixes in auxiliary internal predicates

(This subsection mentions details of the current implementation of the runtime system. We include it here, because the user will sometimes be exposed to these details in error messages.)

In Section 19.2.2 we mentioned "delta predicates", i.e., pulse predicates (Section 19.3.1) that are created and maintained by the system to keep information about requests for insertions and deletions.

In the internal representation of a delta predicate its name refers explicitly to its stage. For example, an atom such as +f[x]=y that appears in a transaction-lifetime rule (which will be evaluated at stage INITIAL) will be represented as f$delta_initial_insert[x]=y.

Such renaming is internally applied also to LogiQL rules which use a combination of delta syntax and stage suffixes. For example, the atom +f@final[x]=y in a query-lifetime rule will be represented as f$delta_final_insert[x]=y.

19.4.5. Ghost Entity Check

A ghost entity is an entity-typed value that occurs in a predicate, but not in the predicate that enumerates values of that entity type. In the following, entity [1] is a ghost entity.

Table 19.1. An example of a ghost entity

Entity bandPredicate nameOf
[0]([0], "The National")
[2]([1], "The Mountain Goats")
 ([2], "Belle and Sebastian")

In LogicBlox 3.x databases may contain ghost entities and developers are responsible for programming accordingly. LogicBlox 4 includes a compile-time check that issues an error for rules that may lead to the introduction of ghost entities.

It is only necessary to check active delta rules, all other rules are safe. A ghost entity is created when an entity-typed value is inserted by the head of a delta rule, but nothing in the body ensures that this particular value will still exist at transaction end. For example, each of the following rules may lead to creation of ghost entities and is flagged with an error.

// Error
+nameOf[b] = "foo" <- -band(b).

// Error
+nameOf[b] = "foo" <- -myFavoriteBands(b).

// Error
+nameOf[b] = "foo" <- band@initial(b), +someEvent(_).

In most cases, a rule can be made safe by extending its body with an atom ensuring that the added entities are not ghosts. For instance, the following rules are safe:

+nameOf[b] = "foo" <- -myFavoriteBands(b), band(b).

+nameOf[b] = "foo" <- band@initial(b), +someEvent(_), band(b).

+nameOf[b] = "foo" <- +band(b).

The rule

+nameOf[b] = "foo" <- -band(b), band(b).

is safe too, but will never fire.

19.4.6. Frame rules

The LogicBlox system uses frame rules to declaratively specify how changes to the state of the database are handled at different stages of a transaction.

We illustrate the principal ideas by means of a simple example, then list the details. While frame rules are internally generated rules that the user does not write, they can appear in error messages and must be taken into account when analyzing performance, so it is useful to have a basic understanding of how they work.

An example

Consider the following trivial lb script:

create W

addblock --name B <doc>
  age[x] = y -> string(x), int(y).
</doc>

exec <doc>
  +age["Ann"] = 7.
  +age["Bob"] = 6.
</doc>

close --destroy

The delta rule +age["Ann"] = 7 is transformed into a more detailed internal representation that looks like this:

Forall __t0::string,__t1::int .
   age$delta_initial_insert{block_1Z1CY2UJ:1(1)--1(15)#1Z1CZI9D}#0[__t0]=__t1 <-
      string:eq_2(__t0,"Ann"), int:eq_2(__t1,7).

Such rules can be read only with some difficulty, so henceforth we will make them more palatable by editing out quantifiers, type information, redundant parentheses and identifiers of internal blocks. For good measure we will also rename variables, and omit rules whose only function is to consolidate rules that are almost identical, but have different information about blocks encoded in the names of their head atoms. The above rule is the first one shown below: it might be instructive to carefully compare the two versions.

age$delta_initial_insert[x] = y <- string:eq_2(x, "Ann"), int:eq_2(y, 7).
age$delta_initial_insert[x] = y <- string:eq_2(x, "Bob"), int:eq_2(y, 6).

The p$delta_initial_insert predicate is an internal predicate that exists for every EDB predicate p. It is used to collect requests to insert facts into the predicate. The delta rule that derives facts into age$delta_initial_insert does not immediately cause these facts to be in the actual predicate age. Similarly, there is a predicate age$delta_initial_erase that contains the requests for deleting facts.

The request changes in age$delta_initial_insert and age$delta_initial_erase are applied to the actual predicate by a frame rule. This happens separately for every stage where changes can be requested (INITIAL and FINAL). For stage INITIAL, the frame rule for age must take the predicate age@START, consider the delta requests, and create the predicate age@INITIAL, by applying appropriate changes to a (logical) copy of age@START. The frame rule is shown as follows, in the form of a LogiQL rule that derives into an auxiliary predicate whose contents are the changed tuples, each annotated with an insert/erase tag. In reality the rule has an internal implementation that cannot currently be expressed in logic.

age@START..INITIAL[x, delta] = y <-
    age$delta_initial_insert[x] = y,
    delta:eq_2(delta, DELTA_INSERT)
  ; age$delta_initial_erase(x),
    age@START[x] = y,
    delta:eq_2(delta, DELTA_ERASE).

Let us now extend our example by adding one more simple transaction at the end:

exec <doc>
   ^age["Ann"] = 8.
   -age["Bob"] = _.
</doc>

The upsert and the deletion are translated to

age$delta_initial_upsert[x] = y <- string:eq_2(x, "Ann"), int:eq_2(y, 8).

age$delta_initial_erase(x) <- string:eq_2(x, "Bob").

Since there is an upsert, we get also an upsert-erase rule and an upsert-insert rule. The first rule states that if there is an upsert for x at stage INITIAL, and there is a previous value for x (i.e., at stage START), then the latter should be erased at stage INITIAL. The second rule states that the upserted value should be inserted at stage INITIAL.

age$delta_initial_erase(x) <-
   age$delta_initial_upsert[x] = _,
   age@START[x] = _.

age$delta_initial_insert[x] = y <-
   age$delta_initial_upsert[x] = y.

Details

The name of a delta predicate is of the form name$delta_stage_kind, where

  • stage is initial, final or all;

  • kind is insert, erase or upsert.

Additionally, for pulse predicates there are delta atoms of the form name$delta_query_insert.

Stage all corresponds to the combination of stages INITIAL and FINAL. Predicates with this stage are defined by rules such as

f$delta_all_insert[x] = y <-
    f$delta_initial_insert[x] = y
  ; f$delta_final_insert[x] = y.

f$delta_all_erase(x) <-
    f$delta_initial_erase(x)
  ; f$delta_final_erase(x).

If f is a pulse predicate, then the following rule is used at stage QUERY:

f$delta_all_insert[x] = y <-
    f$delta_initial_insert[x] = y
  ; f$delta_final_insert[x]   = y
  ; f$delta_query_insert[x]   = y.

If there is an f$delta_final_upsert predicate, then the following rules are generated:

f$delta_final_insert[x] = y <- f$delta_final_upsert[x] = y.

f$delta_final_erase(x) <- f$final_upsert[x] = _, f@initial[x] = _.

(And similarly for stage INITIAL, as shown in our example above.)

If there are rule(s) with head atoms of the form f$delta_stage_kind[x] = y, the following frame rules are generated to apply the deltas:

  • At stage INITIAL:

    f@START..INITIAL[x, delta] = y  <-
         f$delta_initial_insert[x] = y, delta = DELTA_INSERT
       ; f$delta_initial_erase(x), f@START[x] = y, delta = DELTA_ERASE.

  • At stage FINAL:

    f@INITIAL..FINAL[x, delta] = y <-
         f$delta_final_insert[x] = y, delta = DELTA_INSERT
       ; f$delta_final_erase(x), f@INITIAL[x] = y, delta = DELTA_ERASE.

The above patterns were shown in the simplest versions, where they are applied to a functional predicate with one key argument. If a functional predicate has more key arguments, then these are all used where appropriate, in particular in predicates that end with _erase. Non-functional predicates are treated similarly, except that there will be no upserts and related rules (the ..._erase predicates will contain all the arguments).

The separation between requests for changes and application of the changes has several implications:

  • If a fact is derived into p$delta_initial_insert when it already exists in p, then the delta_initial_insert predicate will contain the new fact, but the actual predicate p will not change (i.e., it is not an error to insert an already existing fact). This means that the author of delta logic that is triggered by +age must be cautious about such re-assertions. For example, +age[x] = y, !age@prev[x] = y may in some cases be required.

  • Similarly, it is not an error to retract a fact that does not exist. This means that the author of delta logic that is triggered by -age should also be cautious about the existence of the fact that is requested to be retracted.

  • While derivation of tuples into delta_initial_insert may succeed, the application of the deltas can still fail. For example, if the current age of Ann is 7, and we attempt to change her age to 8 by using +age["Ann"] = 8 rather than ^age["Ann"] = 8, then when the frame rule is evaluated a functional dependency violation will be reported. This is why a functional dependency violation error points at a frame rule as the source of the error.

Since delta predicates are state specific, and since frame rules are used to apply the change requests at both stage INITIAL and stage FINAL, it is possible to make changes twice in a single transaction, and these changes are allowed to be conflicting. For example, the following logic assures that the author of a document gets an appropriate bonus, no matter what bonus was requested.

create W --overwrite

addblock --name B <doc>
  bonus[x] = y -> string(x), int(y).
  author(x)    -> string(x).

  ^bonus[x] = y * 2 <- +bonus@initial[x] = y, author(x).
</doc>

exec <doc>
  +author("John").
  ^bonus["John"] = 50.
</doc>

print bonus

close --destroy

The existence of the delta_all predicate makes this feature a bit difficult to use. The predicate is lazily generated when needed, and if it is generated, then the example above causes a functional dependency violation on delta_all. This happens, for instance, if we add the following to the contents of addblock:

salary[x]  = y -> string(x), int(y).
payment[x] = y -> string(x), int(y).

^payment[x] = salary[x] + z <- +bonus[x] = z.

We plan to remove the delta_all in the near future and introduce a more convenient and efficient way to use all the actual changes made to a predicate.

Chapter 20. Hierarchical Syntax

Hierarchical syntax is a notational convenience ("syntactic sugar") that makes it simpler to write formulas and expressions that involve hierarchically structured data. To a first approximation, you can think of hierarchical syntax as a mechanism to avoid writing the same arguments to predicates over and over again. Its use is best illustrated through examples.

Imagine you are inserting information about a new person into the workspace. Without hierarchical syntax you might write some code that looks like the following:

+person(p), +firstname[p]="John", +lastname[p]="Doe",
+street[p]="1384 West Peachtree Street", +city[p]="Atlanta"

If you instead use hierarchical syntax, you can write the same thing, but avoid having to repeatedly use p everywhere:

+person(p) {
  +firstname("John"),
  +lastname("Doe"),
  +street("1384 West Peachtree Street"),
  +city("Atlanta")
}

Based upon the arguments provided and the type of p, the compiler figures out that it must insert p as the first argument of all the atoms between the braces. Internally, the compiler then desugars the hierarchical version of this example to exactly the same logic as the non-hierarchical version. Alternatively, if you prefer to emphasize the functional nature of the predicates, you can write the example as:

+person(p) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
}

Furthermore, p is used only once so you can replace it with an underscore:

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
}

Using an underscore in such situations avoids the need for unique names when creating multiple members of some entity type with their associated data. For example, without hierarchical syntax you would write:

+person(p1),
+firstname[p1]="John",
+lastname[p1]="Doe",
+street[p1]="1384 West Peachtree Street",
+city[p1]="Atlanta",
+person(p2),
+firstname[p2]="Jane",
+lastname[p2]="Doe",
+street[p2]="1384 West Peachtree Street",
+city[p2]="Atlanta"

Here, because it is necessary to distinguish the links between the persons and their associated relationships, we must choose to use distinct variable names, p1 and p2. With hierarchical syntax we can avoid this by just using underscores:

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
},
+person(_) {
  +firstname[]="Jane",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",
  +city[]="Atlanta"
}

We call the formula just before the curly-braces the "head" of the hierarchical formula and the conjunction of atoms between the curly-braces the "body" of the hierarchical formula. Currently, we only allow conjunctions of atoms as the heads and bodies of hierarchical syntax. We may relax this restriction in the future based upon some additional study and user-provided use cases.

Here is a small example of how we can simplify some code using a conjunction of atoms in the head of a hierarchical formula. First we define a small schema:

person(p) ->.
car(c) ->.
name[p] = s -> person(p), string(s).
brand[c] = s -> car(c), string(s).
driven_by(c, p) -> car(c), person(p).

Without hierarchical syntax you might have written logic like the following:

person(p),
car(c),
name[p] = "Prefect",
brand[c] = "Ford",
driven_by(c, p)

With hierarchical syntax this can be written as the following formula:

(person(p), car(c)) {
  name[] = "Prefect",
  brand[] = "Ford",
  driven_by()
}

Again, based upon the arguments you have supplied in the hierarchical body and the types of p and c in the hierarchical head, the compiler determines that a use of p must be inserted into name, that a use of c must be inserted into brand, and that both c and p must be inserted into driven_by.

Inside hierarchical formulas we allow the use of hierarchical expressions. A hierarchical expression looks just like a hierarchical formula, but may be written anywhere we can write an expression like x, or foo[y]. The only restriction is that the head of a hierarchical expression must be a single atom denoting an entity. For example, you can use a hierarchical expression to simplify the following code:

+person(p),
+firstname[p] = "John",
+lastname[p] = "Doe",
+address(a),
+home[p] = a,
+street[a] = "1384 West Peachtree Street",
+city[a] = "Atlanta"

by writing

+person(_) {
  +firstname[] = "John",
  +lastname[] = "Doe",
  +home[] = +address(_) {
              +street[]="1384 West Peachtree Street"
              +city[]="Atlanta"
            }
}

When discussing aspects of hierarchical syntax that are not specific to just hierarchical formulas or just hierarchical expressions, we will use the generic term "a hierarchical".

There are limits to the compiler's ability to determine how to interpret hierarchical syntax. Given a hierarchical, the first thing the compiler does is collect a set of what we call "candidate" expressions from the head. Currently, candidate expressions can be only variables, literals, or integer expressions composed of literal values. For example, given the hierarchical

(person(p), age[p] = 42) { ... }

the candidate expressions would be p and 42. To ensure that there is always a unique interpretation of a hierarchical, we require that the types of the candidate expressions be disjoint. Roughly, you can understand disjoint to mean that the type of a candidate expression cannot be a subtype of another candidate expression or vice versa. For example, the compiler will disallow the code fragment

(person(p1), person(p2)) { ... }

because p1 and p2 both have the type person and therefore the compiler cannot decide when it should choose to insert a use of p1 rather than a use of p2. Writing the above logic would cause a HIER_AMBIGUOUS_HEAD_BINDING error to be reported.

Once the compiler has determined an unambiguous set of candidate expressions, it will start examining the atoms in the body of the hierarchical. If an atom in the hierarchical's body already has the correct number of arguments for the defined predicate, the atom will not be modified.

If the atom has fewer user-supplied arguments than expected, the compiler will begin the process of resolving the arguments. To make it easier to understand how this process works we will use a contrived example. Consider the following schema:

a(x) ->.
b(x) ->.
c(x) ->.
d(x) ->.
e(x) ->.
f(x) ->.

foo[x, y, z, w] = u -> a(x), b(y), c(z), d(w), e(u).

We will now step through the process used to resolve the arguments to the atom foo in the following logic:

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w, u] = v
  }, b(w), d(u), e(v).

The first thing the compiler does is note that the value argument of the atom has already been specified. Therefore, the compiler will ignore the value argument for the rest of the resolution process.

Next, the compiler will start by simply ignoring the key arguments the user has supplied. We can visualize the compiler's current information about the types of the foo atom's arguments as follows:

foo[•, •, •, •] = v

Here • represents a "hole", i.e., an argument position still to be filled.

The compiler will now fill in all of the argument positions whose expected types are not disjoint from the candidate expressions. In this example, the candidate expressions are x, y, and z with types a, c, and f respectively. Because the first argument of foo should be an expression of type a and the third argument of foo should be an expression of type c, the compiler will insert x and y into these positions:

foo[x, •, y, •] = v

It is worth emphasizing that the compiler will only ever insert a candidate expression into a single hole, and that candidate expressions may go unused (z in our example).

Next, the compiler will take the two arguments that the user supplied to foo and fill them into the remaining holes left to right. This means w will be inserted in the second argument and u into the fourth argument:

foo[x, w, y, u] = v

Because all argument positions of the predicate foo are now filled, the resolution process is considered successful.

However, if we had started with a slightly different example, resolution could have failed at various points. For example, if the user had written

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w] = v
  }, b(w), d(u), e(v).

then the compiler would still have filled in the candidate expressions as follows:

foo[x, •, y, •] = v

However, the compiler would then notice that the user has supplied only one key argument, w, while there are still two holes to be filled. It would report this as a HIER_ATOM_TOO_FEW_SUPPLIED error.

Another error would be detected if the user had written

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w, u, u] = v
  }, b(w), d(u), e(v).

Again, the compiler would begin by inserting the two candidate expressions:

foo[x, •, y, •] = v

This time, the compiler would notice that the user supplied three key arguments (w, u, u), while there are only two holes to be filled. This would be reported as a HIER_ATOM_TOO_MANY_SUPPLIED error.

The resolution process could have also failed if foo had been declared differently. For example, if foo had been declared as

foo[x, y, z, w] = u -> a(x), a(y), c(z), d(w), e(u).

and then the user wrote the logic

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[u] = v
  }, b(w), d(u), e(v).

the resolution process would fail. The reason is that the compiler cannot determine whether to insert the x as the first argument or as the second argument of foo:

foo[•, •, •, •] = v

Again, this is because a candidate expression will only be inserted into a single hole per atom. This kind of failure will be reported as a HIER_AMBIGUOUS_BODY_ATOM error.

Finally, it should be noted that the value argument has a special status in the resolution of a functional predicate. If foo had been declared as:

foo[x, y, z, w] = u -> a(x), b(y), c(z), d(w), a(u).

the compiler would be able to resolve

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo[w, u] = x
  }, b(w), d(u), e(v).

because, when the candidate expressions would be about to be filled in, the value argument would already have been determined:

      foo[•, •, •, •] = x

At this point there is only one hole where x could be inserted and satisfy the typing requirements. So there is no ambiguity. It is very important to understand that the compiler would be able to resolve this example only because it had the syntactic hint that x was to be used as the value argument. If the logic had been written as

q(x, y, z) <-
  (a(x), c(y), f(z)) {
    foo(w, u, x)
  }, b(w), d(y), e(v).

then just before the insertion of candidate expressions the atom would look like:

  foo[•, •, •, •] = •

This is because the compiler would no longer know that x is intended to be the value argument, so there would now be a hole in the position of the value argument. Furthermore, the candidate expression x could be inserted into two possible holes, the first argument and the value argument. Consequently, the compiler would report an ambiguity error (HIER_AMBIGUOUS_BODY_ATOM).

Syntax.  Hierarchical syntax extends the language with the following grammatical constructions:

Hierarchical = HierHead "{" HierBody "}" .

HierHead = Atom
         | "(" HierHeadConjunction ")" .

HierHeadConjunction = Atom { "," Atom } .

HierBody = HierAtom { "," HierAtom } .

HierAtom = [ Delta ] PredicateName [ "@" Stage ] "(" HierExpressions ")"
         | [ Delta ] PredicateName [ "@" Stage ] "[" HierExpressions "]" "="
              HierExpression .

HierExpressions = HierExpression { "," HierExpression } .

HierExpression = Hierarchical | Expression .

Chapter 21. Modules

Modular design and data abstraction are important in the development of large scale software systems. LogiQL provides a module system to help users structure large programs into manageable pieces.

The LogiQL module system provides developers with several specific benefits. The most significant is that logic written using the module system is recompiled incrementally, according to dependencies automatically extracted from the module definitions. Therefore, if you edit one file, separate compilation will only recompile that file and those that depend on it.

Another benefit to developers is that predicates defined in other modules can be aliased to shorter names, which helps to make logic more concise and easier to read.

Finally, the module system provides for hiding and sealing predicates defined in a file. Only explicitly exported predicates are visible from other files. A predicate may also be defined as sealed, which prevents other modules from modifying it. This ensures that someone else will not intentionally or accidentally add new logic for your predicate, thus invalidating your assumed invariants.

21.1. ConcreteBlox

The basic unit of code in the module system is the concrete block. A concrete block is a set of clauses (declarations, facts and rules), along with a set of export and alias declarations.

A concrete block can be declared as inactive, so that it can be scheduled for execution only when needed.

Here is a short explanation of these terms:

alias

Alias declarations are used by a concrete block to give alternative, presumably shorter, names to predicates, blocks, or namespaces when writing the clauses that comprise the concrete block.

exported predicate

An exported predicate is a predicate declared by a concrete block that may be used by other concrete blocks. Some of these exported predicates may be declared to be sealed.

inactive block

An inactive block is not part of the active installed program. This means that the rules contained therein are not automatically evaluated during maintenance. An inactive block must be scheduled for execution explicitly.

sealed predicate

The contents of a sealed predicate may be observed by other concrete blocks, but they may not insert into that predicate or add new rules that derive into that predicate.

21.1.1. Writing your very first concrete block project

The first step in using the module system to organize a program is to create a new directory to hold your project. We will call this directory test. Once we have a directory to hold the project, we will create inside that directory a project file for the compiler to read (see Section 23.1). We will call our project file myproject.project. The file should contain the following text:

myproject, projectname
mylib, module

Like in other project files used in separate compilation, the format is a filename followed by a command and then an optional qualifier. In this case, the project file we have written says that the directory mylib contains a module-based project. Now that we have created the project file, inside the test directory we will create the directory mylib.

Inside directory mylib create a file called A.logic for our first concrete block. Concrete blocks must be written in files with a .logic extension for the compiler to recognize them. Inside A.logic we will write the following:

block(`A) {
  export(`{ p(x) -> . }),
  clauses(`{
    q(x) -> .
  })
} <-- .

This declares a concrete block named A that defines two entities, p and q. Additionally, the concrete block contains a declaration, export(`{ p(x) ->. }) stating that the entity p is exported for use by other concrete blocks. In other concrete blocks A:p will refer to the entity p defined in concrete block A.

Note that the syntax of the concrete block suggests that we are using hierarchical syntax (see Chapter 20). However, we currently do not support writing concrete blocks in a desugared form. This may change in future releases.

It is also important to note that the name of a concrete block must match the name of the file in which it is defined, minus the extension. That is, a concrete block called name must be inside a file called name.logic. This restriction allows the compiler to easily determine which files depend upon a changed file: these files must be recompiled.

Now we will create a second concrete block in a new file called B.logic within the mylib directory:

block(`B) {
  export(`{ r[x]=y ->  A:p(x), int(y). }),
  alias(`mylib:A:p, `otherp),
  clauses(`{
    p(x) -> otherp(x).
  })
} <-- .

This declares a concrete block named B that defines a subtype entity p that is not exported and a functional predicate r that is exported.

In this example we introduce the aliasing functionality provided by ConcreteBlox. The declaration alias(`mylib:A:p, `otherp) states that, inside this concrete block, whenever we use the predicate name otherp, we are in fact referring to the predicate mylib:A:p.

Next, inside the mylib directory, we'll create a directory named util. This directory creates a new namespace called util. If you are familiar with Java, this is similar to how it maps directories to package names. However, unlike in Java, concrete blocks need not specify which namespace they live in. To refer to a concrete block within a specific namespace, the name of the concrete block must be prefixed with the namespace name followed by a colon. The directory that we specified in the project file, here mylib and project.txt respectively, is treated as the root namespace.

Inside the util directory we will create another logic file called C.logic:

block(`C) {
  alias_all(`mylib:A),
  clauses(`{
    f[x] = y -> p(x), int(y).
  })
} <-- .

Because the concrete block C is contained within the namespace util, its fully qualified name is util:C. It does not export any predicates.

In the definition of C, we have used the other form of aliasing offered by ConcreteBlox. The concrete block gives the alias declaration alias_all(`mylib:A), which allows all predicates within the concrete block mylib:A to be used without any prefix. That is, the predicate mylib:A:p may be referenced simply by writing p within the concrete block C.

Finally, inside the util directory, we will also create a file D.logic containing:

block(`D) {
  inactive(),
  clauses(`{
    +mylib:A:p(x).
  })
} <-- .

This defines a concrete block D, that like C does not export any predicates. However, unlike C, this block has been declared inactive by writing the declaration inactive(). Scheduling the execution of this this block will cause a new instance of the entity p, defined in the concrete block mylib:A, to be created. You may also specify that a block is active by writing the declaration active() instead, but blocks default to being active if you provide no declaration.

For the compilation and installing of projects into a workspace, please refer to Chapter 23.

21.1.2. Names

One of the most significant differences between writing logic in a concrete block and in a legacy block is that colons in predicate names now have semantic meaning. A predicate name like foo that does not contain colons is called a simple name. A predicate name like bar:baz that does contain a colon is called a qualified name.

We may sometimes refer to part of a qualified name up to some colon as a prefix of the name. For example, a prefix of the qualified name bar:baz is bar and the qualified name a:b:c has the prefixes a:b and a.

21.1.3. Name trees

The process of finding a specific predicate from a given predicate name is performed using a structure that we call a name tree. Qualified names in a module project can be seen to form a tree where the edges of the tree are simple names.

For example, suppose we created a module project in the directory project containing the file foo.logic and the directories foo and bar, where foo contains the files one.logic, two.logic and bar contains the file three.logic. The directory structure can be envisaged as follows:

Given the above project, the name tree corresponding to the root of the project would look something like this:

The circles represent namespaces and boxes represent exported predicates. There are a number of differences between two trees to note.

First, the module directory project name is used as the single edge from the root of the name tree.

Second, notice that all concrete blocks have become namespaces. For name resolution purposes, concrete blocks can be seen as defining their own namespaces.

Third, there is only one edge labeled foo from the the project node of the name tree. This is because the project directory contains both a directory and a concrete block named foo. Each edge from a given parent node to its child must have a distinct name, so we cannot have two edges labeled foo. Therefore, the name tree merges the two into a single node. This can only ever happen when there is a directory and a concrete block with the same name and the same parent. Because the project is structured around the filesystem, it is never possible to have two namespaces with the same name or two concrete blocks with the same name as children of the same node.

This name tree is what we call the project name tree, because it is the name computed from the root of the project. Each namespace has its own corresponding name tree.

The fully qualified name of a concrete block or predicate is defined by joining together all the simple names found in path from that concrete block or predicate in the root of the project name tree. So in the example project name tree above, the fully qualified name of the concrete block two is project:foo:two and the fully qualified name of the predicate p contained in the concrete block three is project:bar:three:p.

Determining the node a name points to in any given name tree is very simple. Split the name into a list of simple names by removing the colons. Then starting at the root of the name tree, follow the edges given by the simple names. In the example name tree above, to find the node corresponding to the name project:bar:three:p we would start at the root of the tree, follow the project edge, then the bar edge, then the three edge, and finally the p edge.

Starting from the project name tree, we can recursively construct the name trees for each of its descendent nodes. Given a name tree, to construct the name tree that corresponds to one of its immediate children, we just add edges from the root to each of that child's children using the same labels.

To make it easier to visualize and describe how this transformation takes place, we will label the nodes in the following examples with numbers. These numbers are merely for illustrative purposes and do not correspond to anything written by the user or used internally by the compiler. We will start with an extremely simple example and progress to more complicated ones.

Here, the project name tree just contains a single namespace child foo. To obtain the name tree used by the node named foo, labeled with 1, we simply add an edge labeled with p from foo's only child, the node labeled with 2, to the root:

Technically, at this point, we no longer have a tree, but a directed acyclic graph. In the new name tree we have two possible ways to name the predicate p: as foo:p and as p. Therefore, when writing logic in the concrete block foo we may refer to the same predicate either way depending on aesthetics or readability.

Now let us consider a slightly more complicated example:

Now suppose we want to find the name tree for the node named foo:bar, labeled with a 2. We start by constructing the name tree for the node named foo, labeled with 1. This involves adding edges between the root and all of the children of this node. So we add an edge labeled bar from the root to the node labeled 2 and an edge labeled p from the root to the node labeled 4.

Next, we just repeat the process, but for the children of the bar namespace, the node labeled with 2. This means simply adding an edge labeled with q from the root to the node labeled with 3.

Again, note how there are many ways to refer to the same namespace or predicate. For example, foo:bar:q, bar:q, and q can all be used to refer to the same predicate.

However, the process is not always quite so simple. In some cases it is possible for a node in the name tree to become inaccessible. When this happens we say that the namespace or predicate that is no longer accessible is shadowed. As an example, suppose we started with the following project name tree:

Now let us construct the name tree for the node named foo:foo, that is the node labeled with 2. We start by constructing the name tree for the node named foo, labeled with 1.

Because a node's child edges must all be distinct, when we add an edge labeled foo from the root node to the node labeled with 2, we shadow the original edge labeled with foo that connects the root to the node labeled with 1. We have indicated which parts of the name tree are no longer accessible by coloring them grey. In practice, they are no longer even part of the name tree, but we include them here for illustrative purposes. Next, we repeat the process to obtain the name tree relative to foo:foo.

Again, because we need to add an edge labeled with p from the root node to the node labeled with 3, the original edge labeled p from the root node to the node labeled with 4 becomes inaccessible, and that predicate becomes shadowed.

21.1.4. Aliasing

Alias declarations are interpreted as instructions to add new edges to the root of a name tree. Unlike what we have seen above, aliases are not allowed to cause shadowing. Doing so will result in an ALIAS_PREDICATE or ALIAS_NAMESPACE error.

Given the following project name tree:

The name tree for foo looks like the following:

Now suppose the concrete block foo had the alias declaration alias(`bar:q, `q). This would result in the following name tree:

Note that if foo had the alias declaration alias(`bar:q, `p) it would result in an ALIAS_PREDICATE error.

It is also possible to alias namespaces. For example, if foo also contained the alias declaration alias(`bar, `baz), the resulting name tree would look like:

It is even possible for a concrete block to alias its own name. However, this will only have an effect on naming predicates that the concrete block exports.

It is not possible to alias the result of another aliasing operation. This restricion ensures that the order in which aliases are written does not affect the outcome.

ConcreteBlox also provides the declaration alias_all. This declaration takes all of the children of the given namespace and adds edges to the root with the same names. So for example, given the following project name tree:

The name tree for foo would be:

If foo had the alias declaration alias_all(`bar), the resulting name tree would be:

21.1.5. Name resolution

One of the first steps in compiling a concrete block is that of resolving the names of all predicates to fully-qualified names. This is done by walking over all the predicate names in the logic defined by the concrete block and rewriting them according to the following algorithm:

  1. If the name can be found in the concrete block's name tree, we replace that predicate name with its fully qualified name.

  2. If not, we check whether it is a primitive or built-in predicate.

  3. If not, we check whether it is a predicate defined in a legacy block that was compiled prior to this concrete block.

  4. If not, if the predicate has a simple name, we assume that it is a predicate defined within this concrete block, but not exported. If the predicate's name is bar and the fully-qualified name of its concrete block is foo then its fully qualified name becomes foo:bar.

  5. If not, we report a BLOCK_UNKNOWN_PREDICATE error. If there is predicate with a similar enough name in the name tree, we may report a BLOCK_UNKNOWN_PREDICATE_TYPO error.

21.1.6. Block stage and lifetime

By default, logic defined in concrete blocks is active logic. However, you may override this behavior using the inactive() or execute() directives. For instance, the following defines an execute block:

block(`B) {
  execute(),
  clauses(`{
    +A:foo("a").
}) } <-- .

Chapter 22. Hierarchical Import/Export

Many applications store hierarchical data in the workspace. For instance, the following schema describes hierarchical information about a person:

block (`addrbook) {
  export ( `{
    person(x), person_id(x:id) -> int(id).
    person_age[x]=y -> person(x), int(y).
    person_address[x]=y -> person(x), addrbook:address(y).

    address(x), address_id(x:id) -> int(id).
    address_city[x]=y -> address(x), string(y).
    address_state[x]=y -> address(x), string(y).
 } ), ...

Ad hoc techniques for importing or exporting such data from the workspace can be complex and slow. For import, inherently hierarchical data must be flattened and imported into the workspace; for export, flat data extracted from the workspace must have its hierarchical structure reconstructed. In addition, import via delta rules or export via queries have negative performance implications.

Hierarchical import/export is designed to address this problem. Hierarchical import/export allows you to give a hierarchical description of your data as a Google Protocol Buffer Message. You can then write rules that pull data from the message into your working schema (for import), or derive data into the message from your working schema (for export).

22.1. Using Hierarchical Import/Export

There are four steps in using hierarchical import/export:

  1. Providing a specification of your data as a Google protocol buffer message.
  2. Adding a directive to your project file that will generate logic to represent your newly declared message types and ensure the runtime system is aware of these message types. (Project files are described in Chapter 23.)
  3. Writing rules that derive data from your message schema to your working schema, or vice versa.
  4. Using either lb commands or services (Chapter 26) to import/export your data.

In the remainder of this chapter we will use the above person schema as a running example.

22.1.1. Defining A Protocol Buffer Message Specification

This section demonstrates how to build a protobuf schema for representing information about a person, including her name and (some information about) her address. The following protocol buffer message specification describes such information:

package addrbook;

message Person {
  required uint32 age=1;
  required Address address=2;
}

message Address {
  required string city=1;
  required string state=2;
}

In what follows, we will assume this specification is the contents of file person.proto.

22.1.2. Importing the Protocol Message Specification

Adding the following line to your project file generates the definitions for Person and Address shown above and associates the descriptor with a name, myproto, that the runtime system uses to identify this family of message types.

myProject, project
person.proto, proto, descName=myproto

Several options can be given in the third field of the proto directive. These are described below.

descName=name

Required. Sets the descriptor name associated with imported protobuf message types.

lifetime=transaction or lifetime=database

Optional. Describes whether logical representation of the protobuf messages should have transaction or database lifetime. Default is database.

derivationType=edb or derivationType=idb

Optional. Determines whether the generated predicates will be declared as EDBs or IDBs. Declaring them as IDBs only makes sense if you will only ever be exporting data from the predicates, and not importing data. Default is edb.

protoPath=path

Optional. Search path for message types included in .proto files via import statements.

namespace={old1->new1 , old2->new2 , ...}

Optional. A map rewriting top-level namespaces for generated logic.

legacyLogic=true or legacyLogic=false

Optional. The default is false. When true, specifies that logic should be generated as flat files instead of modules. For forward compatibility, predicate names are identical whether or not legacyLogic is set. This is most useful in the case when recursive protobuf declarations would lead to illegal recursive modules.

dropPackages=p,q,r,...

Optional. The default is google.protobuf, blox.options, blox.internal. Specifies that logic should not be generated for the given protobuf packages. This can be useful when including third-party protobuf packages containing types that are not valid in LogiQL, or when a package is included twice via different proto project directives.

If two or more .proto files will create logic in the same namespace, it is necessary to import them together by listing them in the left column of a single proto directive. For example, suppose we refactored the message declarations above into two .proto files. The following directive will import messages from both and also rename the top-level LogiQL package used for generated logic.

myProject, project
person_only.proto addr_only.proto, proto, descName=myproto namespace={addrbook->foo}

The resulting predicate declarations are as follows:

block (`foo) {
  export ( `{
    Address(x), AddressId(x:id) -> int(id).
    Address_city[x]=y -> Address(x), string(y).
    Address_state[x]=y -> Address(x), string(y).

    Person(x), PersonId(x:id) -> int(id).
    Person_age[x]=y -> Person(x), int(y).
    Person_address[x]=y -> Person(x), foo:Address(y).
 } ), ...

22.1.3. Exchanging Data Between a Message and a Workspace

You are responsible for writing rules that populate the message schema with the data from your workspace. This is written using regular LogiQL logic. Below is an example of how to derive addrbook:Person and addrbook:Address entities for export, from corresponding person and address entities declared in a workspace.

begin_export() -> .
lang:pulse(`begin_export).

addrbook:Person(p_out),
addrbook:Person_age[p_out] = age,
addrbook:Address(a_out),
addrbook:Person_address[p_out] = a_out,
extract_address(a_in, a_out) <-
    addrbook:person(p_in), addrbook:person_age[p_in] = age,
    addrbook:person_address[_] = a_in, begin_export().

extract_address(a_in, a_out) ->
    addrbook:Address(a_in), addrbook:Address(a_out).
lang:pulse(`extract_address).

addrbook:address_city[a_out] = city,
addrbook:address_state[a_out] = state <-
    extract_address(a_in, a_out),
    addrbook:address:city[a_in] = city,
    addrbook:Address_state[a_in] = state.

The above rules are written with the assumption that it would be a pre-compiled block, called when necessary to export a message. Thus, it includes a pulse predicate begin_export. This predicate can be used to control when the rules generating message data should be evaluated.

Similar rules can be written to take data from the message predicates to your workspace.

22.1.4. Exporting/Importing a Message

To be able to read an exported message, or to construct a message to import, you would have to use code generated by protoc: a tool distributed with the Google Protocol Buffer tool. protoc can generate messaging APIs to be used with a number of different languages, such as C++, Python, Java, etc. To read more about the use of such APIs, please consult the protocol buffer manual.

22.1.4.1. Export

Exporting data from a workspace requires that you first evaluate the data exchange rules that convert data from your workspace to the message schema. Whether this is done by invoking a pre-compiled block or via one-off executions is the programmer's choice. Assuming that data is available in the message schema, one can use lb export-proto command to export that data into a message.

22.1.4.2. Import

Importing data into a workspace works similarly to exporting. A protocol buffer message must be constructed first, using the code generated by protoc from your message specification. A lb import-proto command can then be used to import a message into a workspace.

22.2. Set semantics for repeated fields

Protobuf repeated fields may be annotated to indicate that they should be represented as unordered sets instead of indexed predicates. This eliminates the need to generate or track indices. For example, protobuf declaration

repeated string foo = 1 [(blox.options.set) = true];

is represented in logic by

A_foo(x, y) -> A(x), string(y).

Part III. Projects

Chapter 23. LogiQL Project

When you are first exploring LogiQL, you are most often adding and executing small pieces of logic into the workspace through the LogicBlox command-line interface. As you build more complex logic that evolves into an application, it is useful to be able to organize your logic into files, and files into projects that represent self-contained units of functionality.

LogicBlox provides support for creating, compiling, and distributing projects as libraries. This chapter describes the basic structure of projects, their compilation and use.

23.1. Project Structure

A LogicBlox project has two components:

  • a project description file;
  • a directory, possibly with subdirectories, containing the logic files of the project.

The project description file specifies the LogiQL files and modules, libraries, and protocol buffer messages that must be compiled as parts of this project. A project description must be contained in a file whose name ends with the extension .project.

The project description file follows a simple comma-delimited format. Apart from proto directives (see Section 22.1.2), each line is of the following form:

<name>, <type indicator>
<name>

Name of the item. For files and modules it can be either an absolute path or the path relative to the directory containing the project description file. Even though an absolute path is allowed, all files must be contained within the same directory as the project description file, or in subdirectories of that directory.

<type indicator>

The type of the name. The following values are allowed:

projectname

Used to specify the name of the project. Project names must be valid LogiQL identifiers.

active

For installed blocks.

inactive

For stored queries.

inactiveAfterFixpoint

For stored queries intended to be activated after the end of stage FINAL.

execute

The file will be compiled, and will be executed when the project is installed into a workspace.

module

Indicates that the name on the first field is that of a directory containing LogiQL modules.

library

Indicates that the name on the first field is that of a library. The name of the library is the same as the project name specified in the library's project description file. The environment variable LB_LIBRARY_PATH, or, alternatively, the command line option -libPath can be used to specify a path of directories to be recursively searched for libraries. By default, the $LOGICBLOX_HOME/BlockResources directory is always included in the search for libraries.

severities

Severity declarations for various error codes. There can be only one severities declaration file per project. The severity declarations in the severities file apply to all logic in a project. It is possible for individual logic files to increase the project-wide severity declaration; however, an individual logic file cannot decrease a project-wide severity.

To change the severity level of a code from its default to a different level of severity, you can use the following four types of declarations in the severities file (CODE should be replaced by a concrete error code reported by the compiler):

// do not report CODE at all
lang:compiler:disableWarning:CODE[] = true.

// report CODE as a warning only
lang:compiler:disableError:CODE[] = true.

// report CODE as an error
lang:compiler:error:CODE[] = true.

// report CODE as a warning
lang:compiler:warning:CODE[] = true.

The project file can contain whitespace, which is ignored by the compiler. It can also contain comments: a comment is a complete line that begins with two forward slashes (//).

The entries are specified in the required order of compilation, with the programmer managing the compilation dependencies for non-module code. For example, the following project file separately compiles a library, some legacy files and a module directory:

// This is a comment
// This specifies that this project is named example
example, projectname
system:baseBootstrap,library

// An active legacy-code block
b1.logic, active

// This is a directory containing modules
employees, module

b2.logic,active
b3.logic,active

The compiler will compile entries in the project description file in the order they are specified in the file. This is also the order in which logic will be installed and executed, except for

  • libraries, which are always installed before other logic;
  • directories that contain module code: the correct ordering is determined automatically from the dependencies within the modules.

23.2. Compiling a Project

A project described in file myproject.project can be compiled on Unix/Mac with the following command:

$ lb compile project myproject.project

There are a number of options associated with the command lb compile project. Detailed descriptions can be found in Section 33.5.4. You can also retrieve usage information with the following command:

$ lb compile project --help

Compiling a project in this manner can improve the development cycle in two ways. First, disassociating the compilation of logic from the creation/addition of logic into a live workspace leads to faster response times. Secondly, the separate compilation mechanism supports incremental compilation: a file is recompiled only when necessary, i.e., when it is changed or when it depends on a changed file.

If there are no compilation errors, the compilation step produces a bytecode file filename.lbb for each logic file filename.lb. Additionally, it also produces a summary file called LB_SUMMARY. Each bytecode file is generated in the same directory as the corresponding source file, unless the --out-dir option is used.

Compilation is incremental, and is guided by a number of heuristics:

  • If any of the libraries referenced by the project has changed, that is, its timestamp is newer than the time the project was last compiled, the entire project must be recompiled.
  • An individual source file F is recompiled according to the following rules:
    1. If a clean build is specified, F is recompiled, as are all the other source files in this project.
    2. If there is no associated bytecode file, F is recompiled.
    3. If the associated bytecode file is older than F, F is recompiled.
    4. If a predicate compiled in another file has changed after the most recent compilation of F, and the existing bytecode file for F indicates that the file references that predicate, then F is recompiled.

When the compiler is invoked with the --explain option, it will provide information on why it makes certain decisions about incremental compilation.

23.3. Installing a Project

Once the project has been compiled, the following command can be used to install the project into a workspace (you should, of course, substitute appropriate names for workspace-name and directory-name):

$ lb addproject workspace-name directory-name

directory-name should be the output directory of the compiled project. Note that if a library is referenced by the project, and that library has not already been installed, the library will be searched for. If it is found, it will be installed; if it is not found, installation will fail. $LOGICBLOX_HOME/BlockResources will always be searched. Optionally, additional directories will be searched if the LB_LIBRARY_PATH environment variable is set, or the --libpath command line option is given.

Part IV. Web Services

Chapter 24. Introduction

ServiceBlox is a framework for developing and hosting services backed by the LogicBlox database. ServiceBlox services are the interfaces to the LogicBlox database from other application components: user interfaces, data integration tools, or third-party applications. For example, a typical service might provide data for a UI component that displays data in charts or tables, or receives input from a web form and makes modifications to the database accordingly. This chapter introduces the implementation and configuration of ServiceBlox services.

ServiceBlox is an extensible framework that comes with a few different types of services that should meet the needs of most applications:

  • Protocol buffer (protobuf) services are HTTP services that are invoked using an HTTP POST request. The request contains a binary protobuf or a textual JSON message. The service returns the protobuf or JSON result of the invocation as an HTTP response message. This type of service is similar to other service frameworks that resemble remote procedure calls, such as JSON-based services used in AJAX applications, SOAP and XML-RPC. In ServiceBlox, the schemata of the request and response messages are precisely specified by a protobuf protocol. Optionally, messages can be encoded as JSON strings, to support access from web browsers. The services can be accessed by any HTTP client, including browsers, curl or any other application that understands the HTTP protocol and is able to encode and decode protocol buffers or JSON messages.

  • Tabular data exchange (TDX) services are HTTP services that can be accessed by GET, POST, and PUT requests. TDX is the core service for getting data in and out of the LogicBlox database, in large volumes. TDX uses delimited files as the input/output data format. Data is retrieved from the database by means of GET requests. POST requests can be used to update the database data, and PUT requests to replace it. TDX services are typically used for integration purposes, for example for importing large volumes of sales data or for exporting large volumes of forecast data.

  • Global protobuf services are protobuf services that are implemented by distributing incoming requests to services hosted on other LogicBlox workspaces. The responses from the individual services are merged into a single response of the global service. Global services are useful when data needed for a service is stored in multiple, partitioned workspaces.

  • Proxy services act as a simple proxy for a service hosted on a different machine. Proxy services can be used to require authentication on top of existing unauthenticated services, or can be used to provide access to a distributed service-oriented system on a single host.

  • Custom services are supported as plugins to the ServiceBlox service container. Custom services must provide implementation to a set of ServiceBlox Java interfaces. Custom services have a great deal of flexibility, and are used internally to implement Tabular Data Exchange, Global, and Proxy services. However, they should be used very sparingly as they do complicate the deployment of your LogicBlox-based application. If you find yourself needing a custom service, we recommend that you contact LogicBlox support personnel first to explore all appropriate options before proceeding.

ServiceBlox supports services request/response via HTTP, where the service message as well as the payload are part of an HTTP message. Alternatively, for longer-running services, an asynchronous queue can be used in place of HTTP; for services with large payloads (e.g. importing/exporting a large delimited file), AWS S3 objects can be used to transfer the payload. Support for HTTP, as well as queues and S3, are built into ServiceBlox, and selecting the right mechanism for a given service is a matter of configuration.

ServiceBlox supports different authentication methods: some of these are appropriate for the relatively hostile environment of a browser, others for non-browser applications running in the controlled environment of a machine.

Chapter 25. Configuration

ServiceBlox is the component that functions as the interface between the LogicBlox platform and other applications and users. Requests to the ServiceBlox components are often initiated from remote locations such as user's web browser. Not surprisingly, there are many configuration options to allow efficient and secure (remote) access. An introduction and detailed description of the service configurations are available in the LogicBlox Administration Guide. In particular, the guide explains general configurations and config files; various methods for authentication; the suggested authorization schemes; configurations for transport methods such as regular TCP connections or AWS queues; and other advanced topics. In the following text, we explain how to write and configure the core service logic. For service configuration, authentication, etc., the admin guide should be consulted.

25.1. Service Metadata

Service metadata can be retrieved using the metadata service, accessible at /meta. This service returns a list of services hosted an a given lb-web instance.

Chapter 26. Protobuf and Global Protobuf Services

26.1. Implementing ProtoBuf/JSON Services

This section explains how to implement protocol buffer and JSON services in ServiceBlox. We illustrate this with a simple service that, given the name of a timezone, returns the current time in that timezone.

The first step in the implementation of a service is the definition of the protocol used between the client and the service. This protocol serves as documentation of the service, but can also be used to generate source code artifacts used by the client and the service. The protocol is specified as a protobuf schema (see the protobuf language guide for a detailed reference on protocol specifications).

For the time service, this protocol is:

package time;

message Request
{
  required string timezone = 1;
}

message Response
{
  optional string answer = 1;
  optional string error = 2;
} 

In JSON syntax, a request for the time in UTC is {"timezone" : "UTC"}. At the time of writing, the answer in JSON syntax would have been {"answer": "2012/11/13 00:19 +00:00"}.

Next, it is time to write the LogiQL rules for the actual implementation of the service. To use protobuf messages in LogiQL, a LogiQL schema must be generated from the protocol. Usually this is taken care of by the build system of a project (see Chapter 22). If we ignore some compiler directives, the generated LogiQL schema for the time protocol looks like this:

time:Request(x), time:RequestId(x:id) -> int(id).
time:Request:timezone[x] = y -> time:Request(x), string(y).

time:Response(x), time:ResponseId(x:id) -> int(id).
time:Response:answer[x] = y -> time:Response(x), string(y).
time:Response:error[x] = y -> time:Response(x), string(y).

When the ServiceBlox service container receives an HTTP request for a service, the server imports the protobuf message contained in the body of the HTTP request into the workspace that hosts the service. This request message is typically a pulse entity, which means that it does not persist in the workspace after the transaction. The import of the example request for the current time in UTC is equivalent to executing the following logic (we are using the syntax described in Chapter 20):

+time:Request(_) {
  +time:Request:timezone[] = "UTC"
}.

The service implementation consists of delta rules that trigger when a request entity element is created. To respond to the request, the delta rules create a message in the response protocol, which is then exported from the workspace by the ServiceBlox service container. This all happens in a single transaction. The server returns the response to the client in the body of an HTTP message. For the UTC example, the delta logic to create the literal response message would be:

+time:Response(_) {
  +time:Response:answer[] = "2012/11/13 00:19 +00:00"
}.

Of course, the actual implementation should trigger from the actual request, and also consider the current time. One common complication in the implementation of the service is that the logic must make sure to always return a response. To guarantee this, it is useful to introduce separate, intermediate predicates for the result of the request. In the following example, we introduced an answer predicate for this purpose. The first rule computes the answer for the given timezone request. The second rule populates a successful response, while the third rule generates an error message if no answer could be computed.

block(`time) {

  clauses(`{

    answer[req] = s -> time:Request(req), string(s).
    lang:pulse(`answer).

    // determine the answer for the requested timezone
    +answer[req] = s
      <-
      +time:Request:timezone[req] = tz,
      datetime:now[] = dt,
      datetime:formatTZ[dt, "%Y/%m/%d %H:%M %Q", tz] = s.

    // use constructor for creating a response message
    lang:constructor(`cons).
    lang:pulse(`cons).
    cons[req] = resp -> time:Request(req), time:Response(resp).

    // create the response message from the answer
    +cons[req] = resp,
    +time:Response(resp),
    +time:Response:answer[resp] = s
      <-
      +answer[req] = s.

    // create the error response message if there is no answer
    +cons[req] = resp,
    +time:Response(resp),
    +time:Response:error[resp] = "not a valid timezone: " + tz
      <-
      +time:Request:timezone[req] = tz,
      !+answer[req] = _.

  })

} <-- . 

ServiceBlox finds services to host by scanning workspaces for service configurations. A workspace can host an arbitrary number of services, each of which is defined by a service entity. For the timezone service, the configuration uses the subtype of service for protobuf services, called default_protobuf_service.

Note

The example below does not specify an HTTP method. If no method is specified (by service_by_prefix_and_method) then protobuf services will default to only answering HTTP POST requests.
block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:protobuf),
  alias_all(`lb:web:config:protobuf_abbr),

  clauses(`{

    service_by_prefix["/time"] = x,
    default_protobuf_service(x) {
      protobuf_protocol[] = "time",
      protobuf_request_message[] = "Request",
      protobuf_response_message[] = "Response"
    }.

  })

} <-- . 

26.1.1. Implementing services in inactive blocks

The implementation of the time service in the previous example was in an active block of logic, which means those rules will get triggered whenever a Request message is imported into the workspace, regardless of the service which imported this message. This may be a problem if you want to implement several services which share a Request message but interpret it differently.

This issue can be solved by implementing the logic which interprets the request in an inactive block which will only be executed for the specified service.

As an example let us imagine we want to implement a service which always returns the time in the EST timezone when no timezone is specified in the request. We do it in this block:

block(`est_time) {
  inactive(),
  clauses(`{

    answer[req] = s -> time:OptionalRequest(req), string(s).
    lang:pulse(`answer).

    // determine the answer for the requested timezone
    +answer[req] = s
      <-
      +time:OptionalRequest_timezone[req] = tz,
      datetime:now[] = dt,
      datetime:formatTZ[dt, "%Y/%m/%d %H:%M %Q", tz] = s.

    +answer[req] = s
      <-
      !+time:OptionalRequest_timezone[req] = _,
      +time:OptionalRequest(req),
      datetime:now[] = dt,
      datetime:formatTZ[dt, "%Y/%m/%d %H:%M %Q", "EST"] = s.

    // use constructor for creating a response message
    lang:constructor(`cons).
    lang:pulse(`cons).
    cons[req] = resp -> time:OptionalRequest(req), time:Response(resp).

    // create the response message from the answer
    +cons[req] = resp,
    +time:Response(resp),
    +time:Response_answer[resp] = s
      <-
      +answer[req] = s.


  })

} <-- . 

and we can configure a service to use this block with:

/**
 * Service that reports the current time in a certain
 * timezone or EST when no timezone is specified.
 */
service_by_prefix["/est-time"] = x,
default_protobuf_service(x) {
  inactive_block_name[] = "sample:est_time",
  protobuf_protocol[] = "time",
  protobuf_request_message[] = "OptionalRequest",
  protobuf_response_message[] = "Response"
}. 

This approach will allow us to implement several services using the same request format with implementations that would otherwise conflict. However, this also means that the code in these inactive blocks cannot be reused between services. Deciding whether to implement a service in active logic or inactive logic is a tradeoff between reusing code and minimizing coupling between services.

Similarly, services can use inactive blocks executed after maintenance has reached a fixpoint (see Section 19.4.1), as in this example.

    default_protobuf_service(x) {
      inactive_after_fixpoint_block_name[] = "sample:est_time_afp",
      protobuf_protocol[] = "time",
      protobuf_request_message[] = "OptionalRequest",
      protobuf_response_message[] = "Response"
    }.

26.1.2. Writing Automated Tests using Python

ServiceBlox ProtoBuf/JSON services are standard HTTP services, so in principle any HTTP service testing tool can be used. ServiceBlox comes with a small Python library of convenient abstractions to invoke services, and we recommend writing automated tests using this library.

  • The lb.web.admin.Client class allows access to the admin services of ServiceBlox. This can be convenient to isolate services from testsuites.

  • The lb.web.service.Client class allows a ProtoBuf request to be built and sent to the service. It dynamically builds the required Python classes from the descriptor that it fetches from the ServiceBlox admin services.

The Python client sends and receives binary protobufs by default. It can be used to test services with BINARY or AUTO encoding. JSON is supported at a lower level.

A simple Python testsuite needs the following imports:

#! /usr/bin/env python

import sys
import os
import unittest

sys.path.insert(0, '%s/lib/python' % os.environ.get('LOGICBLOX_HOME'))
sys.path.insert(0, '%s/lib/python' % os.environ.get('LB_WEBSERVER_HOME'))

import lb.web.testcase
import lb.web.service
import lb.web.admin

There are two main testcase class: lb.web.testcase.PrototypeWorkspaceTestCase and lb.web.testcase.TestCase. We generally recommend using the prototype workspace testcase, because it prevents interference between different tests. For truly stateless services, the simple TestCase class can be used and will be significantly faster.

A simple testsuite for the time service:

class TestTimeService(lb.web.testcase.PrototypeWorkspaceTestCase):

    prototype = "/workspace-name"

    def setUp(self):
        super(TestTimeService, self).setUp()
        self.client = lb.web.service.Client("localhost", 8080, "/time")

    def test_utc(self):
        req = self.client.dynamic_request()
        req.timezone = "UTC"
        response = self.client.dynamic_call(req)
        self.assertHasField(response, "answer") 

The lb.web.service.Client class also provides support for testing authenticated services. Notice that the cookie jar must be manually assigned to the service that requires authentication:

import lb.web.credentials

class AuthenticatedTestTimeService(lb.web.testcase.PrototypeWorkspaceTestCase):

    prototype = "/workspace-name"

    def setUp(self):
        super(AuthenticatedTestTimeService, self).setUp()
        self.client = lb.web.service.Client("localhost", 8080, "/atime")
        self.login_client = lb.web.service.Client("localhost", 8080, "/login")
        self.client.jar = self.login_client.jar

    def test_login_works(self):
        credentials_client = lb.web.credentials.Client()
        credentials_client.set_password("user", "password")
        self.login_client.login("user", "password", "time_auth")

        req = self.client.dynamic_request()
        req.timezone = "EST"
        response = self.client.dynamic_call(req)
        self.assertHasField(response, "answer") 

26.2. Implementing Global ProtoBuf/JSON Services

Global protobuf services support broadcasting a request to a set of other services (usually partitions of a partitioned database) and combining the results of the individual services into a single response of the global service.

As an example, we will use a database of products. Product data in retail planning applications is typically partitioned by product category, which means that it might not be possible to easily find all products that satisfy properties not related to product categories. Yet an application might have to support a search facility that allows us to find, for example, all products whose price is not below a given minimum.

The database schema for products:

block(`schema) {

  export(`{

    product(x), product_id(x:s) -> string(s).
    product_price[x] = v -> product(x), int(v).

  })

} <-- . 

The following protocol of the global service has a minimum price field on the request, and returns a list of products (as signified by the keyword repeated). The list of results is important here: the generic global protobuf service by default concatenates all results, which works particularly well for search services (less so for global aggregation services).

message SearchRequest
{
  required uint32 min_price = 1;
}

message SearchResponse
{
  repeated Product product = 1;
}

message Product
{
  required string description = 1;
  required uint32 price = 2;
} 

The services on the individual partitions of the distributed system can use the same protocol. The implementation of the local service is fairly straightforward. The first rule finds all products that match the search criteria, and creates protobuf Product messages for these products, collecting them in results. The second rule creates responses.

block(`search) {

  alias_all(`schema),

  clauses(`{

    lang:pulse(`results).
    results(p, req) -> Product(p), SearchRequest(req).

    lang:constructor(`cons).
    lang:pulse(`cons).
    cons[req] = resp -> SearchRequest(req), SearchResponse(resp).

    +results(x, req),
    +Product(x),
    +Product:description[x] = s,
    +Product:price[x] = actual
      <-
      +SearchRequest:min_price[req] = v,
      product_price[p] = actual,
      actual >= v,
      product_id[p] = s.

    +cons[req] = resp,
    +SearchResponse(resp),
    +SearchResponse:product[resp, i] = p
      <-
      +SearchRequest:min_price[req] = v,
      +results(p, req),
      +ProductId[p] = i.
  })

} <-- . 

The configuration of the global service is a bit more involved, because it specifies what services to target. The services to target are specified by URLs.

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:protobuf),
  alias_all(`lb:web:config:protobuf_abbr),
  alias_all(`lb:web:config:global_protobuf),
  alias_all(`lb:web:config:global_protobuf_abbr),

  clauses(`{

    service_by_prefix["/protobuf-global-search/search"] = x,
    global_protobuf_service(x) {
      protobuf_protocol[] = "search",
      protobuf_request_message[] = "SearchRequest",
      protobuf_response_message[] = "SearchResponse",

      global_protobuf_target_uri("http://localhost:8080/protobuf-global-search/partition/1"),
      global_protobuf_target_uri("http://localhost:8080/protobuf-global-search/partition/2")
    }.

  })
} <--. 

As an example, the following log illustrates a search across two partitions on a database of highly rated products on Amazon.

$ echo '{"min_price" : 30}' | lb web-client call 'http://localhost:8080/search'
-----------------  request (/search) -----------------
min_price: 30

----------------- response (/partition/1) -----------------
product { description: "Food Thermometer"         price: 97 }
product { description: "Gluten-free Pancake Mix"  price: 41 }
product { description: "Forehead Flashlight"      price: 32 }

----------------- response (/partition/2) -----------------
product { description: "Three Wolf Moon T-Shirt"  price: 35 }
product { description: "Portable Gas Grill"       price: 134 }

------------------- response (/search) --------------------
product { description: "Food Thermometer"         price: 97 }
product { description: "Gluten-free Pancake Mix"  price: 41 }
product { description: "Forehead Flashlight"      price: 32 }
product { description: "Three Wolf Moon T-Shirt"  price: 35 }
product { description: "Portable Gas Grill"       price: 134 }

Complete executable examples of global protobuf services are available in the lb-web-samples package (see protobuf-global-*).

Chapter 27. Data Exchange Services

27.1. Configuring Tabular Data Exchange Services

Tabular data exchange services (TDX) are HTTP services that offer delimited files for download to export data from LogicBlox workspaces (GET) and support uploading delimited files for importing data (POST/PUT). ServiceBlox provides a built-in handler for defining such tabular data exchange services at a very high level. This section describes how to configure and use these services.

To introduce TDX file services we will use a simple example of multi-dimensional sales data. Consider a workspace with the following schema defined for the hierarchy and measures.

block(`hierarchy) {
  export(`{
    sku(x), sku_id(x:s) -> string(s).
    store(x), store_id(x:s) -> string(s).
    week(x), week_id(x:s) -> string(s).
  })
} <--. 
block(`measure) {
  alias_all(`hierarchy),
  export(`{
    sales[x, y, z] = v -> sku(x), store(y), week(z), int(v).
  })
} <--. 

For this application the customer uses a delimited file for sales data, as in the following example.

SKU     | STORE       | WEEK | SALES
apples  | atlanta     | W1   | 10
oranges | atlanta     | W2   | 15
apples  | portland    | W1   | 20
oranges | portland    | W2   | 5

We shall define a tabular data exchange service to import data in this delimited file format to the sku, store, week and sales predicates. We shall also define a service to export from these predicates to a delimited file in this format.

A tabular data exchange service is defined by three parts:

  • File definition which defines the format of a file, such as header names, column formats, optional columns, and the delimiter character that is used.

  • File binding which specifies how columns in a delimited file are bound to predicates in the workspace. This file binding is a high-level, bi-directional specification, which means that it can be used for both the import and the export.

  • Service configuration which defines the service to be hosted by the ServiceBlox service container.

The ServiceBlox programming interface for defining these parts follows. The ServiceBlox handler for tabular data exchange services uses predicates in lb:web:delim namespaces. File definition is defined in lb:web:delim:schema and predicate binding is defined in lb:web:delim:binding. To avoid cluttering logic, and to make logic more readable, it is good practice to use aliases.

27.1.1. File Definition

A delimited file is defined by creating a lb:web:delim:schema:file_definition element and populating interface predicates, and then saving this by name in lb:web:delim:schema:file_definition_by_name. Example code:

block(`files) {
  alias_all(`lb:web:delim:schema),
  alias_all(`lb:web:delim:schema_abbr),

  clauses(`{
    file_definition_by_name["sales"] = fd,
    file_definition(fd) {
      file_delimiter[] = "|",
      column_headers[] = "SKU,STORE,WEEK,SALES",
      column_formats[] = "alphanum,alphanum,alphanum,integer"
    }.
  })
} <--. 
Required file definition interface settings
file_delimiter lb:web:delim:schema

Delimiter character.

column_headers lb:web:delim:schema_abbr

Comma separated list of file headers.

column_formats lb:web:delim:schema_abbr

Comma separated list of column formats (see the table below for supported column formats).

The order of the column names in column_headers must match the order of the column formats in column_formats but does not necessarily correspond to the order of the columns in the delimited file.

Optional file definition interface settings
file_mode lb:web:delim:schema

Specification of how to handle quoting and escaping (see File Modes).

file_columns_required lb:web:delim:schema_abbr

Comma-separated list of required columns. Will make all columns that are not listed optional.

file_column_required lb:web:delim:schema_abbr

Set the column with this header as required. Will make all columns that are not listed optional.

file_columns_optional lb:web:delim:schema_abbr

Comma-separated list of optional columns (i.e., missing values are allowed).

file_column_optional lb:web:delim:schema_abbr

Set the column with this header as optional (i.e., missing values are allowed).

file_columns_can_be_absent lb:web:delim:schema_abbr

Comma-separated list of columns that can be optionally absent (i.e., missing values are allowed, and, moreover, the entire column may be missing from the file on imports).

file_column_can_be_absent lb:web:delim:schema_abbr

Set the column with this header to be optionally absent (i.e., missing values are allowed, and, moreover, the entire column may be missing from the file on imports).

file_column_format lb:web:delim:schema_abbr

Set the format of a column by its header.

column_description lb:web:delim:schema

A textual description for a column. This information is displayed by the meta-service and can be used to document file definitions.

The file_column_format setting provides an alternative way to specify column formats. For example,

file_column_formats["SKU"] = "alphanum"

specifies that the format of the SKU column is alphanumeric. The column_formats setting may be omitted, provided that file_column_format is used to specify the format of each column listed in the column_headers setting.

The following table lists the currently supported formats. TDX will enforce these formats. During an import, it is an error if a data value does not conform to the specified format (e.g., if -1 is bound to a positive integer column). During exports, if data in the predicate does not conform to the format, it is simply filtered out from the exported file.

Supported column formats
Syntax Description Details
alphanum

Alphanumeric string that maps to a string.

The string is trimmed. Only letters and numbers are valid.

string

A string that must not be empty.

The string is trimmed. Must not be empty after trimmed.

string*

A possibly empty string.

The string is trimmed. Can be empty after trimmed. Cannot be used for an optional column.

raw_string

A string that is not processed by TDX.

The string is not trimmed. Can be empty. Cannot be used for an optional column.

char

A string that has a single character.

The string is trimmed. Can be the empty character. Must have a single character.

uuid

A Universally Unique Identifier (UUID) string.

The string is trimmed. Only UUIDs are valid.

integer

Integer number.

0+

Non-negative integer.

1+ or >0

Positive integer.

decimal

Decimal number.

0.0+

Non-negative decimal.

>0.0

Positive decimal.

float

Floating-point number.

0.0f+

Non-negative float.

>0.0f

Positive float.

boolean(t;f)

Boolean value.

The literals t and f are case-insensitive specifications of the expected formats for true and false. For example, this could be (1;0), (t;f) or (true;false). Values that are different from these two options are considered invalid.

datetime(format)

Datetime value.

The value is serialized to and from string with datetime:formatTZ and datetime:parse using the format string, e.g., datetime('%m/%d/%y'). See documentation on built-ins.

date(format)

Date value.

The value is serialized to and from string with datetime:formatTZ and datetime:parse using the format string, e.g., date('%m/%d/%y'). See documentation on built-ins. Imports create datetime objects in the UTC timezone.

27.1.1.1. Format validations

Additional validations can be added to most formats, as illustrated in the code below:

  file_definition_by_name["sales"] = fd,
  file_definition(fd) {
    file_delimiter[] = "|",
    column_headers[] = "SKU,SALES",
    column_formats[] = "string([a-zA-Z0-9]*),float(>=0; <=20; precision 2)"
  }.

In the example above, the string column defines a regular expression [a-zA-Z0-9]*, indicating it only accepts alphanumeric characters. We also have a float column defining three validations, separated by semicolon:

  • >=0: indicating it must be non-negative.
  • <=20: indicating it must not be greater than 20.
  • precision 2: indicating this float number must not have more than 2 digits after the decimal point.

The following table lists currently available validations:

Format Description
>

Greater Than (example >0). Accepted in any numerical format.

>=

Greater Than or Equal To (example >=0). Accepted in any numerical format.

<

Less Than (example <0). Accepted in any numerical format.

<=

Less Than or Equal To (example <0). Accepted in any numerical format.

precision

Maximum number of digits after decimal point. Example precision 2 will accept 3, 3.1 and 3.14, but reject 3.141. Accepted in float and decimal formats. Note: Integer numeric types will not reject this validation, but it will have no effect.

Regular Expression

If the validation does not match any of the above, it defaults to a regular expression. Regular expressions are accepted in all formats, except boolean, date, datetime and uuid. Regular expressions are strings and must be escaped as any String Literal. The following example defines a string with a simple regular expression:

  file_definition_by_name["sales"] = fd,
  file_definition(fd) {
    file_delimiter[] = "|",
    column_headers[] = "SKU,SALES",
    column_formats[] = "string([a-zA-Z0-9]*),float"
  }.

Special attention is needed if the regular expression includes comma (,) or semicolon (;) characters, which are not allowed in the above notation. If needed, a regular expression with these characters must be defined using the format_regex auxiliar predicate and referenced in the format, like illustrated below:

  file_definition_by_name["sales"] = fd,
  file_definition(fd) {
    file_delimiter[] = "|",
    column_headers[] = "SKU,SALES",
    column_formats[] = "string(exp1),float",
    format_regex["exp1"] = "[a-zA-Z]{0,9}"
  }.

27.1.1.2. Optional Columns

TDX provides support for dealing with missing values, such as the following file:

SKU     | STORE       | WEEK | YEAR | SALES
apples  | atlanta     | W1   | 2012 |
oranges | atlanta     | W2   | 2012 | 15
apples  | portland    | W1   | 2012 |
oranges | portland    | W2   | 2012 | 5

In TDX, a column can be "required", "optional", or "possibly absent". If a column is required, this means that it must be present in the file, and every row must specify a value for that column. If a column is optional, this means that it must be present in the file, but some rows may have a missing value for that column. If a column is allowed to be absent, this means that missing values are allowed, and, furthermore, the entire column may be absent from the file on imports.

By default, all columns mentioned in the file definition are required columns. To change this, we can use the file_columns_required or the file_columns_optional predicate, as shown below. When the file_columns_required predicate is used, all columns not specified in that predicate are treated as optional columns. When the file_columns_optional predicate is used, all columns not specified in that predicate are treated as required columns.

file_definition_by_name["sales"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SKU,STORE,WEEK,SALES",
  column_formats[] = "alphanum,alphanum,alphanum,integer",
  file_columns_required[] = "SKU,STORE,WEEK"
}.

Possibly absent columns are specified using file_columns_can_be_absent, as in the following example:

file_definition_by_name["sales-returns"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SKU,STORE,WEEK,SALES,RETURNS",
  column_formats[] = "alphanum,alphanum,alphanum,integer",
  file_columns_can_be_absent[] = "SALES,RETURNS"
}.

In this example, the SALES and RETURNS columns may or may not be present in the file. If they are present, then, in any given row, a value may but need not be given for each of these columns.

27.1.1.3. File Modes

TDX allows the customization of how data values are quoted and escaped in import and export services. The file mode specification consists of a string with zero or more of the following entries.

Supported file mode entries
Syntax Import Export
raw

Import text as is (will break if string contains delimiter or line break).

Export text as is (will create bad csv when string contains delimiter or line break).

quote=c

Parse c as the quote character (c must be a single character, but not a whitespace character).

Use c as quote character.
unix

Remove quotes, unescapes all escaped characters, i.e \n\rt and the quote.

Add quotes, escapes all characters for which it is needed.

excel

Removes quotes and replaces double quotes with quotes.

Adds quotes, double quotes quote character and escapes \n\r\t .

Escaping is applied upon export, and unescaping upon import. In the unix style, escaping is done by prefixing the quote character, line-feed and carriage-return with a backslash (e.g., "foo"bar" is replaced by "foo\"bar"). The backslashes are removed on imports. In excel mode, escaping is done by doubling the quote character (e.g., "foo""bar").

The default file mode is unix quote=" which means that by default Unix escaping rules are used, and values are enclosed in double quotes. That is, imports require values to be either not quoted or quoted with ", and exports always quote with ". For example, a file with a single string column named X behaves like the following by default:

// import
X
foo
"bar"

// export
X
"foo"
"bar"

This behavior can be changed statically with the file_mode attribute. For example, the following definition will export the file with ' as the quote character.

file_definition_by_name[name] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  file_mode[] = "quote='",
  column_headers[] = "X",
  column_formats[] = "string"
}.

This allows the following interaction (note that double quotes are now simply part of the record):

// import
X
foo
"bar"
'baz'

// export
X
'foo'
'"bar"'
'baz'

By setting the file_mode to raw, no quotes are added on export, and import values are treated completely as part of the record:

// import
X
foo
"bar"
'baz'

// export
X
foo
"bar"
'baz'

Finally, it is possible to override the static default by using a URL parameter to the TDX service:

// POST to /file?tdx_file_mode="raw"
X
foo
"bar"
'baz'

// GET to /file?tdx_file_mode="quote=*"
X
*foo*
*"bar"*
*'baz'*

27.1.2. File Binding

The file binding for a delimited file is defined by creating a lb:web:delim:binding:file_binding element, populating the interface predicates, and then saving this by name in lb:web:delim:binding:file_binding_by_name. Here is an example that shows a basic binding to one predicate:

block(`server_init) {
  alias_all(`lb:web:delim:binding),
  alias_all(`lb:web:delim:binding_abbr),

  clauses(`{
    file_binding_by_name["sales"] = fb,
    file_binding(fb) {
      file_binding_definition_name[] = "sales",
      predicate_binding_by_name["measure:sales"] =
        predicate_binding(_) {
          predicate_binding_columns[] = "SKU,STORE,WEEK,SALES"
        }
    }.
  })
} <--.

The predicate_binding_columns is used to map file column names onto the attributes of the predicate, in order. In this example, columns "SKU,STORE,WEEK,SALES" map to a predicate defined as measure:sales[sku, store, week] = sales. Some of the predicate attributes, such as the keys in this example, may have entity types, provided that the entity type has a reference-mode predicate. In this case, the values in the associated file column are treated as refmode values for the entity type in question (see Section 27.1.5 below).

The following table shows how TDX maps primitive LogiQL types into column formats. For each primitive type, the table lists the compatible column formats. For example, int primitives can be bound directly onto integer, 0+ and >0 columns, but int128 primitives can only be bound onto uuid columns. When the TDX generator encounters incompatible types it attempts to convert the primitive into some compatible primitive using standard from:to:convert predicates. The last column of the table shows the conversions supported by TDX. For example, an int primitive bound to a float column will be first converted using int:float:convert, and an int128 bound to uuid will be converted with int128:string:convert.

Supported column format bindings and primitive conversions
Primitive LogiQL type Column formats Primitive conversions
int integer, 0+, >0 decimal, float, string
float float, 0.0f+, >0.0f decimal, int, string
decimal decimal, 0.0+, >0.0 float, string
string raw_string, string, string*, alphanum, char, uuid boolean, datetime, decimal, float, int, int128
boolean boolean string
datetime date, datetime string
int128 uuid string

In general, a file binding can have any number of associated predicate bindings. When multiple predicate bindings are specified, then each predicate gets populated independently on import; on export, the predicates are joined to give rise to rows of the output file. See the section on import and export below for more details.

The above example binding will support import to the sales predicate assuming that entity elements already exist in sku, store, and week. It is common to optionally add elements to all entity types, this is defined by populating the lb:web:delim:binding_abbr:file_binding_entity_creation predicate. Example:

file_binding(fb) {
  ...
  file_binding_entity_creation[] = "accumulate",
  ...
}

File binding configurations may apply to all predicate bindings, as in the above entity creation example, or to individual predicate bindings.

Required file binding settings
file_binding_definition_name

Specifies the file definition that this file binding is associated with.

predicate_binding_by_name

Specifies predicate bindings associated with this file binding.

Optional file binding settings
file_binding_predicate_columns lb:web:delim:binding_abbr

Comma-separated list of column headers. Applies to all predicate bindings for this file binding.

file_binding_entity_creation lb:web:delim:binding_abbr

Set entity creation for all predicate bindings of a file binding. The supported values are:

  • none: do not create entity elements. This is the default value if no entity creation is specified.
  • accumulate: add new elements that did not exist previously.
  • ignore: skip lines with non-existing entities. This can be useful for partitioned deployments but can be dangerous as well, as there is no way to distinguish typos from entities that are not on that partition.

If entity creation is configured on the file binding, then it is recursively applied to all predicate bindings in this file binding. The setting on the predicate binding will recursively apply to all column bindings of the predicate binding.

file_binding_column_entity_creation lb:web:delim:binding_abbr

Default entity creation from the file binding to one specific argument. Applies to all predicate bindings for this file binding.

file_binding_ignore_idb_on_import lb:web:delim:binding

Specifies that predicate bindings that bind IDB predicates should be ignored on imports. Importing into IDBs is not possible because IDBs cannot be directly populated. This allows binding to IDBs that contain information to join on exports while reusing the file binding for imports into the remaining predicate bindings.

file_binding_import_default_value lb:web:delim:binding

By default, the value that represents an empty optional column on imports is the empty string. This option allows the definition of a different string to be deemed the empty value. On imports, an optional column with either the empty string or the default value is deemed missing and will not be imported. This allows the definition of "zero-stripping" policies to ignore import rows with certain values.

file_binding_export_default_value lb:web:delim:binding

By default, the value that represents an empty optional column on exports is the empty string. This option allows the definition of a different string to be deemed the empty value. On exports, an optional column without a value will be exported with the default value instead of with an empty string. This allows the definition of "zero-filling" policies.

file_binding_default_value lb:web:delim:binding

A shorthand to apply the same value to both file_binding_import_default_value and file_binding_export_default_value.

The index argument for options that apply to a particular index is a zero-based index to the arguments of a predicate. For example, to enable entity creation for sku and store entities use:

file_binding(fb) {
  ...
  file_binding_column_entity_creation[0] = "accumulate",
  file_binding_column_entity_creation[1] = "accumulate",
  ...
}
Options applying to individual predicate bindings
predicate_binding_columns lb:web:delim:binding_abbr

Comma-separated list of column headers. Multiple columns can be combined to one column header by a semi-colon. A column-binding transformation must be provided to combine multiple columns to one value (see below).

predicate_binding_entity_creation lb:web:delim:binding_abbr

Set entity creation for all column bindings. See file_binding_entity_creation for the supported values. The setting on a predicate binding overrides the setting on the file binding, and will recursively apply to all column bindings of the current predicate binding.

predicate_binding_export_only lb:web:delim:binding

Specifies that this predicate binding should be applied only for exports, and should be ignored for imports. This is useful to specify export filters which do not apply for imports. See Section 27.1.5 below for more details.

predicate_binding_import_only lb:web:delim:binding

Specifies that this predicate binding should be applied only for imports, and should be ignored for exports. This is useful to populate auxiliary predicates that cannot provide a bi-directional transformation. See Section 27.1.5 below for more details.

predicate_binding_no_retraction_on_post lb:web:delim:binding

The default behavior on POST requests for predicate bindings that bind optional columns is to retract existing values for the keys if the value column is left empty. This behavior may be undesirable in certain circumstances. This flag can then be used to specify that empty optional columns should not cause a retraction on POSTs.

predicate_binding_filter lb:web:delim:filter

A filter to be applied on imports (use export_only predicate bindings to filter exports). The filter must be a predicate with the same arity as the columns being filtered. Only values that exist in the filter are imported on POSTs or PUTs. Furthermore, on PUTs, only values in the filter are retracted if they are not present in the new file.

column_binding_by_arg lb:web:delim:binding

Column binding to describe more precisely how column(s) from a delimited file map to an argument of a predicate.

Column binding options
column_binding_import_function, column_binding_export_function lb:web:delim:binding

Transformation functions to apply to value(s) of this column binding, see Section 27.1.2.1 below.

column_binding_entity_creation lb:web:delim:binding

Set entity creation for this column binding. See file_binding_entity_creation for the supported values.

27.1.2.1. Transform Functions

When importing a delimited file, it is often necessary or convenient to combine column values, or to perform a modification of column values prior to inserting them into predicates in a workspace. Similar manipulations may also be called for when exporting to a delimited file. TDX allows developers to specify column binding transform functions to be applied during imports and/or exports.

In the following we will introduce transform functions by means of examples. Suppose that we want to import this simple sales file and that the application model is the one shown in the following listing.

SKU     | STORE       | WEEK | YEAR | SALES
apples  | atlanta     | W1   | 2012 | 10
oranges | atlanta     | W2   | 2012 | 15
apples  | portland    | W1   | 2012 | 20
oranges | portland    | W2   | 2012 | 5
sku(x),   sku_id(x:id)   -> string(id).
store(x), store_id(x:id) -> string(id).
week(x),  week_id(x:id)  -> string(id).

sales[sku, store, week] = sales -> sku(sku), store(store), week(week), integer(sales).

Now suppose that there are small mismatches between the file format and data, and the schema of our application. For example, we may want to have names of all stores in upper case characters. One possible way to transform the import file is to bind the STORE column to the refmode of sample:sales:store entities, and apply the string:upper function upon imports. This can be specified with the following file binding:

...
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["sample:sales:store"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "STORE",
      column_binding_by_arg[0] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper"
        }
    }
}.
...

Note that import functions bind the values from columns to their keys, and produce a value that is then stored in the predicate being bound. This binding is thus roughly equivalent to applying string:upper[STORE] = id and then using id as the refmode of a store entity.

This file binding would create stores, but we are also interested in creating sales values. The problem now is that in the workspace weeks are identified by a concatenation of year and week id (for example, '2012-W1') The solution is to define a concat_dash function that concatenates two strings with a dash ('-'). Then, we apply this function to the values of the YEAR and WEEK columns, and use the resulting value as the refmode of the week-of-year entity:

concat_dash[s1, s2] = s1 + "-" + s2.
lang:derivationType[`concat_dash]="Derived".

...
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["sample:sales:sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU, STORE, YEAR;WEEK, SALES",
      column_binding_by_arg[2] =
        column_binding(_) {
          column_binding_import_function[] = "sample:sales:services:concat_dash"
        },
      column_binding_by_arg[1] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper"
        }
    }
}.
...

The above file binding will concatenate YEAR and WEEK as well as apply string:upper to the STORE column. Note that column_binding_by_arg is indexed by the components in predicate_binding_columns, which are separated by commas (,). If a component contains multiple columns, because the function contains multiple arguments, they are separated by semicolons (;). Also note that any function can be used, including built-ins (such as string:upper), derived and materialized functions.

This file binding supports importing files but it would fail for exports. The problem is that we need to specify the inverse of concat_dash, that is, how to decompose the week-of-year entity refmode to export WEEK and YEAR columns. But before attending to this issue, let us see how we could extend the binding to export the names of store entities in lower case. This is accomplished by using the inverse export function, string:lower

    ...
      column_binding_by_arg[1] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper",
          column_binding_export_function[] = "string:lower"
        }
...

Note that export functions are applied with respect to the predicate being bound. This binding is thus roughly equivalent to applying store_id(x:id), string:lower[id] = STORE. This also means that we need inverse export functions when exporting to multiple columns, as in the case of WEEK and YEAR, because we will have to bind the entity refmode to the value of the function, and the multiple columns to the keys. So to implement our requirement, we define a split_dash function that splits a string into two strings by the '-' character, and then apply this function as an inverse to the binding. The resulting code for our example is the following:

concat_dash[s1, s2] = s1 + "-" + s2.
lang:derivationType[`concat_dash]="Derived".

split[s1, s2] = s -> string(s1), string(s2), string(s).
split[s1, s2] = s <-
  string:split[s, '-', 0] = s1,
  string:split[s, '-', 1] = s2.
lang:derivationType[`split2]="Derived".

...
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["sample:sales:sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU, STORE, YEAR;WEEK, SALES",
      column_binding_by_arg[2] =
        column_binding(_) {
          column_binding_import_function[] = "sample:sales:services:concat_dash",
          column_binding_export_function_inverse[] = "sample:sales:services:split_dash"
        },
      column_binding_by_arg[1] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper",
          column_binding_export_function[] = "string:lower"
        }
    }
}.
...

27.1.2.2. Predicate binding helpers

Adding alias_all(`lb:web:delim:tdx_helpers) to the service will expose the helper predicates below to use for TDX specification:

Predicate setting helper Description Example
binds_pred

Associate a predicate with a list of comma-separated column headers.

binds_pred["product:sku:label"] = "SKU_NBR,SKU_DESC".
accumulate_entity

Set entity creation for a column.

accumulate_entity("CLASS").
decumulate_entity

Unset entity creation for a column.

decumulate_entity("SKU").
export_only

Only bind this predicate when exporting.

export_only("product:sku").
import_only

Only bind this predicate when importing.

import_only("product:sku:imported").
transform_col_import

Transformation function to apply to column(s) on import.

transform_col_import("SHRINK_AMT", "string:upper").
transform_col_export

Transformation function to apply to column(s) on export.

transform_col_export("SHRINK_AMT", "string:lower").
transform_col

A set of two transformation functions to apply to column(s), first one on import and the second one on export.

transform_col("SHRINK_AMT", "string:upper", "string:lower").

The example below illustrates a file definition using TDX helpers.

block(`product) {

  alias_all(`lb:web:delim:tdx_helpers),
...

  clauses(`{
...

  fb(fb),
  file_binding_by_name[product_file_definition_name[]] = fb,
  file_binding(fb) {
    file_binding_definition_name[] = product_file_definition_name[],
    accumulate_entity("SKU_NBR"),
    binds_pred["product:sku:label"] = "SKU_NBR,SKU_DESC",
    binds_pred["product:sku"] = "SKU_NBR",
    binds_pred["product:sku:imported"] = "SKU_NBR",
    export_only("product:sku"),
    import_only("product:sku:imported"),

  }.

  })
} <-- . 

27.1.3. File Row Offsets

TDX exposes a file offset as a special (reserved) column header named TDX_OFFSET, which can then be used in predicate bindings. For example, in this code we bind the sales file, which has a single column, to the predicate measures:sales, which is a functional predicate from int to int.

sales[x] = v -> int(x), int(v).

file_definition_by_name["sales"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SALES",
  column_formats[] = "int"
},
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  predicate_binding_by_name["measures:sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "TDX_OFFSET, SALES"
    }
}.

Upon import, the offset of the rows in the file will be bound to the TDX_OFFSET column and then used to populate measure:sales. Upon export, the value in the column bound to TDX_OFFSET is ignored (equivalent to measures:sales[_] = SALES).

TDX_OFFSET works as any integer column: it can be bound to an entity by refmode, accumulated, be subject of transformation functions, etc. The value of the offset is the number of bytes from the beginning of the file up to the row being imported and, therefore, guaranteed to be unique and monotonically increasing in the file being imported, but not guaranteed to be continuous.

27.1.4. Service Configuration

A tabular data exchange service is configured with ServiceBlox by creating a delim_service. Example:

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:delim),

  clauses(`{

    service_by_prefix["/sales"] = x,
    delim_service(x) {
      delim_file_binding[] = "sales"
    }.

  })

} <--.

27.1.5. Import and Export

Import is performed as follows. If one of the columns mentioned in the file definition is missing from the input file, and the column was not specified as possibly absent, the input file is rejected. The file may contain additional columns that are not referred to in the configuration, and these columns are simply ignored. The order in which columns appear in the input .csv file is irrelevant: columns are identified by column name, as listed in the file header.

If a row of the file has a missing value in one of the required columns, or if the value in some column is not of the required type, the row is considered invalid (see Section 27.1.5.1 below for more detail on error handling). If the file binding includes predicate bindings that are filters, then all rows that do not pass the filter are ignored. For the remaining rows, row import is performed for each predicate binding that is not export_only and that involves only columns for which the current row has no missing optional values. Predicate bindings that bind IDBs are ignored (if file_binding_ignore_idb_on_import) or cause an error. Implicit conversions (such as from strings to numeric values) are performed where needed. If the predicate binding involves an import function that cannot be successfully applied to the given column values, the row is considered invalid. In particular, if one of the predicate attributes has an entity type and a value does not occur in the associated refmode predicate, either a new entity is created with the given refmode value (if entity creation is enabled via the accumulate policy), or an error message is generated (if entity creation is not enabled).

Export is performed as follows. Predicate bindings that are import_only are ignored, as well as all predicate bindings that are filters. The remaining predicate bindings are handled as follows: for each predicate that is associated with a predicate binding, a set of partial rows is produced (partial, because they only have the columns that occur in the predicate binding). This may involve applying export functions if present, and refmode lookups in case of entities. Next, a join is applied to these sets of partial rows, for all the predicates whose associated predicate binding refers only to required columns of the CSV file. Finally, if there are any predicates whose associated predicate binding refers to one or more optional columns of the .csv file, then the partial rows produced from those predicates are used (via an outer join) to provide potentially missing values for the optional columns.

27.1.5.1. Invalid Records

Records that failed to import are reported back to the user. The format of the report is the same as in the import file, including headers, plus two additional columns that indicate the error (see Section 27.1.5.1.1). If no records failed to import, the server returns a file with only the headers. This feature is enabled by default.

Returning error records does have a small performance penalty as the server must write and then return the file containing the bad records. If for some reason you wish to disable the feature, you can specify the --ignore-bad-records flag on lb web-client or simply do not specify an output_file or output_url in the batch. See Section 27.1.5.1.2 for how to disable when accessing via HTTP.

27.1.5.1.1. Causes

The resulting data that reports which records were not imported will contain all the columns of the original import plus additional two columns describing why the records were not imported. The first column CAUSE, will contain a human readable string such as "'SKU' is a required column.". The last column is CAUSE_CODE which will contain a constant string value of the error type for easy parsing. Here are the error codes, along with their descriptions.

Error Code Description
REQUIRED_COLUMN

A column defined as required is missing from the file.

WRONG_FORMAT

An import data value is invalid with respect to the column format.

DOES_NOT_EXIST

An entity or record referenced by the data does not exist.

MALFORMED_ROW

The row could not be parsed because it contains invalid characters or a different number of columns than the rest of the file.

FAILED_FUNCTION_APPLICATION

An import function application resulted in an undefined value.

FAILED_PRIMITIVE_CONVERSION

A primitive conversion resulted in an undefined value. Conversions are used when adjusting import values for import functions and refmode types.

27.1.5.1.2. Accessing via HTTP

If you are not using lb web-client or batch, you can still enable or disable this feature by using the return_errors query string parameter. Since the feature is enabled by default, simply accessing the URL as normal will return the bad records. If you do not wish to return the error records, set the tdx_return_errors query parameter equal to 1 or true. Also, on_error can be used to make the server return the error file only if import errors were detected.

27.1.5.1.3. Partial Imports and Aborting Transactions

TDX by default aborts an import transaction if any row fails to import. If invalid records are requested, these are still returned, even if the transaction aborts. You can optionally configure a service to allow partial imports, i.e., import good records and ignore bad records. You can configure this option by service or by individual request.

To configure a service to default to allow partial imports, you must assert to the lb:web:config:delim:allow_partial_import predicate. This will default all transactions for the configured service to import valid rows even if they receive error messages. This setting can still be overwritten by individual requests via the following methods, all of which take precedence over the lb:web:config:delim:allow_partial_import predicate.

If you are using a batch configuration, you can set the abort_on_error field of the ImportDelim message; if you are using the lb web-client command line tool, you can specify either the --abort flag or the --allow-partial-import flag; finally, if you are using raw HTTP, you can set the tdx_allow_partial_import query parameter equal to 1 or true to allow, 0 or false to abort.

27.2. Dynamic Tabular Data Exchange Services

While TDX services allows us to statically configure a file binding and thus create one service for each file binding, we can also import and export delimited files by specifying the file binding as a request parameter named file_binding.

To host the dynamic tabular data exchange service we use the dynamic_delim_service predicate, as shown below. This configuration allows us to export or import delimited files by accessing /delim?file_binding=..., where ... is a JSON representation of the FileBinding protobuf message.

service_by_prefix["/delim"] = x,
dynamic_delim_service(x)
  <- .

27.2.1. Building FileBinding Messages

A FileBinding message describes the file structure that we want to export (FileDefinition) and how to build the file from different predicates (PredicateBinding). The optional entity_creation field allows you to set the default entity creation policy for all columns of all predicate bindings.

message FileBinding
{
  required FileDefinition file = 2;

  repeated PredicateBinding binding = 3;
  optional string entity_creation = 4;
}

The FileDefinition describes the format of the files and the options on the columns. If the required field is empty then all columns are considered required (unless otherwise specified). Conversely, if the required field is not empty, all columns that are not mentioned in this field are considered optional.

message FileDefinition
{
  required string delimiter = 2;
  required string column_headers = 3;
  required string column_formats = 4;

  optional string file_columns_required = 5;
  optional string file_columns_optional = 6;
  optional string file_columns_can_be_absent = 7;
  optional string mode = 8;

}

Finally, a PredicateBinding specifies how predicate columns bind to the columns in the file definition.

message PredicateBinding
{
  required string predicate_name = 1;
  required string predicate_binding_columns = 2;
  repeated ColumnBinding column = 3;

  optional bool export = 4 [default = true];
  optional bool import = 5 [default = true];
  optional bool no_retraction_on_post = 11 [default = false];

  optional string entity_creation = 9;

  repeated Filter filter = 10;
}

Example 27.1. 

The following JSON file definition message describes a file with three columns, "PERSON", "FATHER", and "MOTHER".

"file": {
  "delimiter": "|",
  "column_headers": "PERSON,FATHER,MOTHER",
  "column_formats": "alphanum,alphanum,alphanum"
} 

The following two PredicateBinding messages specify how to build the PERSON, FATHER, and MOTHER columns from predicates person_father and person_mother:

"binding":[ {
    "predicate_name": "person_father",
    "predicate_binding_columns": "PERSON,FATHER"
  },
  {
    "predicate_name": "person_mother",
    "predicate_binding_columns": "PERSON,MOTHER"
  }] 

Combining the messages together, we build the file binding message that should be sent as the file_binding parameter to the dynamic delimited file service:

{
    "file": {
      "delimiter": "|",
      "column_headers": "PERSON,FATHER,MOTHER",
      "column_formats": "alphanum,alphanum,alphanum"
    },
    "binding": [
      {
        "predicate_name": "person_father",
        "predicate_binding_columns": "PERSON,FATHER"
      },
      {
        "predicate_name": "person_mother",
        "predicate_binding_columns": "PERSON,MOTHER"
        }]
}

Chapter 28. Implementing Custom Services

Custom services must be implemented in Java as implementations of the ServiceBlox Handler interface. The Handler interface is similar to the standard HttpServlet class.

28.1. Custom ProtoBuf Services

Custom ProtoBuf services have a ProtoBuf interface, but process the JSON or ProtoBuf messages in a different way than by importing them straight into a LogicBlox workspace, as the normal ProtoBuf services do. A few abstractions are available to help with the implementation of such services.

General Custom ProtoBuf Services

The ProtoBufHandler class helps with the input and output aspects of protobuf services. It was written with no assumptions about what is done with the messages. This class implements support for gzip compression and for handling JSON-formatted messages. It also logs requests and responses based on the configuration of the server, and handles error reporting of incorrect messages.

All ProtoBuf services should use this abstraction.

Subclasses of ProtoBufHandler must implement the method handle(Exchange, ProtoBufExchange). The ProtoBufExchange class is used to manage the parsing of the request message and communication of the response message. The goal of the ProtoBufExchange class is to make sure that messages do not get parsed repeatedly. The subclass of ProtoBufHandler can obtain the request message using protoExchange.getRequestMessage(). In return, the subclass is required to set the response message on the ProtoBufExchange. The response can be set in two ways:

protoExchange.setResponseMessage(msg)
protoExchange.setResponseBytes(bytes)

If implementations work with Message objects, then the preferred way of setting the response is via setResponseMessage, because this will avoid having to parse the bytes again in case a JSON response is needed, or if the message has to be logged. The setResponseBytes variant is preferred if subclasses do only a binary serialization of the message, since this will avoid parsing the message when it does not have to be logged and the response is not formatted as a JSON message.

By default, all Message objects are instances of DynamicMessage. The DynamicMessage objects are created using the descriptors that are in the workspace. This means that the message cannot be cast to classes generated by the ProtoBuf compiler (protoc). To address this, implementations can override two more methods on ProtoBufHandler:

protected Message.Builder getRequestBuilder()
protected Message.Builder getResponseBuilder()

This will help in the performance of the service, but it will also allow the subclass to cast messages to the generated message classes.

ProtoBuf Services using ConnectBlox

AbstractProtoBufHandler, which extends ProtoBufHandler, implements support for ProtoBuf services that are implemented by executing ConnectBlox requests. The subclasses determine what actual ConnectBlox requests to execute. This abstraction helps with the correct execution of ConnectBlox requests, handling of errors that might be triggered by the ConnectBlox request, and instrumenting ConnectBlox requests to handle correlation with database logs and monitoring predicate changes. Implementations based on AbstractProtoBufHandler must implement two methods: buildTransaction, to construct the ConnectBlox transaction to execute, and buildResponse, to extract a ProtoBuf response from a ConnectBlox response.

ProtoBuf services that use ConnectBlox should use this abstraction.

Chapter 29. Asynchronous Service calls

ServiceBlox allows for asynchronous service calls, so long running transactions can be processed in the background, with no need for an open TCP connection between client and server.

To asynchronously call a service, the client should send it a regular request, adding the lb_async=true parameter to the URL. When receiving an asynchronous request, ServiceBlox will start the background processing, and immediately send a response with HTTP Status 201 (Created), and the polling URL in the Location header.

Note

Depending on the service configuration, asynchronous handling can be the default mode. In this case, the above parameter is not required for asynchronous requests, and lb_async=false should be sent if a synchronous request is desired.

The client should then periodically send GET requests to the polling URL to query the execution status. Invocations to this URL will return HTTP 200 (Ok) if the background processing is still executing, or 303 (See Other) when the execution is complete. In the latter, the Location header will indicate the final response URL. The client should then send a GET request to this URL to get the execution result.

After retrieving the final response, the client should send a HTTP DELETE request to the final response URL. This will free up memory and disk resources.

Alternatively, if the client wants to abort an execution that is still running, a HTTP DELETE request can be sent to the polling URL.

Note

In order to accept asynchronous calls, the service should be configured accordingly. To see more details on how to enable a service to receive asynchronous calls, please check the LogicBlox Administration Guide.

29.1. Asynchronous call return codes

When handling an asynchronous request, different HTTP Status codes can be returned from the server. Below is the list of Status code for each scenario:

Polling
Status Code Description
404 The id in the polling URL was not found in the cache. This indicates that either the id is wrong, or the processing has been removed from the cache.
400 The background execution has been aborted as per client request.
303 The background execution has completed (either successfully or with an error).
200 The asynchronous request is still executing.

Final Response
Status Code Description
404 The background execution is still executing, or, the provided Id was not found in cache. The latter indicates that either the id is wrong, or the processing has been removed from cache.
424 The execution has been aborted as per client request.
Others The status code returned by the target service (note that this could potentially be one of status codes listed above).

Clean up (DELETE on Final response URL)
Status Code Description
410 Indicate there is nothing to delete for the provided id.
200 Resources cleaned up.

Abort
Status Code Description
404 The provided id was not found in cache. This indicates that either the id is wrong, or the processing has been removed from cache.
409 Abort not possible because the execution was already completed.
200 The execution has already been aborted or failed.
202 Execution has been signalized to abort.

Chapter 30. Measure Service

30.1. Concepts

The measure service is LogicBlox's implementation of online analytical processing (OLAP), which is used for analyzing multidimensional data. In contrast to LogiQL's normal relational model, OLAP is intended to support convenient and efficient roll-up operations that aggregate values belonging to semantically related keys. For instance, an analyst might start with fine-grained Sales data keyed by intersection (sku, store, week) and roll up by aggregating time to view Sales keyed by (sku, store, season). She might also roll up all products to yield Sales at (store, season).

In the LogicBlox approach to OLAP three primary concepts are levels, dimensions, and measures. Intuitively, a level is a set of values that are intended to serve as keys and that are intended to support roll-up, and a measure is a collection whose keys are drawn from zero or more levels. In the example above, Sales is a measure and each of sku, store, week, and season are levels. A dimension is a group of related levels whose members roll up to each other; for instance, week and season are part of a Calendar dimension.

For context, it's worth understanding that the measure service provides an OLAP view of data stored in normal LogiQL predicates and implemented by installing logic in a workspace. The advantage of using the measure service, as opposed to hand-implementing calculations in LogiQL, is that measure queries and definitions are substantially shorter and more direct than their LogiQL equivalents. As we will see, building an OLAP model will require providing both LogiQL definitions and constructing a measure model that explains how to interpret these definitions as OLAP concepts.

30.1.1. Levels, dimensions, and intersections

A level is a set of points, intended to serve as keys, and a dimension is a set of related levels. Common examples include a Location dimension with levels store, city, and state; a Product dimension with levels sku and class; and a Calendar dimension with levels day, week, season, month, etc. Roll-ups are implemented by specifying a total mapping from members of a lower level to a higher level.

Typically levels are implemented by LogiQL entity types. For example, each member of the Product.sku level might be backed by a myapp:sku entity type. Additionally, there are special dimensions corresponding to primitive types, such as the dimension Int which contains a single level, also named Int. Finally roll-ups between the levels are implemented by functional LogiQL predicates. For example, the simple calendar dimension illustrated in Figure 30.1 may be built based on the LogiQL definitions shown in the following (partial) lb script:

addblock <doc>
// Entity types represent levels.
Day(d), dayId(d:s) -> string(s).
Month(m), monthId(m:s) -> string(s).
Year(y), yearId(y:s) -> string(s).
</doc>

exec <doc>
// Entity values represent members.
+dayId[_] = "1-1-2014".
+dayId[_] = "1-2-2014".
+dayId[_] = "2-1-2014".
+dayId[_] = "2-2-2014".

+monthId[_] = "January2014".
+monthId[_] = "February2014".

+yearId[_] = "2014".
</doc>

addblock <doc>
// Level maps specify how levels roll up
dayToMonth[d] = m -> Day(d), Month(m).
monthToYear[m] = y -> Month(m), Year(y).

dayToMonth[d] = m <- dayId[d] = "1-1-2014", monthId[m] = "January2014".
dayToMonth[d] = m <- dayId[d] = "1-2-2014", monthId[m] = "January2014".
dayToMonth[d] = m <- dayId[d] = "2-1-2014", monthId[m] = "February2014".
dayToMonth[d] = m <- dayId[d] = "2-2-2014", monthId[m] = "February2014".

monthToYear[m] = y <- monthId[m] = "January2014",  yearId[y] = "2014".
monthToYear[m] = y <- monthId[m] = "February2014", yearId[y] = "2014".
</doc>

Figure 30.1. Calendar dimension level relationships

Calendar dimension level relationships

It is also be possible to directly relate members of the Day level to the Year level, but as long as such a relationship commutes with the one from Day to Month and Month to Year, it is unnecessary.

Figure 30.2. Calendar dimension level relationships extended

Calendar dimension level relationships extended

However, dimensions need not be strictly linear. For example, we could add a season level to the dimension and provide additional relationships between month and season and season and year.

Figure 30.3. Calendar dimension extended with Season level

Calendar dimension extended with Season level

For some dimensions, it also makes sense to provide a mapping from a level to itself.

Example 30.1. 

Consider a dimension for measuring data with respect to different parts in manufacturing process. It is natural for the members of a "Parts" level to be related to other parts in the same level. For example, a "Gear" may be part of an "Engine".

Figure 30.4. Example of dimension with a self-relationship

Example of dimension with a self-relationship

Because we allow relationships between any two levels in a dimension, they can be thought of as a graph. However, we do place some structural limitations on the graph to ensure that it has the properties needed for well-defined OLAP queries:

  1. The first requirement is that the graph must not contain any cycles that involve more than one node. This allows for the relationships like the one for "Parts", described above, but rules out mappings that form larger cycles. For example, the Vatican is a country, within the city Rome, which is within the region of Lazio, which is within the country Italy. We can illustrate the relationship between members as

    Figure 30.5. Location dimension levels and members

    Location dimension levels and members

    However, to establish those relationships between levels, we would get a dimension looking like:

    Figure 30.6. Disallowed relationship among Location dimension levels

    Disallowed relationship among Location dimension levels

    This dimension's structure has a cycle involving three nodes, which is not allowed.

  2. The second requirement is that the transitive closure of the directed edge relationship must form a meet-semilattice. That is, when graph reachability is treated as a "less than or equals" relation (), then that relation must be a partial order, and there must exist a least element (level) , such that, for every level l in the dimension, ⊥ ≤ l. Furthermore, for every pair of levels l1 and l2, there must exist a meet (i.e., greatest lower bound).

Generally, dimensions are also described in terms of what are called hierarchies. A hierarchy can be thought of as a named path through the dimension graph. Hierarchies provide a useful modeling option for dimensions and they can also be used to direct some operations that involve dimensions to use a specific path.

Another important concept in modeling with dimensions are attributes. Attributes can be thought of as functions or properties of the members of a level. An attribute is generally used to allow meta-data concerning a member to be queried. For example, the name or label of a member might be two separate attributes of a level.

An intersection is a set of labeled levels. For example, given the level day and the level state, the labeled pair (myDay:day, myState:state) is an intersection. The measure service allows you to avoid writing a label, and will use each level's corresponding dimension as a default label. For instance the following are four ways of writing down the same intersection:

(day, state)
(Calendar:day, state)
(day, Location:state)
(Calendar:day, Location:state)

The order in which labels appear in an intersection can influence the measure service's choice of key ordering (see ???), but it is not important when writing queries or updates. In most contexts (day, state) and (state, day) are equivalent. A single level may occur multiple times in an intersection, for instance the intersection (before:sku, after:sku) might be used when modeling how likely one product is to be purchased after another.

Finally, a position is a record (unordered tuple) of level members corresponding to an intersection. For instance (June2014, Waffle) and (September2012, Toast) might be positions of the (month, breakfastFood) intersection.

30.1.2. Measures

A measure is a map from the positions of some intersection to a value or, less frequently, a set of values or no values at all. The canonical OLAP example is the Sales measure, that gives a decimal data value for each position of the intersection (Sku, Store, Week).

Every measure is defined by a measure expression. Measure expressions are discussed below, but for intuition it's worth considering two kinds of measures. Metrics are measures defined by the contents of a provided LogiQL predicate. You can think of metrics as the input data to the measure service. In contrast, an aggregation defines a measure by adding up values in other measures, which might be metrics. Putting this together, we might define Sales at (Sku, Store, Week) as a metric referencing LogiQL predicate companydata:sales and use an aggregation measure expression to roll up sales figures to the measure at intersection (Sku, Region, Year).

It is not strictly necessary that a measure be a function from positions to values. It is legal to have a set of values for each position. These measures can be used in queries, but our current reporting mechanism can only handle functional measures. Therefore, some filtering or aggregation is necessary to obtain a report from these relational measures.

Furthermore, it is not necessary that a measure contain data. A measure can consist entirely of a set of positions. This is isomorphic to having a dense measure measuring boolean values, but more space efficient. While such measures may be used in queries, given that there is no data within them, it is not possible to directly query them in a report.

Finally, it is also possible for measures to be parameterized so that their behavior can be adjusted on a per query basis. Because the parameterization mechanism is closely tied to implementation details, we do not cover it in depth here.

Note

The remainder of this chapter is under construction.

The measure service and CubiQL (a language that provides a high-level interface to the measure service) are still being actively developed. For the time being we provide only the current grammar.

30.2. Measure Expression Grammar

In many contexts the measure service allows one to concisely specify a measure query expression in a form described by the following grammar. (See chapter Grammar for a description of the notation. Additionally, we use %% to begin a comment that extends to the end of the line.)
%%%% Basic elements

NonDQuote is any character other than a double quote (i.e., '"').
NonBQuote is any character other than a back quote (i.e., '`').
Letter    is any Unicode alphabetic character.

NonzeroDigit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' .
Digit        = '0' | NonzeroDigit .

Ident = Letter { Letter | Digit }
      | '`' { NonBQuote } '`' .       % an escaped identifier


%%%% Queries

Label = Ident .

Type = 'string' | 'int' | 'float' | 'decimal' | 'boolean' | Ident
     | '(' [ Types ] ')' .                                         % tuple type

Types = Type { ',' Type } .

Level = Ident                         % Unqualified level name
      | Ident '.' Ident               % Level name qualified with dimension
      | Ident '.' Ident '.' Ident .   % Level name qualified with dimension and
                                      % hierarchy

LLevel = [Label ':' ] Level .         % Labeled level

Intersection = '(' Intersection ')'                    % parentheses
             |  '{' [ LLevels ] '}'                    % i.e., may be nullary
             |  'interof' '(' Expr ')'                 % indirect intersection
             |  Intersection { '&' Intersection }      % meet of intersections
             |  Intersection { '|' Intersection }      % join of intersections
             |  Intersection '!' Label                 % restriction by a label
             |  Intersection '!' LabelSet              %    and a set of labels
             |  Ident .                                % intersection variable

LLevels = LLevel { ',' LLevel } .
LabelSet = '<' Label { ',' Label } '>' .

BaseSignature = Intersection                              % position only
              | Intersection '=>' Type                    % single-valued
              | Intersection '=>' 'Set' '(' Type ')') .   % multi-valued

IntegerLiteral    = '0' | NonzeroDigit [ Digits ] .
FractionalLiteral = IntegerLiteral '.' Digits .

Digits = Digit { Digit } .

ScalarLiteral = IntegerLiteral                                 % integer literal
              | IntegerLiteral 'd' | FractionalLiteral ['d' ]  % decimal literal
              | IntegerLiteral 'f' | FractionalLiteral 'f'     % float literal
              | FractionalLiteral ('E' | 'e') [ Sign ] IntegerLiteral [ 'f' ]
              | 'true' | 'false'                               % boolean literal
              |  '"' { NonDQuote } '"' .                       % string literal

Sign = '+' | '-' .

LiteralColumn = '[' [ ScalarLiterals ] ']' .
LiteralTuple  = '(' [ ScalarLiterals ] ')' .

Literal = '{' [ LiteralTuples ] '}' ':' BaseSignature .

ScalarLiterals = ScalarLiteral { ',' ScalarLiteral } .

LiteralTuples = LiteralTuple { ',' LiteralTuple } .

Expr = [ '{{' [ Annotations ] '}}' ] ExprNoAnnotations .

Annotations = Annotation { ',' Annotation } .

Annotation = Ident '=' ScalarLiteral
           | Ident '=' LiteralColumn.

ExprNoAnnotations =
     '(' Expr ')'                        % parentheses
   | Ident                               % metric or expression variable
   |  LLevel '.' Ident                   % attribute
   |  '-' Expr                           % sugar for negate(<expr>)
   |  Expr '+' Expr                      % sugar for add(<expr>, <expr>)
   |  Expr '-' Expr                      % sugar for subtract(<expr>, <expr>)
   |  Expr '*' Expr                      % sugar for multiply(<expr>, <expr>)
   |  Expr '/' Expr                      % sugar for divide(<expr>, <expr>)
   |  Expr '@' Intersection              % widen
   |  '#' Expr                           % drop values making position only
   |  'demote' Label 'in' Expr           % convert a dimension into the value
                                         %   of the expression
   |  'promote' [Label 'in'] Expr        % convert the value of the expression
                                         %   into a dimension
   |  AggMethod Expr '@' Intersection    % aggregation to an intersection
   |  AggMethod Expr [ Groupings ]       % aggregation by grouping
   |  'headersort' Expr 'by' LLevel      % headersort
   |  'filter' Expr 'by' Comparisons     % filter
   |  'dice' Expr 'by' Dicers            % dice
   |  'fun' ArgBindings 'in' Expr        % function
   |  Ident '(' [ Exprs ] ')'            % operator
   |  Expr AppBindings                   % application
   |  'let' AppBindings 'in' Expr        % let binding
   |  'split' LabelMap 'in' Expr         % dimension splitting
   |  'relabel' LabelMap 'in' Expr       % dimension relabeling
   |  Expr '++' Expr { '++' Expr }       % override
   |  Expr '|' Expr { '|' Expr }         % union
   |  Expr & Expr { & Expr }             % intersection
   |  Expr '\' Expr                      % difference
   |  Expr 'as' Type                     % cast expression to type
   |  ScalarLiteral                      % sugar for a literal at the top
                                         %   intersection
   |  Literal                            % literal expression

Exprs = Expr { ',' Expr } .

ArgBindings = '[' [ InterArgBindings ] ']' '(' [ ExprArgBindings ] ')'
            | '[' [ InterArgBindings ] ']'
            |                              '(' [ ExprArgBindings ] ')' .

InterArgBindings = InterArgBinding { ',' InterArgBinding }' .

InterArgBinding = Ident                         % simple argument
                | Ident '=' Intersection .      % argument with default

ExprArgBindings = ExprArgBinding { ',' ExprArgBinding } .

ExprArgBinding = Ident                          % simple argument
               | Ident '=' Expr.                % argument with default

AppBindings = '[' [ InterAppBindings ] ']' '(' [ ExprAppBindings ] ')'
            | '[' [ InterAppBindings ] ']'
            |                              '(' [ ExprAppBindings ] ')' .

InterAppBindings =
     Intersections                                    % positional only
   | Intersections { ',' InterNamedBinding }          % positional + named
   | InterNamedBinding { ',' InterNamedBinding } .    % named only

Intersections = Intersection { ',' Intersection } .

InterNamedBinding = Ident '=' Intersection .

ExprAppBindings =
     Exprs                                            % positional only
   | Exprs { ',' ExprNamedBinding }                   % positional + named
   | ExprNamedBinding { ',' ExprNamedBinding } .      % named only

ExprNamedBinding = Ident '=' Expr .

LabelMap = Ident 'to' Ident { ',' Ident 'to' Ident } .

Comparisons = Comparison { 'and' Comparison }
            | Comparison { 'or' Comparison } .

Dicers = Expr { 'and' Expr }
       | Expr { 'or' Expr } .

Comparison = Compop Expr .

Compop = '='       % equal
       | '!='      % not equal
       | '<'       % less than
       | '<='      % less than or equal
       | '>'       % greater than
       | '>='      % greater than or equal
       | '~'       % Posix extended regex comparison

AggMethod = 'average' | 'count' | 'total' | 'collect' | 'ambig'
          | 'min' | 'max' | 'mode' | 'count_distinct' | 'histogram' | 'sort' .

Groupings = 'by' Grouping { Grouping } .

Grouping = 'all' Ident            % project dimension
         | 'to' LLevel            % rollup
         |  'slide' MapName .     % rollup across dimensions


%%%% Updates

Update = 'do' AtomicUpdates .

%% In the future this might be extended to
%%   Update = 'do' UpdateExpr .

AtomicUpdateExpr = '(' AtomicUpdateExpr ')'
                 | 'remove' Expr [ Transformations ] 'from' Target
                 | 'spread' Expr [ Transformations ] 'into' Target .

AtomicUpdates = AtomicUpdateExpr { 'and' AtomicUpdateExpr } .

%% Only conjunctive AtomicUpdateExprs are supported at this time.
%% The following is a possible future extension.
%%
%%   UpdateExpr = AtomicUpdates { 'or' AtomicUpdates }
%%              | 'if' AtomicUpdates 'then' UpdateExpr
%%                  { 'else' 'if' AtomicUpdates'then' UpdateExpr }
%%                    [ 'else' AtomicUpdates ] .

Target = Ident                     % metric
       | LLevel                    % level
       | LLevel '.' Ident .        % attribute

%% In the future Target might be extended by yet another case:
       | LLevel '=>' LLevel .      % level map

Transformations = 'via' Transformation { 'then' Transformation } .

Transformation =
    ('even' | 'ratio' | 'replicate' | 'query' Expr) Distribution .

Distribution = { 'down' Level } .

Chapter 31. Proxy Services

ServiceBlox offers two kinds of proxy services: exact proxies and transparent proxies. An exact proxy forwards a request directed at one specific service to a different service, possibly hosted on a different machine. A transparent proxy forwards all requests directed at a certain URL prefix to a given host.

Exact Proxies.  The following example (see lb-web-samples/protobuf-proxy-auth) illustrates how to configure an authenticated exact proxy service.

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:auth),
  alias_all(`lb:web:config:auth_abbr),
  alias_all(`lb:web:config:proxy),

  clauses(`{

    service_by_prefix["/time-proxy-auth"] = x,
    exact_proxy(x) {
      proxy_target[] = "http://localhost:8080/time-unauth"
      auth_realm[] = "time_auth_proxy",
    }.

    realm_by_name["time_auth_proxy"] = x,
    realm(x) {
      realm_config[] = "default-password"
    }.

  })
} <-- .

Transparent Proxies.  The following example illustrates how to use a transparent proxy. A request to /promo/foo on this host will be forwarded to http://example.com/foo. The host option is used for virtual host support. By setting the host, the HTTP header Host will be set in the forwarded request. The prefix option indicates what prefix of the original URL should be removed from the forwarded request.

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:proxy),

  clauses(`{

    service_by_prefix["/promo/*"] = x,
    transparent_proxy(x) {
      proxy_target[] = "http://example.com",
      proxy_host[] = "example.com",
      proxy_prefix[] = "/promo"
    }.

  })
} <-- . 

Authentication.  ServiceBlox proxies do not currently support proxying to authenticated services. For stateful session authentication this is not possible, for RSA-SHA512 authentication there are some small problems that prevent this from working.

Chapter 32. Extensions

32.1. Email Service

The automatic configuration of the email service supports handler, protocol, request and response messages.

The email service is a ServiceBlox service which accepts a JSON or protobuf request defined by the following protocol:

message SendEmailRequest
{
  required string from = 1;
  repeated string reply_to = 2;

  repeated string to = 3;
  repeated string cc = 4;
  repeated string bcc = 5;

  required string subject = 6;
  required string body = 7;
}

32.1.1. Service configuration via SMTP and SES

The email service can be configured either using SMTP or Amazon Simple Email Service (SES).

Configuring an email services using SES is very easy, as the example below illustrates. The SES configuration uses the AWS SES API directly, it is therefore possible to use IAM (Amazon's Identity and Access Management) roles for authentication.

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),

  alias_all(`lb:web:email:service_config),

  clauses(`{

    /**
     * Email service only hosted on internal group. This can
     * only be used with manual testing, because AWS credentials are
     * needed.
     */
    service_by_group["/admin/email", "lb:web:internal"] = x,
    ses_email_service(x).
  })

} <- .

The example below illustrates how to configure an email service via SMTP, by setting the configuration parameters directly in the service.

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),

  alias_all(`lb:web:email:service_config),

  clauses(`{

    service_by_group["/admin/email", "lb:web:internal"] = x,
    smtp_email_service(x) {
        service_parameter["smtp_server"]="smtp.gmail.com",
        service_parameter["smtp_server_port"]="587",
        service_parameter["smtp_server_user"]="...",
        service_parameter["smtp_server_pwd"]="....",
        service_parameter["smtp_server_auth"]="true",
        service_parameter["smtp_server_tls"]="true"
        }.

  })

} <- .

Tip

You will have to include the lb_web_email library in your project.

The two tables below list the configuration parameters for using SMTP as well as SES. All the properties listed below can be configured either on the service, in the lb-web-server.config handler section, or in the global ServiceBlox section.

Email service using SMTP
Required parameters
smtp_server

Hostname of the SMTP server.

smtp_server_port

Port of the SMTP server.

smtp_server_user

User account for sending email.

smtp_server_pwd

User password for sending email.

Email service using SMTP
Optional parameters
smtp_server_auth

Use authentication or not (true or false).

smtp_server_ssl

Use SSL SMTPS protocol (true or false).

smtp_server_tls

Use TLS protocol (true or false).

debug

Prints detailed information to the log while communicating with the mail server. This can be useful if you experience problems in sending emails.

Email service using SES
Optional authentication parameters
access_key

AWS access key.

secret_key

AWS secret key.

iam_role

Use IAM EC2 instance role (set to any value).

env_credentials

Use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_KEY.

By default environment credentials are tried first, followed by IAM roles.

Example configuration using Gmail:

smtp_server = smtp.gmail.com
smtp_server_port = 587
smtp_server_user = ...
smtp_server_pwd = ...
smtp_server_auth = true
smtp_server_tls = true

32.2. Password Management

ServiceBlox comes with services for changing and resetting passwords. The automatic configuration of these services supports handler, protocol, request and response messages. Other service configuration aspects (such as the service group or authentication realm) can be figured by the developer on the service.

32.2.1. Change Password

In order to change the password of a user, the client must send a request to the change password service of ServiceBlox. The change password service is a ServiceBlox service which accepts a JSON or protobuf request defined by the following protocol:

message ChangePasswordRequest
{
  required string user_name = 1;
  required string current_password = 2;
  required string new_password = 3;
}

32.2.1.1. Service Configuration

A change password service is configured with ServiceBlox by creating a change_password_service. Example:

block(`service_config) {
  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),

  alias_all(`lb:web:credentials:password_management),

  clauses(`{

    /**
     * Change a password
     */
    service_by_prefix["/user/change-password"] = x,
    change_password_service(x).
  })

} <- .

32.2.1.2. Service Parameters

A change password service supports the following optional service_parameter:

credentials_url

URL of credentials service (default: http://localhost:55183/admin/credentials)

This property can be configured either on the service, in the lb-web-server.config handler section, or in the global ServiceBlox section.

32.2.2. Reset Password

ServiceBlox also comes with services to reset passwords. Application developers should develop a user interface supporting the following process flow:

  1. The reset password service is not authenticated. A user can invoke the reset password service with his username or email address.

  2. The reset password request generates a token that is stored in the database as a reset password request for the given user.

  3. A configurable email is sent using the email service to the user.

  4. The user can click on a link in the email, which brings him to a client-side page where the user can enter the token in the UI.

  5. Optional: once the password has been changed, a confirmation email is sent to the user.

In order to reset the password of a user, the client must send a request to the reset password service of ServiceBlox. The reset password service is a ServiceBlox service which accepts a JSON or protobuf request defined by the following protocol:

message ResetPasswordRequest
{
  optional string user_name = 1;
  optional string email = 2;
}

Tip

Passwords can be reset either by email or username. If the username is specified, then the email address is ignored.

The protocol to confirm the reset of a password is defined as:

message ConfirmResetPasswordRequest
{
  required string change_token = 1;
  required string new_password = 2;
}

32.2.2.1. Service Configuration

A reset password service is configured with ServiceBlox by creating a reset_password_service. In the example below an email is sent from to the user, containing the user name, the token and the date/time until the token is valid. Additionally, the service_parameter "notify" is set to false, which means that no confirmation email is sent to the user, once the email is reset. An SES email service is used for sending the email regarding the reset of the forgotten password.

block(`service_config) {
  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),

  alias_all(`lb:web:credentials:password_management),
  alias_all(`lb:web:email:service_config),

  clauses(`{

    /**
     * Reset a forgotten password
     */
    service_by_prefix["/user/reset-password"] = x,
    reset_password_service(x) {
      service_parameter["email_template"] =
        "User: {USER}\nToken: {TOKEN}\nValid until: {VALID}\nTime: {TIME}",
      service_parameter["email_from"] = "support@logicblox.com"
    }.

    /**
     * Do not confirm the reset of a forgotten password
     */
    service_by_prefix["/user/confirm-reset-password"] = x,
    confirm_reset_password_service(x) {
      service_parameter["notify"] = false
    }.

    /**
     * Email service only hosted on internal group. This can
     * only be used with manual testing, because AWS credentials are
     * needed.
     */
    service_by_group["/admin/email", "lb:web:internal"] = x,
    ses_email_service(x).
  })

} <- .

32.2.2.2. Service Parameters

All of the properties listed in this section can be configured either on the service, in the lb-web-server.config handler section, or in the global ServiceBlox section.

A reset password service supports the following service_parameter:

Required service_parameter
email_template

The template for the email. Supports {USER}, {TOKEN}, {IP}, {VALID} and {TIME}.

email_from

The address where emails are sent from.

Optional service_parameter
Parameter Description Default
credentials_url The URL of the credentials service http://localhost:55183/admin/credentials
valid_hours The number of hours a reset token is valid. 4 hours
email_url The URL of the email service. http://localhost:55183/admin/email
email_subject Subject of the reset password email. Password reset

A confirm reset password service supports the following service_parameter:

Required service_parameter
email_template

The template for the email. Supports {USER}, {TOKEN}, {IP}, {VALID} and {TIME}.

email_from

The address where emails are sent from.

Optional service_parameter
Parameter Description Default
credentials_url

The URL of the credentials service.

http://localhost:55183/admin/credentials

notify

Setting whether user should be emailed when the password is changed.

true
email_url

The URL of the email service.

http://localhost:55183/admin/email
email_subject

Subject of the reset password email.

Password change notification

32.3. ConnectBlox Services

The ConnectBlox extension exposes some of the more low level functionality of LogicBlox through typical protobuf web services. These services map very closely to functionality found in the lb command line tool but are accessible over the web and are secured in the same way as other ServiceBlox services.

32.3.1. Configuring Services

The ConnectBlox services are subtypes of the default_protobuf_service entity. They each have their own custom_handler and protobuf request and response message (which are documented in the following service specific sections). The schemata for these services are in the lb:web:connectblox:services module. A snippet follows.

connectblox_service(x) -> default_protobuf_service(x).
// defaults to the workspace hosting the service
// You can use regular expressions to match multiple workspaces.
connectblox_service_workspaces[x] = ws -> connectblox_service(x), string(ws).

list_workspaces_service(x) -> default_protobuf_service(x).

pred_info_service(x) -> connectblox_service(x).
list_predicates_service(x) -> connectblox_service(x).
exec_service(x) -> connectblox_service(x).

Examples of configuring these services can be found in the ServiceBlox repository under lb-web-samples/lb-web-connectblox. An example of configuring the list-predicates service follows.

service_by_prefix["/list-predicates"] = x,
list_predicates_service(x) {
  auth_realm[] = "list-predicates-realm"
}.

Another feature of these services is the ability to specify which workspace you wish to run against. For example, if your services are hosted in a workspace called services and you want to list predicates in a workspace called staging, you can do so by configuring the service to allow access to the staging workspace. You do that by specifying a regular expression that matches the workspaces to which that service has access. For the previous example, you could do either of the following:

service_by_prefix["/list-predicates"] = x,
list_predicates_service(x) {
  auth_realm[] = "list-predicates-realm",
  connectblox_service_workspaces[] = "services|staging" // only matches services or staging
}.

service_by_prefix["/list-predicates"] = x,
list_predicates_service(x) {
  auth_realm[] = "list-predicates-realm",
  connectblox_service_workspaces[] = ".*" // matches any workspace name
}.

If you do not specify any value for the connectblox_service_workspaces predicate, that indicates that the service will only have access to the workspace in which the service is hosted.

32.3.2. Services Protobuf Schema

import "blox/connect/ConnectBlox.proto";
import "blox/connect/BloxCommand.proto";
import "blox/common/Common.proto";
package web.connectblox;

option java_package = "com.logicblox.web.connectblox";

message ExecRequest {
  required blox.connect.ExecBlock execute = 1;
  optional string workspace_name = 2;
}

message ExecResponse {
  required blox.connect.ExecBlockResponse response = 1;
  optional Error error = 10;
}

message ListWorkspacesRequest
{
}

message ListWorkspacesResponse
{
  optional blox.connect.ListWorkSpacesResponse workspaces = 1;
  optional Error error = 10;
}

message PredicateInfoRequest
{
  required string qualified_name = 1;
  optional string workspace_name = 2;
}

message PredicateInfoResponse
{
  optional blox.common.protocol.PredicateInfo info = 1;
  optional Error error = 2;
}

message ListPredicatesRequest
{
  repeated string qualified_name = 1; // if empty return all predicates
  optional string workspace_name = 2;
}

message ListPredicatesResponse
{
  repeated blox.common.protocol.PredicateInfo info = 1;
  optional Error error = 2;
}

message Error {
  // English error, not necessarily suitable for presentation to the
  // user. If there was an error, then the error_code field is always
  // set as well.
  optional string error = 1; // change this to 'message'
  optional string error_code = 2;
}

32.3.3. Services Reference

Service Request message Response message Description
web:connectblox:services:list_workspaces_service ListWorkspacesRequest ListWorkspacesResponse

Lists all available workspaces.

web:connectblox:services:pred_info_service PredicateInfoRequest PredicateInfoResponse

Returns predicate information about a predicate.

web:connectblox:services:list_predicates_service ListPredicatesRequest ListPredicatesResponse

Returns predicate information about some or all predicates in a workspace.

web:connectblox:services:exec_service ExecRequest ExecResponse

Executes a snippet of logic against a workspace.

Part V. Tools

Chapter 33. LogicBlox Command Reference

The main tool for manipulating your LogicBlox database is lb. The various subcommands of lb allow you to interact with the database, manipulate and inspect the state of ServiceBlox services, create build configurations for projects, and execute tests. The sections in this chapter provide detailed descriptions of how to use these commands. You can also retrieve the same information by typing lb --help, and lb subcommand --help at the command line.

Note

Invoking just lb puts the tool in interactive mode: it prints a prompt, and you can enter individual commands. (The prompt is lb, followed by the name of the current workspace, if there is one, followed by >).

If you type the command help, you will get the list of commands, as well as a list of additional commands, which are different from the additional commands listed when you type lb --help. The additional commands of interactive mode are not documented in this manual.

If you invoke lb filename, the tool will interpret the commands in the file one by one: the available additional commands are those from the interactive mode. The name of the file must end with the suffix .lb .

33.1. Database Services Management

33.1.1. lb services

33.1.1.1. Syntax

lb services [-h]  ...

33.1.1.2. Description

manage lb services

33.1.1.3. Optional Arguments

-h --help

show this help message and exit

33.1.1.4. Commands

print

list processes of running services

processes

list processes related to running services by heuristic

restart

stop and start a service

start

start a service

status

print the status of services

stop

stop a running service

33.1.2. lb services print

33.1.2.1. Syntax

lb services print [-h]

33.1.2.2. Description

list processes of running services

33.1.2.3. Optional Arguments

-h --help

show this help message and exit

33.1.3. lb services processes

33.1.3.1. Syntax

lb services processes [-h]

33.1.3.2. Description

list processes related to running services by heuristic

33.1.3.3. Optional Arguments

-h --help

show this help message and exit

33.1.4. lb services restart

33.1.4.1. Syntax

lb services restart [-h]

33.1.4.2. Description

start a service

33.1.4.3. Optional Arguments

-h --help

show this help message and exit

33.1.5. lb services start

33.1.5.1. Syntax

lb services start [-h]

33.1.5.2. Description

start a service

33.1.5.3. Optional Arguments

-h --help

show this help message and exit

33.1.6. lb services status

33.1.6.1. Syntax

lb services status [-h]

33.1.6.2. Description

print the status of services

33.1.6.3. Optional Arguments

-h --help

show this help message and exit

33.1.7. lb services stop

33.1.7.1. Syntax

lb services stop [-h]

33.1.7.2. Description

stop a running service

33.1.7.3. Optional Arguments

-h --help

show this help message and exit

33.2. Workspace Commands

33.2.1. lb addblock

33.2.1.1. Syntax

lb addblock [-h] [-f FILE] [--name NAME] [--inactive]
            [--language LANGUAGE] [--loglevel LOGLEVEL] [-L]
            [--cwd [DIR]] [--timeout TIMEOUT] [--exclusive]
            [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
            [-m PREDICATE] [--readonly]
            WORKSPACE [LOGIC]

33.2.1.2. Description

add active or inactive block to workspace

33.2.1.3. Positional Arguments

workspace

name of workspace

logic

logic to add

33.2.1.4. Optional Arguments

-h --help

show this help message and exit

-f --file

logic file to add

--name

name of block (default: unique name)

--inactive

add block as inactive block

--language

source language of block: lb (default) or lb0

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.2. lb addlib

33.2.2.1. Syntax

lb addlib [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
          [--timeout TIMEOUT] [--exclusive]
          [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
          [-m PREDICATE] [--readonly]
          WORKSPACE NAME

33.2.2.2. Description

add library to workspace

33.2.2.3. Positional Arguments

workspace

name of workspace

name

name of library to add

33.2.2.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.3. lb addproject

33.2.3.1. Syntax

lb addproject [-h] [--norecurse] [--nocopy] [--libpath LIBPATH]
              [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
              [--timeout TIMEOUT] [--exclusive]
              [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
              [-m PREDICATE] [--readonly]
              WORKSPACE DIR

33.2.3.2. Description

add project to workspace

33.2.3.3. Positional Arguments

workspace

name of workspace

dir

project directory path

33.2.3.4. Optional Arguments

-h --help

show this help message and exit

--norecurse

avoid finding libraries recursively

--nocopy

skip copy of level 1 workspace

--libpath

the path for lb libraries that this project depends on

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.4. lb aborttransaction

33.2.4.1. Syntax

lb aborttransaction [-h] [WORKSPACE] TID

33.2.4.2. Description

abort a transaction

33.2.4.3. Positional Arguments

workspace

workspace

tid

transaction id

33.2.4.4. Optional Arguments

-h --help

show this help message and exit

33.2.5. lb branch

33.2.5.1. Syntax

lb branch [-h] [--parent PARENT] [--overwrite] [--loglevel LOGLEVEL]
          [-L] [--cwd [DIR]]
          WORKSPACE BRANCH

33.2.5.2. Description

create a new named branch

33.2.5.3. Positional Arguments

workspace

name of workspace

name

name of new branch

33.2.5.4. Optional Arguments

-h --help

show this help message and exit

--parent

branch to create new named branch from, if not default

--overwrite -o

overwrite if the branch already exists

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.6. lb branches

33.2.6.1. Syntax

lb branches [-h] [--all] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
            WORKSPACE

33.2.6.2. Description

list named branches

33.2.6.3. Positional Arguments

workspace

name of workspace

33.2.6.4. Optional Arguments

-h --help

show this help message and exit

--all

also list branches created for automatic backups

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.7. lb copy-remote

33.2.7.1. Syntax

lb copy-remote [-h] [--remote-workspace REMOTE-WORKSPACE]
               [--remote-port REMOTE-PORT]
               NAME HOST

33.2.7.2. Description

make a copy of a remote workspace

33.2.7.3. Positional Arguments

name

name of the local workspace

host

host name of the remote LogicBlox server

33.2.7.4. Optional Arguments

-h --help

show this help message and exit

--remote-workspace

name of the remote workspace

--remote-port

port of the remote LogicBlox server

33.2.8. lb compileblock

33.2.8.1. Syntax

lb compileblock [-h] [-f FILE] [--name NAME] [--inactive]
                [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                [--timeout TIMEOUT] [--exclusive]
                [--commit-mode {diskcommit,softcommit}]
                [--branch [NAME]] [-m PREDICATE] [--readonly]
                WORKSPACE [LOGIC]

33.2.8.2. Description

test compiling active or inactive block

33.2.8.3. Positional Arguments

workspace

name of workspace

logic

logic to compile

33.2.8.4. Optional Arguments

-h --help

show this help message and exit

-f --file

logic file to compile

--name

name of block (default: unique name)

--inactive

active or inactive block

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.9. lb create

33.2.9.1. Syntax

lb create [-h] [--unique] [--overwrite] [--libs LIBS]
          [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
          [NAME]

33.2.9.2. Description

create new workspace

33.2.9.3. Positional Arguments

name

name or prefix of the new workspace

33.2.9.4. Optional Arguments

-h --help

show this help message and exit

--unique

create a workspace with a unique name

--overwrite

delete existing workspace with the same name, if it exists

--libs

comma separated list of libs to load

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.10. lb delete

33.2.10.1. Syntax

lb delete [-h] [-f] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
          NAME [NAME ...]

33.2.10.2. Description

delete workspace

33.2.10.3. Positional Arguments

name

name of workspace to delete

33.2.10.4. Optional Arguments

-h --help

show this help message and exit

-f --force

ignore non-existent workspaces

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.11. lb delete-branch

33.2.11.1. Syntax

lb delete-branch [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                 WORKSPACE BRANCH

33.2.11.2. Description

create a new named branch

33.2.11.3. Positional Arguments

workspace

name of workspace

name

name of branch to delete

33.2.11.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.12. lb exec

33.2.12.1. Syntax

lb exec [-h] [--blockname BLOCKNAME] [--bind-branch BIND-BRANCH]
        [-f FILE] [--raw] [--exclude-ids] [--csv]
        [--delimiter DELIMITER] [--print [PRED]] [--language LANGUAGE]
        [--loglevel LOGLEVEL] [-L] [--cwd [DIR]] [--timeout TIMEOUT]
        [--exclusive] [--commit-mode {diskcommit,softcommit}]
        [--branch [NAME]] [-m PREDICATE] [--readonly]
        WORKSPACE [LOGIC]

33.2.12.2. Description

execute logic and optionally print results

33.2.12.3. Positional Arguments

workspace

name of workspace

logic

logic to add. If logic starts with a '-', please use '--' before the logic to get correct parsing behavior

33.2.12.4. Optional Arguments

-h --help

show this help message and exit

--blockname

name of block

--bind-branch

string with a list of branch bindings. Example: 'branch1=x123, branch2=y456'

-f --file

logic file to execute

--raw

print query result without escaping

--exclude-ids

output only refmode values of entities

--csv

print query result to csv files

--delimiter

delimiter when printing query result to csv files

--print

print local predicate PRED. Default '_'

--language

source language of block: lb (default) or lb0

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.13. lb execblock

33.2.13.1. Syntax

lb execblock [-h] [--raw] [--exclude-ids] [--csv]
             [--delimiter DELIMITER] [--print [PRED]] [--input INPUT]
             [--bind-branch BIND-BRANCH] [--loglevel LOGLEVEL] [-L]
             [--cwd [DIR]] [--timeout TIMEOUT] [--exclusive]
             [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
             [-m PREDICATE] [--readonly]
             WORKSPACE NAME

33.2.13.2. Description

execute inactive block and optionally print results

33.2.13.3. Positional Arguments

workspace

name of workspace

name

name of inactive block to execute

33.2.13.4. Optional Arguments

-h --help

show this help message and exit

--raw

print query result without escaping

--exclude-ids

output only refmode values of entities

--csv

print query result to csv files

--delimiter

delimiter when printing query result to csv files

--print

print local predicate PRED. Default '_'

--input

string with a list of bindings for local variables in the block. Only strings are supported. Example: '_in1=string1, _in2=string2'

--bind-branch

string with a list of branch bindings. Example: 'branch1=x123, branch2=y456'

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.14. lb export-protobuf

33.2.14.1. Syntax

lb export-protobuf [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                   [--timeout TIMEOUT] [--exclusive]
                   [--commit-mode {diskcommit,softcommit}]
                   [--branch [NAME]] [-m PREDICATE] [--readonly]
                   WORKSPACE PROTOCOL MSGTYPE FILE

33.2.14.2. Description

export protobuf message to a file

33.2.14.3. Positional Arguments

workspace

name of workspace

protocol

name of the protocol to use

msgType

protobuf message type of the data file

file

protobuf message data file

33.2.14.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.15. lb export-protobuf-schema

33.2.15.1. Syntax

lb export-protobuf-schema [-h] [--protocol PROTOCOL] [-i [INCLUDE]]
                          [-e [EXCLUDE]] [--loglevel LOGLEVEL] [-L]
                          [--cwd [DIR]] [--timeout TIMEOUT]
                          [--exclusive]
                          [--commit-mode {diskcommit,softcommit}]
                          [--branch [NAME]] [-m PREDICATE] [--readonly]
                          WORKSPACE

33.2.15.2. Description

get installed protocol descriptors

33.2.15.3. Positional Arguments

workspace

name of workspace

33.2.15.4. Optional Arguments

-h --help

show this help message and exit

--protocol

save descriptor for selected protocols. Multiple OK

-i --include

regular expression of protocol names to include

-e --exclude

regular expression of protocol names to exclude

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.16. lb filepath

33.2.16.1. Syntax

lb filepath [-h] [-i] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
            WORKSPACE

33.2.16.2. Description

find file system path of workspace

33.2.16.3. Positional Arguments

workspace

name of workspace

33.2.16.4. Optional Arguments

-h --help

show this help message and exit

-i --inverse

find workspace name from path of workspace

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.17. lb import-protobuf

33.2.17.1. Syntax

lb import-protobuf [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                   [--timeout TIMEOUT] [--exclusive]
                   [--commit-mode {diskcommit,softcommit}]
                   [--branch [NAME]] [-m PREDICATE] [--readonly]
                   WORKSPACE PROTOCOL MSGTYPE FILE

33.2.17.2. Description

import protobuf message from a file

33.2.17.3. Positional Arguments

workspace

name of workspace

protocol

name of the protocol to use

msgType

protobuf message type of the data file

file

protobuf message data file

33.2.17.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.18. lb import-protobuf-schema

33.2.18.1. Syntax

lb import-protobuf-schema [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                          [--timeout TIMEOUT] [--exclusive]
                          [--commit-mode {diskcommit,softcommit}]
                          [--branch [NAME]] [-m PREDICATE] [--readonly]
                          WORKSPACE NAME FILE

33.2.18.2. Description

protobuf add specification from a file

33.2.18.3. Positional Arguments

workspace

name of workspace

name

name of message protocol to add

file

protobuf message descriptor file

33.2.18.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.19. lb import-xml

33.2.19.1. Syntax

lb import-xml [-h] [--schema SCHEMA] [--loglevel LOGLEVEL] [-L]
              [--cwd [DIR]] [--timeout TIMEOUT] [--exclusive]
              [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
              [-m PREDICATE] [--readonly]
              WORKSPACE FILE

33.2.19.2. Description

import XML document

33.2.19.3. Positional Arguments

workspace

name of workspace

file

the /path/to/XML/file

33.2.19.4. Optional Arguments

-h --help

show this help message and exit

--schema

name of schema (optional)

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.20. lb import-xml-schema

33.2.20.1. Syntax

lb import-xml-schema [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                     [--timeout TIMEOUT] [--exclusive]
                     [--commit-mode {diskcommit,softcommit}]
                     [--branch [NAME]] [-m PREDICATE] [--readonly]
                     WORKSPACE NAME FILE

33.2.20.2. Description

add XML schema specification from a file

33.2.20.3. Positional Arguments

workspace

name of workspace

name

name for the schema

file

binary schema descriptor file, generated by importexport

33.2.20.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.21. lb info

33.2.21.1. Syntax

lb info [-h] [--json] NAME

33.2.21.2. Description

print information about a workspace

33.2.21.3. Positional Arguments

name

name of the workspace

33.2.21.4. Optional Arguments

-h --help

show this help message and exit

--json

print information in JSON format

33.2.22. lb list-blocks

33.2.22.1. Syntax

lb list-blocks [-h] [--inactive | --active] [--loglevel LOGLEVEL] [-L]
               [--cwd [DIR]]
               WORKSPACE

33.2.22.2. Description

list blocks in a workspace

33.2.22.3. Positional Arguments

workspace

name of workspace

33.2.22.4. Optional Arguments

-h --help

show this help message and exit

--inactive

list only inactive blocks

--active

list only active blocks

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.23. lb list

33.2.23.1. Syntax

lb list [-h] [--kind KIND] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
        [--timeout TIMEOUT] [--exclusive]
        [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
        [-m PREDICATE]
        WORKSPACE

33.2.23.2. Description

list predicates in workspace

33.2.23.3. Positional Arguments

workspace

name of workspace

33.2.23.4. Optional Arguments

-h --help

show this help message and exit

--kind

kind of predicate. The basic kinds are: "edb", "idb", "built-in" and "runtime-internal". Additionally, "predefined" stands for the disjunction "built-in ; runtime-internal". The kind can be specified by a disjunction of conjunctions, where each conjunct can be negated. For example, "edb ; idb" shows only EDB and IDB predicates: this specification is equivalent to "! predefined" (a negation). "!edb , !runtime internal , !built-in" (a conjunction) is equivalent to "idb". The default is "edb; idb".

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

33.2.24. lb popcount

33.2.24.1. Syntax

lb popcount [-h] [-p PREDICATE] [-i [INCLUDE]] [-e [EXCLUDE]] [-d]
            [--estimate] [--include-default] [--kind KIND]
            [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
            [--timeout TIMEOUT] [--exclusive]
            [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
            [-m PREDICATE]
            WORKSPACE [NAME [NAME ...]]

33.2.24.2. Description

print popcount for all, or specified, predicates

33.2.24.3. Positional Arguments

workspace

name of workspace

name

name of predicate

33.2.24.4. Optional Arguments

-h --help

show this help message and exit

-p --predicate

print popcount for specific predicate

-i --include

regular expression [Perl-syntax] for predicates to include

-e --exclude

regular expression [Perl-syntax] for predicates to exclude

-d --density

print density when available

--estimate

print estimated rather than actual size

--include-default

for default-valued predicates, use size of default layer

--kind

kind of predicate. The basic kinds are: "edb", "idb", "built-in" and "runtime-internal". Additionally, "predefined" stands for the disjunction "built-in ; runtime-internal". The kind can be specified by a disjunction of conjunctions, where each conjunct can be negated. For example, "edb ; idb" shows only EDB and IDB predicates: this specification is equivalent to "! predefined" (a negation). "!edb , !runtime internal , !built-in" (a conjunction) is equivalent to "idb". The default is "edb; idb".

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

33.2.25. lb predinfo

33.2.25.1. Syntax

lb predinfo [-h] [--transitive] [--omit PREDICATE] [--all]
            [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
            [--timeout TIMEOUT] [--exclusive]
            [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
            [-m PREDICATE]
            WORKSPACE [NAME [NAME ...]]

33.2.25.2. Description

print information about a workspace

33.2.25.3. Positional Arguments

workspace

name of workspace

name

name of predicate

33.2.25.4. Optional Arguments

-h --help

show this help message and exit

--transitive

include predicates that use this predicate

--omit

predicates to not include

--all

return info for all user-predicates. [Deprecated]

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

33.2.26. lb print-block

33.2.26.1. Syntax

lb print-block [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
               WORKSPACE BLOCK

33.2.26.2. Description

print the logic for a block

33.2.26.3. Positional Arguments

workspace

name of workspace

block

name of block

33.2.26.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.27. lb print-rules

33.2.27.1. Syntax

lb print-rules [-h] [--branch [NAME]] [--dependent] [--internal]
               [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
               WORKSPACE PREDICATE

33.2.27.2. Description

print the rules defining a predicate

33.2.27.3. Positional Arguments

workspace

name of workspace

predicate

name of predicate

33.2.27.4. Optional Arguments

-h --help

show this help message and exit

--branch

named workspace branch to use

--dependent

print rules that use the predicate in the body

--internal

print rules in internal runtime format

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.28. lb print

33.2.28.1. Syntax

lb print [-h] [--exclude-ids] [--raw] [--loglevel LOGLEVEL] [-L]
         [--cwd [DIR]] [--timeout TIMEOUT] [--exclusive]
         [--commit-mode {diskcommit,softcommit}] [--branch [NAME]]
         [-m PREDICATE]
         WORKSPACE NAME

33.2.28.2. Description

print facts of predicate

33.2.28.3. Positional Arguments

workspace

name of workspace

name

name of predicate

33.2.28.4. Optional Arguments

-h --help

show this help message and exit

--exclude-ids

output only refmode values of entities

--raw

print results without escaping

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

33.2.29. lb query

33.2.29.1. Syntax

lb query [-h] [-f FILE] [--raw] [--exclude-ids] [--csv]
         [--delimiter DELIMITER] [--print [PRED]] [--language LANGUAGE]
         [--loglevel LOGLEVEL] [-L] [--cwd [DIR]] [--timeout TIMEOUT]
         [--exclusive] [--commit-mode {diskcommit,softcommit}]
         [--branch [NAME]] [-m PREDICATE]
         WORKSPACE [LOGIC]

33.2.29.2. Description

execute logic and print results

33.2.29.3. Positional Arguments

workspace

name of workspace

logic

logic to add. If logic starts with a '-', please use '--' before the logic to get correct parsing behavior

33.2.29.4. Optional Arguments

-h --help

show this help message and exit

-f --file

logic file to execute

--raw

print query result without escaping

--exclude-ids

output only refmode values of entities

--csv

print query result to csv files

--delimiter

delimiter when printing query result to csv files

--print

print local predicate PRED. Default '_'

--language

source language of block: lb (default) or lb0

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

33.2.30. lb raw

33.2.30.1. Syntax

lb raw [-h] [-v] [--is-admin] REQUEST

33.2.30.2. Description

send raw request to server [for debugging]

33.2.30.3. Positional Arguments

request

string representation of Request message

33.2.30.4. Optional Arguments

-h --help

show this help message and exit

-v --verbose

print Request and Response

--is-admin

admin-request

33.2.31. lb removeblock

33.2.31.1. Syntax

lb removeblock [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
               [--timeout TIMEOUT] [--exclusive]
               [--commit-mode {diskcommit,softcommit}]
               [--branch [NAME]] [-m PREDICATE] [--readonly]
               WORKSPACE BLOCKNAME [BLOCKNAME ...]

33.2.31.2. Description

remove a block from the workspace

33.2.31.3. Positional Arguments

workspace

name of workspace

blockname

name of block to remove

33.2.31.4. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

--timeout -t

transaction timeout (in milliseconds)

--exclusive

set this flag if this request should be the only one running in the workspace

--commit-mode

set to either softcommit or diskcommit

--branch

named workspace branch to use for transaction

-m --monitor

print assertions and retractions for predicate at end of transaction

--readonly

execute this command in a read-only transaction

33.2.32. lb status

33.2.32.1. Syntax

lb status [-h] [--active] [--all] [--debug] [WORKSPACE [WORKSPACE ...]]

33.2.32.2. Description

server status

33.2.32.3. Positional Arguments

workspace

workspace(s) to be queried

33.2.32.4. Optional Arguments

-h --help

show this help message and exit

--active

show active requests

--all

show all requests (active and queued)

--debug

show debug details

33.2.33. lb version

33.2.33.1. Syntax

lb version [-h] [-d] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
           [WORKSPACE]

33.2.33.2. Description

print version

33.2.33.3. Positional Arguments

workspace

name of workspace

33.2.33.4. Optional Arguments

-h --help

show this help message and exit

-d --detail

print detailed version

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.2.34. lb workspaces

33.2.34.1. Syntax

lb workspaces [-h] [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]

33.2.34.2. Description

list workspaces

33.2.34.3. Optional Arguments

-h --help

show this help message and exit

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.3. Replication

33.3.1. lb promote-mirror

33.3.1.1. Syntax

lb promote-mirror [-h] NAME

33.3.1.2. Description

promote a mirror workspace to master

33.3.1.3. Positional Arguments

name

name of the local workspace

33.3.1.4. Optional Arguments

-h --help

show this help message and exit

33.3.2. lb start-mirror

33.3.2.1. Syntax

lb start-mirror [-h] [--remote-workspace REMOTE-WORKSPACE]
                [--remote-port REMOTE-PORT]
                NAME HOST

33.3.2.2. Description

start or resume mirroring a remote workspace

33.3.2.3. Positional Arguments

name

name of the local workspace

host

host name of the remote LogicBlox server

33.3.2.4. Optional Arguments

-h --help

show this help message and exit

--remote-workspace

name of the remote workspace

--remote-port

port of the remote LogicBlox server

33.3.3. lb stop-mirror

33.3.3.1. Syntax

lb stop-mirror [-h] NAME

33.3.3.2. Description

stop mirroring a remote workspace

33.3.3.3. Positional Arguments

name

name of the local workspace

33.3.3.4. Optional Arguments

-h --help

show this help message and exit

33.4. Unit Testing

33.4.1. lb unit

33.4.1.1. Syntax

lb unit [-h] [--suite SUITE [SUITE ...]]
        [--suite-dir SUITEDIR [SUITEDIR ...]] [--test TEST [TEST ...]]
        [--progress] [--list] [--time] [--sequential]
        [--threads THREADS]
        [--exclude-test EXCLUDETEST [EXCLUDETEST ...]]
        [--exclude EXCLUDE [EXCLUDE ...]] [--no-ignore] [--no-cleanup]
        [--default-fixtures]

33.4.1.2. Description

lb unit

33.4.1.3. Optional Arguments

-h --help

show this help message and exit

--suite

list of test suites to run

--suite-dir

directory containing suites

--test

list of tests to run

--progress -p

print each test name as it runs

--list -l

display list of tests without running them

--time

use measure engine for this workspace

--sequential -s

run tests sequentially

--threads -t

set number of threads, default to 1

--exclude-test

list of tests to exclude

--exclude -e

list of suites to exclude

--no-ignore

do not ignore suites that have 'suite.ignore' files

--no-cleanup

do not run teardown after each test

--default-fixtures

run default test fixtures if no setup or teardown files exists

33.5. Other Commands

33.5.1. lb batch-script

33.5.1.1. Syntax

lb batch-script [-h] [--branch [NAME]] [-t] [-r] [-f FILE]
                [--loglevel LOGLEVEL] [-L] [--cwd [DIR]]
                WORKSPACE [SCRIPT]

33.5.1.2. Description

execute a dlbatch script on the server

33.5.1.3. Positional Arguments

workspace

name of workspace

script

script commands to be executed

33.5.1.4. Optional Arguments

-h --help

show this help message and exit

--branch

named workspace branch to use for script

-t --transactional

execute commands inside a transaction

-r --return-data

return data created by server

-f --file

file containing script commands to execute

--loglevel

log level to change to. Valid levels are 'error', 'warning', 'info', 'perf', 'perf_detail' or 'debug'. A level may refer to a specific scope using @, and multiple specifications may be concatenated by colons. Examples: 'info', 'error@transport', 'info:debug@EvaluateRules'

-L --log

transfer log from server and print

--cwd

use current working directory. Used to resolve file predicate paths

33.5.2. lb compile

33.5.2.1. Syntax

lb compile [-h]  ...

33.5.2.2. Description

compile LogicBlox logic

33.5.2.3. Optional Arguments

-h --help

show this help message and exit

33.5.2.4. Compile Commands

file

Compile an individual file for testing.

project

Compile a LogicBlox project.

33.5.3. lb compile file

33.5.3.1. Syntax

lb compile file [-h] [--serialize SERIALIZE]
                [--out-format {logiql,proto,xml}]
                [--stage {final,initial}]
                [--lifetime {database,transaction}]
                [LOGIC-FILE]

33.5.3.2. Description

Compile an individual file for testing.

33.5.3.3. Positional Arguments

logic-file

logic (.logic) file to compile

33.5.3.4. Optional Arguments

-h --help

show this help message and exit

--serialize

serialize binary protobuf ouput to a file. (Requires --output-format proto.)

--out-format

select the format for pretty-printing the compiled output

--stage

select the transaction stage used when compiling

--lifetime

select the logic lifetime used when compiling

33.5.4. lb compile project

33.5.4.1. Syntax

lb compile project [-h] [--out-dir OUT-DIR] [--libpath LIBPATH]
                   [--progress] [--explain] [--clean]
                   [--max-problems MAX-PROBLEMS]
                   PROJECT-FILE

33.5.4.2. Description

Compile a LogicBlox project.

33.5.4.3. Positional Arguments