LogicBlox 3.10 Reference Manual


I. Language
1. Introduction
2. Values and Types
3. Predicates
3.1. Type declarations
3.2. Functional predicates
3.3. Entity predicates
3.4. Constructor Predicates
3.5. Ordered Predicates
3.6. File Predicates
3.7. One-to-one mappings
3.8. Implicitly declared predicates
3.9. Predicate properties
4. Lexical Syntax
4.1. Introduction
4.2. General
4.3. White space and comments
4.4. Keywords
4.5. Identifiers
4.6. Operators
4.7. Literals
5. Expressions
5.1. Constants
5.2. Variables
5.3. Arithmetic operations
5.4. Applications
5.5. Parenthesized expressions
6. Formulas
6.1. Comparison
6.2. Atoms
6.3. Conjunction and disjunction
6.4. Negation
6.5. Precedence
6.6. Parenthesized formulas
7. Rules
7.1. Aggregation
7.2. Series functions
7.3. Incremental evaluation
7.4. Event rules
8. Constraints
8.1. Syntax
9. Typing
9.1. Predicate declarations
9.2. Entity declarations
9.3. Examples of Implicit entity declarations
9.4. Interpreting predicate declarations
9.5. Predicate type inference
9.6. Type checking
10. Transactions
10.1. Life-time of a transaction
10.2. Stages
10.3. Semantics
11. Updates
11.1. Delta predicates
11.2. Restrictions
11.3. Pulse predicates
12. Hierarchical Syntax
12.1. Formal description
13. MoReBlox
13.1. Level 1 predicates and their declarations
13.2. Level 1 derivation rules
13.3. Well-formedness criterias for generic Datalog
13.4. Examples
14. Separate Compilation
14.1. A LogicBlox Project
14.2. Compiling your project
14.3. Installing a project into a workspace
14.4. Bytecode file format
14.5. Summary file format
15. Modules
15.1. ConcreteBlox
16. Default-Value Predicates
16.1. Declaring a default value
16.2. Functional determination
16.3. Functionally determined predicates
16.4. Restrictions for predicates with default values
17. Provenance
17.1. Recording and querying provenance
17.2. Language constructs for which provenance is not defined
17.3. Provenance rewrite as separate compilation
18. Concurrency Control
18.1. Introduction
18.2. Element-level locking
18.3. Isolation levels
18.4. Log shipping
II. Tools
19. Testing
19.1. Basic BloxUnit
20. lb-config
20.1. Getting Started
21. lb-base-project
21.1. Building and Testing
21.2. Project Structure
21.3. Demo
22. Hierarchical Import/Export
22.1. Using Hierarchical Import/Export
22.2. Hierarchical Import/Export in Logicblox 3.9
22.3. Set semantics for repeated fields
23. BloxWeb
23.1. Introduction
23.2. Installing and Running BloxWeb
23.3. Implementing ProtoBuf/JSON Services
23.4. Service Configuration Reference
23.5. Plugin Logic
23.6. Implementing Global ProtoBuf/JSON Services
23.7. Implementing Delimited File Services
23.8. Dynamic Delimited File Services
23.9. Configuring Proxy Services
23.10. Implementing Custom Services
23.11. Authentication
23.12. Extensions
23.13. Transport Methods
23.14. Bloxweb Batch Language
23.15. Configuration
23.16. CORS Rules
23.17. Specifications
24. Application Console
24.1. Installation
24.2. Configuration
25. Program Analysis
25.1. Usage
25.2. Analysis workspace
26. Profiling, Monitoring and Tuning
26.1. bloxtop
26.2. Understanding Query Execution
26.3. Pre-compiled queries
26.4. Contention reporting
26.5. Long-running rules reporting
26.6. Cycle execution graph
26.7. Detailed Analysis of Contention Issues
26.8. Guidelines for Monitoring System Resources and Logging
27. XPath Query Translation
27.1. Usage
III. Measure Service
28. Concepts
28.1. OLAP
28.2. Measure service
28.3. Dimensions
28.4. Measures
29. Configuration
29.1. Dimensions
29.2. Metrics
30. Primitive queries
30.1. Attributes
30.2. Metrics
30.3. Terms
31. Aggregation queries
32. Filtering and dicing
32.1. Filtering
32.2. Dicing
33. Measure Expression Grammar
34. Spreading
34.1. Concepts
34.2. Update structure
34.3. Direct updates
34.4. Indirect spreads
34.5. Spread-by-even
34.6. Spread-by-ratio
34.7. Spread-by-percent-parent
34.8. Spread-by-query
34.9. Removal
Glossary
IV. Blade Application Framework
35. Workbook Framework
35.1. Getting Started
35.2. Building Blade applications
35.3. Blade configuration files
35.4. Schema Definition
35.5. WorkBook Template Properties
35.6. Commit and Refresh
35.7. lb-workbook command
35.8. Workbook Services
35.9. Configuring the blox-applet-server
35.10. Migrating from pre-3.9 to 3.9.x
36. Blade Tips and Tricks
36.1. Invoking Protobuf Services from Blade
36.2. Setting and re-setting the sorting of views
V. Administration
37. Backup and Copy
38. Workspace Corruption and Consistency
38.1. Intensional (IDB) predicate consistency check
VI. Appendix
A. Built-in predicates
A.1. Primitive type conversion
A.2. Comparison Operations
A.3. Math Operations
A.4. Floating Point Functions
A.5. Canonical values of built-in types
A.6. Datetime predicates
A.7. String predicates
A.8. Ordered Entity Operations
A.9. Numeric Ranges
A.10. Boolean Operations
B. Compiler Errors and Warnings
B.1. MULTIPLE_VALUES
B.2. IDB_META
B.3. EDB_RULE
B.4. INCONSISTENT_EQUALITY
B.5. MULTIPLE_RULES_NONCYCLIC
B.6. DIV_ZERO
B.7. NO_DECLARATION
B.8. UNKNOWN_COMPILER_OPTION
B.9. CONSTRUCTOR_ILLEGAL_VALUETYPE
B.10. SIMILAR_VAR
B.11. SUBTYPE_PRIM
B.12. SUBTYPE_MULTI
B.13. SUBTYPE_PRIM
B.14. TYPE_INFER
B.15. PULSE_NONPULSE_SUPER
B.16. NONPULSE_PULSE_SUPER
B.17. PULSE_CONSTRAINT
B.18. FUNC_SEMICOLON_DEPRECATED
B.19. ONETOONE_DEFAULT_VALUE
B.20. DYNAMIC_TYPE_CONSTRAINT
B.21. SCALABLE_SPARSE_ENTITY_ONLY
B.22. STORAGE_MODEL_TOP_ENTITY
B.23. POLYMORPHIC_LITERAL
B.24. COMP_UNORDERED
B.25. META_MODEL_DEPRECATED
B.26. Skolem functions
B.27. Auto-numbered predicates
B.28. Module system
B.29. Separate compilation
B.30. Hierarchical syntax
B.31. Delta logic
B.32. Aggregations
B.33. Default values
B.34. Entities
B.35. Recursion
B.36. Incremental Evaluation
B.37. File predicates
B.38. Derived-only predicates
C. Platform Environment Variables
C.1. General Enviroment Variables
C.2. Compiler Enviroment Variables
C.3. Runtime Enviroment Variables
C.4. Deployment Enviroment Variables
D. blox:compiler API specification
D.1. blox:compiler
D.2. blox:compiler:project
D.3. blox:compiler:block
D.4. blox:compiler:predicate
D.5. blox:compiler:entity
D.6. blox:compiler:clause
D.7. blox:compiler:rule
D.8. blox:compiler:constraint
D.9. blox:compiler:externalAgg
D.10. blox:compiler:formula
D.11. blox:compiler:atom
D.12. blox:compiler:expr
D.13. blox:compiler:constant
D.14. blox:compiler:vardecl
D.15. blox:analysis:varname
D.16. blox:compiler:region
E. blox:compiler:internal API specification
E.1. blox:compiler:internal:code
E.2. blox:compiler:internal:symbol
E.3. blox:compiler:internal:block
E.4. blox:compiler:internal:predicate
E.5. blox:compiler:internal:entity
E.6. blox:compiler:internal:clause
E.7. blox:compiler:internal:expr
E.8. blox:compiler:internal:externalAgg
E.9. blox:compiler:internal:range
E.10. blox:compiler:internal:application
E.11. blox:compiler:internal:binaryExpr
F. BloxAnalysis libraries API specification
F.1. blox:analysis:dependencyInfo library
F.2. blox:analysis:dependencyInfo:rule
F.3. blox:analysis:dependencyInfo:predicate
F.4. blox:analysis:backward_slicing
G. Blade User Interface Testing Framework
G.1. Test Suite Structure
G.2. Tutorial
G.3. Command Options
G.4. Remote Testing
G.5. Command Reference

Part I. Language

Chapter 1. Introduction

Scope

This manual describes the core platform of LogicBlox. LogicBlox uses a language called DatalogLB, and the bulk of this manual describes that programming language in detail. Additionally, it covers several important tools that are used to interact with the system: how to build and run DatalogLB programs, how to test programs using lb-unit, and how to analyze and improve the performance of your programs.

Database programming

When you work with the LogicBlox platform, you store data in a workspace and then analyze and update that data. The data you store can be anything that is relevant to your business, for example sales and inventory data, analysis results, and forecasting results.

The data in a database is stored as a large set of facts. A fact is written as a predicate followed by a number of values in parentheses. For example, the fact sold("squids", 1995, 100) might mean that your company sold 100 squids in 1995, and the fact bought("lemons", 1996, 18) might mean that your company bought 18 lemons in 1996. In the first example, the predicate is "sold", and the values are "squids", 1995, and 100.

You update and analyze the data in a workspace by writing code in the DatalogLB language. For example, you can write DatalogLB code that scans all the "sold" facts and produces new facts like eversold("squids", 15021).

Chapter 2. Values and Types

The facts in a workspace have a predicate and a number of values. For example, in the fact sold("squids", 100), the two values are the textual string "squids" and the integer 100. Furthermore, each such value has a type. For example, the type of "squids" is string, and the type of 100 is int[64]. This chapter describes the different kinds of values that you can have in a workspace.

Booleans

There are two boolean values, true and false.

Integers

Integer values are the usual mathematical integers such as 0, 25, and -10. In Datalog, every integer value has an integer type. Every integer type is associated with a range of integers, and every integer value is within the range of its associated type.

There are eight integer types available. Each type has a signedness and a precision. The signedness is either signed or unsigned, and the precision is one of 8, 16, 32, and 64. An integer type is written as either int or uint, depending on whether it is signed, followed by the precision in square brackets. That is, the eight integer types are written: int[8], int[16], int[32], int[64], uint[8], uint[16], uint[32], and uint[64].

The range of an unsigned integer with precision in is from 0 to 2^n-1. The range of a signed integer is from -(2^(n-1)) to 2^(n-1) - 1. In tabular form, the ranges are as follows:

Table 2.1. Integer type ranges

TypeMinimumMaximum
uint[8]0255
uint[16]032767
uint[32]04294967295
uint[64]018446744073709551615
int[8]-128127
int[16]-1653616535
int[32]-21474836482147483647
int[64]-92233720368547758089223372036854775807

Floating-point numbers

The system supports four kinds of IEEE 754 floating-point numbers: 32-bit binary, 64-bit binary, 64-bit decimal, and 128-bit decimal. A detailed description of floating-point numbers is beyond the scope of this document, but here are a few rough notes.

A fixed-point decimal number has an integer part and a fractional decimal part, the base of the exponent is 10, just as with standard scientific notation. The integer and fractional parts are separated by a dot and followed by a required suffix d. For example, a decimal number can be written as 3.14d.

A binary floating-point is written with an integer part, a decimal fractional part with a dot prefix and/or an exponent over base 10 with prefix E (internally it is represented using an exponent base of 2), and an optional suffix f. For example, a floating-point number can be written as 2.71 or 2.71f with a decimal part, 2E3 or 2E3f with an exponent part (equivalent to 2000.0), or 2.71E3 or 2.71E3f with both decimal and exponent parts (equivalent to 2710.0). Binary floating-point computations run much more quickly but can have unintuitive behavior.

The number of bits in the floating-point type determines how much space the computer uses to represent the number. Some of those bits are used for the integer part of the number, whereas others are used for the exponent. Floating-point numbers with fewer bits are faster to compute with, but they have more severe round-off errors.

The two binary floating-point types are written as float[32] and float[64]. The two decimal floating-point types are written as decimal[64] and decimal[128].

Strings

A string is a finite sequence of Unicode characters. For example, "hello" and "Bob" are strings.

There two ways of writing strings. In the first a string is sequence of Unicode characters, not including line breaks between two quote characters. If two of these strings are written adjacent to each other, even when separated by lines, they will be concatenated together. For example, writing "foo" "bar" will reduce down to the single string literal "foobar".

The second form of string is any sequence of Unicode characters, including line breaks, between two groups of three quoote characters. For example,


        """foo
        bar""".
      

Datetime values

A datetime is the combination of a date and a time, measured in seconds. For example, August 12, 1980, at 4:13, is a datetime.

Entities

Entities provide a way for you to define your own kinds of values. For example, instead of having to use the string "Georgia" to refer to the state of Georgia, you could define an entity type for states and then create an entity for each state. Using entities instead of more primitive types such as strings and integers makes a program easier to understand, because the program's data structures correspond better with your own mental model of how the data is arranged. Additionally, using entities helps catch errors such as confusing two different states named "Georgia" with each other.

Every entity has an entity type and an identity. The entity types available in a program are defined by the user. Define as many or as few as makes sense for the programming task being approached. The identity of an entity is stored internally in the system and is used to determine when two entities are the same or not. Whenever this document writes about a new entity being created, that new entity has an identity that is different from the identity of any other entity already in the database.

Entity types are written as entity[`id], where id is an identifier indicating the name of the entity. For example, entity[`state] is an entity type with the name "state". When it's unambiguous, such a type is written as just the identifier. For example, sometimes entity[`state] is written as just state.

An entity type can be a subentity of another entity type. Part of declaring an entity type is declaring which other entity types it is a subentity of. The subentity relation is transitively closed. That is, if A is a subentity of B, and B is a subentity of C, then A is a subentity of C.

The superentity relationship is the inverse of the subentity relationship. Whenever A is a subentity of B, by definition B is a superentity of A.

Note that for any program without compile errors, the subentity relationship will include no cycles or self references. Further, whenever two entities have any common superentity at all, one of those superentities is a subentity of all the others.

The type hierarchy

This section reviews the kinds of types available in LB Datalog and defines some relationships and operations over types that are used throughout the rest of the manual.

Types

The following types are available in LB Datalog:

  • The bottom type, bottom.
  • The top type, top.
  • The numeric types, uint[8], uint[16], uint[32], uint[64], int[8], int[16], int[32], int[64], float[32], float[64], decimal[64], and decimal[128].
  • The non-numeric primitive types, which are boolean, datetime, and string.
  • Entity types, written entity[`id] for some identifier id.
  • Intersection types, written intersection(T), where T is a set of at least two types that are all numeric types, non-numeric primitive types, or entity types. The type intersection(T) is said to be an intersection over the types T. None of the types in the intersection can be a subtype of any of the other, and at most one of the types can be a numeric type.

A primitive type is a numeric type or any of the non-numeric primitive types.

Subtypes

Some types are subtypes of other types. A type T is a subtype of type S if and only if one of the following is true:

  • S and T are the same type.
  • S is bottom.
  • T is top.
  • S and T are entity types, and S is a subentity of T.
  • S is the numeric type skind[sprec], T is the numeric type tkind[tprec], sprec is at most tprec, and skind occurs no later than tkind in the following list: uint, int, float, decimal.
  • S is an intersection of the types SS, and one of the types in SS is a subtype of T.
  • T is an intersection of the types TT, and S is a subtype of each type in TT.

Greatest lower bounds

The greatest lower bound of types T and S is the largest type that is a subtype of both T and S. It can be calculated as follows.

First, if either type is a subtype of the other, then that type is the greatest lower bound. Suppose, then, that neither is a subtype of the other.

Second, if S is the numeric type skind[sprec], and T is the numeric type tkind[tprec], the greatest lower bound is the numeric type rkind[rprec], where rkind and rprec are computed as follows. The precision, rprec, is the smaller of sprec and tprec. The kind, rkind, is the first kind in the following list that is the same as either tkind or skind: uint, int, float, decimal.

In all other cases, start by finding the component types of T and S as follows. The component types of a primitive type or an entity type is the set containing only that type. The component types of an intersection type is the set of types in the intersection.

Next, find the non-redundant component types of T and S as follows. Include in the set any non-numeric component type such that there is no other component type that is a subtype of it. Additionally, if any of the component types are numeric types, include the greatest lower bound of all the numeric types, which will itself be a single numeric type.

If the set of non-redundant component types has exactly one element, then that element is the greatest-lower bound of S and T. Otherwise, the greatest lower bound is the intersection type over the set of non-redundant component types.

Least upper bounds

The least upper bound of two types T and S is the smallest type that is a supertype of T and S. It can be calculated as follows.

If T is a subtype of S, then the least upper bound is S. Likewise, if S is a subtype of T, then the least upper bound is T.

If T is an intersection over TT, then compute the set TTS containing the least upper bound of each element of TT with S. The least upper bound of T and S is then the greatest lower bound of the types in TTS.

Likewise is S is an intersection type.

If S is the numeric type skind[sprec] and T is the numeric type tkind[tprec], then the least upper bound is the numeric type rkind[rprec] computed as follows. The precision, rprec, is the larger of sprec and tprec. The kind, rkind, is the last kind in the following list that is either skind or tkind: uint, int, float, decimal.

If S is an entity type and T is an entity type, and they have a common superentity, then the least upper bound of S and T is the least such superentity.

If none of the above cases apply, then the least upper bound of S and T is top.

Chapter 3. Predicates

Predicates separate facts into different kinds. The two facts bought("squid", 100) and sold("squid",100) have the same values but very different predicates. For the first one, the predicate is 'bought', and for the second the predicate is 'sold'. This chapter describes all the predicates available in Datalog code.

3.1. Type declarations

Subtyping

Type declaration of any predicate is a subtype.

3.2. Functional predicates

A functional predicate divides the arguments of a predicate into n key arguments and one value argument. The Cartesian product of the key arguments is the domain of the functional predicate. The combination of all key arguments is also referred to as the key. The value argument is the codomain of the functional predicate. Introducing a functional predicate defines a functional dependency between the key arguments and the value arguments. That is, for every combination of key arguments there can be only one value.

The domain of a functional predicate is called the key. The codomain of a functional predicate is called the value.

For all x1-n in X1-n, y1,2 in Y, it holds that if p[x1-n] = y1 and p[x1-n] = y2, then y1 = y2.

This property is guaranteed at run-time. If the functional dependency of a functional predicate is violated, then an exception is reported and the current transaction is aborted.

A variable in a key position is referred to as key variable.

Functional predicates have an alternative syntax p(x;y). Note that the semicolon distinguishes a functional predicate from a normal predicate.

3.3. Entity predicates

Entity predicates are unary predicates that are treated specially.

Only entity variables are allowed to be existential in the head of a rule.

Every entity element has an internal, unique representation (since entities are normal unary predicates, this is also known as the key).

An entity can optionally have a reference mode, which provides a user-accessible identifier to each entity.

3.3.1. Reference mode predicates

Reference mode predicates are special functional predicates associated with an entity. Reference mode predicates are used to refer to a specific entity element. For example, the reference mode predicate person:name associated with a person entity could be used to look up a specific person by name.

An entity does not necessarily need to have a reference-mode predicate, but since it is not possible to refer a specific entity element without a reference-mode predicate, almost all practical entities have an associated reference-mode predicate.

Reference-mode predicates are injective functions from the entity type they are associated with to a some value, which can has to be a primitive type. The value of a reference-mode predicate cannot be an entity. Because of the special nature of the reference-mode predicates, there is a special syntax for declaring reference-mode predicates.

p(x), p(x:s) -> string(s)

Uses of the reference-mode can use the colon syntax, or the notation for functional predicates: p[x] = s. The colon syntax cannot be used for functional predicates that are not reference modes.

A reference-mode predicate p[x] = y is an injective (one-to-one), non-surjective functional predicate. This means that reference-mode predicates, compared to general functional predicates, provide the guarantee that for all elements y of Y there is at most one x in X such that p[x] = y.

An entity can have only one reference mode predicate.

Subtypes inherit the reference-mode predicate of their supertype. Subtypes cannot declare an additional reference-mode predicate.

Restriction: the value of a reference-mode predicate has to be a primitive type. Using a different entity as the value is currently not supported.

3.3.1.1. Auto-numbered reference modes

An auto-numbered reference mode allows the engine to automatically assign a reference value for each entity. The following declares that predicate q and its auto-numbered reference mode:

q(x), qId(x:i) -> uint[32](i).
lang:autoNumbered(`qId).

An auto-numbered reference mode must have a value of type uint[32].

3.4. Constructor Predicates

A constructor predicates act as a one-to-one function (injective) that maps a multi-dimensional key to elements of an entity. The key arguments can be primitive types or entity types. Currently the value type of a constructor predicate must be an entity declared scalable and ref-mode less. Constructor predicates are a generalization of the concept of a reference-mode predicate, where a single primitive value has a one-to-one relationship with an entity element.

When used in the head of a rule a constructor predicate must have an existentially quantified variable for the value argument. This means that the value argument cannot occur in the body of the rule. If a value does not already exist for a given key in the constructor predicate, then a new entity element will be created; otherwise the existing entity element is used.

Both table and array predicates can be declared as constructor predicates. Entity and refmode predicates cannot be used as constructor predicates. Constructors are declared using the lang:constructor property:

person(x) -> .
byName[f,l]=x -> string(f), string(l), person(x).
lang:constructor(`byName).
lang:physical:storageModel[`person] = "ScalableSparse".

One can use a constructor predicate to help create new entity values. For instance:

+person(p),
+by_name["Jane","Doe"] = p, ... <- ...

The above rule will create a new person element if there is no fact in the predicate by_name with key ["Jane","Doe"]. if there is such a fact, then `p' is bound to the existing person element.

Constructor predicates can also be used to derive new entity elements in IDB (non-delta) rules.

person(p),
by_name[f,l]=p 
  <- input_names(f,l).

The above rule will derive a new entity person for every pair of strings in input_names. The advantage of using such rules instead of delta rules, is that, as is true for all IDB rules, the data in person and by_name is automatically and incrementally maintained to be consistent with any changes to input_names.

When a constructor atom is used to bind an existentially quantified variable in the head of a rule, it must not share that variable with any other constructor atoms. For instance, the following is disallowed:

ssn[s]=p -> string(s), person(p).
lang:constructor(`ssn).

person(p),
by_name[f,l]=p,
ssn[s]=p
  <- input_data(f,l,s)

3.5. Ordered Predicates

Any entity predicate entity_p can be declared to support "ordering" using the following fact assertion:

lang:ordered(`entity_p).

Note that the ordering supported for entity_p is insertion ordering. An ordered predicate entity_p has the following predicates automatically defined for it:

ordered:first[`entity_p][] = p1
Binds p1 to the first inserted element in entity_p.
ordered:last[`entity_p][] = p2
Binds p2 to the last inserted element in entity_p
ordered:next[`entity_p][curr] = next
Binds next to the element inserted after curr.
ordered:offset[`entity_p][p1, p2] = num
Binds num to the difference in the insertion order of p1 and p2.

The following alternatives are deprecated and not supported in modules:

entity_p:first[] = p1
Deprecated alternative for ordered:first[`entity_p][] = p1
entity_p:last[] = p2
Deprecated alternative for ordered:last[`entity_p][] = p2
entity_p:next[curr] = next
Deprecated alternative for ordered:next[`entity_p][curr] = next
entity_p:offset[p1, p2] = num
Deprecated alternative for ordered:offset[`entity_p][p1, p2] = num

Variable binding in ordered predicates.

Proper usage of ordered predicates involve binding the argument variables to appropriate entities.

  • entity_p:first[] = p1 and entity_p:last[] = p1: p1 does not have to be otherwise bound.
  • entity_p:next[curr] = p1: At least one of curr or p1 should be bound in another atom. For example, the following declaration of rule p does not properly bind variables, since neither curr or x is bound in another atom in the body.
    person(x),person:name(x:n) -> string(n).
    female(x) -> person(x).
    
    lang:ordered(`female).
    
    p(x) <- female:next[curr] = x.
    
    Additionally, a bound variable should have a type that has a common supertype with entity_p. For instance, building on the previous example, female is an ordered predicate. Thus, any argument to female:next should have a type that has a common supertype with female. The following rule declarations all have correctly typed variables:
    male(x) -> person(x).
    girl(x) -> female(x).
    
    q(x) <- female:next[curr] = x, girl(curr). 
    r(x) <- female:next[curr] = x, female(curr).
    s(x) <- female:next[curr] = x, male(curr).
    t(x) <- female:next[curr] = x, person(curr).
    
    It is clear that in q and r, curr has the types girl and female, respectively. The common supertype obtained is female in both cases. In both s and t, the common supertype between the type of curr and female is person

    The following rule, however, does not have a correctly typed curr, since there is no common supertype between female and random:

    random(x) -> .
    u(x) <- female:next[curr] = x, random(curr). 
    

    Note that it is not necessary to bind variables with unary predicates. The following rule is correct, as well:

    ancestor(x,y) -> person(x), person(y).
    
    u(x) <- female:next[curr] = x, ancestor(_,curr).
    

  • entity_p:offset[p1,p2] = num requires that at least two of its three argument variables should be bound. The following three rules demonstrate the three possible, legal, binding combinations:
    u(f1)     <- female:offset[f1,f2] = _, female(f1), female(f2).
    v[f1] = x <- female:offset[f1,f2] = x, female(f2), x=2.
    w[f1] = x <- female:offset[f1,_] = x, female(f1), x=2.
    

Ordered predicates are not allowed on the left-hand side of delta rules.

3.6. File Predicates

We have introduced support for a variety of file predicates. If a predicate is declared to be a file predicate, then the facts of this predicate correspond to the contents of a file. File predicates are a convenient way of reading and writing data from or to external files. As opposed to the old import mechanism, file predicates provide a lot of flexibility in how the data in a file is processed.

As a simple example, consider the following logic that copies a file. The input file 'input.csv' is represented by the predicate '_in', which has two arguments, of type string and integer. The conversion from values in the file to these primitive types happens automatically. The output file 'output.csv' is represented by the predicate '_out'. Every number is incremented by one, just to demonstrate how logic can be used to manipulate the data.

_in(s,x) -> string(s), int[32](x).
lang:physical:storageModel[`_in] = "DelimitedFile".
lang:physical:filePath[`_in] = "input.csv".
lang:physical:hasColumnNames[`_in] = false.

_out(s,x) -> string(s), int[32](x).
lang:physical:storageModel[`_out] = "DelimitedFile".
lang:physical:filePath[`_out] = "output.csv".
lang:physical:hasColumnNames[`_out] = false.

_out(s, y) <-
   _in(s, x), y = x + 1.

If 'input.csv' contains the following data:

John,43
Mary,25
Bill,14

then after executing this logic, the file 'output.csv' will contain:

John,44
Mary,26
Bill,15

As a slightly more useful example, the following interactive script (see the Section "BloxBatch scripting") defines a small schema of for persons, and imports the 'input.csv' file above to define persons and their age. This example uses automatic ref-mode conversion.

create testws --overwrite

transaction
addBlock <doc>
   person(x), person:name(x:s) -> string(s).
   person:age[x] = i -> person(x), int[32](i).
</doc>
commit

transaction
exec <doc>
   _in(s,x) -> string(s), int[32](x).
   lang:physical:storageModel[`_in] = "DelimitedFile".
   lang:physical:filePath[`_in] = "input.csv".
   lang:physical:hasColumnNames[`_in] = false.

   +person(s), +person:age[s] = x <- _in(s, x).
</doc>
commit

An overview of the various features supported by file predicates:

  • File predicates supported storage models "DelimitedFile", "BinaryFile", and "RawFile". The last two are mostly useful for exchanging data between different LogicBlox workspaces.
  • The delimiter used between columns can be configured:
    lang:physical:delimiter[`p] = ";"'.
    
  • Delimited files support column names. Column names are configured using the 'lang:physical:columnNames' property. The value is a string of names separated by the delimiter of the file predicate. When specifying column names, it is currently necessary to also explicitly define the delimiter of the predicate.
    lang:physical:delimiter[`p] = ","'.
    lang:physical:columnNames[`p] = "a,b,c".
    
  • The file predicate can be configured to include line numbers. In this case, the first argument of the file predicate represents the line number of a fact. The second argument is the first column, and so on. This is only applicable to reading files using file predicates.
    lang:physical:lineNumbers[`p] = true.
    
  • String values from the file are automatically converted to the primitive type arguments of the file predicate. Arguments of type entity are not supported. While the automatic conversion is convenient, it does not do any error handling or reporting. If custom error handling is necessary, then we recommend to use strings as parameters of the file predicate.
  • File predicates are supported in queries (transaction lifetime blocks) and pre-compiled queries (database lifetime blocks, with transaction-lifetime predicates).

3.7. One-to-one mappings

Reference-mode predicates are one-to-one mappings. It is possible to declare other functional predicates explicitly as one-to-one functions.

Limitations: The engine does currently not enforce the one-to-one property

3.8. Implicitly declared predicates

For ordered predicates, see the ordered predicate section for now.

3.9. Predicate properties

Predicate properties must be set in the same block where a predicate is being declared. Once set, a property cannot be changed without rebuilding the workspace.

3.9.1. Physical properties

  • lang:physical:capacity
  • lang:physical:partitioning
  • lang:physical:storageModel

Entities have a capacity

The capacity can be set in the block where an entity is initially declared. It is not possible to change the capacity of an entity.

3.9.2. Logical properities

Logical properties can be set using the following format:

logical_property_decl ::=
  predicate_property '(' '`' id ')' '.'
; val_predicate_property '[' '`' id ']' '=' value

predicate_property ::=
  lang:autoNumbered
; lang:defaultValue
; lang:derivationType
; lang:entity
; lang:pulse
; lang:constructor
; lang:disjoint
; lang:ordered

val_predicate_property :=
  lang:isEntity
; lang:isPulse
; lang:lockingPolicy
; lang:isOrdered

id indicates the predicate for which the property is being set. If the property requires a value (for val_predicate_property), then the appropriate value should be provided.

Chapter 4. Lexical Syntax

4.1. Introduction

Datalog is a text format that must be parsed before it can be interpreted. This chapter describes the first step of parsing, which is lexical analysis. Lexical analysis converts a flat text file into a sequence of tokens. Later chapters show how these sequences of tokens are interpreted as various kinds of syntax.

4.2. General

Datalog programs are represented as Unicode text encoded as UTF-8. Programs consist of a number of tokens. Each token is matched using the longest match principle: whenever adding more input characters to a valid token will result in a new valid token, the longer token is the one used. For example, '>=' is parsed as a single token '>=' rather than as two tokens '>' and '='.

In this manual, the notation U+nnnn is used to indicate the Unicode code point numbered nnnn in hex. For example, U+0020 is the space character. More frequently, a character or string is described by writing it in single quotes. For example, 'A' is the same character as U+0041.

4.3. White space and comments

White space and comments are used to lay out code and to separate tokens that would otherwise combine due to the longest match principle. White space and comments are immediately discarded after being parsed.

White space is any sequence of the following characters: space (U+0020), tab (U+0009), form feed (U+000C), carriage return (U+000D), or line feed (U+000A).

A comment can be written either of two ways. One way is to start with a slash and an asterix ('/*'). Such a comment continues until the first instance of an asterix followed by a slash ('*/'). The second way is to start with two slashes ('//'). In that case, the comment extends to the end of the line. Here are two examples of comments.

// This is a comment

/* This is
   a multi-
   line comment
*/

4.4. Keywords

The following character sequences are keywords in Datalog.

not exists true false

4.5. Identifiers

An identifier is a sequence of characters where each character is a Unicode letter, a Unicode numeric digit, a dollar sign ('$'), an underscore ('_'), or a colon (':'). The first character of an identifier cannot be a digit. Here are some examples of identifiers:

x
y
cost
sales_2010
PriceStoreSku
sku:cost

4.6. Operators

The following sequences are used as operators in Datalog.

.
::
:
,
;
<-
->
=
<
>
!=
<=
>=
(
)
/
-
+
*
^
@
[
]
!
min=
max=
+=
|=
&=

4.7. Literals

4.7.1. Integer Literals

An integer literal is a sequence of one or more decimal digits. Here are some example integer literals:

0
123
42

4.7.2. Floating-Point Literals

A floating-point literal specifies an IEEE 754 floating-point number. It is given as an integer literal, followed by an optional decimal part, followed by an optional exponent part. Either the decimal part or the exponent part must be specified, or it will not be parsed as a floating-point literal.

The decimal part, if present, is indicated with a period ('.') followed by another integer literal. The exponent part, if specified, is indicated with the letter 'e' or 'E', followed by an optional plus ('+') or minus ('-') sign, followed by an integer literal.

Here are some example floating-point literals:

31.555
31e12
31.555e-12

4.7.3. Boolean Literals

There are two boolean literals, 'true' and 'false'.

true
false

4.7.4. String Literals

A string literal is a double quote character ('"', U+0022), followed by zero or more character specifiers, followed by another double quote characters. Each character specifier determines one character that will be included in the string. The possible character specifiers are as follows:

  • Any character except a double quote, a backslash ('\', U+005C), or a newline (U+000A). The character specifies itself for inclusion in the string.

  • '\"', indicating a double quote character (U+0022).

  • '\b', indicating a backspace (U+0008).

  • '\t', indicating a tab character (U+0009).

  • '\n', indicating a newline character (U+000A).

  • '\f', indicating a form feed character (U+000C).

  • '\r', indicating a carriage return character (U+000D).

  • '\\', indicating a single back slash (U+005C).

  • '\'', indicating a single quote character (U+0027).

  • '\u' followed by exactly four hexadecimal digits, indicating the Unicode character with the code point given by those hex digits. Hexadecimal digits that are letters may be given in upper or lower case.

The following are some example string literals:
"hello, world"
""
"He said, \"It's only logical.\"\n"
"\uDEADbeef"

4.7.5. Predicate Literals

A predicate literal is a back quote ('`', U+0061) followed by an identifier. For example:

`p
`q
`parent

4.7.6. Datetime Literals

A date and time literal is specified as a hash sign ('#', U+XXXX), followed by a date specifier, followed by an optional time specifier, followed by an optional time-zone specifier, followed by another hash sign. The date is specified as a integer month value, followed by a slash ('/', U+XXXX), followed by an integer day value, followed by another slash, followed by an integer year value.

The time value, if present, starts with a space (' ', U+0020), followed by an integer for the hour, a colon (':', U+003A), an integer for the minutes, and then optionally a colon and an integer for the seconds.

The time zone, if present, starts with a space and is followed by a textual string indicating the time zone. Several different formats for the time zone are accepted.

Here are some example date and time literals:

#05/08/1989#
#05/08/1989 14:30#
#05/08/1989 4:30:02#
#05/08/1989 GMT#
#05/08/1989 -0400#

Chapter 5. Expressions

An expression is a kind of syntax to compute values. For example, '3+4' is an expression that adds the value '3' with the value '4' to produce the value '7'. Another example is 'x+1', which adds whatever value is in 'x' to '1' to produce a new value.

This chapter describes the different kinds of expressions you can write as well as how those expressions compute a value.

In general, note that it is also possible for an expression not to compute anything. For example, 5/0 doesn't compute any result. Thus all expressions compute either one or zero values.

5.1. Constants

expr ::= constant .
constant ::= string | boolean | number | datetime .
number ::= real | integer .

Several kinds of literals may be used as constant expressions. The value of the expression is the same as the value indicated by the literal. The kinds of literals for which this is supported are: strings, booleans, integers, floating-point literals, and date and time literals. Here are some example constants:

"hello"
12
#05/08/1989#

5.2. Variables

expr ::= identifier .

A variable can be used as an expression. The value of the expression is the same as the value referred to by that variable. Some example variable expressions are:

x
y
cost

5.3. Arithmetic operations

expr ::= expr "+" expr .
expr ::= expr "-" expr .
expr ::= expr "*" expr .
expr ::= expr "/" expr .

When parsing, multiplication and division have higher precedence than addition and subtraction, and otherwise operations associate to the left. For example, 'x+y*z' parses the same as 'x+(y*z)', because the '*' operator has higher precedence than the '+' operator. Otherwise, ambiguities are resolved by associating to the left. For example, 'x+y+z' parses the same as '(x+y)+z'.

An arithmetic expression evaluates to its usual numeric meaning if both of the arguments are numbers and that meaning exists. For example, '+' is for addition, and the expression '3+4' evaluates to the number 7. As an exception, the result of division by two integers is always an integer. If numeric division results in a fraction, then the result of the expression is that fraction rounded toward zero. For example, the result of '-4/-3' is 1.

Additionally, '+' is used to mean string concatenation. If both arguments are strings, then the result of the expression is the string that is the concatenation of the two arguments. For example, '("abc" + "def")' evaluates to the string "abcdef".

Division by zero does not have any result. For example, the expression '5/0' does not evaluate to anything at all.

The type of the expression depends on the type of the two values being operated over and on whether or not the arguments are integer constant expressions.

Integer constant expressions include integer constants and, recursively, addition, subtraction, multiplication, and division of other integer constant expressions. For example, 0, 5, 3+4, and 3-(1+2) are integer constant expressions. If both arguments to the operation are integer constant expressions, then the type of the expression is determined by the result of the expression. If the result is negative, then the expression's type is a signed integer type, and otherwise the expression's type is an unsigned integer type. The precision of the expression's type is the smallest precision such that the range of the resulting type includes the result, or 64 if no precision suffices.

It is a compile error for the right side of a division expression to be an integer constant expression that evaluates to 0.

If either argument is not an integer constant expression, then the type of the expression is determined by the types of the arguments. If they are both strings, the result is a string, and otherwise the result is a numeric type. The precision of the numeric type is the greater of the precisions of the input types, unless that precision is less than 32-bit, in which case the precision is 32-bit. The kind of numeric type is the last kind in the following list that one of the arguments is a member of: unsigned integer, integer, binary floating-point, decimal floating-point. Here are some examples of the type resulting from arithmetic over two other numeric types.

Table 5.1. Example arithmetic result types

Argument typeArgument typeResult type
int[32]int[32]int[32]
int[32]uint[32]int[32]
int[8]int[8]int[32]
float[32]decimal[64]decimal[64]

If the range of the type of the expression does not include the integer value of the result, then the expression has no result.

Here are a few examples of arithmetic operations:

3 + 4
x * 5
x + borderWidth

5.4. Applications

expr ::= identifier "[" ( expr ("," expr)* )? "]" .

An application looks up a fact in the database based on all all of the values in the fact except the last one, and it returns the last value as the value of the application. For example, the application 'sold["squids", 1995]' might look in the database and find the fact sold("squids", 1995, 100), in which case the value of the expression is 100.

Applications are only allowed to be written for predicates that are functional. It is an error to write an application for a non-functional predicate.

The expressions between square brackets ('[', ']') are called arguments to the application. The number of arguments supplied must be one less than the number of arguments to the predicate. It is an error to specify any other number of arguments.

The application matches those facts in the database where the initial sequence of values in those facts are equal to the entire sequence of values that the application arguments evaluate to. Because the predicate is function, the system ensures that there is no more than one such fact. If there is exactly one such fact, then the value of the application is the last value in that fact. If there are no such facts, then the expression has no value.

Here are a few example applications:

cost["squids", 1995]
cost[item, year]
cost[bestseller[year], year+1]

5.5. Parenthesized expressions

expr ::= "(" expr ")" .

An expression may be enclosed in parentheses. The value resulting from such an expression is the same as the value resulting from the enclosed expression. Parenthesized expressions are useful for overcoming precedence. For example, '2*3+4' has a different result from '2*(3+4)', because the parentheses cause the operations to happen in a different order.

Chapter 6. Formulas

This chapter describes the various kinds of formulas that are supported in Datalog. Formulas are a general construct that evaluate to true or false according to the information currently in the database. When used in the body of a rule, they are used to select parts of the database that are true under some condition. When used in the head of a rule, they are used to assert that new information is true. When used in a constraint, they check whether something is true or not, and if it's not, they raise an integrity violation.

6.1. Comparison

formula ::= expr (comp_oper expr)+ .
comp_oper ::= "=" | "!=" | "<" | ">" | "<=" | ">=" .

A comparison compares two or more expressions with each other. The formula holds true if for each operator, the expressions on each side evaluate to a value, and those two values compare to each other in the indicated way.

If the comparison operator is equality ("="), then the two values must be the same, and if the operator is inequality ("!="), the two values must be different. Equality and inequality can hold true for values of any type.

The ordering operators "<", ">", "<=", and">=", test whether the two values are ordered in the indicated way. Ordering operations can only hold true when applied to values that have an ordering, such as numbers, strings, and dates.

Only the first comparison operator in a comparison formula can be "=" or "!=". All of the other ones must be an ordering operator.

Here are some example comparisons:

x = y
y > 0
0 <= p < 100

6.2. Atoms

formula ::= identifier "(" argument_list ")" .

argument_list ::= ( expr ("," expr)* )?
              |   identifier ":" (identifier | constant) .

An atom holds true if it describes a fact that is in present the database. For example, if the fact p(7) is in the database, then the atom p(3+4) holds true.

The fact an atom describes has two components: a predicate and a list of values. The predicate is indicated by the identifier at the beginning of the atom. The list of values is indicated in one of two ways.

The first way the list of values can be indicated is as a list of expressions separated by commas. This syntax can be used with any predicate, and it indicates that the values in the list are the values the expressions evaluate to. For example, the atom p(3+4) indicates a predicate named "p" and a single value resulting from evaluating 3+4.

The second way the list of values can be indicated is with two expressions separated by a colon. This syntax can only be used if the predicate is a reference mode predicate, and the two expressions are tightly restricted in their form. The first expression must be an identifier (indicating a variable expression), and the second expression must be either an identifier or a constant. As an example, the atom person:name(bob:"Bob") indicates a predicate named "person:name" and two values: whatever the variable bob refers to, and the string "Bob".

With either syntax, if any of the supplied expressions do not evaluate to a value, then the atom as a whole does hold true.

6.3. Conjunction and disjunction

formula ::= formula "," formula .
formula ::= formula ";" formula .

A conjunction is written as two formulas separated by a comma. A conjunction is true whenever both of the two formulas are both true. For example, 3<4,4<5 is true, because both formulas are true. However, 3<4,4>5 is not true, because only the first of the two formulas is true.

A disjunction is written as two formulas separated by a semicolon. A disjunction is true if either one of the two formulas is true. For example, 3<4;4>5 is true, because the first of the two formulas is true.

6.4. Negation

formula ::= "!" formula .

A negation holds true when the negated formula does not hold true. For example, !(0>1) is a true formula.

Any variable used within a negation must be used only once within the negation, or else must also be used somewhere outside the negation in a binding context.

Additionally, negation is only allowed when the platform can determine a way to stratify all rules and constraints that use negation. Stratification means that the truth value of any fact in the database is well-defined. There are no facts that are in the database if and only if they are not in the database.

Here are some example negations:

!p(x,y)
!(p(x),q(x))

6.5. Precedence

Negation has higher precedence than conjunction, and conjunction has higher precedence than disjunction. For example, p();q(),r() parses as a disjunction and !a(),b() parses as a conjunction.

When precedence does not fully disambiguate a parse, association is to the left. For example, p(),q(),r() is a conjunction of p(),q() with r().

6.6. Parenthesized formulas

formula ::= "(" formula ")"

A formula surrounded by parentheses is also a formula. The two formulas have the same truth value. For example, due to the use of parentheses, the following formula is a conjunction:

(p(); q()), r()

Chapter 7. Rules

Rules scan the facts in the database and produce new facts. This chapter gives the syntax and semantics of rules.

7.1. Aggregation

General form

result[] = x <-
   agg<<x = method()>>
     input(y).

An aggregation computes a function across all the information in the database matching some formula. The information to be aggregated over is specified by a formula.

V=count()
Calculates the count of facts, V, in a collection of facts. If the collection is empty, then the aggregation doesn't calculate anything. Thus, the calculated count will never be 0.
V1=min(V2) V1=max(V2)
Calculates the minimum or maximum value, V1, given a collection of values identified by V2.
V1=min(V2) K1=min(V2,K2) V1=max(V2) K1=max(V2,K2)
Calculates the minimum or maximum value, V1, and the K1 element that is associated with the minimum or maximum value, given the collection of values identified by V2 and associated elements K2. In other words, this calculates the min and argmin, or the max and argmax. More than one argmin or argmax can be calculated.
V1=top[N](V2) V1=bottom[N](V2)
Calculates the top values or bottom values, V1, given a collection of values identified by V2. The number of values returned is identified by N and must be an ordered entity in the key of the output.
V1=top[N](V2) K1=top[N](V2,K2) V1=top[N](V2) K1=bottom[N](V2,K2)
Calculates the top values or bottom values, V1, and the K1 elements that are associated with the top or bottom values, given the collection of values identified by V2 and associated elements K2. In other words, this calculates the top and argtop, or the bottom and argbottom. More than one argtop or argbottom can be calculated.
V1=ambig(V2)
Returns a value, V1, if all values in a collection are the same. Collection is identified by V2.
V1=and(V2) V1=or(V2)
Returns a value, V1, which is the boolean AND or OR of all values in collection identified by V2.
V1=concat(D,V2)
Returns a string, V1, which is the concatenation of a collection of strings identified by V2. The delimiter is identified by D.

7.2. Series functions

Series functions is a collection of aggregations designed to work on time series data. A time series is a predicate of the form

sales[week,d0,...,dn] = v 

where week is a variable ranging over an ordered entity representing the time dimension, and d0 through dn may be other dimensions indexing multiple time series in the same predicate ts. The type of the value v is usually a numeric type such as float or integer. Unlike aggregations, which often reduce the dimensionality of its inputs, series functions usually transform a time series into another time series, acting like maps.

Similarly to the aggregations, the general form of time series functions is:

result[w,s]=v <- series<<v = method[w](i)>> input[w,s]=i.
val,alpha=singleESOpt[t](v)

INPUT: timeSeries[t,...]=v

OUTPUT: alpha:timeSeries[ ... ]=alpha, timeSeries:smoothed[t, ... ]=val

Calculates single exponential smoothing of the input time series indexed by time dimension t, as well as an optimal alpha parameter that was inferred and applied for each series in the input to produce the output. Both the alpha and the smoothed series are returned. Downhill simplex is used to compute the optimal alpha based on minimizing the prediction error. Note that the alpha:timeSeries predicate has the same dimensionality as the timeSeries:smoothed without the time dimension, i.e., there is one alpha parameter for each series.
vf,holtTrend,alpha,beta=linearESDOpt[t](v)>

INPUT: sales[t,...] =v

OUTPUT: alpha[...] = alpha, beta[...] = beta, holtTrend[t,...]=holtTrend, holtDeviates[t,...]=vf

Calculates linear exponential smoothing, using the Holt method, optimizing for the alpha and beta parameters. Also returns trend and deviates with same dimensionality as the input series. Downhill simplex is used to compute the optimal alpha based on minimizing the prediction error.
m=median[t](v) m=medianS[t](v)

INPUT: timeSeries[t, ...]=v

OUTPUT: median[...]=m

For each time series in the input predicate timeSeries, where t is the time dimension, computes the median of the values in the time series. There are two functions: median, which is used when the time series is known to have no gaps or missing values along the time dimension t; and medianS which can handle input time series that may contain gaps in their data. Both series functions use the linear median algorithm: the former is slightly faster, but not accurate for non-contiguous time series.
t2=permSort[t](v)> t2=permSortS[t](v)>

INPUT: timeSeries[t, ...]=v

OUTPUT: permutation[t,...]=t2

Sorts a time series indexed by the ordered time dimension t. The result predicate is a permutation in which uses the timension t to give the index of the first, second, and so on value in the input time series. For example,
first[...]=t -> tdim(t), extradim(ed). 
first[...]=t <- timeSeries[t,...]=_, not exists timeSeries[t2,...], tdim:next[t2]=t.
first[...]=t <- timeSeries[t,...]=_, tdim:first[]=t.

permutation[t,...]=v2 <- series<<v2=permSort[t](v)>> timeSeries[t,...]=v. 
The expression permutation[t:next[first[...]]] gives the time index of the second smallest number in the input series.

Note that just like median, the sort has two forms, permSort and permSortS in which the first can be only applied to time series that do not have gaps in their data, while the second, slightly less efficient, should work on any time series.

outputReal,outputImg=fft[t](inputReal,inputImg)

INPUT: inReal[t,...]=inputReal,inImg[t,...]=inputImg

OUTPUT: outReal[t,...]=outputReal, outImg[t,...]=outputImg

For each time series indexed by t, computes a discrete Fourier transform. The inputs consist of two time series whose values contain the real and imaginary part; the outputs are also two time series predicates (real and imaginary). The time dimension of the input predicates, t, becomes the frequency dimension on the output predicates, where the earliest time value represents the lowest frequency, and so on.
outputReal,outputImg=inversefft[t](inputReal,inputImg)

INPUT: inReal[t,...]=inputReal,inImg[t,...]=inputImg

OUTPUT: outReal[t,...]=outputReal, outImg[t,...]=outputImg

For each time series indexed by t, computes a the inverse discrete Fourier transform. The inputs consist of two time series whose values contain the real and imaginary part; the outputs are also two time series predicates (real and imaginary). The frequency dimension of the input predicates, t, becomes the time dimension on the output predicates, where the lowest frequency value represents the earliest time, and so on.

7.3. Incremental evaluation

Installed, non-delta logic rules are evaluated incrementally by the engine, which means that computed results are stored, and future transactions only recompute changes based on changes in the predicates in the body of the rule. That is, existing results that are not invalided are assumed to still be correct. For non-deterministic features, storing those results is actually incorrect, since they can change in every single transaction. Therefore, such non-deterministic logic features can be used only in logic that is fully evaluated, which is any rule in queries or delta rules in queries and installed blocks. If you do use these features, then you need to know that the application might work with results that are logically incorrect.

The compiler reports the following errors if you use non-deterministic language features in installed derivation rules:

INCR_NONDET:
  isExpired(qo) <- validUntilDate[qo] = dt, datetime:now[] = now, dt < now.

INCR_CHOICE:
  useMe[]=i <- choice<<i=any(unused)>> item:isUsed@prev(unused;false).

INCR_PULSE:
  q(x, y) <- p(x, y). where p is a pulse predicate

INCR_DELTA:
  q(x, y) <- +p(x, y)

7.4. Event rules

Some rules are considered what is called an event rule. A rule is considered an event rule if the following is true:

  • the run stage of the enclosing block is INITIAL
  • or the rule contains no deltas, all predicates in the head of the rule are pulse predicates, and
    • the block is transaction lifetime.
    • or there is at least one pulse predicate in the body of the rule.

Chapter 8. Constraints

Constraints tell the system to verify that some formula holds true in the database. Some constraints are checked by the compiler and will thus hold true no matter what data the program is run against. The remaining constraints are checked at run time, whenever a transaction is committed. If any of the constraints fail, then Datalog is a text format that must be parsed before it can be interpreted. This chapter describes the first step of parsing, which is lexical analysis. Lexical analysis converts a flat text file into a sequence of tokens. Later chapters show how these sequences of tokens are interpreted as various kinds of syntax.

8.1. Syntax

The most common form of a constraint is formula1 -> formula2. The system checks that whenever formula1 holds true, formula2 also holds true. Logically, a constraint f1 -> f2 is equivalent to !(f1, !f2). Note how f2 is negated.

Chapter 9. Typing

Datalog programs are type checked. The system finds a type for every predicate, variable, and expression, and it verifies that the types are used consistently. Doing so prevents many kinds of little mistakes, such as adding two product names instead of adding two product prices. This chapter describes how the system assigns and checks the types for non-generic code. For typing of generic code see the chapter on MoReBlox.

9.1. Predicate declarations

A predicate declaration tells the system that a predicate exists and provides typing information about that predicate. When the system type checks your code, the first thing it does is identify all the predicate declarations.

Predicate declarations are written as constraints. For example, here is a predicate declaration:

parentof(x, y) -> person(x), person(y).

In this example, the predicate 'parentof' is declared, and its arguments are declared to both be of type 'person'.

The specific requirements for a constraint to be a predicate declaration are as follows.

The left-hand side of the constraint determines what predicates are being declared. It must either be a single atom, in which case the predicate of that atom is being declared, or it must be two atoms, in which case an entity is being declared along with its reference-mode conversion. If it's a single atom, then every argument to the atom must be a distinct variable. If it's two atoms, the first atom declares the entity predicate, and it must be of the form 'p(x)' for some p and x. The second atom declares the reference-mode conversion predicate, and it must be of the form 'q(x:id)'. The second atom must reuse the same variable 'x' that is used in the first atom.

The right-hand side of the constraint must be either empty, a single unary atom with a variable as its argument, or a conjunction where one of the formulas is a unary atom with a variable as its argument. For a reference-mode declaration, there must be exactly one atom on the right-hand side, and its argument should be the second argument to the second atom on the left-hand side.

All variables that appear in the right-hand side of the constraint must also appear in the left-hand side of the constraint.

It is possible to have more than one declaration for the same predicate. Those multiple declarations can be exact duplicates, or some of them can be more specific than others, or they can each provide different information about the predicate.

Here are some more examples of predicate declarations:

earnings(r, a) -> region(r), int[64](a).
expenditures(r, a) -> region(r).
expenditures(r, a) -> int[64](a).
expenditures(r, a) -> int[64](a), a >= 0.
person(x), person:eid(x:id) -> int[64](id).
region(r) -> area(r).
area(r) -> .
success() -> .

Note that even though all of the above constraints can be used to determine the types of predicates, some of them cannot be completely statically guaranteed. For instance, the compiler cannot guarantee statically that expenditures tuples have a number >= 0 as its second value. This type of constraint is maintained at runtime.

9.2. Entity declarations

A unary predicate can be declared to also be an entity predicate in two ways: explicitly and implicitly. Either way, the predicate must still have a declaration. In the latter case, that declaration also serves to make the predicate be an entity predicate.

An explicit entity declaration directly indicates whether a predicate is an entity or not. The syntax is as follows:

explicit_entity_decl ::=
  'lang:isEntity' '[' '`' id ']' '=' ( 'true' | 'false' )'.'
| 'lang:entity' '(' '`' id ')' '.'

In the latter case, where lang:entity is used, the predicate is declared to be an entity. In the former case, the predicate is an entity if the right-hand side of the equality is true, and otherwise the predicate is explicitly not an entity.

For example, here is a declaration that 'person' is explicitly an entity type. Note that there is also a predicate declaration for 'person' in addition to the entity declaration.

person(p) -> .
lang:entity(`person).

On the other hand, here is an example declaring that instock is explicitly not an entity:

instock(sku) -> .
lang:isEntity[`instock] = false.

If there is no explicit declaration for a predicate, then it is possible to implicitly declare it as an entity. To do so, provide an entity declaration for the predicate. An entity declaration is a constraint with several restrictions. The left-hand side of the constraint must be a unary atom for the predicate being declared as an entity and whose argument is a variable name. The right-hand side of the constraint must either be empty or a single unary atom. If there is a unary atom on the right-hand side, it must be for a predicate that is either implicitly or explicitly declared as an entity, and its argument must be the same variable used as an argument to the atom on the left-hand side.

If the right-hand side of an entity declaration is empty, then the predicate is implicitly declared to be a top-level entity type. Otherwise, it is declared to be a subtype of all the entities referenced in the declaration's right-hand side, but only if one additional consideration is met: the predicate must be used somewhere in the right-hand side of another type declaration.

It is a compile error for a program to include two entity declarations for the same entity with a different superentity. Likewise, it is a compile error for a program to include for the same entity both a top-level entity declaration and a declaration that is not a top-entity declaration.

9.3. Examples of Implicit entity declarations

This section gives a few examples of implicit entity declarations.

As a simple example, in the following code, 'area' is a top-level entity type but 'region' is just an ordinary predicate and not an entity.

area(a) -> .
region(r) -> area(r). // not an entity

In the following, longer, program, 'region' is explicitly declared to be an entity type, and so it is:

area(a) -> .
region(r) -> area(r).
lang:isEntity[`region] = true.  // explicit declaration

In the following example, 'region' is used in the right-hand side of a type declaration, so it becomes an entity:

area(a) -> .
region(r) -> area(r).  // implicit entity declaration
earnings(r, a) -> region(r), int[64](a).

As another example, in the following code, 'region' is not an entity predicate. The explicit declaration takes precedence over what would otherwise be an implicit declaration.

area(a) -> .
region(r) -> area(r).  // not an entity
earnings(r, a) -> region(r), int[64](a).
lang:isEntity[`region] = false.

Finally, in the following example, A and B are declared as entity, but C is not. A is an entity because it has a declaration with an empty right-hand side. B is an entity because it has a declaration with a non-empty right-hand side, and it is also used on the right-hand side of the declaration for C. However, even though C has an entity declaration, it is never used on the right-hand side of a declaration, so C is not an entity.

A(x) -> .
B(x) -> A(x).
C(x) -> B(x).

9.4. Interpreting predicate declarations

Predicate declarations do more than indicate that a predicate exists and can be used. Whenever a predicate declaration has a right-hand side where all of the atoms are for either built-in types or for entities, those atoms declare type restrictions on the corresponding positions in the predicate. For example, consider the following declarations:

person(p) -> .
parentof(a, b) -> person(a), person(b).

The first declaration is an entity declaration, and it declares that 'person' is a top-level entity. The second declaration is a predicate declaration. It declares that 'parentof' is a two-argument predicate, and that each argument to the predicate must be a person.

If a predicate declaration's right-hand side uses no atoms where the predicate is a type, then that declaration does not provide any type information for the predicate, and it is run as an ordinary constraint. It still declares that the predicate exists and that it has a specific number of arguments.

If at least one atom in a declaration's right-hand side is for a type, and that atom's argument is a variable that appears in the declaration's left-hand side, then the declaration contributes to the declared type of the predicate. Each type atom in the right-hand side whose argument is a variable declares that the associated position in the atom in the left-hand side has a type that is a subtype of the type of the atom.

For any equality in the right-hand side between two variables, any type information declared for one of the variables also applies to the other. For example, the following declaration declares that each argument to evaluatedBy is a person.

evaluatedBy(a, b) -> person(a), a = b.

If multiple declarations are given for the same predicate, then they must all declare the predicate to have the same number of arguments. The type bounds for each argument is the intersection of the types for those arguments given by each declaration. If any of the arguments are given a type, then all of them must be given a type by some declaration or other.

9.5. Predicate type inference

All predicates that are used in a program must either have a predicate type declaration or have at least one derivation rule. If the program includes type declarations for the predicate, then those type declarations must specify a type for each argument to the predicate.

For any predicate that has no type declaration, a type will be inferred for it from the derivation rules that use that predicate in their right-hand side. Predicate type inference attempts to choose the most specific type for the predicate that will allow the program to type check. This attempt often succeeds, but not always, and even if it succeeds, it might choose types other than the ones you intended. If you want to be certain, then supply a type declaration.

As an example, in the following program, predicate 'parentof' is declared to have each argument being a person. Predicate 'ancestorof' does not have a type declaration, so its type is inferred from the derivation rules. The inferred type in this case is that 'ancestorof' has two arguments, each of which is also a person.

person(x) -> .

parentof(x, y) -> person(x), person(y).

ancestorof(x, y) <- parentof(x, y).
ancestorof(x, z) <- ancestorof(x, y), parentof(y, z).

9.6. Type checking

Once the compiler has determined the entity types and has determined the types of all predicates, it uses those types to check for common errors in the program's rules and constraints. There are a variety of checks that the compiler applies.

One kind of check is type consistency. If a variable is used in one part of a formula to bind values of one type, and it is used in another part of the formula to bind values of another type, then the two types must have at least some values in common. For example, in the following code, variable 'b' is used to bind both a person and a 32-bit integer, so the code has a type consistency error.

person(x) -> .
parentof(x, y) -> person(x), person(y).
likesNumber(x, y) -> person(x), int[32](y).

p(a) <- parentof(a, b), likesNumber(a, b). // Type consistency error

Another check the system performs is that an asserted fact in the head of a rule will not violate the type of the fact's predicate. For example, in the following code, there is an attempt to assert that a person is a mother. However, the programmer accidentally mixed up the variables in the body of the rule. Since the 'motherof' predicate is declared to only have females in its first argument, the code has a type-too-big error.

person(x) -> .
female(x) -> person(x).
parentof(x, y) -> person(x), person(y).
motherof(x, y) -> female(x), person(y).

motherof(a, b) <- parentof(a, b), female(b). // Type-too-big error

Chapter 10. Transactions

Transactions are the ways the data in a workspace changes. You create a transaction, add some datalog code, and then commit the transaction. If the transaction succeeds, then the workspace will be updated with a new set of facts. If the transaction fails for any reason, then the workspace will go back to exactly how at was before the transaction was attempted.

10.1. Life-time of a transaction

  1. transaction is created as either read only, read, or write.

  2. external programs can modify extensional predicates. Also, the engine can run update logic (see Chapter ref) to update extenstional predicates. This stage is called the initial stage.

  3. commit or abort the transaction

    • committing a transactions triggers re-evaluation of the installed logic. Changes written to the workspace after this. This stage is called the final stage.

    • aborting a transaction restores the state of all predicates in the open workspace to the state they were in before the beginning of the transaction. The installed logic programs are never evaluated.

10.2. Stages

During a transaction, logic may be evaluated during the initial stage or the final stage. Logic programs that run during the initial stage are programs that are run only once. That is, the logic is supplied to the engine when the transaction is in the initial stage and after that the logic is discarded. Logic programs that are intended to be run in the initial stage are called queries. In contrast, logic that runs during the final stage is added to a workspace and will be evaluated during all final stages of all future transactions until the logic is removed. These blocks are also referred to as installed blocks. The collection of installed blocks is also known as the installed program.

Predicates declared in an installed program are essentially materialized views. The final stage updates the materialized views incrementally based on the changes made to the database during the initial stage.

For stage during which logic is evaluated is also known as the run stage. For queries, the run stage is the initial stage. For installed blocks, the run stage is the final stage.

The facts of predicate changes during the life-time of a transaction. Extensional predicates are changed by update operations, intensional predicates are changed by reevaluation of the logic based on those external changes.

LogicBlox allows logic to refer to the state of a predicate during an earlier stage of a transaction using a stage suffix. For a predicate p, there are four stage suffixes:

p@previous, p@prev

The stage of a predicate when the current transaction was created is called the previous stage.

This stage suffix can be used for logic that will be evaluated during the initial or final stage.

p@initial, p@init

The p@initial stage suffix refers to the predicate p during the initial stage.

In the final stage, p@initial refers to predicate p at the end of the initial stage.

For queries, a predicate p without stage suffix defaults to p@initial.

p@final

The p@final stage suffix refers to the predicate p during the final stage.

For installed logic, a predicate p without stage suffix defaults to p@final.

Note that logic is never evaluated in stage @override or stage @previous. Only the stage suffixes @initial and @final are actual run stages.

functor_expr ::=
  functor stage

functor_expr ::=
  IDENTIFIER stage

stage =
   '@' '-' INTEGER_CONSTANT
 | '@' INTEGER_CONSTANT
 | '@' IDENTIFIER

The temporal ordering on stages is: previous < initial < override < final.

10.2.1. Restrictions

  • For a run stage s no stage suffix from a stage later than s is allowed, in the head, nor in the bpody of a rule.

  • * No head predicate stage may be earlier than the run stage. o For example, previous[p](x) <- ... is illegal. o And, initial[p](x) <- ... is illegal when run stage is final .

10.3. Semantics

The following logic rules define the relationship between predicates with stage suffixes.

p[x] = v <-
  p@previous[x] = v,
  !-p@initial[x] = _

p[x] = v <-
  +p@initial[x] = v.

p[x] = v <-
  ^p@initial[x] = v.

Chapter 11. Updates

A limitation of ordinarly logic rules is that they cannot express update rules (also called triggers or event/action rules). An update rule makes changes to extensional (i.e. editable) predicates (instead of to only intensional (i.e. non-editable) predicates like ordinary logical rules do). Update rules are useful for many real database applications where update patterns are needed that are more complex than the simple extensional/intensional paradigm can provide. Various proposals have been made by researchers for supplementing Datalog with update rules, but DatalogLB 's approach is somewhat novel -- it tries to remain even more strictly logical than these approaches.

11.1. Delta predicates

Every predicate p has a set of associated delta predicates that reflect changes to be made to the facts of predicate p.

Delta predicates are a combination of a predicate name and a delta modifier. For example, for a predicate p, the predicate +p denotes the facts to be added to predicate p.

atom =
   delta functor_expr '(' ')'

atom =
   delta functor_expr '(' argument_list ')'

delta =
   '+' | '*' | '-' | '^'

There are four delta modifiers:

+

The Insert modifier is used to add facts to a predicate.

-

The Delete modifier is used to remove facts from a predicate.

*

The Update modifier is used to change values of a functional predicate. The update modifier is not allowed on a non-functional predicate.

^

Upsert inserts or updates, depending on whether or not the key exists

Example 11.1. Delta modifier

If an employee is promoted into management, this change to the database could be expressed as:

+manager(e) <- employee(e), employee:name(e:"John Smith").

11.2. Restrictions

It is an error for a predicate at one stage to depend on any predicate at a later stage. For example, the following program is not legal:

An insert into an entity always has to be accompanied by an insert into its reference-mode predicate, and vice versa.

The reason for this is that reference-modes are injective functions

DELTAS: * The predicate inside a delta predicate may be a stage predicate. * When run stage is initial , o Every head predicate of a rule must be a delta predicate (or a local predicate -- this is a "block" issue). o Every predicate in a fact assertion must be delta (i.e not local). * When run stage is final , o Fact assertions are not yet supported.

11.3. Pulse predicates

Greg: A pulse predicate is always empty at previous stage and can only have assertions (no retractions). Assertions are thrown out after final stage is evaluated.

Barry: A pulse predicate is a predicate which has delta arrays but no 'main' array. They are useful for representing one-time events such as a button click.

Declaring a pulse predicate:

lang:pulse(`trigger).

Chapter 12. Hierarchical Syntax

To a first approximation, you can think of hierarchical syntax as a mechanism to avoid writing the same arguments to predicates over and over again. Its use is best illustrated through examples. Imagine you are inserting a new person into the workspace. Without hierarchical you might write some code that looks like the following:

+person(p), 
+firstname[p]="John",
+lastname[p]="Doe",
+street[p]="1384 West Peachtree Street",
+city[p]="Atlanta"

If you instead use hierarchical syntax, you can write the same thing, but avoid having to keep repeating the use of 'p' everywhere:

+person(p) {
  +firstname("John"),
  +lastname("Doe"),
  +street("1384 West Peachtree Street"),    
  +city("Atlanta")
}

Based upon the arguments provided and the type of 'p', the compiler figures out that it needs to insert 'p' as the first argument of all the atoms between the braces. Internally, the compiler then desugars the hierarchical version of this example to exactly the same logic as the non-hierarchical version. Alternately, if you prefer to emphasize the functional nature of the predicates you could have written the example as:

+person(p) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",    
  +city[]="Atlanta"
}

Furthermore, 'p' is redundant because it is only used once so we can replace it with an underscore:

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",    
  +city[]="Atlanta"
}

Using underscore in such situations avoids the need to have unique names when creating multiple instances of some entity with its associated data. For example, before you would write:

+person(p1),
+firstname[p1]="John",
+lastname[p1]="Doe",
+street[p1]="1384 West Peachtree Street",
+city[p1]="Atlanta"
+person(p2),
+firstname[p2]="Jane",
+lastname[p2]="Doe",
+street[p2]="1384 West Peachtree Street",
+city[p2]="Atlanta"

Here, because it is necessary to distinguish the links between the persons and their associated relationships, we must choose to use distinct variable names, 'p1' and 'p2'. Using hierarchical syntax, we can avoid this by just using underscore:

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",   
  +city[]="Atlanta"
},
+person(_) {
  +firstname[]="Jane",
  +lastname[]="Doe",
  +street[]="1384 West Peachtree Street",   
  +city[]="Atlanta"
}

We call the formula just before the curly-braces the "head" of the hierarchical formula and the conjunction of atoms between the curly-braces the "body" of the hierarchical formula. Currently, we only allow conjunctions of atoms as the heads and bodies of hierarchical syntax. We may relax this restriction in the future based upon some additional study and user provided use cases. Here is a small example of how we could simplify some code using a conjunction of atoms in the head of a hierarchical formula. First we define a small schema:

person(p) ->.
car(c) ->.
name[p]=s -> person(p), string(s).
brand[c]=s -> car(c), string(s).
driven_by(c, p) -> car(c), person(c).

Before you might write have written some logic like the following:

person(p),
car(c),
name[p]="Prefect",
brand[c]="Ford",
driven_by(c, p)

Now this logic could be written as the following hierarchical formula:

(person(p), car(c)) {
  name[]="Prefect",
  brand[]="Ford",
  driven_by()
}

Again, based upon the arguments you have supplied in the hierarchical body, and the types of 'p' and 'c' in the hierarchical head, the compiler determines that a use of 'p' must be inserted into 'name', that a use of 'c' must be inserted into 'brand', and that both 'c' and 'p' must be inserted into 'driven_by'. Inside of hierarchical formulas, we also allow the use of what we call hierarchical expressions. A hierarchical expression looks just like a hierarchical formula, but may be written anywhere we can write an expression like 'x', or 'foo[y]'. The only restriction is that the head of a hierarchical expression must be a single atom that is an entity. For example, you can use a hierarchical expression to simplify the code

+person(p),
+firstname[p]="John",
+lastname[p]="Doe",
+address(a),
+home[p]=a,
+street[a]="1384 West Peachtree Street",
+city[a]="Atlanta"

to become

+person(_) {
  +firstname[]="John",
  +lastname[]="Doe",
  +home[]= +address(_) {
             +street[]="1384 West Peachtree Street"
             +city[]="Atlanta"
           }
}

When discussing aspects of hierarchical syntax that is not specific to either hierarchical formulas or hierarchical expressions, they will be refered to as hierarchicals. There are limits to the compiler's ability to determine how to interpret hierarchical syntax. Given a hierarchical, the first thing the compiler does is collect a set of what we call "candidate" expressions from the head. Presently, candidate expressions can either be variables, constants, or integer constant expressions. For example, given the hierarchical

(person(p), age[p]=42) { ... }

the candidate expressions would be "p" and "42". To ensure that there is always a unique interpretation of a hierarchical, we require that the types of the candidate expressions to be disjoint. Roughly, you can understand disjoint to mean that the type of a candidate expression cannot be a subtype of another candidate expression or vice versa. For example, the compiler will disallow the code fragment

(person(p1), person(p2)) { ... }

because 'p1' and 'p2' both have the type 'person' and therefore the compiler cannot decide when it should choose to insert a use of 'p1' rather than a use of 'p2'. Writing the above logic would cause a HIER_AMBIGUOUS_HEAD_BINDING error to be reported. Once the compiler has determined an unambiguous set of candidate expressions, it will then start examining the atoms in the body of the hierarchical. If an atom in the hierarchical's body already has the correct number of arguments for the defined predicate, the atom will be left alone. If the atom has fewer user supplied arguments than expected, the compiler will begin the process of resolving the arguments. To make it easier to understand how this process works we will use a rather contrived example. Consider the following schema:

a(x) ->.
b(x) ->.
c(x) ->.
d(x) ->.
e(x) ->.
f(x) ->.

foo[x, y, z, w]=u -> a(x), b(y), c(z), d(w), e(u).

We will now step through the process used to resolve the arguments to the atom 'foo' in the following logic:

(a(x), c(y), f(z)) { foo[w, u]=v }, b(w), d(u), e(v).

The first thing the compiler does is note that the value argument of the atom has already been specified. Therefore, the compiler will ignore the value argument for the rest of the resolution process. Next, the compiler will start by simply ignoring the key arguments the user has supplied. We can visualize the current state of the 'foo' atom as follows:

foo[•, •, •, •]=v

Here • represents argument positions to be filled, "holes" essentially. Now, the compiler will fill in all of the argument positions that have expected types that are not disjoint from the candidate expressions. In this example, the candidate expressions are 'x', 'y', and 'z' with types 'a','c', and 'f' respectively. Because the first argument of 'foo' is expecting an expression of type 'a' and the third argument of 'foo' is expecting an expression of type 'c', the compiler will insert 'x' and 'y' into these positions:

foo[x, •, y, •]=v

It is worth emphasizing that the compiler will only ever insert a candidate expression into a single hole and that candidate expressions may go unused, such as 'z'. Next, the compiler will take the two arguments the user supplied to 'foo' and fill them into the remaining holes left to right. This means 'w' will be inserted in the second argument and 'u' into the fourth argument:

foo[x, w, y, u]=v

Because all argument positions of the predicate 'foo' are now filled, the resolution process is considered successful. However, if we had started with a slightly different example, resolution could have failed at a few different points. For example, if the user had written

(a(x), c(y), f(z)) { foo[w]=v }, b(w), d(u), e(v).

The compiler would still have filled in the candidate expressions like so:

foo[x, •, y, •]=v

However, the compiler will then notice that the user has supplied only one key argument, 'w', while there are still two holes to fill. It would report this as a HIER_ATOM_TOO_FEW_SUPPLIED error. Similarly, if the user had written

(a(x), c(y), f(z)) { foo[w, u, u]=v }, b(w), d(u), e(v).

Again, the compiler will begin by inserting the two candidate expressions:

foo[x, •, y, •]=v

This time, the compiler will notice that the user supplied three key arguments ('w', 'u', 'u'), while there are only two holes to be filled. This will be reported as a HIER_ATOM_TOO_MANY_SUPPLIED error. The resolution process could have also failed if 'foo' had been declared differently. For example, if 'foo' had been declared as

foo[x, y, z, w]=u -> a(x), a(y), c(z), d(w), e(u).

and then the user wrote the logic

(a(x), c(y), f(z)) { foo[u]=v }, b(w), d(u), e(v).

the resolution process would fail. The reason is that the compiler cannot determine whether to insert the 'x' as the first argument or as the second argument of 'foo':

foo[•, •, •, •]=v

Again, this is because a candidate expression will only be inserted into a single hole per atom. This kind of failure will be reported as a HIER_AMBIGUOUS_BODY_ATOM error. Finally, I will explain a little more about the special status of the value argument in the resolution of a functional predicate. If 'foo' had been declared as:

foo[x, y, z, w]=u -> a(x), b(y), c(z), d(w), a(u).

The compiler would be able to resolve

(a(x), c(y), f(z)) { foo[w, u]=x }, b(w), d(u), e(v).

because when it begins filling in the candidate expressions, as described above, the compiler has determined that the value argument is already determined:

foo[•, •, •, •]=x

Therefore, at this point there is only one hole where 'x' could be inserted and satisfy the typing requirements. So there is no ambiguity. It is also very important to understand that the compiler is only able to do resolve this example because it had the syntactic hint that 'x' was to be used as the value argument. If the logic had been written as

(a(x), c(y), f(z)) { foo(w, u, x) }, b(w), d(y), e(v).

When the compiler reaches the point where it will start determining the insertion of candidate expressions the atom would look like:

foo[•, •, •, •]=•

This is because the compiler no longer knows that 'x' is intended to be the value argument, so there is now a hole in the value argument. Furthermore, there are now two possible holes, the first argument and the value argument, where the candidate expression 'x' could be inserted. Consequently, the compiler will report an ambiguity error.

12.1. Formal description

Hierarchical syntax extends the language with the following grammatical constructions:

hierarchical        ::= hier_head '{' hier_body '}'; 
hier_head           ::= atom 
                      | '(' hier_head_conjunct ')';
hier_head_conjunct  ::= atom
                      | hier_head_conjunct ',' atom;
hier_body           ::= hier_atom 
                      | hier_body ',' hier_atom;
hier_atom           ::= (delta)? predname ('@' stage)? '(' hier_expr_list ')'
                      | (delta)? predname ('@' stage)? '[' hier_expr_list ']' '=' hier_expr;
hier_expr_list      ::= hier_expr
                      | hier_expr_list ',' hier_expr;
hier_expr           ::= hierarchical
                      | expr;

Chapter 13. MoReBlox

MoReBlox stands for Modular Reusable Blox. MoReBlox allows programmers to define Datalog rules and constraints with variable predicate names. For example, MoReBlox allows a programmer to declare a rule that derive facts from predicate P to predicate Q, where both P and Q are variables, not concrete predicate names. The programmer can then specify with which concrete predicates P and Q should be instantiated with.

A MoReBlox program resembles a regular Datalog program: it is composed of rules and constraints. However, the data that MoReBlox rules and constraints compute over are elements of non-generic Datalog programs---predicates, rules, etc. Thus, a MoReBlox rule can be used to derive, or generate, more (non-generic) predicates, as well as rules and constraints associated with those predicates.

We identify generic versus non-generic Datalog programs using a level number:

  • Regular Datalog programs are level 0 programs
  • Generic programs can be level n programs, for all n >= 1.
  • The data a level n program computes over, where n >= 1, comprises elements of programs at level n-1. That is, a level n program can use elements of level n-1 program to derive more (elements of) level n-1 programs.

For LogicBlox 3.4 release, we only support generic programs at level 1. Thus, from here on, we use "generic" to imply "level 1".

There are two main goals to MoReBlox:

  • Reusable relationships.

    A Datalog rule (or constraint) describes the relationship between several predicates: the predicates in the body help derive the data for the predicates in the head; the predicates in the body are related via logical operators such as conjunction, negation, etc. By allowing these predicates to be variables, and allowing these variables to be instantiated by different, concrete predicates, MoReBlox allows programmers to capture such relationships in a reusable manner.

  • Separate Type-checking

    MoReBlox aims to guarantee that a rule containing variable predicates, i.e., a generic rule, is always going to result in a well-typed rule, for all possible concrete substitutions of its variables. Any possible typing errors need to be reported to the implementer of the generic rule, not the user. Thus, these typing checks on generic rules need to be done without knowing these concrete predicate instantiations.

13.1. Level 1 predicates and their declarations

We also refer to level 1 predicates as generic predicates. Generic predicates store facts about level 0 programs. For example, the following is the declaration of a level 1 entity, predicate, as well as subtypes of predicate, entity and primitive_type:

predicate(p) --> .
entity(p) --> predicate(p).
primitive_type(p) --> predicate(p).

Note that level 1 constraints are declared with -->, whereas level 0 constraints are declared using ->.

predicate and entity store information about the predicates and entities in level 0 programs. For instance, for the following level 0 program, predicate would contain an element representing the predicate person (as would entity), as well as an element representing the predicate parent:

person(x) -> .
parent(x,y) -> person(x), person(y).

predicate, entity, and primitive_type are pre-defined level 1 generic predicates, whose data are populated by the LogicBlox compiler/runtime, based on which level 0 program are active. There are a number of other pre-defined predicates, such as rule, constraint, etc.

Programmers may also define their own generic predicates very much the same way they define level 0 predicates. The difference is that a level 1 predicate can only contain level 0 program as data, and thus, their types must be level 1 entity predicates: predicate, entity, etc. Additionally, primitive types (int[8], datetime), can be used as level 1 data, as long as these values are stored in level 1 predicates.

13.1.1. Predicate signatures

MoReBlox enhances the notion of types with predicate signatures. In level 1 type declarations, predicate signatures declare extra constraints on variables of type predicate or some subtype of (e.g., entity or primitive_type). The following is an example of a type declaration using predicate signatures:

binary_pred(p) --> predicate(p :: (T1,T2)), entity(T1), entity(T2).

The above declaration says that binary_pred contains elements of type predicate. Additionally, the variable p is a signatured variable. It is declared to be a predicate with the signature (T1,T2). This states that every p must be a predicate with two keys of type T1 and T2. This signature requirement is considered as part of the type of binary_pred. All programs using binary_pred is checked at compile-time to verify that all elements stored in binary_pred would satisfy this signature requirement.

13.1.1.1. Signatured variable syntax

A signatured variable can be declared with the following syntax:

signatured_variable ::= varname '::' signature;
signature           ::= '(' (type)* ')'
                      | '[' (type)* ']' '=' type
                      ;
type                ::= varname
                      | '`' predicatename
                      ;

Signatured variable can be used where normal variables are used. The predicate signatures are used in both type-checking, as well as runtime constraint checking and filtering (when signatured variables appear in generic rules, for example).

13.1.2. Type-based signature constraints

Generic type declarations are verified statically -- analogous to how type declarations in level 0 programs are constraints that are verified statically. MoReBlox supports an additional format of constraints that is recognized as statically-checkable type declarations. These involve the use of predicate signatures:

generic_type_decl ::= predname '(' varname* ')' '-->' pred_sig_conjunct ;
pred_sig_conjunct ::= pred_sig
                    | pred_sig_conjunct ',' pred_sig 
                    ;
pred_sig          ::= constraint_type '(' varname ')';
                    | constraint_type '(' signatured_variable ')' ;
constraint_type   ::= 'predicate' 
                    | 'entity'
                    | 'primitive_type'
                    ;

Note that it is okay for a pred_sig to constrain a variable to be predicate, entity, or primitive_type, without any signature specification. For instance, in the example declaration above, entity(T1) constrains T1 to be an entity without further specification.

Since signatured type declarations are verified statically, it would be compile-time error to add a level 0 predicate into binary_pred that does not meet the declared predicate signature. For instance, the following will be rejected by the MoReBlox compiler:

+binary_pred(`person).

13.2. Level 1 derivation rules

We refer to level 1 derivation rules as generic rules. These rules are used to derive facts for generic predicates, and declared with <--. For instance, the following is a generic rule that derives elements of binary_pred into also_binary:

also_binary(p) <-- binary_pred(p).

Only generic predicates can appear in a generic derivation rule, whereas only non-generic, level 0 predicates can appear in a level 0 derivation rule.

13.2.1. Deriving, or generating, level 0 programs

The main goal of defining a generic rule is to derive, or generate, level 0 Datalog programs. MoReBlox provides a "quoting" mechanism that makes deriving code easy. The following rule derives a level 0 rule, which derives data into predicate foo, from all (level 0) predicates in binary_pred

anon_block(`{ foo(x,y) <- P(x,y). })
   <-- binary_pred(P).

Note that placing the quoted code inside of anon_block indicates that the generated code should be placed in a new block of some system-chosen unique name.

Given two (level 0) predicates in binary_pred, for example, parent and grandparent, compiling the above generic rule would result in the following two derived, level 0, rules:

foo(x,y) <- parent(x,y).
foo(x,y) <- grandparent(x,y).

Any level 0 rules and constraints can appear inside of the quote. When quoted code appears in the head of a generic rule, we call such rule a code-generating rule.

Quoting is in fact syntactic sugar, so that programmers do not have to fully specify derived/generated code fully relationally.

13.2.1.1. Deriving new predicates

The above example generates rules deriving data into the same level 0 predicate, foo. Most times, it is desirable to, given a set of level 0 predicates, create new predicates that are defined in terms of the given level 0 predicates. MoReBlox allows this by allowing existential variables in the head. For example, given the following program:

another_binary[P]=B --> predicate(P::(T1,T2)), 
                        predicate(B::(T1,T2)),
                        entity(T1), entity(T2).

another_binary[P]=B,
anon_block(`{ B(x,y) <- P(x,y). })
   <-- binary_pred(P).

The above declares a generic predicate another_binary, which maps a binary predicate to another binary predicate with the exact same type (note the same predicate signature for P and B.

The above generic rule says that, for every element in binary_pred, derive a pair (P,B) into another_binary. Additionally, generate the rule that derives every pair in P into B. Assuming that there are two predicates in binary_pred, parent, and grandparent, then there will be two rules generated:

generated_name_1(x,y) <- parent(x,y).
generated_name_2(x,y) <- grandparent(x,y).

To refer to the generated predicate's (which have been renamed), one can use another_binary[`parent], or another_binary[`grandparent].

Using Head Existentials

Notice that the variable B is not bound in the body of the rule. This is not allowed in level 0 Datalog programs. However, it is an essential feature of MoReBlox. Variable such as B are called head-existential variables. Such variables need to be created for every fact in the body of the rule. This is to say, for the above rule, for every entity P, a predicate is created to represent B.

Head existentials are not dissimilar to delta rules. Delta rules also give programmers the ability to "create" entities to satisfy certain relations. Thus, we could also write:

+predicate(B), +another_binary[P]=B,
  ... 
  <-- binary_pred(P).

However, the difference between delta rules and head-existential rules is that the consistency between predicates in a head-existential rule is automatically managed by the runtime. For example, in the head-existential rule, if an entity P has been retracted from binary_pred, the corresponding B in predicate, as well as the corresponding pair P,B in another_binary, will also be retracted. This is not the case for delta rules: such retractions must be manually maintained by programmers (using another delta rule, for example.)

It should not be a surprise then head-existential rules are crucial for MoReBlox, as a code generation facility. MoReBlox heavily leverages head existentials to create new data in level 1 predicates: new rules, constraints, predicates, etc.

In fact, the quoted part in the head of the rule, is desugared into a regular, level 1 formula that contains head-existential variables.

13.2.1.2. Variable name scoping

Note that the above example did not generate two constraints, both constraining the predicate B. Instead, fresh names are generated to replace B. This is because B, in the quoted code, is bound to the variable B, also bound by predicate(B). If a predicate is bound to a predicate variable, then its name is similarly bound to the name of that predicate variable. Since a new predicate B is created for each entity P, it also has a fresh, generated name, which is then used to replace B in the quoted code.

Names that are not bound to a variable name in the enclosing level 1 rule is assumed to be concrete, non-variable names.

13.3. Well-formedness criterias for generic Datalog

A well-formed generic datalog program must follow all well-formedness rules of regular, non-generic datalog. I.e., it must binds its variables correctly, it must not contain illegal recursion, etc. Additionally, the quoted program, i.e. program within `{ ... }, must also be type-correct, for all possible versions of its generated code. For instance, given the following program:

make_implies(P) --> predicate(P).
derivedFrom[P]=D --> predicate(P), predicate(D).

derivedFrom[P]=D,
anon_block(`{ D(x) <- P(x). })
   <-- make_implies(P).

The MoReBlox type-checker will verify that the quoted code will be type-correct for all possible instantiations of the quoted code. That is, for all possible predicate P that can be the data of make_implies, the generated code should be well-typed.

For this particular example, the quoted code will not always be type-correct. The reason is the following: any predicate can be in make_implies: unary, binary, etc. For instance, we can populate make_implies as follows:

person(x) -> .
parent(x,y) -> person(x), person(y).

+make_implies(`person).
+make_implies(`parent).

For each of the two predicates in make_implies, the quoted code will be generated:

generated_name_1(x) <- person(x).
generated_name_2(x) <- parent(x).

Clearly, the second rule is not type safe. The MoReBlox compiler reasons in the abstract sense that such a situation may occur, and rejects the definition of the quoted code as not type-safe.

13.3.1. Type-checking quoted code

MoReBlox compiler checks the potential type-safety of quoted code without actually generating any quoted code. It reasons abstractly about the type-safety using the information it has about predicate variables (e.g., P and D in the above example). Thus, in order for a quoted piece of code that uses predicate variables to type-check, additional information about those predicate variables must be known, e.g., its arity, its argument types, etc. MoReBlox extracts this information from predicate signatures. To make a piece of quoted code type-check, proper predicate signatures must be specified. For instance, to make the above quoted code type-check, we need to modify the declaration of make_implies and derivedFrom:

make_implies(P) --> predicate(P::(T)), entity(T).
derivedFrom[P]=D --> predicate(P::(T)), predicate(D::(T)), entity(T).

The above modified constraint would give the type checker information that make_implies only contains binary predicates, and that derivedFrom would always produce a predicate with the same arity and type as its key. The constraints would also cause the compiler to throw an error on inserting predicate parent into make_implies.

For the LB 3.4 release, we will focus on the most frequently occurring and most difficult to catch problems from an implementation perspective: name checking, variable binding checkings, type checking, and recursion checks.

  • Name Check
    • Predicate referenced exists. This means predicate referenced in `{...} must either be concrete, level 0 predicates, or predicate variables bound in the same rule.
    • Predicate arity, key-arity, and one-to-one'ness, will be checked. When a predicate variable is involved, if no signature is specified for the predicate variable, then errors PREDICATE_UNKNOWN_ARITY, PREDICATE_UNKNOWN_KEY_ARITY, and PREDICATE_UNKNOWN_ONETOONE, will be thrown. This indicates that a predicate variable cannot be used in an atom without knowing its arity property.
  • Variable Binding

    All generated code must obey the current variable binding rules.

    Implied Restrictions Only full clauses -- rules or constraints -- can be generated, or appear in `{...}. Checking variable binding of open fragments of code is near impossible unless programmers give extremely detailed instructions on how each variable should be bound.

  • Type Check

    All typing properties currently checked are maintained. Type-checking crucially depend on predicate signature declarations.

  • Recursion Check

    We need to verify that generated code will not induce unsupported recursions:

    • Recursion through negation.
    • Recursion through aggregation.
    • Recursion through entity creation.
    • Recursion through table predicates.
    • Recursion through stages.

    There are two types of rules as far as recursion is concerned:

    • Predicate in the head of the rule is existential (generated). This is an easy case, as we are creating a new rule for a predicate that is being generated, i.e., it does not have other predicates depended on it. Thus, any recursion through this predicate can only be created within the rules in the same quote `{...} construct.
    • Predicate in the head of the rule is universally quantified, i.e. appears in the body of the rule. This could mean that either the predicate is an existing predicate defined in a level 0 Datalog program, or the predicate could be a generated level 0 predicate by another code-generating rule. For example, see the following code example:

    A universally quantified predicate means that it could have been used in another rule, as a dependency (i.e. in the body of the rule). Putting this predicate in the head of another rule causes dependencies to be created from this predicate to others. And thus possibly causing undesirable recursion.

    For the moment, we disallow any universally quantified predicates to appear in the head of a generated rule. This eliminates any chance of a cycle being generated, since dependencies can only be created from a existential (freshly generated) predicate to others.

The following properties will be checked for LB 3.5:

  • Stage Check
  • Statelog Check

13.3.2. Other well-formedness rules

  • Existential predicates must be defined by quoted code `{...}. That is to say, if a predicate variable is to be created by a rule, then it must be done while rules or constraints defining it are being declared through `{...}, as well.
  • Quote construct `{...} can only appear in the head of a rule.
  • Quoted construct cannot redefine existing level 0 predicates. That is, concrete predicates cannot appear in the head of a quoted rule, or the LHS of a quoted constraint.
  • Code-generating rules must be conjunctive: only conjunctions, no negation or disjunction in either the head or the body of such rules.
  • Only variables of type predicate or some subtype thereof can be existential in the head of a generic rule. That is, only predicates, and associated declarations and rules via `{...}, can be created.
  • Existential predicate variable must appear in the value position of a generic predicate. This means there is always a functional dependency from existing predicates to existential, i.e., generated, predicates. It is not correct to use an existential predicate variable as the key of a generic predicate.
  • A generic rule must only make use of generic predicates or primitives and primitive build-ins. A rule cannot mix level 0 and level 1 predicates.
  • A rule without a body must explicitly use <-- to indicate that it is generic. Otherwise it is considered a level 0 rule.

13.4. Examples

Transitive closure

transitive[P]=TP --> predicate(P::(T,T)), 
                      predicate(TP::(T,T)), entity(T).
make_transitive(P) --> predicate(P::(T,T)),entity(T).                             

transitive[P]=TP,
anon_block(`{ TP(x,y) <- P(x,y).
   TP(x,y) <- P(x,z), TP(z,y). })
   <-- make_transitive(P).

To make of of a transitive version of a predicate, say, parent, one must add parent into make_transitive:

parent(x,y) -> person(x), person(y).

+make_transitive(`parent) <--.

Now, one can refer to the transitive version of parent in any level 0 program as follows:

something(x,y) <- transitive[`parent](x,y).

Generic dimension mapping

In sales, data is often stored for a point in a multi-dimensional space. For instance, the three common dimensions are calendar, product, and location. Each dimension may have multiple "levels". The calendar dimension may have multiple levels: day, week, month, etc.; the product dimension may have the levels: sku, style, class, etc.; and the location dimension may have levels store, city, state, etc. These levels are often represented in the LogicBlox schema as predicates:

// calendar dimension
day(x) -> int[32](i).
month(x) -> int[32](i).
year(x) -> int[32](i).

// product dimension.
sku(x) -> .
style(x) -> .
subclass(x) -> .
class(x) -> .

// location dimension.
store(x) -> .
city(x) -> .
state(x) -> .
country(x) -> .

Predicates are ofte defined to map a level to its immediate higher level along the same dimension:

// mappings for calendar dimension.
day2month[d]=m -> day(d), month(m).
month2year[m]=y -> month(m), year(y).

// mappings for the location dimension
store2city[s]=c -> store(s), city(c).
city2state[c]=s -> city(c), state(s).
state2country[s]=c -> state(s), country(c).

// mappings for the product dimension
sku2style[s]=sty -> sku(s), style(sty).
style2subclass[s]=sc -> style(s), subclass(sc).
subclass2class[sc]=c -> subclass(sc), class(c).          

In order to map a level to a non-immediate higher level, however, programmers ofter have to define specialized predicates: day2year, store2state, store2country, city2country, etc. Using MoReBlox, we can define a generic mapping function that can map a level to any of its higher levels in the same dimension.

First, we define a dimension level as a level 1 concept:

dim_level(T) --> entity(T).

// user-defined predicates that map one level to another.
// e.g. day2month[d]=m, would be a singlemap for the calendar dimension.
dim_singlemap(P) --> predicate(P :: [L1]=L2), dim_level(L1), dim_level(L2).

We can then refine the concept of dim_level to the calendar dimension, and populate it with predicates representing levels along the calendar dimension:

cal_level(c) --> dim_level(c).
lang:entity(`cal_level) <-- .

+cal_level(e) <-- entity(e), e = `day.
+cal_level(e) <-- entity(e), e = `month.
+cal_level(e) <-- entity(e), e = `year.

// specialize dim_map for calendar dimension.
+dim_singlemap(`day2month) <-- .
+dim_singlemap(`month2year) <-- .

Similarly, we define notions of product dimension and location dimension:

// product dimension
prod_level(p) --> dim_level(p).
lang:entity(`prod_level) <-- .

// location dimension
loc_level(l) --> dim_level(l).
lang:entity(`loc_level) <-- .

We can then populate with the the appropriate predicates:

+prod_level(e) <-- entity(e), e = `sku.
+prod_level(e) <-- entity(e), e = `style.
+prod_level(e) <-- entity(e), e = `subclass.
+prod_level(e) <-- entity(e), e = `class.

// specialize dim_map for product dimension.
+dim_singlemap(`sku2style) <-- .
+dim_singlemap(`style2subclass) <-- .
+dim_singlemap(`subclass2class) <-- .

+loc_level(e) <-- entity(e), e = `store.
+loc_level(e) <-- entity(e), e = `city.
+loc_level(e) <-- entity(e), e = `state.
+loc_level(e) <-- entity(e), e = `country.

// specialize dim_map for location dimension.
+dim_singlemap(`store2city) <-- .
+dim_singlemap(`city2state) <-- .
+dim_singlemap(`state2country) <-- .

To define a generic mapping from one level to another, we define the level 1 predicate dim_map, which would take two levels, and return a predicate that maps values from one level to another:

dim_map[L1,L2]=P --> dim_level(L1), dim_level(L2), predicate(P :: [L1]=L2).

dim_map can then be defined as follows:

// base case for mapping: if comparing values of the same dimension L, use '='
dim_map[L,L]=E,
anon_block(
`{
    E[t1]=t2 -> L(t1), L(t2).
    E[t1]=t2 <- t1 = t2.

    lang:derivationType[E]="Derived".
})
    <-- dim_level(L).
 
// inductive case:    
// If for some intermediate level Temp, there is a dim_map[L1,Temp]=L1ToTemp,
// and there is a single level map from TempToL2, then
// L1 can be mapped to L2 by joining L1ToTemp, and TempToL2.
dim_map[L1,L2]=L1toL2,
anon_block(
`{
    L1toL2[x]=y <- L1ToTemp[x]=z,
    TempToL2[z]=y.
 })
    <--  dim_map[L1,Temp]=L1ToTemp,
         dim_singlemap(TempToL2 :: [Temp]=L2).

One can use a specific mapping by applying dim_map to predicates representing specific levels. For instance, the following (not very interesting) query retrieves all classes that are mapped to be sku's:

_(c) <- dim_map[`sku,`class][s]=c.

Generic aggregations

In forcasting or planning applications, data often needs to be aggregated over multiple dimensions. The three common dimensions in retail are calendar, product, and location (the same dimensions used in the previous example for the generic mapping of dimension levels). Some combination of levels from each dimension is called an "intersection". For instance, <sku,city,day> is an intersection, and <class,state,month> is yet another intersection.

Data (such as sales) is often only stored only for the base intersection of these levels. For instance, the following predicate stores the sales data for the base intersection, e.g. <sku, store, day>:

sales[sk,st,d]=amt -> sku(sk), store(st), day(d).

In order to compute the total sales for a higher intersection, such as <sku,city,day>, one must define an aggregation:

sku_city_day_sales[sku,city,day]=amt
   <- agg<<amt=total(a)>> sales[sku,st,day]=a,
                          store2city[st]=city.

Similarly, to compute the total sales over a different intersection, such as <class,state,month>, one must define a similar aggregation, using mapping predicates to map sku to class, store to state, and day to month:

class_state_month_sales[cls,state,mo]=amt
   <- agg<<amt=total(a)>> sales[sku,str,day]=,
                          sku2subclass[sku]=scls, subclass2class[scls]=cls,
                          store2city[str]=city, city2state[city]=state,
                          day2month[day]=mo.                                    

MoReBlox allows us to define such aggregation once, generically. In order to do so, we first define the notion of an intersection as a level 1 predicate, and we also define the notion of an aggregation function. Note that we make use of the dimension levels defined in the previous example:

agg_intersection(Cal,Prod,Loc) --> cal_level(Cal), prod_level(Prod), loc_level(Loc).

// function used to do the base aggregation
agg_function(F) --> predicate(F :: [`sku,`store,`day]=`float[64]).

We can populate the intersections we are interested in aggregating over as follows:

+agg_intersection(m,sku,st) 
   <-- cal_level(m), m = `month,
       prod_level(sku), sku = `sku,
       loc_level(st), st = `store.

+agg_intersection(m,sku,st) 
   <-- cal_level(m), m = `month,
       prod_level(sku), sku = `subclass,
       loc_level(st), st = `state.

+agg_intersection(m,sku,st) 
   <-- cal_level(m), m = `year,
       prod_level(sku), sku = `class,
       loc_level(st), st = `country.

We can also define the function we want to aggregate to be sales, as follows:

+agg_function(`sales) <-- .

Now we can define a generic aggregate that, given a level in each of the dimensions, returns an aggregation predicate that map keys in the given levels to a sales number (float[64]):

agg_total[Cal,Prod,Loc]=AGG 
    --> cal_level(Cal), prod_level(Prod), loc_level(Loc), 
        predicate(AGG :: [Cal,Prod,Loc]=`float[64]).

agg_total[Cal,Prod,Loc]=AGG,
anon_block(`{
    AGG[cal,prod,loc]=sum 
       <- agg<<sum=total(y)>>
          F[day,sku,store]=y,
          Day2Cal[day]=cal,
          Sku2Prod[sku]=prod,
          Store2Loc[store]=loc.                
 })
    <-- agg_intersection(Cal,Prod,Loc),
        dim_map[`day,Cal]=Day2Cal, predicate(Day2Cal :: [`day]=Cal),
        dim_map[`sku,Prod]=Sku2Prod, predicate(Sku2Prod :: [`sku]=Prod),
        dim_map[`store,Loc]=Store2Loc, predicate(Store2Loc :: [`store]=Loc),
        agg_function(F :: [`day,`sku,`store]=`float[64]).

To query the total sales of a particular intersection, such as <month,sku,store>, one can write:

_(m,s,st,total) 
    <- agg_total[`month,`sku,`store][m,s,st]=total.

Generic spreading

Spreading is a specific case of view-update. When the value of an aggregation, e.g., the total of sales over a certain intersection, is changed, we want to change the base values that sum up to the aggregated value, so that the new aggregated value is consistent with base values. The following example shows two spreading operations defined on the generic aggregations defined in the previous example.

The first rule says that if the aggregated total is changed to a new value, then the base values should be changed proportionally. The second rule says that if the aggregated value is deleted, then all base values involved in that aggregation should be deleted, as well.

anon_block(`{
   // proportionally spread down the new aggregated value.
   ^F[day,sku,store]=new_val
      <- ^AGG@initial[cal,prod,loc]=new_sum,
         AGG@previous[cal,prod,loc]=old_sum,
         F@previous[day,sku,store]=old_val,
         Day2Cal[day]=cal,
         Sku2Prod[sku]=prod,
         Store2Loc[store]=loc,
         new_val = ( old_val / old_sum ) * new_sum.

   // retract all values that contributes to a deleted aggregated value.
   -F[day,sku,store]=old_val
      <- -AGG@initial[cal,prod,loc]=_,
         !(+AGG@initial[cal,prod,loc]=_),
         AGG@previous[cal,prod,loc]=_,
         F@previous[day,sku,store]=old_val,
         Day2Cal[day]=cal,
         Sku2Prod[sku]=prod,
         Store2Loc[store]=loc.         
})
   <-- agg_total[Cal,Prod,Loc]=AGG, predicate(AGG :: [Cal,Prod,Loc]=`float[64]),
       agg_function(F :: [`day,`sku,`store]=`float[64]),
       agg_intersection(Cal,Prod,Loc),
       dim_map[`day,Cal]=Day2Cal, predicate(Day2Cal :: [`day]=Cal),
       dim_map[`sku,Prod]=Sku2Prod, predicate(Sku2Prod :: [`sku]=Prod),
       dim_map[`store,Loc]=Store2Loc, predicate(Store2Loc :: [`store]=Loc).

Desugaring quoted code

The quoting mechanism, `{...}, is in fact just syntactic sugar for the convenience of the programmers. All quoted code desugars into fairly straight-forward generic datalog rules, where the relational representation of the quoted code AST is being derived by the generic rules.

For instance, given the following generic rule:

another_binary[P]=B,
anon_block(`{ B(x,y) <- P(x,y). })
  <-- binary_pred(P).

The desugared form is:

another_binary[P]=B,
rule(R), // represents the rule to be derived

// describes the head of the rule being derived.
rule_head[R]=head, formula_type[head]="atom",
atom_pred[head]=B, atom_numkey[head]=2, 
atom_arg[head,0]=ha1, expr_type[ha1]="varname", name2string[ha1]="x",
atom_arg[head,1]=ha2, expr_type[ha2]="varname", name2string[ha2]="y",

// describes the body of the rule being derived.
rule_head[R]=body, formula_type[body]="atom",
atom_pred[body]=P, atom_numkey[body]=2, 
atom_arg[body,0]=ba1, expr_type[ba1]="varname", name2string[ba1]="x",
atom_arg[body,1]=ba2, expr_type[ba2]="varname", name2string[ba2]="y"
  <-- binary_pred(P).

Built-in generic predicates

Built-in generic predicates are defined and populated to describe basic relationships between level 0 program data: they are relational representations of the AST (abstract syntax tree) of a level 0 program, as well as typing properties of level 0 programs. For readers familiar with level 0 datalog programs, the structure of these entities and how they would be populated given a particular level 0 program, should be intuitive.

At the programmer level, the basic type of data generic Datalog programs should be computing over is predicate. Thus, the following are the programmer-visiable built-in generic predicates:

predicate(P) --> .
entity(P) --> predicate(P).
primitive_type(P) --> predicate(P).

Compiler-visible generic predicates

The predicates below are not avaliable to programmers writing generic programs. The are only used by the MoReBlox compiler to store information about existing and generated programs.

Entities

code(c) --> .

program(P) --> code(P).
comp_unit(C) --> code(C). 

clause(C) --> code(C).
rule(R) --> clause(R).
external_agg(E) --> rule(E).
constraint(C) --> clause(C).

formula(F) --> code(F).
composite_formula(F) --> formula(F).
comparison(C) --> formula(C).
negation(F) --> formula(F).
atom(A) --> formula(A).
delta_atom(A) --> atom(A).

expr(E) --> code(E).
constant(C) --> expr(C).
string_const(C) --> constant(C).
int_const(C) --> constant(C).
real_const(C) --> constant(C).
bool_const(C) --> constant(C).
datetime_const(C) --> constant(C).
name(N) --> expr(N).
varname(N) --> name(N).
predicatename(N) --> name(N).
application(A) --> expr(A).
delta_appl(D) --> application(D).
binary_expr(E) --> expr(E).
not_expr(E) --> expr(E).

Non-entity predicates

In addition to entities, there are a number of level 1, non-entity predicates that relate these entities to one another. These are not available to the generic programmer.

pred2compUnit[c]=cu -> predicate(c), comp_unit(cu). 
compunit_name[cu]=n -> comp_unit(cu), string(n).
compUnit2program[cu]=p -> comp_unit(cu), program(p). 
code2compUnit[c]=cu -> code(c), comp_unit(cu). 

type[p,i]=t -> predicate(p), uint[8](i), string(t).
pred_arity[p]=c -> predicate(p), uint[8](c).
pred_numkeys[p]=k -> predicate(p), uint[8](k). 
predicate_name[p]=n -> predicate(p), name(n).

rule_head[r]=f -> rule(r), formula(f).
rule_body[r]=f -> rule(r), formula(f).

cons_lhs[c]=f -> constraint(c), formula(f).
cons_rhs[c]=f -> constraint(c), formula(f).

formula_type[f]=s -> formula(f), string(s).

subformula(f,sf) -> composite_formula(f),formula(sf).
negated_formula[nf]=f -> negation(nf), formula(f).

comparison_lhs[c]=f -> comparison(c), expr(f).
comparison_rhs[c]=f -> comparison(c), expr(f).

atom_pred[a]=p -> atom(a), predicate(p).
atom_predname[a]=s -> atom(a), string(s).
atom_arg[a,i]=arg -> atom(a), uint[8](i), expr(arg).
atom_numkeys[a]=i -> atom(a), uint[8](i).
atom_delta_type[a]=d -> delta_atom(a), string(d).
atom_onetoone(a) -> atom(a).

agg_lib[e]=s -> external_agg(e), string(s).
agg_op[e]=s -> external_agg(e), string(s).
agg_numvar[e]=n -> external_agg(e), uint[8](n).
agg_opvar[e,i]=v -> external_agg(e),uint[8](i),expr(v).

appl_pred[a]=p -> application(a), predicate(p).
appl_predname[a]=s -> application(a), string(s).
appl_arg[a,i]=arg -> application(a), uint[8](i), expr(arg).
appl_delta_type[a]=d -> delta_appl(a), string(d).

expr_type[e]=s -> expr(e), string(s).

binaryexpr_lhs[b]=e -> binary_expr(b), expr(e).
binaryexpr_rhs[b]=e -> binary_expr(b), expr(e).

string_const_val[s]=v -> string_const(s), string(v).
int_const_val[i]=v -> int_const(i), int[32](v).
bool_const_val[b]=v -> bool_const(b), boolean(v).
real_const_val[r]=v -> real_const(r), float[32](v).
datetime_const_val[d]=v -> datetime_const(d), datetime(v).

name2string[n]=s -> name(n), string(s).

code_level[p]=i -> code(p), uint[8](i).

from_source(cu) -> comp_unit(cu).
start_pos[c]=i -> code(c), uint[16](i).
end_pos[c]=i -> code(c), uint[16](i).

Chapter 14. Separate Compilation

Separate compilation allows the compilation of a LogicBlox project without the existence of a workspace. Separate compilation can improve the development cycle in two ways. First, disassociating the compilation of logic from the creation/addition of logic into a live workspace means faster compilation times. Secondly, separate compilation supports incremental compilation: when a logic file is changed, only the changed file, and the files that depend on it, are recompiled.

There are three components to using separate compilation:

  • Organizing your logic into a project
  • Compiling the project
  • Installing the compiled project into a workspace

We discuss each of these aspects in the following sections.

14.1. A LogicBlox Project

A LogicBlox project has two components:

  • a directory, with possibly subdirectories, containing the logic files of the project
  • a project description file

The project description file specifies the files, libraries, and modules that need to be compiled as part of this project. Currently, the file is just a simple comma-separated text file. Each line is of the following form:

<name>, <type indicator>

  • <name> Name of the item. For files and modules it can be either an absolute path or the path relative to the directory containing the project description file. However, despite allowing absolute paths, all files must be contained within the same directiory as the project description file, or in subdirectories of that directory.
  • <type indicator> The type of the name. The following values are allowed:
    • projectname: used to specify the name of the project. Project names must be valid Datalog identifiers.
    • active: for installed blocks.
    • inactive: for stored queries.
    • inactiveAfterFixpoint: for stored queries intended to be actived after the end of stage final.
    • execute: for level 1 logic, this file will be compiled and installed into the level 1 compilation workspace at compile time. For level 0 logic, the file will be compiled, and will be executed when the project is installed into a workspace.
    • module: indicates that the name on the first field is that of a directory containing datalog modules.
    • library: indicates that the name on the first field is that of a library. The name of the library is the same as the projectname specified in the library's project description file. The environment variable LB_LIBRARY_PATH, or, alternatively, the command line option -libPath can be used to specify a path of directories to be recursively searched for libraries. By default, the $LOGICBLOX_HOME/BlockResources directory is always included in the search for libraries.
    • severities: severity declarations for various error codes. There can be only one severities declaration file per project. The severity declarations in the severities file applies to all logic in a project. It is possible for individual logic files to increase the project-wide severity declaration; however, an individual logic file cannot decrease a project-wide severity. To change the severity level of a code from its default to a different level of severity, you can use the following four types of declarations in the severities file:
      // do not report CODE at all
      lang:compiler:disableWarning:CODE[]=true.
      
      // report CODE as a warning only
      lang:compiler:disableError:CODE[]=true.
      
      // report CODE as an error
      lang:compiler:error:CODE[]=true.
      
      // report CODE as a warning
      lang:compiler:warning:CODE[]=true.
      

The project file can contain whitespace, which is ignored by the compiler. Comments are specified by starting a line with two forward slashes. A comment must exist by itself on a line.

As of now, the entries are specified in the required order of compilation, with the programmer managing the compilation dependencies for non-module code. For example, the following project file ("project.txt") separately compiles a library, some legacy files and a module directory:

// This is a comment
// This specifies that this project is named example
example, projectname
system:baseBootstrap,library

// An active legacy-code block
b1.logic,active

// This is a directory containing modules
employees,module

b2.logic,active
b3.logic,active

The compiler will compile each entry in the project description file in the order they are specified in the file. Other than libraries, which are all installed first, this is also the order in which the logic will be installed and executed. However, for a directory containing module code, the dependencies within the modules are computed and the correct ordering is determined automatically.

14.2. Compiling your project

A project can be compiled using the following command on Unix/Mac:

$ bloxcompiler -compileProject project.txt [-explain] [-progress] [-outDir directory] [-libPath path] [-clean]

If you are using a version where these scripts are not avaliable, the compiler jar file needs to be invoked explicitly, using the following command:

$ java -jar $LOGICBLOX_HOME/bin/BloxCompiler.jar -compileProject project.txt [-explain] [-progress] [-outDir directory] [-libPath path] [-clean]

The options have the following meanings:

  • -outDir directory: the directory containing the output of the compilation.
  • -libPath path: the library path. If non specified, then $LOGICBLOX_HOME/BlockResources is used.
  • -clean: no incremental compilation is used. The entire project is compiled from scratch.
  • -progress: show compilation progress and timing information.
  • -explain: show information about the incremental recompilation decisions the compiler is making.

If there are no compilation errors, this step produces a file.lbb file for each logic file file. Additionally, it also produces a summary file: LB_SUMMARY.lbp. Each bytecode file is generated in the same directory as the corresponding source file, unless the -outDir option is used.

Compilation is incremental based upon a number of heuristics. If any of the libraries referenced by the project have changed, that is, they have a newer timestamp than the time the project was last compiled, the entire project must be recompiled. Individual files are recompiled based upon the following rules

  1. If a clean build is specified, the file will be recompiled.
  2. If there is no corresponding bytecode file, the logic must be recompiled.
  3. If the source text is newer than the existing bytecode file, it must be recompiled.
  4. If a predicate compiled in an earlier file has changed, and the existing bytecode file for this file indicates that the file references that predicate, then the file must be recompiled.

Incremental compilation will also attempt to optimistically reuse level one compilation workspaces, if possible. Level one compilation workspaces can be reused if

  • A clean build was not specified.
  • No level one code has changed or beenrecompiled.
  • No level zero predicate signatures have changed, been added or removed.

Unfortunately, because the compiler may need to decide whether to reuse a level one compilation workspace before it can determine whether all of these conditions are met, it is possible for some projects that certain sequences of edits to cause incremental compilation to fail. If the compiler detects that it shouldn't have reused the compilation workspace, it will report an error and on the next compile it will use a fresh compilation workspace.

You can get feedback from the compiler on why it makes certain incremental compilation decisions by passing it the -explain option.

14.3. Installing a project into a workspace

Once the project has been compiled, the following command can be used to install the project into a workspace:

$ bloxbatch -db workspace -installProject -dir <outDir>

outdir should be the output directory of the compiled project. Note that if a library is referenced by the project, if that library has not already been installed, the library will be searched for and installed if found. If the library cannot be found, installation will fail. $LOGICBLOX_HOME/BlockResources will always be searched. Optionally, additional directories will be searched if the LB_LIBRARY_PATH environment variable is set, or the -libPath command line option is given.

14.4. Bytecode file format

The Datalog bytecode format is loosely based on the Java bytecode format:

  • Magic String: A sequence of 4 hexadecimal numbers (4 bytes) which serves as an identifying string. This sequence is "datalog" (0d a7 a1 09).
  • Version Number: 4 bytes, used to specify a bytecode file format version number
  • Summary size: 4 bytes, interpreted as an integer depicting the number of bytes used by the summary message following it.
  • Summary: A protocol buffer message of type CompilationUnitSummary, containing a succint description of the predicates declared in this compilation unit.
  • Code size: 4 bytes, interpreted as an integer depicting the number of bytes used by the code message following it.
  • Code: A protocol buffer message of type CompilationUnit, containing all the logic code in LB0 (Protocol Buffer) format for the block.

14.5. Summary file format

The summary file format is similar to the bytecode file format:

  • Magic String: A sequence of 4 hexadecimal numbers (4 bytes) which serves as an identifying string. This sequence is "database" (da 7a ba 5e).
  • Version Number: 4 bytes, used to specify a summary file format version number.
  • Project name size: 4 bytes, used to specify the length of the project name field.
  • Project name: A UTF-8 encoded string with the length specified by the previouse field.
  • Summary message size: 4 bytes, interpreted as an integer depicting the number of bytes used by the project summary message following it.
  • Summary message: A protocol buffer message of type ProjectSummary, containing a graph of dependencies within this project.

Chapter 15. Modules

Modular design and data abstraction are important in the development of large scale software systems. To help users structure large Datalog projects into manageable pieces, the language provides a module system to allow developers to divide large projects up into smaller pieces.

The DatalogLB module system provides several specific benefits to developers. The most significant is that it logic written using the module system is incrementally recompiled based on dependencies automatically extracted from the module definitions. Therefore, if you edit one concrete block, when separately compiling it will only recompile that concrete block and those that depend upon it.

Another benefit to developers is the module system allows for aliasing predicates defined in other concrete blocks or namespaces to shorter names to make their logic more concise and easier to read.

Finally, the module system provides for hiding and sealing predicates defined in a concrete blox. Only those predicates explicitly exported are visible from other concrete blox and predicates may also be defined as sealed, which prevents other concrete blox from deltaing or deriving into them. This makes it possible to be sure someone else does not intentionally or accidentally add a new logic for a predicate invalidating your assumed invariants.

15.1. ConcreteBlox

A concrete block is a set of clauses along with a set of exported predicates and a set of alias declarations.

An exported predicate is a predicate declared by the concrete block that may be used by other concrete blocks. Some of these exported predicates may be declared to be sealed. The contents of a sealed predicate may be observed by other concrete blocks, but they may not insert into that predicate or include new rules that derive into that predicate.

Alias declarations are used by a concrete block to give alternate names to predicates, blocks, or namespaces when writing the clauses the comprise the concrete block.

Finally, it is also possible for a concrete block to be declared as inactive, that is, it is not part of the active installed program. These concrete blocks can be scheduled for execution as needed.

15.1.1. Writing your very first concrete block project

The first step is to create a new directory to hold your project. We will call this directory test. Once we have a directory to hold the project, we will create inside of that directory a project file for the compiler to read. We will call our project file project.txt. Inside of this file we will enter in the following text:

myproject, projectname
mylib, module

Like other project files used in separate compilation, the format is a filename followed by a command and then a qualifier. In this case, the project file we have written says that the directory mylib contains a module based project. Now that we have created the project file, inside of the test directory we will create the directory mylib.

Inside of the mylib create a file called A.logic for our first concrete block. Concrete blocks must be written in files with a .logic extension for the compiler to recognize them. Inside of A.logic we will write the following:

block(`A) {
  export(`{ p(x) -> . }),
  clauses(`{
    q(x) -> .
  })
} <-- .

This declares a concrete block named A that defines two entities, p and q. Additionally, the concrete block contains a declaration, export(`{ p(x) ->. }) stating that the entity p is exported for use by other concrete blocks. That means other concrete blocks may write A:p to refer to the entity p define in concrete block A.

There are a few specific things to notice about what we have written. First, you may notice that the concrete block looks like a MoReBlox rule. This is intentional, but we do not currently support mixing MoReBlox code and concrete blocks. However, this is planned for a future release. Second, the syntax of the concrete block suggests that we are using hierarchical syntax. Again this is intentional, but we currently do not support writing concrete blocks in a desugared form. This may change in future releases.

It is also important to note that the name of a concrete block must match the name of the file in which it is defined, minus the extension. That is for any concrete block called name it must be inside of a file called name.logic. This restriction is necessary so the compiler can easily determine which logic files need to be reparsed and recompiled when files that they depend upon change.

Now we will create a second concrete block in a new file called B.logic within the mylib directory:

block(`B) {
  export(`{ r[x]=y ->  A:p(x), int[64](y). }),
  alias(`mylib:A:p, `otherp),
  clauses(`{
    p(x) -> otherp(x).
  })
} <-- .

This declares a concrete block named B that defines a subtype entity p that is not exported and a functional predicate r this is exported.

This concrete block introduces the aliasing functionality provided by ConcreteBlox. The declaration alias(`mylib:A:p, `otherp) states that inside of this concrete block whenever we use the predicate name otherp that we are in fact referring to the predicate mylib:A:p.

Next, inside of the mylib directory, we'll create a directory named util. This directory creates a new namespace called util. If you are familiar with Java, this is similar to how it maps directories to package names. However, unlike Java, concrete blocks do not need to specify which namespace they live within. To refer to concrete blocks within a specific namespace, the name of the concrete block must be prefixed with the namespace separated by a colon. The directory that we specified in the project file, here mylib and project.txt respectively, is treated as the root namespace.

Inside of the util directory we will create another logic file called C.logic:

block(`C) {
  alias_all(`mylib:A),
  clauses(`{
    f[x] = y -> p(x), int[64](y).
  })
} <-- .

Because the concrete block C is contained within the namespace util, its fully qualified name is util:C. It does not export any predicates.

In the definition of C, we have used the other form of aliasing offered by ConcreteBlox. The concrete block gives the alias declaration alias_all(`mylib:A) which allows all predicates within the concrete block mylib:A to be used without any prefix. That is, the predicate mylib:A:p may and is referenced simply by writing p within the concrete block C.

Finally, inside of the util directory, we will also create a file D.logic containing:

block(`D) {
  inactive(),
  clauses(`{
    +mylib:A:p(x).
  })
} <-- .

This defines a concrete block D, that like C does not export any predicates. However, unlike C, this block has been declared inactive by writing the declaration inactive(). Scheduling the execution of this this block will cause a new instance of the entity p, defined in the concrete block mylib:A, to be created. You may also specify that a block is active by writing the declaration active() instead, but blocks default to being active if you provide no declaration.

Now that we have created all these files for our project, we will now compile it. We do this by running something like the following within the test directory that we created:

bloxcompiler -compileProject project.txt

If you did not make any mistakes in defining the project or concrete blocks, you will now have a file called LB_SUMMARY.lbp in the test directory. This is called the project summary file and contains the information the runtime needs to install the project into a workspace. You may also notice that within the mylib and util directories there are now .lbb files corresponding to the logic files we have written.

Now we can write a small BloxBatch script to install our project into a workspace. Inside of the test directory, we will create a file called install.lb with the following contents:

create --unique
transaction
  installProject --dir .
commit
transaction
  exec --storedBlock mylib:util:D
commit
transaction
  print mylib:A:p
commit

In the first transaction, we use installProject --dir . to tell the runtime to look for the project summary file (LB_SUMMARY.lbp) in the current directory. In the second transaction, we use exec --storedBlock mylib:util:D to execute the code in the inactive concrete block mylib:util:D. Finally, in the last transaction, we use print mylib:A:p to observe that executing mylib:util:D did create a new instance of the entity p contained within concrete block mylib:A.

15.1.2. Names

One of the most significant differences between writing logic in a concrete block and an legacy block is that colons in predicate names now have semantic meaning. A predicate name like foo that does not contain colons called a simple name. A predicate name like bar:baz that does contain a colon is called a qualified name.

We may sometimes refer to part of a qualified name up to some colon as a prefix of the name. For example, a prefix of the qualified name bar:baz is bar and the qualified name a:b:c has the prefixes a:b and a.

15.1.3. Name trees

The process of finding a specific predicate from a given predicate name is performed using a structure that we call name trees. Qualified names in a module project can be seen to form a tree where the edges of the tree are simple names.

For example, suppose we created a module project in the directory project containing the file foo.logic and the directories foo and bar, where foo contains the files one.logic, two.logic and bar contains the file three.logic. Visually the directory structure will look like:

Given the above project the name tree corresponding to the root of the project the would look something like:

The circles represent namespaces and boxes represent exported predicates. There are a number of differences between two trees to note.

First, the module directory project name is used as the single edge from the root of the name tree.

Second, notice that all concrete blocks have become namespaces. For name resolution purposes, concrete blocks can be seen as defining their own namespaces.

Third, there is only one edge labeled foo from the the project node of the name tree. This is because the project directory contains both a directory and concrete block named foo as children. Each edge from a given parent node to its children must have a distinct name, so we cannot have two edges labeled foo. Therefore, the name tree merges the two into a single node. This can only ever happen when there is a directory and a concrete block with the same name with the same parent. Because the project is structured around the filesystem, it is never possible to have two namespaces with the same name or two concrete blocks with the same name as children of the same node.

This name tree is what we call the project name tree, because it is the name computed from the root of the project. Each namespace has its own corresponding name tree.

The fully qualified name of a concrete block or predicate is defined by joining together all the simple names found in path from that concrete block or predicate in the root of the project name tree. So in the example project name tree above, the fully qualified name of the concrete block two is project:foo:two and the fully qualified name of the predicate q contained in the concrete block three is project:bar:three:p.

Determining the node a name points to in any given name tree is very simple. Split the name into a list of simple names by removing the colons. Then starting at the root of the name tree, follow the edges given by the simple names. In the example name tree above, to find the node corresponding to the name project:bar:three:p we would start at the root of the tree, follow the project edge, then the bar edge, then the three edge, and finally the p edge.

Starting from the project name tree, we can recursively construct the name trees for each of its descendent nodes. Given a name tree, to construct the name tree that corresponds to one of its immediate children, we just add edges from the root to each of that child's children using the same labels.

To make it easier to visualize and describe how this transformation takes place, we will label the nodes in the following examples with numbers. These numbers are merely for illustrative purposes and do not correspond to anything written by the user or use internally by the compiler. We will start with an extremely simple example and progress to more complicated ones.

Here, the project name tree just contains a single namespace child foo. To obtain the name tree used by the node named foo, labeled with 1, we simply add an edge labeled with p from foo's only child, the node labeled with 2, to the root:

Technically, at this point, we no longer have a tree, but a directed acyclic graph. Notice now in the new name tree, we have two possible ways to name the predicate p: as foo:p and as p. Therefore, when writing logic in the concrete block foo we may refer to the same predicate either way depending on aesthetics or readability.

Now let us consider a slightly more complicated example:

Now suppose we want to find the name tree for the node named foo:bar, labeled with a 2. We start by constructing the name tree for the node named foo, labeled with 1. This involves adding edges between the root and all of the children of this node. So we add an edge labeled bar from the root to the node labeled 2 and an edge labeled p from the root to the node labeled 4.

Next, we just repeat the process, but instead for the children of the bar namespace, the node labeled with 2. This means simply adding an edge labeled with p from the root to the node labeled with 3.

Again, note how there are many ways to refer to the same namespace or predicate. For example, foo:bar:q, bar:q, and q can all be used to refer to the same predicate.

However, the process is not always quite so simple. In some cases it is possible for a node in the name tree to become inaccessible. When this happens we say that the namespace or predicate that is no longer accessible is shadowed. As an example, suppose we started with the following project name tree:

Now let us construct the name tree for the node named foo:foo, that is the node labeled with 2. We start by constructing the name tree for the node named foo, labeled with 1.

Because a node's child edges must all be distinct, when we add an edge labeled foo from the root node to node labeled with 2, the original edge labeled with foo that connects the root to the node labeled with 1, becomes shadowed. We have indicated which parts of the name tree are no longer accessible by coloring them grey. In practice, the are no longer even part of the name tree, but we include them here for illustrative purposes. Next, we repeat the process to obtain the name tree relative to foo:foo.

Again, because we need to add an edge labeled with p from the root node to the node labeled with 3, the original edge labeled p from the root not to the node labeled with 4 becomes inaccessible, and that predicate becomes shadowed.

15.1.4. Aliasing

Alias declarations are interpreted as instructions to add new edges to the root of a name tree. Unlike what we have seen above, aliases are not allowed to cause shadowing. Doing so will result in an ALIAS_PREDICATE or ALIAS_NAMESPACE error.

Given the following project name tree:

The name tree for foo looks like the following:

Now suppose the concrete block foo had the alias declaration alias(`bar:q, `q). This would result in the following name tree:

Note that if foo had the alias declaration alias(`bar:q, `p) it would result in an ALIAS_PREDICATE error. It is also possible to alias namespaces. For example, if foo also contained the alias declaration alias(`bar, `baz), the resulting name tree would look like:

It is even possible for a concrete block to alias its own name. However, this will only have an effect on naming predicates that the concrete block exports.

The edges added by aliasing are all performed independently of each other. This means it is not possible to alias the result of another aliasing operation. This is done to ensure that the order that aliases are written does not affect the outcome.

ConcreteBlox also provides the declaration alias_all. This declaration takes all of the children of the given namespace and adds edges to the root with the same names. So for example, given the following project name tree:

The name tree for foo would be:

If foo had the alias declaration alias_all(`bar), the resulting name tree would be:

15.1.5. Name resolution

One of the first steps in compiling a concrete block is for the compiler to resolve the names of all predicates to fully-qualified names. This is done by walking over all the predicate names in the logic defined by the concrete block and rewriting them based upon the following process.

  1. If the name can be found in the concrete block's name tree, we replace that predicate name with its fully qualified name.

  2. If not, we check whether it is a primitive or built-in predicate.

  3. If not, we check whether it is a predicate defined in a legacy block that was compiled prior to this concrete block.

  4. If not, if the predicate has a simple name, we assume that it is a predicate defined within this concrete block, but not is not exported. If the predicate's name is bar and the fully-qualified name of its concrete block is foo then its fully qualified name becomes foo:bar.

  5. If not, we report a BLOCK_UNKNOWN_PREDICATE error. If there is predicate with a similar enough name in the name tree, we may report a BLOCK_UNKNOWN_PREDICATE_TYPO error.

15.1.6. Block stage and lifetime

By default, logic defined in concrete blocks is active logic. However you may overide this behavior using the inactive() or execute() directives. For instance, the following defines an execute block:
block(`B) { 
  execute(), 
  clauses(`{ 
    +A:foo("a").
}) } <-- .

Chapter 16. Default-Value Predicates

Functional predicates can be assigned a default value. Using default values carefully can give a large performance improvement to your programs, because any fact in a predicate that has the default value is not explicitly stored.

16.1. Declaring a default value

A default value for a predicate is declared as follows.

lang:defaultValue[`predName] = value

After such a declaration, the predicate named predName is considered declared at all possible key values. For any key where the program does not provide an explicit value, the predicate's value will be the default value.

16.2. Functional determination

A functional atom is said to be functionally determined if there exists a functional dependency between the atom's keys and the atoms's value. That is, for any combination of input keys, the atom will only ever have one value.

Example 16.1. 

In the following rule, because f[x] = y is functionally determined, functionaly dependency checking can be disabled because it is impossible to get a functional dependency error.

f[x] = y <- g[x] = y.

In the following rule, because f[x] = y is not functionally determined, it is necessary to do functional dependency checks. For example, if predicate 'g' has facts g[1, 2] = 3 and g[1, 3] = 4, then a functional dependency error is necessary for f[1] with values 3 and 4.

f[x] = y <- g[x, _] = y.

16.3. Functionally determined predicates

A predicate 'p' is functionally determined if for all rules and disjuncts defining predicate 'p', the head occurrences of that predicate are functionally determined. Furthermore, because the analysis is conservative, each of these rules or disjuncts must be disjoint from the result of all other rules or disjuncts.

Example 16.2. 

In the following two rules, the head occurences of 'f' are individually functionally determined, but the rules are not guaranteed to be mutually exclusive. Hence, the we cannot be certain the predicate 'f' is functionally determined.

f[x] = y <- g[x] = y.
f[x] = y <- h[x] = y.

If a predicate is functionally determined, then it is not necessary to do functional dependency checking during evaluation. This is an important run-time optimization.

Currently, if a predicate is defined using multiple rules or disjuncts, then it is necessary to explicitly declare that the predicate is disjoint in order for the functional determination analysis to be performed. By this declaration, the user guarantees that the rules are mutually exclusive. This is not currently verified by the compiler. A predicate is declared disjoint using the predicate property lang:disjoint.

lang:disjoint(`f).

Warning

In the following example, there will not be a functional dependency violation if predicate 'f' is incorrectly declared as disjoint. It is undefined what facts predicate 'f' contains after executing this program.

create /tmp/db --overwrite

transaction
addBlock <doc>
   a(x) -> .
   g[x] = y -> a(x), int[32](y).
   h[x] = y -> a(x), int[32](y).

   f[x] = y <- g[x] = y.
   f[x] = y <- h[x] = y.

   lang:disjoint(`f).
</doc>
commit

transaction
exec <doc>
   +a(x), +g[x] = 2, +h[x] = 3.
</doc>
commit

close

16.4. Restrictions for predicates with default values

Predicates with default values currently must be functionally determined. If a rule derives a predicate with a default value, but the rule body is not functionally determined by the head occurrence of that predicate, then a compile-time error is reported.

Cannot be computed using recursive deduction rules Cannot be used in negation Rules computing facts for predicates with a default value need to be unique derivations. Insertion and retraction do not make sense for predicates with default values.

Chapter 17. Provenance

17.1. Recording and querying provenance

Recording of provenance information during program evaluation can be turned on by using the meta-property lang:provenance. There is also an option about whether to include constant fields of rules in the provenance predicates (mostly for future use).

lang:provenance[] = true.

lang:provenance:recordConstants[]=true.

When the aforementioned properties are set, new predicates are created by the compiler to store the provenance of each rule in the original program. In order to access the provenance of a specific fact in a predicate, one can use the 'provenance' command as follows:

provenance <doc>
+provenance[`predicateName](values ...).
</doc>

This provenance query will return the steps involved in deriving the fact predicateName(values ...). For exampleconsider the following program:

transaction
addBlock <doc>
  lang:provenance[]=true.

  person(x), pname(x:n) -> string(n).
  parent(x,y) -> person(x), person(y).

  ancestor(x,y) <- parent(x,y).
  ancestor(x,y) <- parent(x,z), ancestor(z,y).
</doc>
commit

transaction
exec <doc>
  +parent("bob","mary").
  +parent("mary", "joe").
  +parent("bob", "mandy").
  +parent("mandy","jill").
  +parent("jill", "jake").
</doc>
commit

Then, the query:

provenance <doc>
  +provenance[`ancestor]("bob","jake").
</doc>

would return:

Provenance information:
  ancestor(bob,jake) <- parent(bob,mandy),ancestor(mandy,jake).
  ancestor(jill,jake) <- parent(jill,jake).
  ancestor(mandy,jake) <- parent(mandy,jill),ancestor(jill,jake).

17.2. Language constructs for which provenance is not defined

For programs involving negation, provenance derivations are only explored up to the step where the negation is applied. Intuitively, this is because there are no particular derivations that justify the ‘non-existence’ of a particular fact in a predicate. For example, if we have the program:

t(x) <- v(x,_).
r(x) <- s(x), !t(x).

+s(1), +s(2), +v(1,1), +v(1,2).

the provenance query:

provenance <doc>
  +provenance[`r](2).
</doc>

produces the output:

Provenance information:
r(2) <- s(2),!(t(2)).

Aggregate functions are also treated similarly to negation, i.e., provenance is explored up to the point where they are used but not any further, e.g. to the rules deriving fact into the predicate over which the aggregation is applied.

The current provenance implementation deals with rules with insertion (delta) atoms. Retract and upsert atoms are going to be considered in the next release.

Disjunctive rules are always converted to DNF for provenance queries, to be able to determine which disjunct produced the derivation of a fact.

17.3. Provenance rewrite as separate compilation

Starting from release 3.6, the provenance rewrite can be invoked during Separate Compilation by using the -provenance option. For example:

java -jar BloxCompiler.jar -compileProject project.txt -provenance

When this method is followed, the compiler options for provenance (e.g., lang:provenance[] = true.) should not be used in individual files. To ensure that the provenance rewrite is never applied on a program that has already been rewritten to record provenance, the -provenance option implicitly invokes the -clean option to delete existing .lbb files and compiler programs from scratch. Note that, when the project includes multiple files, after the provenance rewrite has been invoked as separate compilation all predicate declarations are contained in the bytecode file corresponding to the first file of the project.

The resulting bytecode files can be executed (as with other results of separate compilation) using the -buildProject -dir < project root directory> bloxbatch option and provenance queries can be performed on the result of their execution, using the syntax explained above.

In frontend unit tests for provenance as separate compilation the following line should be included in the project.txt file, to indicate that the provenance rewrite should be performed on all files in the project.

// @Provenance

This is equivalent with specifying the option -provenance in the command line.

Chapter 18. Concurrency Control

18.1. Introduction

Datalog programs run faster when parts of them can run on multiple processors in parallel. This chapter describes some techniques you can use to get more concurrency in your programs.

18.2. Element-level locking

We have introduced a more fine-grained locking policy, called element-level locking. Until now, our only locking policy was predicate-level, which means that if one transaction is writing to any fact of a predicate 'p', then this would conflict will all transactions reading or writing to any facts of the same predicate 'p'. This rather coarse-grained locking policy was a major limitation for building applications on our platform that require concurrent transactions.

Element-level locking refines the predicate-level locking policy to individual entity elements. If a predicate is locked by element, then instead of acquiring a lock on the entire predicate, the engine locks only a single entity element for each individual fact that is read or written.

The locking policy used for a predicate can be declared by using the meta-property lang:lockingPolicy. The default value for database lifetime predicates is still locking by predicate. To refine the policy to element-level locking, set the locking policy to "ByElement". For a predicate to be locked by element, it is necessary for one of the entity arguments to be locked by element. The compiler automatically sets the locking policy of a predicate to "ByElement" if one of its entity arguments is locked by element. In general, this means that only the locking policy of entities needs to be declared explicitly. For example, in the following schema, both 'person' and 'parent' will use element-level locking.

person(x) -> .
lang:lockingPolicy[`person] = "ByElement".

parent(x, y) -> person(x), person(y).

When using element-level locking, for every individual fact in a predicate, only one entity will be locked. Currently, the engine selects an argument to lock based first on the order of registration of the entity in the workspace (predicates registered earlier are preferred) and second based on the order of the arguments (from left to right). For example, given the following facts in the predicate 'parent', the person 'Mary' or the person 'Bill' is locked when one of these facts is read while evaluating a query.

+parent("Mary", "John").
+parent("Bill", "John").

To help analyze concurrency problems, we have introduced an option to log the evaluation of rules that might cause contention between different transactions (See the Section Testing, Tuning, Monitoring and Profiling).

18.3. Isolation levels

In LogicBlox 3.3 we have implemented support for standard isolation levels. Isolation levels define when changes to the database made in one transaction become visible in other, possibly concurrent, transactions. In principle, the most desirable isolation level is that all transactions appear to be executed in complete isolation. This would require extensive locking and cause major contention between different threads, so instead the ANSI/ISO SQL standard defines a number of more relaxed isolation levels. These isolation levels mostly control what kind of locks are acquired and for how long the lock is kept by a transaction.

18.3.1. Supported Levels

We now support the following ANSI/ISO SQL isolation levels:

  • Repeatable Read

  • Read Committed

  • Read Uncommitted

And additionally two levels specific to our platform:

  • Less Repeatable Read

  • Single User

Specific to our platform, we add the isolation level 'Lesser Repeatable Read', which is a slight relaxation of 'Repeatable Read', but more strict than 'Read Committed'. The reason for this isolation level is that with our platform complete applications run inside of the database. In this scenario, 'Repeatable Read' soon degenerates into predicate-level locking, which causes too much contention.

LogicBlox applications (in particular those using UIBlox) typically use element-level locking to allow concurrent users of the application. Unfortunately, managing concurrency is costly, which influences large data loads that are typically not concurrent very negatively. To avoid the overhead of concurrency for such data loads, we have introduced an isolation level, called SINGLE_USER. This isolation level minimizes concurrency management and has demonstrated to speed up typical data loads by a factor 4.

Similar to many other database systems, we do not support the standard isolation level 'Serializable' (which corresponds to full isolation). Instead, we are actively working on introducing Snapshot Isolation, which is scheduled for release in LogicBlox 3.5.

18.3.2. Configuration

The isolation level should in general be selected based on the nature of applications, or could be set specifically for certain operations (such as reporting or logging, where isolation is sometimes less important). For this reason, the isolation level can be configured globally (by setting the environment variable LB_DEFAULT_ISOLATION_LEVEL) and can be set per transaction (via the method TransactionParameters::isolationLevel).

Most of the isolation levels are available in LogicBlox mostly for consistency and completeness. We do not expect that developers will actively start configuring the isolation level.

18.4. Log shipping

When using snapshot isolation, it is possible to maintain one or more replicated workspaces via "log shipping", which involves copying the log files from a source workspace to one or more destination workspaces. This can be useful in a couple of scenarios:

  • Scaling in situations where an application needs to provide a large number of users with read access to the workspace.

  • High availability, where it is possible to fail over to a different instance if something happens to the original instance.

18.4.1. Configuration

For a given workspace to be the source in a log shipping configuration, it is necessary to create a folder called "outgoingLog" in the workspace. The presence of such a folder is detected by the log writer, which will then move processed log files into this folder. The files are moved into this folder atomically, using the internal commit timestamp as part of the name, so that the files will be ordered lexicographically according to commit time.

Each destination workspace should have a folder called "incomingLog", where copies of the log files should be placed. It is important that the log files be copied into this folder atomically, and in commit order. Whenever the log writer is running, it will detect the presence of such a folder, and apply all the log files to the workspace. Since the log writer is not a continously running daemon, it is necessary to pulse it if there is not an active process that has the workspace open. This can be achieved by simply calling bloxbatch to open the workspace.

There is a sample script called propdeltas.sh in the LogicBlox install directory, which handles the copying of log files from a source workspace to a single destination workspace. It relies on being able to use the "cp" and "mv" commands, and so requires the workspaces to have a shared storage configuration (such as with EC2). This script can be modified to use scp and ssh for remote configurations, and for multiple destination configurations.

Note that there is no built in protection against processing write transactions in the destination workspace. A workspace that is the destination of a log shipping configuration should not be used for such transactions, since the resulting state will be inconsistent. Protecting against such use is thus a deployment task.

Part II. Tools

Chapter 19. Testing

19.1. Basic BloxUnit

Recently we have extended BloxUnit to support a new kind of testsuite, entirely based on the scripting language many of you already know from bloxbatch -interactive. It turns out that many testing tasks require more precise control over transactions, workspace creation, data initialization, etc. This is what the new suite supports.

A suite is a set of .lb files, which are scripts in the same language as used by bloxbatch -interactive. A suite can (optionally) have two special scripts: setUp.lb and tearDown.lb. There are no requirements on the filenames of the other .lb files. Every .lb file is executed wrapped in setUp.lb and tearDown.lb. For example, a suite consisting of the files:

setUp.lb
tearDown.lb
simple.lb
advanced.lb
original-bug-report.lb

will be executed by BloxUnit as:

setUp.lb - simple.lb - tearDown.lb
setUp.lb - advanced.lb - tearDown.lb
setUp.lb - original-bug-report.lb - tearDown.lb

That is, if you concatenate the three files, and run them with bloxbatch -interactive, you will exactly run the code that bloxunit will run. You will probably recognize this idea from other unit-testing frameworks (junit etc). BloxUnit does not do anything else but execute these scripts and observe if they failed or not. In particular, BloxUnit does not create a workspace, does not create a transaction, and does not do any automatic initialization.

Typically, setUp.lb would create a workspace using:

create --unique

Typically, tearDown.lb would destroy the workspace using:

close --destroy

This means that the tests are execute with an open, but empty workspace. If all tests exercise a certain installed program, you could decide to install the logic in setUp.lb:

create --unique

transaction
addBlock -B main <doc>
      a(x) -> .
</doc>
commit

Or, of course you can put the logic in a file, similar to the current suite.program:

transaction
addBlock --file schema.logic
commit

Of course, you can also import data, execute logic, etc.

In the tests, assertions happen through normal Datalog constraints or the 'comparePredicates' command. For example, you could use two constraints to do the comparison:

foo:expected(x, y) -> foo(x, y).
foo(x, y) -> foo:expected(x, y).

19.1.1. Notes

  • tearDown is always invoked, even if the test failed. In this way, workspaces can be removed.
  • It's an error for a test (in the combination of setUp, test, tearDown) to not terminate the current transaction or workspace. However, we 'fix' the session if this does happen, so that other tests in the same session are no messed up.
  • Between setUp, test, and tearDown it is in principle okay to keep a transaction open. However, if the test failed, then we terminate the transaction before invoking tearDown. This might make tearDown fail, but since we already failed anyway, that's not a big deal.
  • Many bloxunit command-line parameters are not applicable. We will try to organize the parameters one way or another to make this clear. Currently, the main parameters that do apply are ones that relate to enabling/disabling specific suites and tests. For most parameters it's obvious from the nature of the new suites that they do not apply.
  • For examples, go the BloxUnit directory and search for .lb files

19.1.2. Testing for Failure

We've implemented a simple method for writing tests that are supposed to trigger exceptions. It will probably be replaced at some point with a fancier mechanism, but it does the job for now.

The idea is to set an environment variable, bloxunit:expectException. BloxUnit will simply confirm that the suite throws an exception, and it will check if a thrown exception is of the right type.

It's useful to set the expected exception right before the command that should cause the exception, not on the first line of the test. There is currently no mechanism to avoid the false success of a test by a *later* command that happens to throw the same exception. A simple trick would be to just not include any further commands (since these are not supposed to execute anyway). BloxUnit will clean-up any open transactions for you.

Example:

transaction
addBlock <doc>
  f[] = x -> int[32](x).
  lang:defaultValue[`f] = 0.
</doc>
commit

transaction
exec <doc>
  ^f[] = 2.
</doc>

set bloxunit:expectException LogicException
exec <doc>
  ^f[] = 3.
</doc>

19.1.3. Continuing Testing after Failure

We've implemented a feature in BloxCommandLib that facilitates testing for failure. We already had the bloxunit specific feature:

$ set bloxunit:expectException PredicateException

But, the problem is that you sometimes want to continue testing after a failure. For this, you need the shell to continue executing commands. This feature actually already existed (raiseInLoop), but could not be changed through commands. So, we now have something similar to 'set -e' in the shell (except that we already have a 'set' command, so I called it 'option').

For example, to test a commit that needs to fail:

option --raise false
commit
option --raise true
abort

Notice that for now it's up to you to in one way or another confirm that there was actually a failure in the commit (although the abort implicitly guarantees that in this case).

Chapter 20. lb-config

20.1. Getting Started

lb-config is a tool to configure a Makefile for building, testing, and installing LogiQL projects and libraries. Once the developer has created a config.py file, declaring the projects, libraries, dependencies, workspaces, and tests for a project, running lb-config will generate a new Makefile.

In most cases, the following commands will compile the project, run tests, and install any files when applicable:

make              ## compile
make check        ## run tests
make install      ## install
make dist         ## generates source distribution

20.1.1. Installing lb-config

Tip

From LogicBlox 3.10.3 onwards lb-config is part of the LogicBlox distribution, these installation steps are therefore not necessary anymore.

20.1.1.1. Install from Binary

  1. Download binary relase of lb-config.

  2. Extract the archive to a folder. We'll refer to this folder to which you extracted lb-config as $(lb-config) elsewhere in the documentation.

  3. Add $(lb-config)/bin to PATH environment variable.

20.1.1.2. Install from Source

  1. Clone lb-config repository from http://bitbucket.org/logicblox/buildlib into $(lb-config-src) directory.

  2. Enter $(lb-config-src) directory and execute ./buildlb --prefix=$(lb-config) && make install

  3. Add $(lb-config)/bin to PATH environment variable.

20.1.2. Creating config.py

The config.py script is fundamentally a Python script, but its code is intended to be of the declarative style, declaring the libraries, workspaces, and test targets.

The following is an example of a config.py script that declares a LogicBlox library and a workspace in which that library will be installed in:

Example 20.1. 

from lbconfig.api import *

lbconfig_package(
    ‘my-app-name',
    default_prefix='/opt/logicblox/my-app',
    default_targets=[‘foo', ‘bar'])

depends_on(
    logicblox_dep,
    jacoco='/path/to/jacoco',
    guava={‘default_path': ‘/path/to/guava', ‘help'='Description for this parameter'})

lb_library(
  name='my-lb-project',
  srcdir='datalog/my-lb-project',
  srcgen=[‘my_protobuf_message_proto.logic'],
  deps=[‘bloxweb']
)

check_lb_workspace(
  name='my-workspace’,
  libraries=[‘my-lb-project’]
)

The first thing to note is that all public functions are imported from lb-config. Next the package metadata, its dependencies, the LB library and the workspace are declared. Since this is done in a declarative way, the order in which these are defined does not change the outcome of the build.

In the following section, each of the above functions are described. For a detailed overview of on all of the supported functions, please refer to the lbconfig API documentation.

20.1.2.1. lbconfig_package

The lbconfig_package function is used to declare some of the package’s metadata, such as its name and default installation directory. It is also used to set the default targets.

By default, the install location is the local out directory. This can be changed, using the default_location parameter in lbconfig_package.

To set an install location when configuring the build, use the --prefix command line argument:

lb-config --prefix /install/here

We will further refer to the installation prefix as $(prefix).

20.1.2.2. depends_on

Tasks are often dependent on other resources. For instance, an LB library might depend on other LB libraries, Java files on external Java files or JARs. In the case of the previous example, the LB library ‘my-lb-project’ depends on bloxweb.

To declare dependencies in lb-config, the depends_on function can be used as shown above. lbconfig.api has two predefined dependencies that can be used: logicblox_dep and bloxweb_dep. Other dependencies can be added by listing them as named parameters, as the example shows.

For each declared dependency, lb-config will create a variable in the Makefile. As a result, we can refer to the guava path using $(guava). Second, this will create command line arguments to lb-config so that the path for each dependency can be changed. For the example, lb-config will accept a --with-guava parameter. The example below show how to configure the build for a different bloxweb path:

lb-config --with-guava=/some/other/path/to/guava

Finally, declaring dependencies will check that each of the paths exist. If they do not, the build does not proceed.

20.1.2.3. lb_library

The lb_library function is used to declare a LogiQL project that should be compiled and installed into $(prefix). In the example above, a project named my-lb-project is compiled in the datalog/my-lb-project directory. The name of the library is be the name of the project file, without the .project extension.

The srcgen parameter specifies the source files that have been generated, such as protobuf, LogiQL, Java, or Python files. Since these file are generated from other files, they need not and will not be included into source distributions.

By default, lb_library will install the compiled project to $(prefix) on a call to make install. For a library that does not need to be installed, such as libraries for testing purposes, one should use the check_lb_library function instead, which takes similar parameters.

20.1.2.4. check_lb_workspace

The check_lb_workspace function is used to declare a workspace that should be built on a call to make check. In the example above, the workspace is named ‘my-workspace’ and the ‘my-lb-project’ library is installed into the workspace.

lb-config creates the following make targets for dealing with workspaces: 1. check-lb-workspaces: will create all workspaces declared in config.py; 2. check-ws-foobar: will create the workspace declared with the name ‘foobar’

20.1.3. Extending lb-config

Extentions to the current lb-config functions or new functions should be defined in the lbconfig_local.py.

lbconfig_local.py:

def my_special_rule(foo, bar):
    …

config.py:

from lbconfig.api import *
from lbconfig_local import *


my_special_rule(‘foo’, ‘bar’)

Chapter 21. lb-base-project

lb-base-project is being distributed together with the LogicBlox platfrom from release 3.10.5 onwards to help developers with starting a new project. It is structured according to the LogicBlox reference architecture, contains sample configurations for building and deploying the project, as well as examples of common services and their interactions with a Javascript UI. Projects that follow the structure of lb-base-project are automatically compatible with builds.logicblox.com, the continuous build application for LogicBlox projects.

21.1. Building and Testing

You can find this starter project in the base-project folder of the LogicBlox distribution.

In order to use this project, lb-services need to be started:

$ lb-services start

To build the application simply run:

$ lb-config
$ make

To run the tests, you simply run:

$ make check

To run the application and access it via the web, run:

$ ./run

Or to start nginx without rebuilding the workspace:

$ ./run --keep

You can access the demo application in your browser from http://localhost:8000. Log in with user user1 and password password.

21.2. Project Structure

The project contains the following directories:

  • conf: this directory contains nginx configuration files for local testing (with ./run)
  • doc: documentation for the project
  • src: contains the LogiQL source code, one module per directory. Directories that end with _services are modules that expose either custom LogiQL services or delimited file services. Each module directory contains a .project file with the same name, and a directory with the same name containing the actual .logic and .proto files.
  • test: contains unit tests and test-related files.
  • www: contains static web resources that form the front-end of the application, typically consisting of HTML, CSS, images and JavaScript files.

The root directory contains the following files:

  • README.md: describes the project and how to develop and run it.
  • config.py: contains the lb-config specification for building the project. You can find more information on lb-confighere.
  • run: is a script that is used for running the application locally. It will build the project, deploy it to the run_workspace workspace, start bloxweb services and then run nginx serving the web UI on http://localhost:8000. The run script can also be run with --keep for it to not rebuild the workspace.

21.3. Demo

This project also contains a small demo application with examples of common services and their interactions with a Javascript UI. The demo application contains the following:

  • Sample hierarchies (sources in src/hierarchy/hierarchy/*.logic):
    • calendar.logic
    • location.logic
    • product.logic
  • Delimited file services that expose the above hierarchy under /hierarchy/* (sources in src/hierarchy_services/hierarchy_services/*.logic):
    • calendar.logic
    • location.logic
    • product.logic
  • Some example sales measures (sources in src/sales/sales/sales.logic)
  • Delimited file services that expose the above sales measures under /sales/* (sources in src/sales_services/sales_services/sales.logic).
  • A sample bloxweb protobuf JSON service that exposes location data (sources in src/json_services):
    • protocols/location.proto: defines the Request/Response protobuf message
    • json_services/location_json.logic: implements the location services
    • json_services/service_config.logic: exposes location service to /json/location as well as in an authenticated way under /json/location_authenticated (for demo purposes only).
  • A sample batch script (in test/test.batch) that is executed by the ./run script to import the sample data into the workspace.
  • A minimal web UI with authentication (username: user1, password: password) that calls the /json/location_authenticated and measure service on load. Note that the user is automatically redirected to the login page only when a web service is called that requires authentication. To disable having to login, simply disable calls to authenticated services.

Chapter 22. Hierarchical Import/Export

Many applications store hierarchical data in the workspace. For instance, the following schema describes hierarchical information about a person:

block (`addrbook) {
  export ( `{
    Person(x), PersonId(x:id) -> uint[32](id).
    Person_age[x]=y -> Person(x), uint[32](y).
    Person_address[x]=y -> Person(x), addrbook:Address(y).

    Address(x), AddressId(x:id) -> uint[32](id).
    Address_city[x]=y -> Address(x), string(y).
    Address_state[x]=y -> Address(x), string(y).
 } ), ...

Ad hoc techniques for importing or exporting such data from the workspace can be complex and non-performant. For import, inherently hierarchical data needs to be flattened and imported into the workspace; for export, flat data extracted from the workspace needs to have its hierarchical structure reconstructed. In addition, import via delta rules or export via queries have negative performance implications.

Hierarchical import/export is designed to address this problem. Hierarchical import/export allows you to give a hierarchical description of your data as a Google Protocol Buffer Message. You can then write rules that pull data from the message into your working schema (for import), or derive data into the message from your working schema (for export).

22.1. Using Hierarchical Import/Export

Requirement: In order to use hierarchical import/export, the workspace must be created with the block protocol. The following command illustrates how to create such an workspace:

bloxbatch -db /tmp/db -create -blocks protocol

There are four steps in using hierarchical import/export:

  1. Provide a specification of your data as a Google protocol buffer message.
  2. Add a directive to your project file that will generate logic to represent your newly declared messages types and ensure the runtime is aware of these messages types.
  3. Write rules that derive data from your message schema to your working schema, or vice versa.
  4. Using either bloxbatch commands or LDBC methods to begin the import/export of your data.

The second step uses new features first available on platform version 3.10. Users of older platforms should see the section Hierarchical Import/Export in Logicblox 3.9.

We use the above Person schema as an example to illustrate how one can import/export from that schema.

22.1.1. Defining A Protocol Buffer Message Specification

This section demonstrates how to build a protobuf schema for representing information about a person, including her name and (some information about) her address. The following protocol buffer message specification describes such information:

package addrbook;

message Person {
  required uint32 age=1;
  required Address address=2;
}

message Address {
  required string city=1;
  required string state=2;
}

Google Protocol Buffer supports a number of native types that can be used directly to represent the data types supported by our workspace: e.g. uint32 translates directly to a uint[32] in our workspace.

22.1.2. Importing the Protocol Message Specification

Adding the following line to your project file generates the definitions for Person and Address shown above and associates the descriptor with a name, myproto, that the runtime uses to identify this family of message types.

myProject, project
person.proto, proto, descName=myproto

Several options can be given in the third column of the proto directive. These are described as follows.

  • descName=name
  • lifetime=transaction|database: Optional. Describes whether logical representation of the protobuf messages should have transaction or database lifetime. Default is transaction.
  • storageModel=model: Optional. Storage model for generated enities, e.g. ScalableSparse.
  • protoPath=path: Optional. Search path for message types included in .proto files via import statements.
  • namespace={old1->new1,old2->new2,...}: Optional. A map rewriting top-level namespaces for generated logic.
  • legacyLogic=true|false: Optional, default false. When true, specifies that logic should be generated as flat files instead of modules. For forward compability predicates names are identical whether or not legacyLogic is set. This is most useful in the case that recurive protobuf declarations would lead to illegal recursive modules.
  • dropPackages=p,q,r,...: Optional, default google.protobuf,blox.options,blox.internal. Specifies that logic should not be generated for given protobuf packages. This can be useful when including third-party protobuf packages containing types that are not valid in LoqiQL, or when a package is included twice via different proto project directives.

If two of more .proto files will create logic in the same namespace it's necessary to import them together by listing them in the left column of a single proto directive. For example, suppose we refactored the message declarations above into two .proto files. The following directive will import messages from both and also rename the top-level Datalog package used for generated logic.

myProject, project
person_only.proto addr_only.proto, proto, descName=myproto namespace={addrbook->foo}

The resulting predicate declarations are as follows:

block (`foo) {
  export ( `{
    Address(x), AddressId(x:id) -> uint[32](id).
    Address_city[x]=y -> Address(x), string(y).
    Address_state[x]=y -> Address(x), string(y).

    Person(x), PersonId(x:id) -> uint[32](id).
    Person_age[x]=y -> Person(x), uint[32](y).
    Person_address[x]=y -> Person(x), foo:Address(y).
 } ), ...

22.1.3. Exchanging Data Between a Message and a Workspace

You are responsible for writing rules that populate the message schema with the data from your workspace. This is written using regular Datalog logic. Below is an example of how to derive addrbook:Person and addrbook:Address entities for export, from corresponding person and address entities declared in a workspace.

begin_export() -> . lang:pulse(`begin_export).

addrbook:Person(p_out),
addrbook:Person_age[p_out]=age,
addrbook:Address(a_out),
addrbook:Person_address[p_out]=a_out,
extract_address(a_in, a_out)
  <- addrbook:person(p_in), addrbook:person_age[p_in]=age,
     addrbook:person_address[_]=a_in, begin_export().

 extract_address(a_in, a_out) ->
      addrbook:Address(a_in), addrbook:Address(a_out).
lang:pulse(`extract_address).

addrbook:address_city[a_out]=city,
addrbook:address_state[a_out]=state
   <- extract_address(a_in, a_out),
      addrbook:address:city[a_in]=city,
      addrbook:Address_state[a_in]=state.

The above rules are written with the assumption that it would be a pre-compiled block, called when necessary to export a message. Thus, it includes a pulse predicate begin_export. This predicate can be used to control when the rules generating message data should be evaluated.

Similar rules can be written to take data from the message predicates to your workspace.

22.1.4. Exporting/Importing A Message

To be able to read an exported message, or to construct a message to import, you would need to use code generated by protoc: a tool distributed by the Google Protocol Buffer tool. protoc can generate messaging APIs to be used with a number of different languages, such is C++, Python, Java, etc. To read more about the use of such APIs, please consult the protocol buffer manual.

22.1.4.1. Export

Exporting data from a workspace requires that you first evaluate the data exchange rules that compute data from your workspace to the message schema. Whether this is done by invoking a pre-compiled block or via one-off executions is programmer choice. Assuming that data is available in the message schema, the following demonstrates how to export that data into a message:

  • In bloxbatch -interactive, the protoExport command takes the following parameters:
    • --proto <descriptor-name>: <descriptor-name> indicates the registered name of the descriptor.
    • --msgType <msg-type>: In the same descriptor file, there may be multiple messages (e.g., addrbook:Person, addrbook:Address). <msg-type> indicates which message should be the root message being exported.
    • --block <block-name>: The block of the message schema.
    • --file <exported-data>: In bloxbatch -interactive, a message can only be exported into a file. This parameter indicates the file name storing the exported message.
    protoExport --proto myproto --msgType addrbook:Person --file persons.exported
       --block addrbook
    
  • In C++ or Python, the protoExport command must be invoked on a block, and returns a string. The first parameter to protoExport is the type of message being exported, and the second the descriptor name as registered:
    msg = ws.getBlock("addrbook").protoExport("addrbook:Person", "myproto")
    

22.1.4.2. Import

Importing data into a workspace works similarly as export. A protocol buffer message must be constructed first, using the protoc generated code from your message specification.

  • In bloxbatch -interactive: message can only be imported from a file. The command protoImport takes similar parameters to protoExport. The following example shows how to import message stored in test.exported, where the message type is addrbook:Person, with the registered descriptor myproto and block addrbook:
    protoImport --file test.exported --msgType addrbook:Person
      --proto myproto --block addrbook
    
  • Using C++ or Python, message can be constructed using the protoc generated API, and stored in a string object. It can then be imported into the workspace as follows:
    ws.getBlock("persons").protoImport("addrbool:Person", "myproto")
    

22.2.  Hierarchical Import/Export in Logicblox 3.9

Hierarchical import/export is similar in earlier platform versions (through 3.9), except that the user must manually create and manage protobuf descriptor files. There are five steps for importing or exporting data, some of which are the same as in 3.10, and some of which are different.

  1. Provide a specification of your data as a Google protocol buffer message. (As above.)
  2. Invoke the provided tool, $LOGICBLOX_HOME/bin/proto2datalog.sh, on your message specification. This will generate a legacy logic schema definition for your message and a binary protobuf descriptor file. (Different step.)
  3. Write rules that derive data from your message schema to your working schema, or vice versa. (As above.)
  4. Install the message schema, register the descriptor. (Different step.)
  5. Using either bloxbatch commands or LDBC methods to begin the import/export of your data. (As above.)

The different steps are described below.

22.2.1. Processing the Protocol Buffer Message with proto2datalog.sh

A compiler tool, proto2datalog.sh, generates a Datalog schema definition representing the message specification, as well as a descriptor of the message. The following command turns the message specification, stored in my_person.proto, into a descriptor, my_proto.descriptor, as well as the schema declaration, my_person.logic:

proto2datalog.sh my_person.proto my_proto.descriptor -file my_person.logic

The tool generates the following schema definition for the protocol buffer message specification, and stores it in my_person.logic:

my_person(x) -> .
my_person:age[x]=y -> my_person(x), uint[32](y).
my_person:my_address[x]=y -> my_person(x), my_address(y).
my_address(x) -> .
my_address:city[x]=y -> my_address(x), string(y).
my_address:state[x]=y -> my_address(x), string(y).

Note that the naming scheme used here is different from above, reflecting the use encoding of protobuf messages a legacy logic.

proto2datalog.sh optionally takes the following parameters:

  • -lifetime transaction: this option would cause all predicates in the message schema to be pulse predicates: data is only available in these predicates within the same transaction they are populated.
  • -storageModel model: this option specifies that all entities in the message schema have the storage model specified in the option, e.g., ScalableSparse.
  • -namespace ns: this option specifies that all predicates in the message schema should be prefixed by the namespace given in this option. By default, if the protobuf message contains a "package" declaration, the name of the package is used as the prefix (with '.'s replaced by ':'s).

For example, the following call to the tool with these parameters generates the schema definition shown below:

proto2datalog.sh my_person.proto my_proto.descriptor -file my_person.logic
   -lifetime transaction -storageModel ScalableSparse -namespace foo

lang:block:predicateLifetime[]="TransactionLifetime".
foo:my_person(x) -> .
lang:physical:storageModel[`foo:my_person]="ScalableSparse".
foo:my_person:age[x]=y -> foo:my_person(x), uint[32](y).
foo:my_person:my_address[x]=y -> foo:my_person(x), foo:my_address(y).

foo:my_address(x) -> .
lang:physical:storageModel[`foo:my_address]="ScalableSparse".
foo:my_address:city[x]=y -> foo:my_address(x), string(y).
foo:my_address:state[x]=y -> foo:my_address(x), string(y).

22.2.2. Install Message Schema and Descriptor

Before exporting/importing a message, the generated message schema and the import/export rules must be installed in the workspace. Next, you must registor the message descriptor with the workspace. The following examples illustrate how to register a descriptor file, <descriptor>, with the workspace using the name MyPerson:

  • In bloxbatch -interactive
    protoAddSpec --file <descriptor> --name MyPerson
    
  • In C++ and Python, assuming ws is a WorkSpace object:
    // read descriptor file into String descriptor
    ws.addMessageProtocol("MyPerson", descriptor);
    

22.2.3. Migrating Logic Referencing Generated Predicates

As shown above, hierarchical import/export on platform 3.10 generates predicate names according to the scheme, package:MessageType_field. This naming convention is consistent with treating protobuf packages as modules, and message types and fields as predicates in such modules. Older versions of the platform tools (before 3.10) generated legacy logic and followed the naming convention package:MessageType:field.

The lb-migrate tools can aid updgrades from legacy- to module-logic protobuf schemas. Invoking

lb-migrate -p schema.proto foo.logic

updates predicates in foo.logic and

lb-migrate -p schema.proto dir

updates all .logic files in dir and its subdirectories. See lb-migrate -h for more more usage information.

Addtionally script $LOGICBLOX_HOME/bin/migrateBR conveniently updates references to message predicates declared in BlockResources. Here, running

cd $PROJECT
migrateBR

will update BlockResouces predicates in $PROJECT.

22.3.  Set semantics for repeated fields

In LogicBlox 3.10 protobuf repeated fields may be annotated to indicate that they should be represented as unordered sets instead of indexed predicates. This eliminates the need to generate or track indices. For example, protobuf declaration

    repeated string foo = 1 [(blox.options.set) = true];

is represented in logic by

     A_foo(x, y) -> A(x), string(y).

Chapter 23. BloxWeb

23.1. Introduction

BloxWeb is a framework for developing services hosted by LogicBlox workspaces. BloxWeb services offer business functionality to user-interfaces, integration scripts, and 3rd party applications. Typically, services provide access to the data stored in LogicBlox workspaces, for example to provide data to charts and tables, or for forms to create and modify entities. This chapter documents the implementation and configuration of such services.

BloxWeb is an extensible framework that comes with a few different types of services that should meet the needs of most applications:

  • Protocol buffer (protobuf) services are HTTP services that are invoked using an HTTP POST request. The request contains a binary protobuf or textual JSON message. The service returns the protobuf or JSON result of the invocation as an HTTP response message. This type of service is similar to other service frameworks that resemble remote procedure calls, such as JSON-based services used in AJAX applications, SOAP and XML-RPC. In BloxWeb, the schema of the request and responses messages are precisely specified by a protobuf protocol. Optionally, messages can be encoded as a JSON string, to support access from web browsers. The services can be accessed by any HTTP client, including browsers, LogicBlox workspaces, or really any application that understands the HTTP protocol and is able to encode and decode protocol buffers or JSON messages. Protobuf services are typically used for online communication between different components of a system developed with LogicBlox.

  • Delimited file services are HTTP services that can be accessed by GET, POST as well as PUT requests. The service uses delimited files as the data format. Data is retrieved from the database using GET requests. Using POST requests, data in the database can be updated, and using PUT requests data can be replaced. Delimited file services are typically used for integration purposes, for example to import sales, or export forecast data. Delimited file services are based on a REST architecture.

  • Global protobuf services are protobuf services that are implemented by distributing incoming requests to a series of other services (typically partitions of a distributed system). The responses from the individual services are merged into a single response of the global service. Global services are useful to implement functionality that does not conform to the partitioning strategy used, for example to search globally for products meeting certain criteria.

  • Proxy services act as a simple proxy for a service hosted on a different machine. Proxy services can be used to require authentication on top of existing unauthenticated services, or can be used to provide access to a distributed service-oriented system on a single host.

  • Custom services are supported as plugins to the BloxWeb service container. Custom services are written in Java and can virtually do anything. They are not restricted to using LogicBlox databases. The implementation of custom services is facilitated by some abstract implementations of certain service patterns, such as protobuf services.

BloxWeb is based on the HTTP protocol, but the HTTP protocol is in some cases only used as an encoding of messages exchanged between a client and the server. Also, as opposed to normal HTTP usage, BloxWeb supports separate transportation of the content of a request, for example by referring to an AWS S3 object. BloxWeb supports the following transport methods:

  • The BloxWeb server can be used as a normal HTTP server, where a client opens TCP socket connections to the server and then sends the content of the request over the socket (e.g. a protobuf message or a delimited file).

  • To facilitate importing and exporting huge files, the BloxWeb server implements a small extension of the HTTP protocol to allow the body of a request to reside in S3, the storage service of AWS. The service container will transparently download the file from S3 and provide the content to the service just as if it was part of the original request. No application work is needed to enable S3 support on specific services.

  • The BloxWeb server can also be configured to retrieve HTTP requests from a message queue. It currently supports SQS, the queue service from Amazon, and RabbitMQ. This method of invoking a service is useful to decouple systems and to support very long running requests (e.g. data loads). The response is sent to the client over another queue, along with some information to correlate it to the original request.

BloxWeb supports different authentication methods, where the different methods have benefits when used from the relatively hostile environment of a browser versus non-browser applications running in the controlled environment of a machine.

23.2. Installing and Running BloxWeb

The binary package for BloxWeb is relocatable and can simply be uncompressed in some location. At LogicBlox we standardize on /opt/logicblox/bloxweb. This is also the default installation location when building from source.

$ sudo mkdir -p /opt/logicblox
$ sudo chown -R $USER /opt/logicblox
$ sudo chgrp -R $USER /opt/logicblox
$ cd /opt/logicblox
$ rm -rf bloxweb
$ tar zxvf ~/Downloads/bloxweb-*.tgz
$ mv bloxweb-* bloxweb

The main executable of BloxWeb is called bloxweb. To be able to execute bloxweb from any location, add the bin directory to the PATH environment variable:

$ export PATH=/opt/logicblox/bloxweb/bin:$PATH

The BloxWeb server can be started manually with the command start. The BloxWeb server requires that lb-services have been started already.

$ lb-services start
$ bloxweb start

Once the server is started the bloxweb tool allows starting and stopping services and doing further configuration actions without having to stop the BloxWeb server. The --help option lists all the available commands and options. Commonly used commands are:

$ bloxweb start-services
$ bloxweb list-services
$ bloxweb loglevel debug@transaction
$ bloxweb log messages
$ bloxweb log no-messages

BloxWeb is started automatically by lb-services if the BLOXWEB_HOME environment variable is set to the installation prefix of BloxWeb (e.g. /opt/logicblox/bloxweb. If you need to set this environment variable, but do not want lb-services to start bloxweb, then the server can be disabled by setting the configuration setting enabled = false in the $LB_DEPLOYMENT_HOME/config/bloxweb.config configuration file. If BloxWeb is started by lb-services, then the log file can be found at $LB_DEPLOYMENT_HOME/logs/current/bloxweb.log.

23.3. Implementing ProtoBuf/JSON Services

This section explains how to implement protocol buffer and JSON services in BloxWeb. We illustrate this using a simple service that given the name of a timezone returns the current time in that timezone.

The first step of the implementation of a service is the definition of the protocol used between the client and the service. This protocol serves as documentation of the service, but can also be used to generate source code artifacts used by the client and the service. The protocol is specified as a protobuf schema (see the protobuf language guide for a detailed reference on protocol specifications).

For the time service, this protocol is:

package time;

message Request
{
  required string timezone = 1;
}

message Response
{
  optional string answer = 1;
  optional string error = 2;
}   

In JSON syntax, a request for the time in UTC is {"timezone" : "UTC"}. At the time of writing, the answer in JSON syntax would have been {"answer": "2012/11/13 00:19 +00:00"}.

Next, we need to write the Datalog rules for the actual implementation of the service. To use protobuf messages in Datalog, a Datalog schema needs to be generated from the protocol. Usually this is taken care of by the build system of a project (see the Chapter on hierarchical import/export). Ignoring some compiler directives, the generated Datalog schema for the time protocol is:

time:Request(x), time:RequestId(x:id) -> uint[32](id).
time:Request:timezone[x] = y -> time:Request(x), string(y).

time:Response(x), time:ResponseId(x:id) -> uint[32](id).
time:Response:answer[x] = y -> time:Response(x), string(y).
time:Response:error[x] = y -> time:Response(x), string(y).

When the BloxWeb service container receives a HTTP request for a service, the server imports the protobuf message contained in the body of the HTTP request into the workspace that hosts the service. This request message is typically a pulse entity, which means that it does not persist in the workspace after the transaction. The import of the example request for the current time in UTC is equivalent to executing the following logic:

+time:Request(_) {
  +time:Request:timezone[] = "UTC"
}.

The service implementation consists of delta rules that trigger when a request entity element is created. To respond to the request, the delta rules create a message in the response protocol, which is then exported from the workspace by the BloxWeb service container. This all happens in a single transaction. The server returns the response to the client in the body of a HTTP message. For the UTC example, the delta logic to create the literal response message would be:

+time:Response(_) {
  +time:Response:answer[] = "2012/11/13 00:19 +00:00"
}.

Of course, the actual implementation needs to trigger from the actual request, and also consider the current time. One common complication in the implementation of the service is that the logic needs to make sure to always return a response. To guarantee this, it is useful to introduce separate, intermediate predicates for the result of the request. In the following example, we introduced an answer predicate for this purpose. The first rule computes the answer for the given timezone request. The second rule populates a successful response, while the third rule generates an error message if no answer could be computed.

block(`time) {

  clauses(`{

    answer[req] = s -> time:Request(req), string(s).
    lang:pulse(`answer).

    // determine the answer for the requested timezone
    +answer[req] = s
      <-
      +time:Request:timezone[req] = tz,
      datetime:now[] = dt,
      datetime:formatTZ[dt, "%Y/%m/%d %H:%M %Q", tz] = s.

    // use constructor for creating a response message
    lang:constructor(`cons).
    lang:pulse(`cons).
    cons[req] = resp -> time:Request(req), time:Response(resp).

    // create the response message from the answer
    +cons[req] = resp,
    +time:Response(resp),
    +time:Response:answer[resp] = s
      <-
      +answer[req] = s.

    // create the error response message if there is no answer
    +cons[req] = resp,
    +time:Response(resp),
    +time:Response:error[resp] = "not a valid timezone: " + tz
      <-
      +time:Request:timezone[req] = tz,
      !+answer[req] = _.

  })

} <-- . 

BloxWeb finds services to host by scanning workspaces for service configurations. A workspace can host an arbitrary number of services, each of which is defined by a service entity. For the timezone service, the configuration uses the subtype of service for protobuf services, called default_protobuf_service.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:protobuf),
  alias_all(`bloxweb:config:protobuf_abbr),

  clauses(`{

    service_by_prefix["/time"] = x,
    default_protobuf_service(x) {
      protobuf_protocol[] = "time",
      protobuf_request_message[] = "Request",
      protobuf_response_message[] = "Response"
    }.

  })

} <-- . 

23.3.1. Writing Automated Tests using Python

BloxWeb ProtoBuf/JSON services are standard HTTP services, so in principle any HTTP service testing tool can be used. BloxWeb comes with a small Python library of convenient abstractions to invoke services, and we recommend writing automated tests using this library.

  • The bloxweb.admin.Client class allows access to the admin services of BloxWeb. This can be convenient to isolate services from testsuites.

  • The bloxweb.service.Client class allows a ProtoBuf request to be built and sent to the service. It dynamically builds the required Python classes from the descriptor that it fetches from the BloxWeb admin services.

The Python client sends and receives binary protobufs by default. It can be used to test services with BINARY or AUTO encoding. JSON is supported at a lower-level.

A simple Python testsuite needs the following imports:

#! /usr/bin/env python

import sys
import os
import unittest

sys.path.insert(0, '%s/lib/python' % os.environ.get('LOGICBLOX_HOME'))
sys.path.insert(0, '%s/lib/python' % os.environ.get('BLOXWEB_HOME'))

import bloxweb.testcase
import bloxweb.service
import bloxweb.admin

There are two main testcase class: bloxweb.testcase.PrototypeWorkspaceTestCase and bloxweb.testcase.TestCase. We generally recommend using the prototype workspace testcase, because it prevents interference between different tests. For truly stateless services, the simple TestCase class can be used and will be significantly faster.

A simple testsuite for the time service:

class TestTimeService(bloxweb.testcase.PrototypeWorkspaceTestCase):

    prototype = "/workspace-name"

    def setUp(self):
        super(TestTimeService, self).setUp()
        self.client = bloxweb.service.Client("localhost", 8080, "/time")

    def test_utc(self):
        req = self.client.dynamic_request()
        req.timezone = "UTC"
        response = self.client.dynamic_call(req)
        self.assertHasField(response, "answer") 

The bloxweb.service.Client class also provides support for testing authenticated services. Notice that the cookie jar needs to be manually assigned to the service that requires authentication,

import bloxweb.credentials

class AuthenticatedTestTimeService(bloxweb.testcase.PrototypeWorkspaceTestCase):

    prototype = "/workspace-name"

    def setUp(self):
        super(AuthenticatedTestTimeService, self).setUp()
        self.client = bloxweb.service.Client("localhost", 8080, "/atime")
        self.login_client = bloxweb.service.Client("localhost", 8080, "/login")
        self.client.jar = self.login_client.jar

    def test_login_works(self):
        credentials_client = bloxweb.credentials.Client()
        credentials_client.set_password("user", "password")
        self.login_client.login("user", "password", "time_auth")

        req = self.client.dynamic_request()
        req.timezone = "EST"
        response = self.client.dynamic_call(req)
        self.assertHasField(response, "answer") 

23.4. Service Configuration Reference

Service types
delim_service bloxweb:config:delim Delimited file service hosted by the workspace that contains the configuration of the service.
default_protobuf_service bloxweb:config:protobuf Default protobuf services hosted by the workspace that contains the configuration of the service.
global_protobuf_service bloxweb:config:global_protobuf Global protobuf service.
exact_proxy bloxweb:config:proxy Exact proxy service.
transparent_proxy bloxweb:config:proxy Transparent proxy service.

Configuration options applicable to all services
service_prefix string Required Path of the URL where the BloxWeb service container will make a service available
auth_realm string Optional Name of the realm used for authenticating users of this service (see the authentication section for further details)
custom_handler string Optional Name of a custom handler for this service. The handler needs to be configured in the configuration file bloxweb.config as a section [handler:name].
disabled_status uint[32] Optional Set the BloxWeb service to be disabled. The value defines the status code that will be returned to clients that try to access the disabled service.
service_parameter string, string Optional Key/value pairs that are passed as parameters to service handlers.
group string Optional The group to which the service belongs. The default is the unnamed group.

Workspace services are delimited-file services and protobuf services.

Configuration options on workspace services
inactive_block_name string Optional Name of an inactive block to be executed when serving a request to this service.
readonly Optional Marks a service as read-only, which means that database transactions executed by this service will use a read only setting.
exclusive Optional Marks a service as exclusive, which means that database transactions executed by this service will acquire an exclusive lock on the database.

Configuration options for all types of protobuf services (default and global)
protobuf_protocol string Optional Name of the protocol. This is a convenience predicate to set both protobuf_request_protocol and protobuf_response_protocol.
protobuf_request_protocol string Required Name of the protocol for requests
protobuf_response_protocol string Required Name of the protocol for response
protobuf_message string Optional Name of the message. This is a convenience predicate to set both protobuf_request_message and protobuf_response_message. This option is not commonly used, because usually the request and response messages have a different type.
protobuf_request_message string Required Message type of the request
protobuf_response_message string Required Message type of the response
protobuf_encoding string Optional

The encoding used by the service. This convenience predicate simply sets the options protobuf_request_encoding and protobuf_response_encoding. Supported encodings are:

  • auto: services will accept either binary protobuf or textual JSON requests or responses, according to the Content-Type and Accept HTTP Headers of the request. Service with the auto encoding fall back to binary protobuf if the header is not present. This allows services to be used both by browsers and LB workspaces. This is the recommended encoding to use.

  • json: if services are only meant to be used from a browser. The request is required to use application/json content-type. The response will have the same content-type.

  • binary: only supports binary protobuf messages, which is suitable for services that are never used from browsers. The content-type is application/octet-stream.

protobuf_request_encoding string Optional The encoding used for the request (see protobuf_encoding for the list of supported encodings).
protobuf_response_encoding string Optional The encoding used for the response (see protobuf_encoding for the list of supported encodings).

Configuration options for global protobuf services
global_protobuf_target_uri string Required Convenience predicate that associates target URIs with a global protobuf service. This will automatically create the service_clients for URIs using a TCP transport protocol and associate the service_clients with the global_protobuf_service.
global_protobuf_target service_client Required Collection of service_client entities to target with a global protobuf service.

Configuration options for delimited-file services
file_binding string Required Name of the file_binding the service uses.

Configuration options for all proxy services (exact and transparent)
proxy_target string Required
proxy_host string Optional

Configuration options for transparent proxy services
proxy_prefix string Optional

23.4.1. Admission Control

By default, BloxWeb services requests as soon as they are submitted, and will issue as many concurrent requests as there are worker threads to service those requests (as described in Server Configuration). For a mix of read-only and update services, this can sometimes result in poor performance, depending on the type of concurrency control used by the workspace, and the transaction times resulting from the services. In some cases, it is desirable to have BloxWeb order the execution of the services, such that read-only requests are run concurrently, while update requests are run exclusively. Besides resulting in better performance from avoiding transaction aborts, this can also result in performance gains from disabling concurrency control on the workspace.

This can be achieved by configuring services to use specific admission queues. All requests to services using the same admission queue are executed using a policy whereby:

  • Requests are submitted for execution in the order received.
  • Requests for read-only services can run concurrently.
  • Requests for update services wait until all currently running requests are complete.
  • Requests for read-only services are only submitted for execution after any currently running write requests are complete.

23.4.1.1. Configuring Admission Control

Configuring a service to use an admission queue is done by adding the AdmissionQueue property to the service_parameter option when configuring the service. The value of this property is an arbitrary name, used to group services into the same queue. For instance, the following service configuration shows two services using the same admission queue, with one being read-only while the other can update the state of the workspace.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:protobuf),
  alias_all(`bloxweb:config:protobuf_abbr),

  clauses(`{

    service_by_prefix["/readonly-service"] = x,
    service_parameter[x,"AdmissionQueue"] = "myqueue",
    default_protobuf_service(x) {
      readonly(),
      protobuf_protocol[] = "time",
      protobuf_encoding[] = "binary",
      protobuf_request_message[] = "Request",
      protobuf_response_message[] = "Response"
    }.

    service_by_prefix["/update-service"] = x,
    service_parameter[x,"AdmissionQueue"] = "myqueue",
    default_protobuf_service(x) {
      protobuf_protocol[] = "settime",
      protobuf_request_message[] = "Request",
      protobuf_response_message[] = "Response"
    }.

  })

} <-- .           
           

23.5. Plugin Logic

Plugin logic allows Bloxweb to invoke LogiQL-generation components transparently. Fundamentally, plugin logic is a high-level specification of the expected contents of a predicate on a call to a particular service.

This is useful when a bloxweb service depends on the generation of LogiQL from another component, such as when exporting measure predicates using the delimited file service. In this case, a plugin logic specification would specify that, for a particular delimited file, a particular predicate must be populated with the sales aggregated by store, year, and SKU.

Every plugin logic has a name as an identifier, and a plugin that will be responsible for fulfilling its requirements. In addition, plugin logic also allow for a set of configuration parameters.

plugin_logic_by_name["sales-week-store-sku"] = lt,
plugin_logic(lt) {
  executor[] = "measure",
  param["measure_str"] = "Sales"
} <- .

The plugin logic above is declaring that some predicate with a virtual name of "sales-week-store-sku" will be populated with the result of calling the "measure" plugin with a parameter measure_text="metric(\"sales\")".

We can then create a delimited file service that will export the sales-week-store-sku predicate:

service_by_prefix["/sales"] = x,
delim_service(x) {
  delim_file_binding[] = "sales"
}
  <- .

file_definition_by_name["sales"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SKU,STORE,WEEK,SALES",
  column_formats[] = "alphanum,alphanum,alphanum,float"
}.

file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales", 
  predicate_binding_by_plugin_predicate["sales-week-store-sku"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU,STORE,WEEK,SALES"
    }
}.

It is important to note that this allows the delimited file service to export the content of predicates generated by components other than the measure service. The same file binding above could be used to export sales data from some hypothetical plugin that downloads such data from the web:

plugin_logic_by_name["sales-week-store-sku"] = lt,
plugin_logic(lt) {
  executor[] = "web-downloader",
  param["url"] = "http://some.place/sales"
} <- .

Similarly, the same plugin logic from above could be used by a service other than the delimited file service, for example, a hypothetical charting service.

From the examples above, we see that the parameters for plugin logic will vary per plugin. Each plugin will define which parameters are accepted. Thus, to use plugin logic generators, is essential to be aware of how the plugin interface is designed.

23.5.1. Binding Services to Plugin Logic

Although the above examples declare a plugin logic, it is still necessary to declare when the plugin logic generator should generate code.

There are two methods to bind a service to a plugin logic, an explicit way and an implicit way. We generally use the explicit binding when we know the names of the predicates that will be populated by plugin logic. When we don't know the predicate names of the predicates populated by plugin logic, as is the case for the measure service plugin logic, we should use the implicit binding method.

We explicitely bind the plugin logic to a service using the execute_on predicate, as shown below. The execute_on predicate will trigger the code generation of the plugin logic on every call to a service. For the example below, every call to the service at /sales will trigger the plugin logic sales-week-store-sku.

plugin_logic_by_name["sales-week-store-sku"] = lt,
plugin_logic(lt) {
  executor[] = "measure",
  execute_on['/sales'],
  param["measure_str"] = "Sales"
} <- .

The implicit binding of plugin logic to services depends on the service. For delimited file service, this binding occurs using the predicate_binding_by_plugin_predicate predicate when definiting the file binding. In the example below, we are binding the predicate resulting from the plugin logic sales-week-store-sku to the file binding.

file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales", 
  predicate_binding_by_plugin_predicate["sales-week-store-sku"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU,STORE,WEEK,SALES"
    }
}.

23.5.2. Incremental Configuration of Plugin Logic

Plugin logic can be configured incrementally. Plugin logic are first loaded and configured on handler startup. But it is not necessary to fully configure the plugin logic in datalog code. We can pass in parameters to the plugin logic on each HTTP call to services using plugin logic.

Consider a delimited file service to export sales data filted by some maximum sales value. One way to configure this service is to use the following plugin logic:

plugin_logic_by_name["sales-week-store-sku"] = lt,
plugin_logic(lt) {
  executor[] = "measure",
  param["measure_str"] = "filter Sales by <= max_sales : float",
  param["max_sales"] = "300"
} <- .

Here, we have fully configured the plugin logic, including the maximum sales value. Alternatively, we could achieve the same result using the following plugin logic:

plugin_logic_by_name["sales-week-store-sku"] = lt,
plugin_logic(lt) {
  executor[] = "measure",
  param["measure_str"] = "filter Sales by <= max_sales : float"
} <- .

and calling this service using /sales?max_sales=300. Finally, we could even pass the measure_text parameter only on request with the following plugin logic:

plugin_logic_by_name["sales-week-store-sku"] = lt, plugin_logic(lt) { executor[] = "measure" } <- .

and call the uservice using (with the url properly encoded) /sales?measure_str=filter Sales by <= max_sales : float&max_sales=300. The only constraint here is that the key signature of the metric being generated needs to match the column types in the file binding.

23.5.3. Using Measure Service Plugin Logic with Delimited File Service

23.5.3.1. Measure Service Plugin Logic

The main parameter to the measure service plugin logic generator is the measure_str parameter. This parameter is a full specification of a measure expression and fully defines the metric we want to export, modulo some filtering parameter values that can be set later.

An example of values to the measure_str parameters are:

- "Sales"
- "Returns"
- "filter Sales by <= max_sales : float"

Depending on the measure expression set in measure_str, the measure service plugin logic generator will accept parameters for its filters. For the third measure expression above, the plugin will, thus, accept a max_sales parameter.

23.5.3.2. Exploring Dynamic/Static Capabilities

We will now show how plugin logic allow for static semi-static or semi-dynamic configurations of delimited files with measure expressions.

For all cases, we will use the delimited file service specification below. We will access the delimited file using a GET to /filter and the delimited file will have the columns SKU|STORE|WEEK|VALUE.

service_by_prefix["/filter"] = x,
delim_service(x) {
  delim_file_binding[] = "filter"
}.

file_definition_by_name["filter"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SKU,STORE,WEEK,VALUE",
  column_formats[] = "alphanum,alphanum,alphanum,float"
}.

Next, we declare the predicate binding. This binds the predicate generated by the measure service to the columns of the delimited file. We note that it is necessary for a developer to specify how to map the keys of the predicate generated by the measure service to the columns in the delimited file. This means that the developere writing the file binding should know the rules that plugin (this case, the measure service) uses to create the key signature for its generated predicates.

file_binding_by_name["filter"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "filter", 
  predicate_binding_by_plugin_predicate["plugin-filter"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "WEEK,STORE,SKU,VALUE"
    }
}.
23.5.3.2.1. Fully-Static Case

Let's start with the fully static configuration of a delimited file measure. We first declare the plugin logic. Essentially, this is just a specification of how we should invoke the measure service so that it generate the expected predicates. Here, we fully configure the measure, including the value of the max parameter.

plugin_logic_by_name["plugin-filter"] = spec,
plugin_logic(spec) {
  plugin_predicate(),
  executor[] = "measure",
  param["measure_str"] = "filter Sales by <= max_sales : float",
  param["max_sales"] = "300"
}.

To access this service, since all the configuration for the measure has been set, we just access it with a GET to /filter.

23.5.3.2.2. Semi-Static Case

In this case, we'll use the folloing plugin logic, which is identical to the one above, minus the "max" param:

plugin_logic_by_name["plugin-filter"] = spec,
plugin_logic(spec) {
  plugin_predicate(),
  executor[] = "measure",
  param["measure_str"] = "filter Sales by <= max_sales : float",
  allow_override("max_sales")
}.
23.5.3.2.2.1. Allowing for parameters in the request

The allow_override predicate indicates that requests can set new values for the max parameter. The allow_override accepts parameter names with one * character, such as max_*, *_max, or max_*_foo.

Similarly, the allow_accumulate predicate indicates that new values for a parameter should be added to the existing values for the parameter.

Since the "max" parameter has not yet been set, we just pass this on our request to the service: /filter?max=300.

23.5.3.2.3. Semi-Dynamic Case

In this case, our plugin logic will provide no parameters to the measure service:

plugin_logic_by_name["plugin-filter"] = spec,
plugin_logic(spec) {
  plugin_predicate(),
  executor[] = "measure"
  allow_override("*")
}.

To access this delimited file we must thus pass it the "measure_str" parameter (the url would need to be properly encoded): /filter?measure_str=filter Sales by <= max_sales : float&max=300.

23.5.3.3. Joining Two Metrics

Logic templates also allow us to export a delimited file that is a join of two metrics. The following file binding binds a sales and a returns metrics by the SKU, STORE, and WEEK columns.

file_binding_by_name["sales-returns"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales-returns", 
  predicate_binding_by_plugin_predicate["plugin-sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU,STORE,WEEK,SALES"
    },
  predicate_binding_by_plugin_predicate["plugin-returns"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU,STORE,WEEK,RETURNS"
    }
}.

We define the plugin-sales and plugin-returns plugin logic as:

plugin_logic_by_name["plugin-sales"] = spec,
plugin_logic(spec) {
  plugin_predicate(),
  executor[] = "measure",
  param["measure_str"] = "Sales"
}.

plugin_logic_by_name["plugin-returns"] = spec,
plugin_logic(spec) {
  plugin_predicate(),
  executor[] = "measure",
  param["measure_str"] = "Returns"
}.

As can be seen, it is possible to join any predicate with a measure predicate, including real EDB, IDB, or other plugin predicates created by other plugin logic.

23.5.3.3.1. Renaming parameters

Let's now consider the case where we want to pass in the measure_str parameter to both the plugin-sales and plugin-returns plugin logic.

In this case, we would define the file binding as:

file_binding_by_name["sales-returns"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales-returns", 
  predicate_binding_by_plugin_predicate_index[0, "plugin-measure"] =
    predicate_binding(_) {
      plugin_logic_param_rename["sales-measure-str"] = "measure_str",
      predicate_binding_columns[] = "WEEK,STORE,SKU,SALES"
    },
  predicate_binding_by_plugin_predicate_index[1, "plugin-measure"] =
    predicate_binding(_) {
      plugin_logic_param_rename["returns-measure-str"] = "measure_str",
      predicate_binding_columns[] = "WEEK,STORE,SKU,RETURNS"
    }
}.

with the plugin_logic_param_rename predicates indicating that the returns-measure-str and sales-measure-str should be renamed to measure_str before being passed to the plugin logic generator. This would, thus, allow us to make the following request:

/sales-returns?sales-measure-str=...&returns-measure-str=...

23.6. Implementing Global ProtoBuf/JSON Services

Global protobuf services support broadcasting a request to a set of other services (usually partitions of a partitioned database) and combining the results of the individual services into a single response of the global service.

As an example, we will use a database of products. Product data is typically partitioned by product category in retail planning applications, which means that it might not be possible to easily find all products that satisfy properties that do not relate to product categories. For example, consider that an application needs a search facility to find all products with a given minimum price.

The database schema for products:

block(`schema) {

  export(`{

    product(x), product_id(x:s) -> string(s).
    product_price[x] = v -> product(x), uint[32](v).

  })

} <-- . 

The following protocol of the global service has a minimum price field on the request, and returns a list of products. The list of results is important here: the generic global protobuf service by default concatenates all results, which works particularly well for search services (less so for global aggregation services).

message SearchRequest
{
  required uint32 min_price = 1;
}

message SearchResponse
{ 
  repeated Product product = 1;
}

message Product
{
  required string description = 1;
  required uint32 price = 2;
} 

The services on the individual partitions of the distributed system can use the same protocol. The implementation of the local service is fairly straightforward. The first rule finds all products that match the search criteria, and create protobuf Product messages for these products, collecting them in _resuls. The second rule creates a response for every request. The third rule populates the products of the response. Note that the separation between the second and the third rule is important, otherwise no response would be returned if a search does not match any products.

block(`search) {

  alias_all(`schema),

  clauses(`{

    lang:pulse(`results).
    results(p, req) -> Product(p), SearchRequest(req).

    lang:constructor(`cons).
    lang:pulse(`cons).
    cons[req] = resp -> SearchRequest(req), SearchResponse(resp).

    +results(x, req),
    +Product(x),
    +Product:description[x] = s,
    +Product:price[x] = actual
      <-
      +SearchRequest:min_price[req] = v,
      product_price[p] = actual,
      actual >= v,
      product_id[p] = s.

    +cons[req] = resp,
    +SearchResponse(resp),
    +SearchResponse:product[resp, i] = p
      <-
      +SearchRequest:min_price[req] = v,
      +results(p, req),
      +ProductId[p] = i.
  })

} <-- . 

The configuration of the global service is a bit more involved, because it specifies what services to target. The services to target are specified by URLs.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:protobuf),
  alias_all(`bloxweb:config:protobuf_abbr),
  alias_all(`bloxweb:config:global_protobuf),
  alias_all(`bloxweb:config:global_protobuf_abbr),

  clauses(`{

    service_by_prefix["/protobuf-global-search/search"] = x,
    global_protobuf_service(x) {
      protobuf_protocol[] = "search",
      protobuf_request_message[] = "SearchRequest",
      protobuf_response_message[] = "SearchResponse",

      global_protobuf_target_uri("http://localhost:8080/protobuf-global-search/partition/1"),
      global_protobuf_target_uri("http://localhost:8080/protobuf-global-search/partition/2")
    }.

  })
} <--.

As an example, the following log illustrates a search across two partitions on a database of highly rated products on Amazon.

$ echo '{"min_price" : 30}' | bloxweb-client call-json 'http://localhost:8080/search'
-----------------  request (/search) -----------------
min_price: 30

----------------- response (/partition/1) -----------------
product { description: "Food Thermometer"         price: 97 }
product { description: "Gluten-free Pancake Mix"  price: 41 }
product { description: "Forehead Flashlight"      price: 32 }

----------------- response (/partition/2) -----------------
product { description: "Three Wolf Moon T-Shirt"  price: 35 }
product { description: "Portable Gas Grill"       price: 134 }

------------------- response (/search) --------------------
product { description: "Food Thermometer"         price: 97 }
product { description: "Gluten-free Pancake Mix"  price: 41 }
product { description: "Forehead Flashlight"      price: 32 }
product { description: "Three Wolf Moon T-Shirt"  price: 35 }
product { description: "Portable Gas Grill"       price: 134 }

Complete executable examples of global protobuf services are available in the bloxweb-samples package (see protobuf-global-*).

23.7. Implementing Delimited File Services

23.7.1. Introduction

Delimited file services are HTTP services that offer delimited files for download to export data from LogicBlox workspaces (GET) and support uploading delimited files for importing data (POST/PUT). BloxWeb provides a built-in handler for defining such delimited file services at a very high level. This section describes how to configure and use these services.

We will use a simple example of multi-dimensional sales data to introduce delimited file services. Consider a workspace with the following schema defined for the hierarchy and measures.

block(`hierarchy) {
  export(`{
    sku(x), sku_id(x:s) -> string(s).
    store(x), store_id(x:s) -> string(s).
    week(x), week_id(x:s) -> string(s).
  })
} <-- . 
block(`measure) {
  alias_all(`hierarchy),
  export(`{
    sales[x, y, z] = v -> sku(x), store(y), week(z), int[64](v).
  })
} <-- . 

For this application the customer uses a delimited file for sales data, as in the following example.

SKU     | STORE       | WEEK | SALES
apples  | atlanta     | W1   | 10
oranges | atlanta     | W2   | 15
apples  | portland    | W1   | 20
oranges | portland    | W2   | 5

We shall define a delimited file service to import data in this delimited file format to the sku, store, week and sales predicates; as well as a service to export from these predicates to delimited file in this format.

A delimited file service is defined by three parts:

  • File definition which defines the format of a file, such as header names, column formats, optional columns, and the delimiter character that is used.

  • File binding which specifies how columns in a delimited file are bound to predicates in the workspace. This file binding is a high-level, bi-directional specification, which means that it can be used for both the import and the export.

  • Service configuration which defines the service to be hosted by the BloxWeb service container.

The BloxWeb programming interface for defining these parts follows.

23.7.2. Programming Interface

The BloxWeb handler for delimited file services uses predicates in bloxweb:delim namespaces. File definition is defined in bloxweb:delim:schema and predicate binding is defined in bloxweb:delim:binding. To avoid cluttering logic, and to make logic more readable, it is good practice to use aliases.

23.7.2.1. File Definition

A delimited file is defined by creating a bloxweb:delim:schema:file_definition element and populating interface predicates, and then saving this by name in bloxweb:delim:schema:file_definition_by_name. Example code:

block(`files) {
  alias_all(`bloxweb:delim:schema),
  alias_all(`bloxweb:delim:schema_abbr),

  clauses(`{
    file_definition_by_name["sales"] = fd,
    file_definition(fd) {
      file_delimiter[] = "|",
      column_headers[] = "SKU,STORE,WEEK,SALES",
      column_formats[] = "alphanum,alphanum,alphanum,integer"
    }.
  })
} <-- . 

Required file definition interface settings
file_delimiter bloxweb:delim:schema delimiter character
column_headers bloxweb:delim:schema_abbr Comma separated list of file headers
column_formats bloxweb:delim:schema_abbr Comma separated list of column formats (see the following table for supported column formats).

Optional file definition interface settings
file_columns_required bloxweb:delim:schema_abbr Comma separated list of required columns. Will make all columns that are not required optional.
file_column_required bloxweb:delim:schema_abbr Set the column with this header as required. Will make all columns that are not required optional.
file_columns_optional bloxweb:delim:schema_abbr Comma separated list of optional columns.
file_column_optional bloxweb:delim:schema_abbr Set the column with this header as optional.
file_columns_can_be_absent bloxweb:delim:schema_abbr Comma-separated list of columns that can be optionally absent.
file_column_can_be_absent bloxweb:delim:schema_abbr Set the column with this header to be optionally absent.
file_column_format bloxweb:delim:schema_abbr Set the format of a column by its header.

Supported column formats
Syntax Type Description
alphanum string Alpha-numeric string that maps to a string
integer int[64] Integer number
string string Non-empty string (after trimming whitespace from left and right)
string* string Possibly empty string. Cannot be used for an optional column. Whitespace is not trimmed.
0+ uint[64] Non-negative integer
1+ uint[64] Positive integer
float float[64] Floating-point number
boolean(t;f) boolean Boolean value, where t and f are literal case-insensitive specifications of the expected format for true and false. For example, this could be (1;0), (t;f) or (true;false). Values that are different from these two options are considered invalid.
datetime(format) datetime Datetime value, serialized to and from string with datetime:format and datetime:parse using the format string, e.g. datetime('%m/%d/%y'). See documentation on built-ins.
date(format) datetime Date value, serialized to and from string with datetime:format and datetime:parse using the format string e.g. datetime('%m/%d/%y'). See documentation on built-ins.

The translation creates a datetimeobject at 12:00 DST, i.e. builds a a datetime object in the database for this date at 12:00 DST.

23.7.2.2. File Binding

The file binding for a delimited file is defined by creating a bloxweb:delim:binding:file_binding element and populate interface predicates, and then saving this by name in bloxweb:delim:binding:file_binding_by_name. Example code with most basic binding to one predicate:

block(`server_init) {
  alias_all(`bloxweb:delim:binding),
  alias_all(`bloxweb:delim:binding_abbr),

  clauses(`{
    file_binding_by_name["sales"] = fb,
    file_binding(fb) {
      file_binding_definition_name[] = "sales",
      predicate_binding_by_name["measure:sales"] =
        predicate_binding(_) {
          predicate_binding_columns[] = "SKU,STORE,WEEK,SALES"
        }
    }.
  })
} <-- .

This binding will support import to sales predicate assuming that entity elements already exist in sku, store, and week. It is common to optionally add elements to all entity types, this is defined by populating bloxweb:delim:binding_abbr:file_binding_entity_creation predicate. Example:

    file_binding(fb) {
      ...
      file_binding_entity_creation[] = "accumulate",
      ...
    }

As shown in the example above a file binding consists of one or more predicate bindings. File binding configurations can be applied to all predicate bindings, like the entity creation example, or to individual predicate bindings.

Optional file binding settings applying to all predicate bindings
file_binding_predicate_columns bloxweb:delim:binding_abbr Comma-separated list of column headers. Applies to all predicate bindings for this file binding.
file_binding_entity_creation bloxweb:delim:binding_abbr

Set entity creation for all predicate bindings of a file binding. Supported values are:

  • none: do not create entity elements. This is the default value if no entity creation is specified.

  • accumulate: add new elements that do not exist previously

If entity creation is configured on the file binding, then it is recursively applied to all predicating bindings in this file binding. The setting on the predicate binding will recursively apply to all column bindings of the file binding.

file_binding_column_entity_creation bloxweb:delim:binding_abbr Default entity creation from the file binding to one specific argument.

The index argument for options that apply to a particular index is a zero based index to argument of predicate. For example, to enable entity creation for sku and store entities use:

    file_binding(fb) {
      ...
      file_binding_column_entity_creation[0] = "accumulate",
      file_binding_column_entity_creation[1] = "accumulate",
      ...
    }

Options applying to individual predicate bindings
predicate_binding_columns bloxweb:delim:binding_abbr Comma-separated list of column headers. Multiple columns can be combined to one column header by a semi-colon. A column binding transformation must be provided to combine multiple columns to one value.
predicate_binding_entity_creation bloxweb:delim:binding_abbr

Set entity creation for all column bindings. See file_binding_entity_creation for the supported values. The setting on the predicate binding will recursively apply to all column bindings of the file binding.

predicate_binding_export bloxweb:delim:binding Specify that this predicate should be ignored when constructing a .csv file. This is useful if imports populate auxiliary predicates that cannot provide a bi-directional transformation.
predicate_binding_filter bloxweb:delim:binding Specify that a predicate binding acts as a filter on import. Note: Filter predicate bindings do not allow entity creation.
column_binding_by_arg bloxweb:delim:binding Column binding to capture information on how column(s) from a delimited file map to an argument of a predicate.

Column binding options
column_binding_transform bloxweb:delim:binding Bi-directional transformation to apply to value(s) for this column binding.
column_binding_entity_creation bloxweb:delim:binding Set entity creation for this column binding. See file_binding_entity_creation for the supported values.

23.7.2.3. Column Transforms

It is often necessary or convenient to combine column values, or to perform a modification of column values prior to writing values to predicates in workspace. Column binding transform provide a collection of built-in functions that can combine and transform column values. There are currently three transform functions: concat, lookup, and substring. Example of binding where column headers are combined with a column binding transform. Consider adding a year column to the data file, we will combine week and year column to one week refmode value on import and decompose the week refmode value to week and year column on export:

SKU     | STORE       | WEEK | YEAR | SALES
apples  | atlanta     | W1   | 2012 | 10
oranges | atlanta     | W2   | 2012 | 15
apples  | portland    | W1   | 2012 | 20
oranges | portland    | W2   | 2012 | 5

File binding:

    ...
    file_binding_by_name["sales"] = fb,
    file_binding(fb) {
      file_binding_definition_name[] = "sales",
      predicate_binding_by_name["sample:sales:sales"] =
        predicate_binding(_) {
          predicate_binding_columns[] = "SKU,STORE,YEAR;WEEK,SALES",
          column_binding_by_arg[2] =
            column_binding(_) {
              column_binding_transform[] = "concat(-)"
            }
        }
    }.
    ...

Where the concat transform composes a single week refmode value using '-' character as delimiter (for example, '2012-W1') from the week and year column on import, and on export decomposes the week refmode value to week and year column values.

Column transform functions and arguments
concat(delim_char)

Import: composes string concatenation of all column values for column binding using required delim_char argument.

Export: decomposes predicate value to column values for column binding using required delim_char argument.

lookup(p,[p_inv])

Import: uses the column value as key to look up in predicate p. Required argument p is the name of a functional predicate with exactly one key argument which has the type of the column format (string,int[32],float[64], etc). The second argument is optional for import.

Export: uses predicate value as key to look up column value in p_inv. Required argument p_inv is the name of a functional predicate with exactly one key argument which has the type of the predicate value. Both arguments are required for export.

substring(start,len)

Import: creates substring of column value that starts at character position start and is len characters long. Both arguments are required.

Export: not implemented. Note that this means the export will fail unless predicate export is specifically ignored with predicate_binding_export option.

Column transforms can be sequenced with semi-colon separator. For example:

          ...
          predicate_binding_columns[] = "SKU,STORE,YEAR;WEEK,SALES",
          column_binding_by_arg[2] =
            column_binding(_) {
              column_binding_transform[] = "concat(-);lookup(p,p_inv)"
            }
          ...
      

On import this will first concatenate year and week column values and then look up the result in p; on export this will first look up the predicate value in p_inv and then decompose the result to create year and week column values.

23.7.2.4. Service Configuration

A delimited file service is configured with BloxWeb by creating a delim_service. Example:

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:delim),

  clauses(`{

    service_by_prefix["/sales"] = x,
    delim_service(x) {
      delim_file_binding[] = "sales"
    }.

  })

} <-- .

23.7.2.5. Using Service Configuration

23.7.2.5.1. Import

...

23.7.2.5.2. Export

...

23.7.2.6. Advanced Predicate Binding

TODO: filtering, optional columns, ...

TODO: default values. incremental.

23.7.2.7. Advanced Service Configuration

TODO: exclusive, read only txns. optional exec block, ...

23.7.2.8. Testing

TODO

23.7.2.9. Invalid Records

As of BloxWeb 3.10.2, we now report records that failed to import back to the user. The format of the report will be the same as the import file, including headers. If no records failed to import, we will return a file with only the headers. This feature is enabled by default.

Returning error records does have a small performance penalty as we must write and then return the file containing the bad records. If for some reason you wish to disable the feature, you can specify the --ignore-bad-records flag on bloxweb-client or simply do not specify an output_file or output_url in the batch. See the section Accessing via HTTP below for how to disable when accessing via HTTP.

23.7.2.9.1. Causes

The resulting data that reports which records were not imported will contain all the columns of the original import plus an additional two columns describing why the records were not imported. The first column CAUSE, will contain a human readable string such as "SKU is a required column.". The last column is CAUSE_CODE which will contain a constant string value of the error type for easy parsing. Below you can find the error codes along with their descriptions.

Error Code Description
REQUIRED_COLUMN The column is defined as required but is missing from the file.
WRONG_FORMAT The format of the data is incorrect.
DOES_NOT_EXIST The entity or record referenced by the data does not exist.

23.7.2.9.2. Accessing via HTTP

If you are not using bloxweb-client or batch, you can still enable / disable this feature by using the return_errors query string parameter. Since the feature is enabled by default, simply accessing the URL as normal will return the bad records. If you do not wish to return the error records, set the return_errors query parameter equal to "1".

23.7.2.9.3. Aborting Transactions

You can optionally configure a service to abort transactions if any rows fail to import. The default is NOT to abort, i.e. import good records, ignore bad records. You can configure this option by service or by individual request. To configure a service to default to aborting transactions, you must assert to the bloxweb:config:delim:abort_on_error predicate. This will default all transactions for the configured service to abort if they receive error messages. This setting can still be overwritten by individual requests via the following methods. If you are using a batch configuration, you can set the abort_on_error field of the ImportDelim message to true. This takes precedence over the bloxweb:config:delim:abort_on_error predicate. If you are using the bloxweb-client command line tool, you can specify the --abort flag. This takes precedence over the bloxweb:config:delim:abort_on_error predicate. If you are using raw HTTP, you can set the abort_on_error query parameter equal to "1" to abort, or "0" to not abort. This takes precedence over the bloxweb:config:delim:abort_on_error predicate.

23.8. Dynamic Delimited File Services

While delimited file services allow us to statically configure a file binding and, thus, create one service for each file binding, we can also import and export delimited files by specifying the file binding as a request parameter named file_binding.

To host the dynamic delimited file service we use the dynamic_delim_service predicate, as shown below. This configuration allows us to export and/or import delimited files by accessing /delim?file_binding=..., where ... is a JSON representation of the FileBinding protobuf message.

service_by_prefix["/delim"] = x,
dynamic_delim_service(x)
  <- .

23.8.1. Building FileBinding Messages

A FileBinding message describes the file structure that we want to export (FileDefinition) and how to build the file from different predicates (PredicateBinding). The optional entity_creation field allows you to set the default entity creation policy for all columns of all predicate bindings.

message FileBinding
{
  required FileDefinition file = 2;

  repeated PredicateBinding binding = 3;
  optional string entity_creation = 4;
}

The FileDefinition describes the format of the files and the options on the columns. If the required field is empty all columns are considered required if not otherwise specified. Conversely if the required field is not empty all columns not included in this field are considered optional.

message FileDefinition
{
  required string delimiter = 2;
  required string column_headers = 3;
  required string column_formats = 4;

  optional string file_columns_required = 5;
  optional string file_columns_optional = 6;
  optional string file_columns_can_be_absent = 7;

}

Finally, a PredicateBinding specifies how predicate columns bind to the column in the file definition.

message PredicateBinding
{
  required string predicate_name = 1;
  required string predicate_binding_columns = 2;
  repeated ColumnBinding column = 3;
  
  optional bool export = 4 [default = true];
  optional bool filter = 5 [default = false];
  optional bool incremental = 6 [ default = false ];
  optional bool negated = 7 [default = false];

  optional bloxweb.internal.PluginLogic plugin_logic = 8;
  optional string entity_creation = 9;
}

Example 23.1. 

The following JSON file definition message describes a file with three columns, "PERSON", "FATHER", and "MOTHER".

"file": {
  "delimiter": "|",
  "column_headers": "PERSON,FATHER,MOTHER",
  "column_formats": "alphanum,alphanum,alphanum"
}

The following two PredicateBinding messages specify how to build the PERSON, FATHER, and MOTHER columns from two predicates person_father and person_mother:

"binding":[ {
    "predicate_name": "person_father",
    "predicate_binding_columns": "PERSON,FATHER"
  },
  {
    "predicate_name": "person_mother",
    "predicate_binding_columns": "PERSON,MOTHER"
  }]

Combining the messages together, we build the file binding message that should be sent as the file_binding parameter to the dynamic delimited file service:

{
    "file": {
      "delimiter": "|",
      "column_headers": "PERSON,FATHER,MOTHER",
      "column_formats": "alphanum,alphanum,alphanum"
    }
    "binding": [
      {
        "predicate_name": "person_father",
        "predicate_binding_columns": "PERSON,FATHER"
      },
      {
        "predicate_name": "person_mother",
        "predicate_binding_columns": "PERSON,MOTHER"
      }]

23.8.2. Exporting Predicates Populated by Plugin Logic

For the case in which we know the name of the predicates that will be populated by plugin logic, the file binding is the same as the previous example.

To trigger the plugin logic generator, however, we pass in another request starting with plugin_logic with the contents of a PluginLogic protobuf message in JSON format.

Below is the message spec for PluginLogic:

message PluginLogic {
  message Param {
    required string key =1;
    required string value = 2;
  }
  message ParamRename {
    required string original_name = 1;
    required string renamed_name = 2;
  }

  required string name = 1;
  required string plugin_name = 2;
  repeated Param param = 3 [(blox.options.set) = true];
  repeated ParamRename rename_param = 4 [(blox.options.set) = true];
  repeated string allow_override = 5 [(blox.options.set) = true];
  repeated string allow_accumulate = 6 [(blox.options.set) = true];
}

Example 23.2. 

Thus, assuming that person_mother is a predicate populated by a plugin logic, we would pass a request parameter named plugin_logic_mother with the contents:

{
  name: "plugin-ancestor",
  plugin_name: "mother-plugin",
  param: [
    {
      "key": "some-param",
      "value": "some-param-value"
    },
    {
      "key": "another-param",
      "value": "value"
  }]
}

For a plugin logic generator that defines the predicate names and, thus, we do not know the predicate name that will be populated by the plugin logic generator, we pass in the PluginLogic message as the plugin_logic field for the PredicateBinding message.

Example 23.3. 

Thus, for a hypothetical plugin logic that generates the predicate name, our file binding would be:

{
    "file": {
      "delimiter": "|",
      "column_headers": "PERSON,FATHER,MOTHER",
      "column_formats": "alphanum,alphanum,alphanum"
    }
    "binding": [
      {
        "predicate_name": "person_father",
        "predicate_binding_columns": "PERSON,FATHER"
      },
      {
        "predicate_name": "person-mother-unknown-predicate",
        "predicate_binding_columns": "PERSON,MOTHER"
        "plugin_logic": {
           name: "plugin-ancestor",
           "plugin_name": "mother-plugin",
           "param": [
             {
               "key": "some-param",
               "value": "some-param-value"
             }
             {
               "key": "another-param",
               "value": "value",
             }]
         }
      }]
}

23.8.3. Exporting Measure Service Data using Dynamic Delimited Files

The following is a sample file_binding parameter to export sales and returns in a delimited file with header SKU|WEEK|STORE|SALES|RETURNS filtered by max sales and max returns. In this example the plugin logic configuration specified in measure_str assumes there is a measure service data set with Sales and Returns metrics with dimensions product, calendar, and location.

{
  "file": {
    "delimiter": "|",
    "column_headers": "SKU,WEEK,STORE,SALES,RETURNS",
    "column_formats": "alphanum,alphanum,alphanum,float,float"
  },
  "binding": [
    {
      "predicate_name": "sales",
      "predicate_binding_columns": "SKU,WEEK,STORE,SALES",
      "plugin_logic": {
        "name": "sales"
        "plugin_name": "measure"
        "param": [
          {
            "key": "measure_str"
            "value": "filter Sales by <= max_sales : float"
          },
          {
            "key": "max_sales"
            "value": "100"
          }]
      }
    },
    {
      "predicate_name": "returns",
      "predicate_binding_columns": "SKU,WEEK,STORE,RETURNS",
      "plugin_logic": {
        "name": "returns",
        "plugin_name": "measure",
        "param": [
          {
            "key": "measure_str"
            "value": "filter Returns by <= max_returns : float"
          },
          {
            "key": "max_returns"
            "value": "100"
          }]
      }            
    }]
}

23.9. Configuring Proxy Services

BloxWeb offers two kinds of proxy services: exact proxies and transparent proxies. An exact proxy forwards a request to one specific service to a different service, possibly hosted on a different machine. A transparent proxy forwards all requests send to a certain URL prefix to a given host.

Exact Proxies.  The following example (see bloxweb-samples/protobuf-proxy-auth) illustrates how to configure an authenticated exact proxy service.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:auth),
  alias_all(`bloxweb:config:auth_abbr),
  alias_all(`bloxweb:config:proxy),

  clauses(`{

    service_by_prefix["/time-proxy-auth"] = x,
    exact_proxy(x) {
      proxy_target[] = "http://localhost:8080/time-unauth"
      auth_realm[] = "time_auth_proxy",
    }.

    realm_by_name["time_auth_proxy"] = x,
    realm(x) {
      realm_config[] = "default-password"
    }.

  })
} <-- .

Transparent Proxies.  The following example illustrates how to use a transparent proxy. A request to /promo/foo on this host will be forwarded to http://example.com/foo. The host option is used for virtual host support. By setting the host, the HTTP header Host will be set in the forwarded request. The prefix option indicates what prefix of the original URL needs to be removed from the forwarded request.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:proxy),

  clauses(`{

    service_by_prefix["/promo/*"] = x,
    transparent_proxy(x) {
      proxy_target[] = "http://example.com",
      proxy_host[] = "example.com",
      proxy_prefix[] = "/promo"
    }.

  })
} <-- . 

Authentication.  BloxWeb proxies do currently not support proxying to authenticated services. For stateful session authentication this is not possible, for RSA-SHA512 authentication there are some small issues preventing this from working.

23.10. Implementing Custom Services

Custom services need to be implemented in Java as implementations of the BloxWeb Handler interface. The Handler interface is similar to the standard HttpServlet class.

23.10.1. Custom ProtoBuf Services

Custom ProtoBuf services have a ProtoBuf interface, but process the JSON or ProtoBuf messages in a different way than importing it straight into a LogicBlox workspace, as the normal ProtoBuf services do. A few abstraction are available to help with the implementation of such services.

General Custom ProtoBuf Services

The ProtoBufHandler class helps with the input and output aspects of protobuf services. It makes no assumptions on what is done with the messages. This class implements support for gzip compression and handling JSON-formatted messages. It also logs requests and responses based on the configuration of the server, and handles error reporting of incorrect messages.

All ProtoBuf services should use this abstraction.

Subclasses of the ProtoBufHandler need to implement the method handle(Exchange, ProtoBufExchange). The ProtoBufExchange class is used to manage the parsing of the request message and communication of the response message. The goal of the ProtoBufExchange class is to make sure that messages do not get parsed repeatedly. The subclass of ProtoBufHandler can obtain the request message using protoExchange.getRequestMessage(). In return, the subclass is required to set the response message on the ProtoBufExchange. The response can be set in two ways:

protoExchange.setResponseMessage(msg)
protoExchange.setResponseBytes(bytes)

If implementations work with Message objects, then the preferred way of setting the response is using setResponseMessage, because this will avoid having to parse the bytes again in case a JSON response is needed, or if the message needs to be logged. The bytes variant is preferred if subclasses do only have a binary serialization of the message, since this will avoid having to parse the message, if the message does not need to be logged, and if the response is not formatted as JSON message.

By default, all Message objects are instances of DynamicMessage. The DynamicMessage objects are created using the descriptors that are in the workspace. This means that the message cannot be cast to classes generated by the ProtoBuf compiler (protoc). To address this, implementations can override two more methods on the ProtoBufHandler:

protected Message.Builder getRequestBuilder()
protected Message.Builder getResponseBuilder()

This will help in the performance of the service, but it will also allow the subclass to cast messages to the generated message classes.

ProtoBuf Services using ConnectBlox

The AbstractProtoBufHandler, which extends ProtoBufHandler, implements support for ProtoBuf services that are implemented by executing ConnectBlox requests. The subclasses determine what actual ConnectBlox requests to execute. This abstraction helps with the correct execution of ConnectBlox requests, handling of errors that might be triggered by the ConnectBlox request, and instrumenting ConnectBlox requests to handle correlation with database logs and monitoring predicate changes. Implementations based on the AbstractProtoBufHandler need to implement two methods: buildTransaction, to construct the ConnectBlox transaction to execute, and buildResponse, to extract a ProtoBuf response from a ConnectBlox response.

ProtoBuf services that use ConnectBlox should use this abstraction.

23.11. Authentication

By default BloxWeb services do not require authentication by users. They can however be reserved to authenticated users if the specification includes an authentication realm which BloxWeb has access to. Just like a service, an authentication realm is specified in a workspace according to the BloxWeb protocol.

BloxWeb comes with two default realm configurations, which are types of realms that can be instantiated in workspaces. The use of realms is advised to avoid potential security problems and help with future compatibility issues. The two realm configurations are default-password, for stateful authentication with usernames and passwords, and default-signature, for stateless authentication with RSA-SHA512 signatures. BloxWeb also supports SAML Authentication to rely on third parties to authenticate users.

23.11.1. Credentials Database and Service

BloxWeb contains a library that implements local storage of all user related data, such as usernames, passwords, public keys, email addresses and locale. This library can be used in applications by including the library bloxweb_credentials in a project.

The bloxweb command-line tool has support for administration of users. It supports the following commands:

$ bloxweb import-users users.dlm
$ bloxweb export-users users.dlm
$ bloxweb set-password john.smith
enter password for user 'john.smith':
$ bloxweb list-users
john.smith
james.clark

The library will cause the workspace that includes it to host the following delimited-file service:

  • /admin/users for files with headers:
    • USER - required
    • DEFAULT_LOCALE - optional
    • EMAIL - optional
    • ACTIVE - optional
    • PASSWORD - optional
    • PUBLIC_KEY - optional

The PASSWORD column is required to be a bcrypt hash of the clear text password. Importing plain text passwords to these services will not work. We enforce the usage of separate bcrypt hashing to discourage transferring and storing files with plain text passwords. Passwords can be hashed using:

$ echo "password" | bloxweb-client bcrypt
$2a$10$f3yGStD8yx57jXtZixP4bOtVuTjlRCLJl1HnCwy6HuB60cmJN799i

Or by giving the bloxweb-client bcrypt command a full file:

$ cat passwords.txt 
aaa
bbb
$ bloxweb-client bcrypt -i passwords.txt 
$2a$10$2f1UDZiqp1yHSQpwL5ciYe69lKiA.n22pvhPVtmBI/B6vB7JzPzAW
$2a$10$B4lyNNnWhjtBgmk9lOUi8Ofq2N.GrEjOX8N1XZ0A7gECpOCsO3J0m

23.11.2. Stateful Authentication using Passwords and Sessions

BloxWeb comes with a default realm configuration for authentication using passwords.

  • The realm configuration uses a session key that is stored in a cookie that expires after a reasonable amount of time. The default name of the session key is "lb-session-key_" plus the realm name. This value can be configured by the realm declaration.

  • The realm configuration stores passwords using the secure bcrypt hashing algorithm with salt. It uses a service to retrieve the credentials from a workspace. The service is by default /admin/users. This service can currently not be proxied to a different machine, but support for this is planned.

To use the default password realm configuration, configure a service as follows:

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),
  alias_all(`bloxweb:config:protobuf),
  alias_all(`bloxweb:config:protobuf_abbr),
  alias_all(`bloxweb:config:auth),
  alias_all(`bloxweb:config:auth_abbr),

  clauses(`{

    service_by_prefix["/time"] = x,
    default_protobuf_service(x) {
      ...
      auth_realm[] = "realm_name"
    }.

    realm_by_name["realm_name"] = x,
    realm(x) {
      realm_config[] = "default-password",
      realm_session_key[] = "time_session_key"
    }.
  })

} <-- .

This will declare a realm called "realm_name" using the default password configuration. In order to be authenticated in this realm, a client needs to send a request to the login service of BloxWeb. The login service is a BloxWeb service that accepts JSON or protobuf requests defined by the following protocol:

message Request
{
  required string realm = 1;
  optional string username = 2;
  optional string password = 3;
  optional bool logout =4;
}

message Response
{
  required bool success = 1;
  optional string key = 2;
  optional string exception = 3;
}

Upon successful login, a key is returned to the client to be stored in a cookie. Since this realm overwrites the default session key name, the cookie will be named "time_session_key" (instead of the default "lb-session-key_realm_name"). In the server, a security context is created which will allow subsequent requests with this cookie or with an HTTP header of the same name to access services in this realm.

The security context will expire when more than secondsToLive seconds elapse without a request using this security context, or after a call to the login service with the key for the context as a cookie or http-header and the attribute logout set to true.

23.11.3. Stateless Authentication using RSA-SHA

For services that are not accessed from a browser, but instead are used from non-browser applications deployed on some machine, LogicBlox advises to use an authentication mechanism called RSA-SHA512 (for reference, the closely related HMAC-SHA1 and RSA-SHA1 is more commonly used, but these have weaker security properties and key management complications).

Clients of the web service compute a string-to-sign based on a hash of the content of a request and some important aspects of the HTTP headers, such as the HTTP method, query-string, etc. The goal of this string-to-sign is to be specific enough so that the HTTP request cannot be altered by an attacker to send a critically different request to the server with the same signature. Clients of the web service compute a signature using their private key and the string-to-sign. This signature is passed to the web-service in the standard header used for HTTP authentication. Although we do use SSL sockets, the signature does not expose any secret information if intercepted.

On the server-side, the web service computes the same string-to-sign and verifies with the public key of the user that the client originally did use the correct private key to sign the request. We also verify that the date of the request is within a certain skew (to be determined, but probably 15 minutes).

This method of authentication and ensuring message integrity is similar to the authentication methods of many current mainstream, highly security-sensitive web services, notably Google Apps and the AWS S3 REST authentication scheme, called HMAC-SHA1 (see http://docs.amazonwebservices.com/AmazonS3/latest/dev/RESTAuthentication.html). LogicBlox uses asymmetric cryptography (public/private keys) rather than symmetric cryptography (private keys on both client and server), as is the case in HMAC-SHA1. This reduces the problem of secure key management, because our servers will only contain the public key of the client.

23.11.3.1. Usage

Generate an RSA key-pair as follows:

$ openssl genrsa -out priv.pem 2048
$ openssl rsa -in priv.pem -out pub.pem -pubout
$ openssl pkcs8 -topk8 -inform PEM -outform PEM -in priv.pem -out priv-pkcs.pem -nocrypt

The priv-pkcs.pem key is intended for deployment on the client-side (the server of a client using the BloxWeb service). The BloxWeb client library supports reading this private key format directly. The pub.pem file should be deployed on the server hosting the service to authenticate the user.

See protobuf-auth-rsa in bloxweb-samples/

23.11.4. Customizing Authentication Realms and Mechanisms

An authentication realm is an instance of an authentication realm configuration, and it is uniquely identified by its name in a BloxWeb instance. If multiple realms are configured with the same name, a warning is raised and the realms are not hosted. The realm accepts several configuration options, which are listed in the realm configuration section of bloxweb.config. For example, this code can be used to set the session timeout for a default-password realm to be one hour:

realm_by_name["time_auth_proxy"] = x,
realm(x) {
  realm_config[] = "default-password",
  realm_option["seconds_to_live"] = "3600"
}.

23.11.5. Using Authenticated User Information in Services

The authenticated user for a service call is available in the pulse predicate bloxweb:auth:username[] = username.

Computation of the bloxweb:auth:username predicate is implemented in active logic. This means that the authenticated user is currently not available in inactive blocks. The HTTP header information is available though, so it is possible to directly use the HTTP headers to obtain information on the authenticated user.

Security Considerations.  The authenticated user information should only be trusted if the service is actually authenticated. For services that do not have authenticated configured, clients can claim to be any user, because it is based on HTTP header information (x-logicblox-username). The authentication facilities initialize information for a request.

Supported Services.  The authenticated user is available to protobuf as well as delimited file services. If a proxy service is authenticated, then the authenticated user information is forwarded to the target service. Custom service handlers need to make sure to import the HTTP request information to make this information available.

23.11.6. SAML Authentication

SAML is an XML standard for applications to securely communicate authentication and authorization information. The most common application of SAML is browser single sign-on (SSO), which lets web applications use a trusted, external application manage the information of users and execute the authentication procedure. The web application is commonly called the service provider, or SP. The external application that manages users and executes authentication is called the identity provider, or IDP. The key advantage of single sign-on is that the service provider never gets to see the secret credentials of the user, and user information does not need to be separately maintained for every web application.

BloxWeb has native support for SAML 2.0 authentication. SAML is really a collection of different authentication methods though, and not all these methods make sense for BloxWeb. The remainder of this paragraph is to clarify the level of support to SAML experts. BloxWeb only uses the Authentication Request Protocol. BloxWeb submits the authentication request using a HTTP redirect to the identity provider (profile urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect). The identity provider is instructed to use a HTTP POST to the assertion consumer service hosted by BloxWeb (profile urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST).

SAML authentication is enabled by including a [saml:somename] section in the bloxweb.config file. The SAML section supports the following configuration options:

Required settings
Configuration Description
request Path of URL for triggering the initial authentication request to the identity provider. This is the URL that users need to visit to trigger authentication. For example, if the path is configured to be /sso/request, then on a local machine the user would need to visit http://localhost:8080/sso/request. From this URL they will be redirected to the identity provider. If the user was already authenticated (for example with a cookie for the identity provider), then the identity provider will normally immediately send the user back to the application. Due to the automatic sequence of redirects, this entry-point can be configured as the front-page of the application.
response Path of URL that the identity provider will use to confirm successful authentication. It is important that this setting is consistent with the assertion_consumer_service configuration.
redirect URL to redirect the user to after successful authentication. For testing purposes it can be helpful to use /login/user?realm=test, which shows the current username.
realm Name of the authentication realm that this SAML configuration will authenticate users to. Services that use this realm can be accessed by the user after successful authentication. The realm is separately configured in the workspace as a normal BloxWeb authentication realm. It has to use the realm configuration default-saml.
meta Path of URL for hosting xml metadata for the service provider (an EntityDescriptor) that can be used by some identity providers to register the service provider.
alias_idp Alias name given to the identity provider. This alias is used to find a certificate (.cer format) for the identity provider. It is only used by BloxWeb and is not used in the exchange of information with the identity provider.
alias_sp Alias name given to this service provider. This alias is used to find public/private keys (.pem format) for the service provider. It is only used by BloxWeb and is not used in the exchange of information with the identity provider.
entity_id_idp Identity for the identity provider. Depending on the registration procedure with the entity provider, this can normally be found in the XML metadata for the identity provider, as the entityID attribute of the EntityDescriptor element.
entity_id_sp Identity for the service provider. This is the name used by the identity provider to refer to the service provider, and is typically agreed upon in some way during the registration procedure of the service provider with the identity provider. This can in simple cases be identical to the alias_sp. In some cases identity providers can have specific requirements for the id, such as URIs, in which case the entity_id_sp cannot be the same as the alias_sp.
sso_service URL hosted by the IDP that a user will be redirected to after visiting the request URL explained previously. This can normally be found in the XML metadata of the identity provider. Be careful to select the Location for the redirect binding, for example: <SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.testshib.org/idp/profile/SAML2/Redirect/SSO"/>
assertion_consumer_service Full URL that the IDP will use to confirm successful authentication. The path of the URL should be the same as the response path, unless a proxy is used that rewrites paths.
attribute_map List of pairs, where the first element is the local identifier to use to refer to a certain attribute of a user. The second element is the standard object identifier, which is the recommended way to refer to attributes returned by the IDP to the SP. The local identifier uid is special in BloxWeb and will be used as the username. This is the only required attribute. See the example configuration below for how to determine what attributes are available.
Optional settings
keydir Path of directory where encryption keys and certificates are kept. Defaults to the global BloxWeb key directory.
log_level Log level for the SAML libraries. Possible values are: off, error, warning, and message. Defaults to warning.
logdir Directory for log files used by the SAML libraries. Defaults to the global log directory (LB_DEPLOYMENT_HOME/logs/current).

In addition to the configuration options, the following cryptographic keys are needed:

  • alias_idp.cer - Certificate of the IDP. The certificate is used to verify that authentication confirmations really do originate from the intended identity provider. The certificate is not a secret, and is usually available in the XML metadata of the identity provider. This file should contain something similar to:

    -----BEGIN CERTIFICATE-----
    MIIEDjCCAvagAwIBAgIBADANBgkqhkiG9w0BAQUFADBnMQswCQYDVQQGEwJVUzEV
    ...
    8K/qhmFT2nIQi538n6rVYLeWj8Bbnl+ev0peYzxFyF5sQA==
    -----END CERTIFICATE-----

  • alias_sp.pem - Private key for the service provider. This is used to sign the request to the identity provider. The identity provider uses a certificate generated from this key (see next item) to validate that the authentication request does indeed originate from the registered service provider. This file should contain something similar to:

    -----BEGIN PRIVATE KEY-----
    MIICdwIBADANBgkqhkiG9w0BAQEFAASCAmEwggJdAgEAAoGBANXOgiM+NVoAIiZY
    ...
    Opkj9UzXZ+nvzfk=
    -----END PRIVATE KEY-----

  • alias_sp.cer - Certificate file for the service provider. The format is the same as the certificate for the identity provider.

23.11.6.1. Testing SAML with TestShib

This section describes step by step how to use a free online SAML testing service called TestShib with BloxWeb. Clearly this service should never be used in any actual application, but it is useful as an exercise in deploying SAML, without having to install and configure an identity provider as well.

Choose an Alias.  First, decide on an alias for your web application, which we will refer to as the service provider (SP). This example will use logicblox-abc from now. Do pick a different alias, because the TestShib testing service will use this as the account name, and different users of this guide will interfere!

Private Key.  We begin with generating a private key using the s3lib-keygen tool (which uses OpenSSL). This private key is used by the service provider to sign requests to the identity provider, which can in this way confirm that authentication requests only originate from approved service providers.

$ s3lib-keygen logicblox-abc

Certificate.  While the SAML standard could have chosen to simply provide the public key of the key pair to the identity provider, the standard is using certificates to get additional evidence about the identity of the service provider. Therefore, we next need to create a certificate for the private key. In this example we use a self-signed certificate. Note that the certificate is never used in a browser, so there will not be a problem with the recent trend of severe browser warnings for self-signed certificates. The openssl tool will ask a few pieces of information, which do not matter for this example. Simply hitting enter is sufficient.

$ openssl req -new -x509 -key logicblox-abc.pem -out logicblox-abc.cer -days 1095

If you later want to inspect certificate files (either the one just generated, or the one from the IDP that we will obtain later), then you can use openssl as well:

$ openssl x509 -in logicblox-abc.cer -text -noout

Key Deployment.  Copy the logicblox-abc.pem and logicblox-abc.cer files to your standard key directory, which normally is $HOME/.s3lib-keys.

$ cp logicblox-abc.cer logicblox-abc.pem ~/.s3lib-keys

Collect IDP Metadata.  Before we can configure BloxWeb we need to collect some information on the identity provider. The TestShib configuration instructions link to a XML document that describes the TestShib services: testshib-providers.xml. The XML document contains two EntityDescriptors: an identity provider (for testing service providers, which is what we are doing here), and a service provider (for testing identity providers, which is not the purpose of this guide). We need to collect three pieces of information:

  • The entityID of the IDP, which can be found in this XML tag: <EntityDescriptor entityID="https://idp.testshib.org/idp/shibboleth">. We will use this value in the configuration of BloxWeb.

  • The URL to redirect users to when authentication is needed, which can be found in this XML tag: <SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.testshib.org/idp/profile/SAML2/Redirect/SSO"/>. We will use this value in the configuration of BloxWeb.

  • The certificate of the IDP, which the service provider will use to validate that it is not being tricked by somebody posing to be the IDP. The certificate can be found as the first <KeyDescriptor> in the <IDPSSODescriptor> tag. You need to copy the content of <ds:X509Certificate>...</ds:X509Certificate> and create a file testshib.cer that looks like the following example (which is actually the testshib certificate). Make sure that the file is formatted exactly in this way.

    -----BEGIN CERTIFICATE-----
    MIIEDjCCAvagAwIBAgIBADANBgkqhkiG9w0BAQUFADBnMQswCQYDVQQGEwJVUzEV
    MBMGA1UECBMMUGVubnN5bHZhbmlhMRMwEQYDVQQHEwpQaXR0c2J1cmdoMREwDwYD
    VQQKEwhUZXN0U2hpYjEZMBcGA1UEAxMQaWRwLnRlc3RzaGliLm9yZzAeFw0wNjA4
    MzAyMTEyMjVaFw0xNjA4MjcyMTEyMjVaMGcxCzAJBgNVBAYTAlVTMRUwEwYDVQQI
    EwxQZW5uc3lsdmFuaWExEzARBgNVBAcTClBpdHRzYnVyZ2gxETAPBgNVBAoTCFRl
    c3RTaGliMRkwFwYDVQQDExBpZHAudGVzdHNoaWIub3JnMIIBIjANBgkqhkiG9w0B
    AQEFAAOCAQ8AMIIBCgKCAQEArYkCGuTmJp9eAOSGHwRJo1SNatB5ZOKqDM9ysg7C
    yVTDClcpu93gSP10nH4gkCZOlnESNgttg0r+MqL8tfJC6ybddEFB3YBo8PZajKSe
    3OQ01Ow3yT4I+Wdg1tsTpSge9gEz7SrC07EkYmHuPtd71CHiUaCWDv+xVfUQX0aT
    NPFmDixzUjoYzbGDrtAyCqA8f9CN2txIfJnpHE6q6CmKcoLADS4UrNPlhHSzd614
    kR/JYiks0K4kbRqCQF0Dv0P5Di+rEfefC6glV8ysC8dB5/9nb0yh/ojRuJGmgMWH
    gWk6h0ihjihqiu4jACovUZ7vVOCgSE5Ipn7OIwqd93zp2wIDAQABo4HEMIHBMB0G
    A1UdDgQWBBSsBQ869nh83KqZr5jArr4/7b+QazCBkQYDVR0jBIGJMIGGgBSsBQ86
    9nh83KqZr5jArr4/7b+Qa6FrpGkwZzELMAkGA1UEBhMCVVMxFTATBgNVBAgTDFBl
    bm5zeWx2YW5pYTETMBEGA1UEBxMKUGl0dHNidXJnaDERMA8GA1UEChMIVGVzdFNo
    aWIxGTAXBgNVBAMTEGlkcC50ZXN0c2hpYi5vcmeCAQAwDAYDVR0TBAUwAwEB/zAN
    BgkqhkiG9w0BAQUFAAOCAQEAjR29PhrCbk8qLN5MFfSVk98t3CT9jHZoYxd8QMRL
    I4j7iYQxXiGJTT1FXs1nd4Rha9un+LqTfeMMYqISdDDI6tv8iNpkOAvZZUosVkUo
    93pv1T0RPz35hcHHYq2yee59HJOco2bFlcsH8JBXRSRrJ3Q7Eut+z9uo80JdGNJ4
    /SJy5UorZ8KazGj16lfJhOBXldgrhppQBb0Nq6HKHguqmwRfJ+WkxemZXzhediAj
    Geka8nz8JjwxpUjAiSWYKLtJhGEaTqCYxCCX2Dw+dOTqUzHOZ7WKv4JXPK5G/Uhr
    8K/qhmFT2nIQi538n6rVYLeWj8Bbnl+ev0peYzxFyF5sQA==
    -----END CERTIFICATE-----

    You can check if the certificate is saved correctly by printing the certificate info:

    openssl x509 -in testshib.cer -text -noout

    This should include the following: Issuer: C=US, ST=Pennsylvania, L=Pittsburgh, O=TestShib, CN=idp.testshib.org. Now Copy this certificate to the standard key directory as well:

    $ cp testshib.cer ~/.s3lib-keys

Minimal BloxWeb Configuration.  Now we have all the information needed to configure BloxWeb. Create a configuration file bloxweb-testshib.config (or alternatively put this directly in LB_DEPLOYMENT_HOME/config/bloxweb.config).

[saml:testshib]
request = /sso/request
response = /sso/response
redirect = /login/user?realm=test
realm = test
meta = /sso/metadata

alias_idp = testshib
alias_sp = logicblox-abc

entity_id_sp = logicblox-abc
entity_id_idp = https://idp.testshib.org/idp/shibboleth

sso_service = https://idp.testshib.org/idp/profile/SAML2/Redirect/SSO
assertion_consumer_service = http://localhost:8080/sso/response

attribute_map = uid urn:oid:0.9.2342.19200300.100.1.1

Except for attribute_map, all the configuration settings have been discussed in the previous steps. The attribute_map is mostly an initial attempt. The urn:oid:0.9.2342.19200300.100.1.1 is the standard object identifier for user identifiers, and is used by all SAML providers. The IDP will offer more information on the user, but these attributes are not formally documented, so we need to trigger an authentication request before we can discover them. Some SAML identity providers will publish this information in their documentation.

Start BloxWeb. 

$ bloxweb start bloxweb-testshib.config

Register Service Provider with TestShib.  BloxWeb is now hosting an XML metadata file at the configured meta path. This file can be used to register the service provider with TestShib. Download this file and store it as logicblox-abc.xml. You can do this either using a browser or with the following command.

$ curl http://localhost:8080/sso/metadata > logicblox-abc.xml

On the TestShib website the service provider can now be registered with this XML file. Visit the metadata upload form for this and upload the logicblox-abc.xml file.

First Authentication.  Everything is now setup to attempt the first authentication. Point your browser at http://localhost:8080/sso/request. This will redirect to the TestShib website, where you can login with one of the suggested accounts. Pick the username and password myself. After confirming, the browser will go back to the BloxWeb-hosted application. Most likely, you will now see an error Could not find realm : test. This indicates that the SAML request was processed correctly, but that there is no authentication realm test. This error is simply because we did not cover hosting an actual service with a realm of the name test. While the error is perhaps unsatisfying, this means that the SAML configuration was successful (TODO: working on supporting static configuration of realms so we can get a fully successful authentication).

Configuring more attributes.  In the bloxweb.log or the terminal (depending on how you started bloxweb), there are two tables printed as the result of this authentication request. The first table corresponds to the actual configured attribute_map, which currently only contains uid.

|------------------------------------------------------------------------------|
| name                           | value                                       |
|------------------------------------------------------------------------------|
| uid                            | myself                                      |
|------------------------------------------------------------------------------|

The second table lists all attributes returned by the IDP.

|---------------------------------------------------------------------------------------------------------------|
| oid                                | friendly name              | value                                       |
|---------------------------------------------------------------------------------------------------------------|
| urn:oid:0.9.2342.19200300.100.1.1  | uid                        | myself                                      |
| urn:oid:1.3.6.1.4.1.5923.1.1.1.1   | eduPersonAffiliation       | Member                                      |
| urn:oid:1.3.6.1.4.1.5923.1.1.1.6   | eduPersonPrincipalName     | myself@testshib.org                         |
| urn:oid:2.5.4.4                    | sn                         | And I                                       |
| urn:oid:1.3.6.1.4.1.5923.1.1.1.9   | eduPersonScopedAffiliation | Member@testshib.org                         |
| urn:oid:2.5.4.42                   | givenName                  | Me Myself                                   |
| urn:oid:1.3.6.1.4.1.5923.1.1.1.7   | eduPersonEntitlement       | urn:mace:dir:entitlement:common-lib-terms   |
| urn:oid:2.5.4.3                    | cn                         | Me Myself And I                             |
| urn:oid:1.3.6.1.4.1.5923.1.1.1.10  | eduPersonTargetedID        | <saml2:NameID Format="urn:oasis:names:tc... |
| urn:oid:2.5.4.20                   | telephoneNumber            | 555-5555                                    |
|---------------------------------------------------------------------------------------------------------------|

Based on this information we can now extend the attribute_map configuration to have more attributes that are relevant to the service provider. For this, modify bloxweb-testshib.config to use the following setting for attribute_map:

attribute_map = \
  uid urn:oid:0.9.2342.19200300.100.1.1 \
  cn urn:oid:2.5.4.3 \
  phone urn:oid:2.5.4.20 \
  sn urn:oid:2.5.4.4 \
  givenName urn:oid:2.5.4.42

Note that BloxWeb currently does not support exposing user attributes to implementations of services, so configuring more attributes is currently actually not very useful, but we expect this to be supported soon. After restarting BloxWeb, the next authentication request will now show a more detailed attribute table:

|------------------------------------------------------------------------------|
| name                           | value                                       |
|------------------------------------------------------------------------------|
| uid                            | myself                                      |
| phone                          | 555-5555                                    |
| sn                             | And I                                       |
| cn                             | Me Myself And I                             |
| givenName                      | Me Myself                                   |
|------------------------------------------------------------------------------|

23.12. Extensions

23.12.1. Email Service

The automatic configuration of the email service supports handler, protocol, request and response message.

The email service is a BloxWeb service which accepts a JSON or protobuf request defined by the following protocol:

message SendEmailRequest
{
  required string from = 1;
  repeated string reply_to = 2;

  repeated string to = 3;
  repeated string cc = 4;
  repeated string bcc = 5;

  required string subject = 6;
  required string body = 7;
}

23.12.1.1. Service configuration via SMTP and SES

The email service can be configured either using SMTP or Amazon Simple Email Service (SES).

Configuring an email services using SES is very easy, as the example below illustrates. The SES configuration uses the AWS SES API directly, it is therefore possible to use IAM (Amazon's Identity and Access Management)roles for authentication.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),

  alias_all(`bloxweb:email:service_config),

  clauses(`{

    /**
     * Email service only hosted on internal group. This can
     * only be used with manual testing, because AWS credentials are
     * needed.
     */
    service_by_group["/admin/email", "bloxweb:internal"] = x,
    ses_email_service(x).
  })

} <-- .

The example below illustrates how to configure an email service via SMTP, by setting the configuration parameters directly in the service.

block(`service_config) {

  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),

  alias_all(`bloxweb:email:service_config),

  clauses(`{

    service_by_group["/admin/email", "bloxweb:internal"] = x,
    smtp_email_service(x) {
	service_parameter["smtp_server"]="smtp.gmail.com",
	service_parameter["smtp_server_port"]="587",
	service_parameter["smtp_server_user"]="...",
	service_parameter["smtp_server_pwd"]="....",
	service_parameter["smtp_server_auth"]="true",
	service_parameter["smtp_server_tls"]="true"
	}.

  })

} <-- .

Tip

You will need to include the bloxweb_email library in your project.

The two tables below list the configuration parameters for using SMTP as well as SES. All the properties listed below can either be configured on the service, in the bloxweb.config handler section, or the global BloxWeb section.

Email service using SMTP
Required parameters
smtp_server Hostname of the SMTP server.
smtp_server_port Port of the SMTP server.
smtp_server_user User account for sending email.
smtp_server_pwd Password of user for sending email.

Email service using SMTP
Optional parameters
smtp_server_auth Use authentication or not (true/false).
smtp_server_ssl Use SSL SMTPS protocol (true/false).
smtp_server_tls Use TLS protocol (true/false).
debug Prints detailed information to the log while communicating with the mail server. This can be useful if you experience problems in sending emails.

Email service using SES
Optional authentication parameters
access_key AWS access key.
secret_key AWS secret key.
iam_role Use IAM EC2 instance role (set to any value).
env_credentials Use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_KEY.

By default environment credentials are tried first, followed by IAM roles.

Example configuration using Gmail:

smtp_server = smtp.gmail.com
smtp_server_port = 587
smtp_server_user = ...
smtp_server_pwd = ...
smtp_server_auth = true
smtp_server_tls = true

23.12.2. Password Management

BloxWeb comes with services for changing and resetting passwords. The automatic configuration of these services supports handler, protocol, request and response message. Other service configuration aspects (such as the service group or authentication realm) can be figured by the developer on the service.

23.12.2.1. Change Password

In order to change the password of a user, the client needs to send a request to the change password service of BloxWeb. The change password service is a BloxWeb service which accepts a JSON or protobuf request defined by the following protocol:

message ChangePasswordRequest
{
  required string user_name = 1;
  required string current_password = 2;
  required string new_password = 3;
}

Service Configuration

A change password service is configured with BloxWeb by creating a change_password_service. Example:

block(`service_config) {
  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),

  alias_all(`bloxweb:credentials:password_management),

  clauses(`{

    /**
     * Change a password
     */
    service_by_prefix["/user/change-password"] = x,
    change_password_service(x).
  })

} <-- .
Service Parameters

A change password service supports the following optional service_parameter:

credentials_url url of credentials service (default: http://localhost:55183/admin/credentials)

This property can either be configured on the service, in the bloxweb.config handler section, or the global BloxWeb section.

23.12.2.2. Reset Password

BloxWeb also comes with services to reset passwords. Application developers should develop a user interface supporting the following process flow:

  1. The reset password service is not authenticated. A user can invoke the reset password service with his username or email address.

  2. The reset password request generates a token that is stored in the database as a reset password request for the given user.

  3. A configurable email is sent using the email service to the user.

  4. The user can click on a link in the email, which brings him to a client-side page where the user can enter the token in the UI.

  5. Optional: once the password has been changed, a confirmation email is sent to the user.

In order to reset the password of a user, the client needs to send a request to the reset password service of BloxWeb. The reset password service is a BloxWeb service which accepts a JSON or protobuf request defined by the following protocol:

message ResetPasswordRequest
{
  optional string user_name = 1;
  optional string email = 2;
}

Tip

Passwords can be reset either by email or username. If the username is specified, then the email address is ignored.

The protocol to confirm the reset of a password is defined as:

message ConfirmResetPasswordRequest
{
  required string change_token = 1;
  required string new_password = 2;
}

Service Configuration

A reset password service is configured with BloxWeb by creating a reset_password_service. In the example below an email is sent from to the user, containing the user name, the token and the date/time until the token is valid. Additionally, the service_parameter "notify" is set to false, which means that no confirmation email is sent to the user, once the email is reset. An SES email service is used for sending the email regarding the reset of the forgotten password.

block(`service_config) {
  alias_all(`bloxweb:config:service),
  alias_all(`bloxweb:config:service_abbr),

  alias_all(`bloxweb:credentials:password_management),
  alias_all(`bloxweb:email:service_config),

  clauses(`{

    /**
     * Reset a forgotten password
     */
    service_by_prefix["/user/reset-password"] = x,
    reset_password_service(x) {
      service_parameter["email_template"] =
        "User: {USER}\nToken: {TOKEN}\nValid until: {VALID}\nTime: {TIME}",
      service_parameter["email_from"] = "support@logicblox.com"
    }.

    /**
     * Do not confirm the reset of a forgotten password
     */
    service_by_prefix["/user/confirm-reset-password"] = x,
    confirm_reset_password_service(x) {
      service_parameter["notify"] = false
    }.

    /**
     * Email service only hosted on internal group. This can
     * only be used with manual testing, because AWS credentials are
     * needed.
     */
    service_by_group["/admin/email", "bloxweb:internal"] = x,
    ses_email_service(x).
  })

} <-- .
Service Parameters

All of the properties listed in this section can either be configured on the service, in the bloxweb.config handler section, or the global BloxWeb section.

A reset password service supports the following service_parameter:

Required service_parameter
email_template The template for the email. Supports {USER}, {TOKEN}, {IP}, {VALID} and {TIME}
email_from The address where emails are sent from.
Optional service_parameter
credentials_url The URL of the credentials service http://localhost:55183/admin/credentials
valid_hours The number of hours a reset token is valid. 4 hours
email_url The URL of the email service. http://localhost:55183/admin/email
email_subject Subject of the reset password email. Password reset

A confirm reset password service supports the following service_parameter:

Required service_parameter
email_template The template for the email. Supports {USER}, {TOKEN}, {IP}, {VALID} and {TIME}
email_from The address where emails are sent from.
Optional service_parameter
credentials_url The URL of the credentials service http://localhost:55183/admin/credentials
notify Setting whether user should be emailed when the password is changed true
email_url The URL of the email service. http://localhost:55183/admin/email
email_subject Subject of the reset password email. Password change notification

23.12.3. ConnectBlox Services

The ConnectBlox extension exposes some of the more low level functionality of LogicBlox through typical protobuf web services. These services map very closely to functionality found in the 'lb' command line tool but are accessible over the web and are secured in the same way as other BloxWeb services.

23.12.3.1. Installing The Extension

The extension ships with BloxWeb but it isn't installed by default. In order to install the handlers, you must run the following command after BloxWeb has started:

$ bloxweb install-jar $BLOXWEB_HOME/lib/java/bloxweb-connectblox.jar

23.12.3.2. Configuring Services

The ConnectBlox services are subtypes of the default_protobuf_service entity. They each have their own custom_handler and protobuf request and response message (which are documented in the following service specific sections). The schema for these services are in the bloxweb:connectblox:services module and a snippet follows.

connectblox_service(x) -> default_protobuf_service(x).
// defaults to the workspace hosting the service
// You can use regular expressions to match multiple workspaces.
connectblox_service_workspaces[x] = ws -> connectblox_service(x), string(ws).

list_workspaces_service(x) -> default_protobuf_service(x).

pred_info_service(x) -> connectblox_service(x).
list_predicates_service(x) -> connectblox_service(x).
exec_service(x) -> connectblox_service(x).

Examples of configuring these services can be found in the BloxWeb repository under bloxweb-samples/bloxweb-connectblox. An example of configuring the list-predicates service follows.

service_by_prefix["/list-predicates"] = x,
list_predicates_service(x) {
  auth_realm[] = "list-predicates-realm"
}.

Another feature of these services is the ability to specify which workspace you wish to run against. For example, if your services are hosted in a workspace called 'services' and you want to list predicates in a workspace called 'staging', you can do so by configuring the service to allow access to the staging workspace. You do that by specifying a regular expression that matches the workspaces to which that service has access. For the previous example, you could do either of the following:

service_by_prefix["/list-predicates"] = x,
list_predicates_service(x) {
  auth_realm[] = "list-predicates-realm",
  connectblox_service_workspaces[] = "services|staging" // only matches services or staging
}.

service_by_prefix["/list-predicates"] = x,
list_predicates_service(x) {
  auth_realm[] = "list-predicates-realm",
  connectblox_service_workspaces[] = ".*" // matches any workspace name
}.

If you do not specify any value for the connectblox_service_workspaces predicate, that indicates that the service will only have access to the workspace in which the service is hosted.

23.12.3.3. Services Protobuf Schema

import "blox/connect/ConnectBlox.proto";
import "blox/connect/BloxCommand.proto";
import "blox/common/Common.proto";
package bloxweb.connectblox;

option java_package = "com.logicblox.bloxweb.connectblox";

message ExecRequest {
  required blox.connect.ExecBlock execute = 1;
  optional string workspace_name = 2;
}

message ExecResponse {
  required blox.connect.ExecBlockResponse response = 1;
  optional Error error = 10;
}

message ListWorkspacesRequest
{
}

message ListWorkspacesResponse
{
  optional blox.connect.ListWorkSpacesResponse workspaces = 1;
  optional Error error = 10;
}

message PredicateInfoRequest
{
  required string qualified_name = 1;
  optional string workspace_name = 2;
}

message PredicateInfoResponse
{
  optional blox.common.protocol.PredicateInfo info = 1;
  optional Error error = 2;
}

message ListPredicatesRequest
{
  repeated string qualified_name = 1; // if empty return all predicates
  optional string workspace_name = 2;
}

message ListPredicatesResponse
{
  repeated blox.common.protocol.PredicateInfo info = 1;
  optional Error error = 2;
}

message Error {
  // English error, not necessarily suitable for presentation to the
  // user. If there was an error, then the error_code field is always
  // set as well.
  optional string error = 1; // change this to 'message'
  optional string error_code = 2;
}

23.12.3.4. Services Reference

service request message response message description
bloxweb:connectblox:services:list_workspaces_service ListWorkspacesRequest ListWorkspacesResponse List all available workspace
bloxweb:connectblox:services:pred_info_service PredicateInfoRequest PredicateInfoResponse Returns predicate information about a predicate.
bloxweb:connectblox:services:list_predicates_service ListPredicatesRequest ListPredicatesResponse Returns predicate information about some or all predicates in a workspace.
bloxweb:connectblox:services:exec_service ExecRequest ExecResponse Executes a snippet of logic against a workspace.

23.13. Transport Methods

23.13.1. Using Amazon Simple Storage Service (S3)

Normally, an HTTP request contains a request line, headers, and the content of the request. For POST and PUT requests to delimited file services, the content would be the CSV data. For files that are extremely large, this is not a good idea, because a connection failure would force re-doing the entire upload. Also, a file might already be available in highly redundant distributed storage, which makes it undesirable to send the data directly to a service from the client.

BloxWeb supports transferring large data files separate from HTTP requests. This makes it possible to have an interface that resembles the REST architecture even when the data files exceed the volume of data one would normally want to transfer over a single socket in a single request.

BloxWeb uses a custom HTTP header x-blox-content-uri for this. The idea of the HTTP header is that the content is not actually in the body of the HTTP request, but is stored as an S3 object located at the specified URI. The additional HTTP headers of the request or response still apply to the content at this URI (e.g. the content-type header of the HTTP request would specify the content-type of the S3 object). The header x-blox-response-content-uri is used to indicate that a response is expected not as-is in the HTTP response, but at a certain URI. The content-uri support is applicable to all transport mechanisms, so it can be used on TCP requests as well as queue requests.

Users do not need to be aware of the HTTP headers. BloxWeb client APIs and the bloxweb-client tool internally use the headers to communicate with the BloxWeb server when importing and exporting delimited files using S3.

BloxWeb uses a high performance S3 library developed by LogicBlox, called s3lib, to upload and download files to and from S3. The library can also be used through a separate command-line tool called s3tool. The s3lib library uses encryption keys to encrypt data stored in S3. Because s3tool, bloxweb-client and the BloxWeb server use s3lib, it is necessary to set up s3lib encryption keys before using any of those tools with S3. The following section explains how to manage the encryption keys used by s3lib.

23.13.1.1. Managing s3lib keys

The s3lib library uses asymmetric encryption keys (public/private RSA keys) to encrypt and decrypt data that is stored in S3. Data is encrypted locally prior to uploading to S3, and it is decrypted locally after downloading, so it is never transferred in clear text. The asymmetric encryption method is used to simplify key management: the uploader of a file only needs access to the public key (the private key is still needed to decrypt upon download).

Encryption is currently not an optional feature, so before the tools can be used, encryption keys need to be generated and configured. The s3lib library uses a directory of .pem text files to store the encryption keys. On EC2 instances, this directory should be in an in-memory file system to protect the keys from being available on disk. The s3lib tool and BloxWeb manage encryption keys by a key alias, which is the name of the .pem file in the key directory. It is important that key aliases are unique and consistent across machines, otherwise decryption will fail and data-loss will occur. A key pair can be generated using:

$ s3lib-keygen mykey

This command generates a file mykey.pem in the current working directory. The key alias is thus mykey. The file contains the public as well as the private key, in the following format:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAvS53DLCAPMQv3mBb/G/D
...
-----END PUBLIC KEY-----

-----BEGIN PRIVATE KEY-----
MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQC9LncMsIA8xC/e
...
-----END PRIVATE KEY-----

For upload clients that only need the public key, the private key section can be removed from the file.

The s3lib by default uses the directory ~/.s3lib-keys for encryption keys, but all commands and tools can change the default setting. The generated .pem file should be put in the encryption key directory.

$ mkdir -p ~/.s3lib-keys
$ mv mykey.pem ~/.s3lib-keys

23.13.1.2. Using s3tool

The s3tool is a command-line tool around s3lib. It has 3 major commands that allow to upload a local file to S3, to download an S3 file, and to verify whether an S3 file exists. These commands take a command-line option --keydir to change the default setting for the directory where the keys are stored. They also accept a variety of command-line options to configure the retry behavior and influence performance. To review these options, use:

$ s3tool upload --help
$ s3tool download --help
$ s3tool exists --help

The s3tool needs AWS credentials to access S3. Currently, it uses the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_KEY, or falls back to credentials delivered via the EC2 meta-data service (usually set using EC2 instance roles).

After credentials have been set up and the mykey key has been generated and stored in ~/.s3lib-keys, s3tool can upload a file as follows:

$ s3tool upload s3://bucket/AS400.jpg -i AS400.jpg --key mykey

The upload command will encrypt the file using mykey and then will send the encrypted file to the S3 object. The s3tool also attaches meta-data to the S3 object to identify the key with which it is encrypted. To download the file back:

$ s3tool download s3://bucket/AS400.jpg -o AS400-2.jpg

Note that the download command automatically determines what key to use to decrypt from the meta-data attached to the S3 object.

23.13.1.3. Using S3 in BloxWeb

The following sections describe how to configure the BloxWeb components that can use S3. In all cases it is assumed that encryption keys were already generated according to the Managing s3lib keys section.

BloxWeb Server

The BloxWeb server can be configured to access S3 to retrieve content of requests and to store responses. Just like s3tool, it needs AWS credentials and needs access to the directory where the keys are stored. These configurations are set with an [s3:default] section in bloxweb.config. For example, the following section defines the directory holding encryption keys and the credentials to access AWS:

[s3:default]
keydir = ...
access_key = ...
secret_key = ...
#iam_role =
#env =

Note that keydir could have been specified outside the [s3:default] section because variables of the global section are inherited by sections (see details in BloxWeb Configuration). Also, instead of explicitly specifying the access and secret keys, it is possible to use AWS Identity and Access Management (IAM) by setting the iam_role variable (the value of the variable is ignored and the default role is used). Finally, setting the env variable requests the BloxWeb server to load access and secret keys from environment variables.

After setting AWS credentials and keydir, the BloxWeb server is ready to use S3 on requests. When the server is instructed with the x-blox-content-uri HTTP header to download a file from S3, it will determine the key to decrypt the file from the meta-data stored in the S3 object. It will search for the key in keydir. When the server needs to store response content in S3 due to a x-blox-response-content-uri HTTP header, it expects an additional HTTP header called x-blox-response-content-enckey. This header defines the alias of the key to use to encrypt the file before sending to S3 (the alias is also stored in S3 meta-data).

BloxWeb Client

The BloxWeb client accepts the same configuration options as the server, but in the bloxweb-client.config file instead. Additionally, it supports the keyname variable in the global section. This variable defines the key alias to use by default for the commands export-delim (to encrypt the exported file) and import-delim (to encrypt bad records if requested; the decryption key for the file to import is determined by meta-data). The key to use in these commands can be overridden with the command line option --key.

BloxWeb Batch Language

The BloxWeb client can be used to execute batches specified in the BloxWeb Batch Language. Since batch specifications are executed with bloxweb-client, the configurations available in bloxweb-client.config are immediately supported. Furthermore, batch specifications can override the values of the config files in two ways:

  • The BatchConfig message can specify the keydir and keyname to be used globally in the batch specification.

  • Most statements that can make use of S3 can individually override the global setting for the key name using the key attribute. Currently, S3Upload, ImportDelim, and ExportDelim can specify the key to use. Note that S3Download uses S3 but the key is determined by meta-data.

23.13.2. Using SQS

TODO this is very rough!

To configure a pair of SQS queues as an end-point, add a configuration section to the bloxweb.config file in LB_DEPLOYMENT_HOME/config.

[sqs:sample]
request_queue_name = bloxweb-sample-request
response_queue_name = bloxweb-sample-response
access_key = ...
secret_key = ...

To invoke a service via SQS, you can use the bloxweb-client tool. The bloxweb-client tool uses a separate configuration located in LB_DEPLOYMENT_HOME/config/bloxweb-client.config. The configuration is identical to the configuration for the server:

[sqs:sample]
request_queue_name = bloxweb-test-request
response_queue_name = bloxweb-test-response
access_key = ...
secret_key = ...

A JSON service now be invoked over SQS as follows:

$ echo '{"min_price" : 30}' | bloxweb-client call-json /search --format --config sqs:sample

Configuration options:

AWS credentials are specified as one of the following options:
access_key secret_key Configure access keys and secret keys to use.
iam_role Use IAM instance roles to obtain credentials. This option only works on EC2 instances. The value of the iam_role currently does not matter because EC2 supports only a single IAM role per instance.
env_credentials Use credentials set by environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_KEY. The value of env_credentials does not matter: the setting serves as a flag to enable the use of environment variables for credentials.
Queues can be configured using short names as well as full URLs
request_queue_name response_queue_name The short name of the request and response queues. The queue is required to be in the same account as the credentials used. The endpoint is configured by the global (not section-specific) setting sqs_endpoint. If the queues do not exist, then BloxWeb will attempt to create them.
request_queue_url response_queue_url Full request and response urls. The SQS urls have the form https://sqs.us-east-1.amazonaws.com/accountid/queuename.

23.13.3. Using RabbitMQ

RabbitMQ is an alternative queuing implementation from SQS. The primary differences are:

  • RabbitMQ is much faster and completely consistent (i.e. guarantees only once delivery) whereas SQS can sometimes deliver messages more than once.
  • RabbitMQ can be run locally and must be deployed to an EC2 server as part of the application's deployment.

23.13.3.1. Installing RabbitMQ

To install RabbitMQ on your developer machine, it is recommended you download the generic-unix version of RabbitMQ. The lb-config extension for RabbitMQ is designed to work with this distribution. All you need to do is download the file and unzip it. It does not require any installation and all command to start it will be found in the sbin folder.

The easiest way to start RabbitMQ is to run RABBITMQ_HOME/sbin/rabbitmq-server. Shut it down by pressing Ctrl+C in that terminal.

23.13.3.2. Configuring RabbitMQ

RabbitMQ requires very little configuration. Once you have started the server, you add a section to both your bloxweb.config and bloxweb-client.config that specifies request and response queues and an amqp endpoint to the RabbitMQ server. An endpoint looks just like a normal URL except it starts with "amqp://" instead of "http://" and must include a port number. The default port for RabbitMQ is 5672.

An example configuration is shown below:

[rabbitmq:sample]
request_queue_name = bloxweb-sample-request
response_queue_name = bloxweb-sample-response
endpoint = 'amqp://localhost:5672'

A JSON service now be invoked over RabbitMQ as follows:

$ echo '{"min_price" : 30}' | bloxweb-client call-json /search --format --config rabbitmq:sample

23.14. Bloxweb Batch Language

The bloxweb-client tool allows users to execute batch jobs that will interact with BloxWeb services. To this end, bloxweb-client provides a powerful batch specification language.

The main elements of the batch language are statements. A batch specification is, thus, composed of a set of statements. Statements are of two kinds: simple statements or control-flow statements.

Simple Statements are single, indivisible, concrete batch operations. The table below lists all the simple statements supported by the BloxWeb batch language.

Simple Statements
ImportDelim Imports a delimited file to the delimited file service.
ExportDelim Exports a delimited file from the delimited file service.
S3Upload Uploads a file to an S3 file system.
S3Download Downloads a file from an S3 file system
CallProto Calls a custom BloxWeb service.
ClearQueue Clears SQS queues.
ListQueue Lists SQS queues.
Exec Executes command line statements at the local machine.
Echo Writes strings to standard output.
Fail Does not execute anything, returns ERROR status.
Noop Does not execute anything, returns SUCCESS status.

Control-flow statements, on the other hand, specify how to control the execution of other statements. The table below lists all the control-flow statements supported by the BloxWeb batch language.

Control-Flow Statements
StmSeq Executes a sequence of statements in order, synchronously.
StmPar Executes a set of statements asynchronously, in parallel.
StmTryCatch Emulates a try/catch control-flow.
StmTxn Executes a set of statements as a transaction, aborting the transaction on ERROR.

23.14.1. Executing a Batch Specification

To execute batch where batch-spec is the name of a file containing a specification message, run:

$ bloxweb-client batch -i batch-spec

The Spec message is composed of an optional BatchConfig message and an Stm message, as defined in the protocol below.

message Spec
{
  optional BatchConfig config = 1;
  required Stm stm = 2;
}

The Stm message specifies the statements to be executed. Every simple statement or control-flow statement is wrapped within a Stm message, which is defined by the following protocol.

message Stm
{
  optional StmSeq seq = 1;
  optional StmTxn transaction = 2;
  optional StmPar parallel = 3;
  optional StmSimple simple = 4;
  optional StmTryCatch try_catch = 5;

  // Result (exception and end_time are only set if status is set)
  optional Status status = 6;
  optional string exception = 7;
  optional string end_time = 8;

  // Optional field to describe the purpose of a Stm
  optional string description = 9;

  // Result
  optional string start_time = 10;
}

The following example shows how to create a Spec message using the BloxWeb batch Python module (bloxweb.batch_pb2) executing a simple statement or a control-flow statement.

from google.protobuf import text_format
import bloxweb.batch_pb2

batch_spec = bloxweb.batch_pb2.Spec()
exec_stm = batch_spec.stm   # the statement to be executed

//executes a simple statement
exec_stm.simple.echo.text = ‘echo this’

//executes a control-flow statement
stm1 = exec_stm.seq.add()
stm1.simple.echo.text = ‘echo this’
stm2 = exec_stm.seq.add()
stm2.simple.echo.text = ‘echo this too’

spec_str = text_format.MessageToString(batch_spec)
write_file(spec_str, ‘my-batch.batch’)

23.14.1.1. Execution Results

After execution, the batch command will return the original Spec message populated with the result of the execution. Each child Stm within control-flow statements will be populated with the result of its execution. Simple statements will have status of either SUCCESS or ERROR. Control-flow statements will have one of the following status:

DNS If the statement was not executed.
SUCCESS If all child statements return SUCCESS.
ERROR If all child statements return ERROR.
PARTIAL_ERROR If some child statements return with SUCCESS and others with ERROR.

On failure, the return Stm message will also return the exception field initialized with the exception that caused the failure.

Every executed statement has a start_time as well as an end_time, which allows the calculation of the statement execution time.

23.14.2. Simple Statements

The StmSimple message is used to specify the concrete operations to be executed. You can find below the protocol defining the StmSimple message. It takes one, and only one, of the different concrete statements outlined above (Echo, Fail, ImportDelim, ExportDelim, etc).

message StmSimple
{
  optional Echo echo = 1;
  optional Fail fail = 2;

  optional ImportDelim import_delim = 3;
  optional ExportDelim export_delim = 4;

  optional S3Upload s3_upload = 5;
  optional S3Download s3_download = 6;

  optional CallProto call_proto = 7;

  optional ClearQueue clear_queue = 8;
  optional ListQueue list_queue = 9;

  optional Noop noop = 10;

  optional Exec exec_stm = 11;
}

23.14.2.1. Echo

The Echo statement prints a string to the standard error stream. This is useful for debugging purposes.

message Echo
{
  required string text = 1;
}

The following example shows how to create a Spec message with a simple Echo statement:

batch = bloxweb.batch_pb2.Spec()
stm = batch.stm
stm.simple.echo.text = ‘Print this to stderr!’

23.14.2.2. CallProto

Below is the protocol defining the CallProto message. The CallProto statement performs a POST operation to a BloxWeb service, specified in the service parameter, with the contents of the input parameter.

message CallProto
{
  required string service = 1;

  // config is deprecated, use config in Transport
  optional string config = 2;
  optional TransportConfig transport = 7;

  required ProtoBufEncoding encoding = 3;
  optional bool gzip = 4;
  required Input input = 5;
  optional string output_file = 6;

  // do a best attempt to format the output for human consumption
  optional bool format = 8;
}

The Input message is defined as below. It can be a sequence of bytes, a string, the contents of a file, or the contents of a resource at a certain url.

message Input
{
  optional bytes binary = 1;
  optional string text = 2;
  optional string file = 3;
  optional string url = 4;
}

The content of the input resource should be a protobuf message, in either binary or JSON format. This is specified in the encoding parameter.

The response, if any, will be written to the output_file. If the response is a JSON message, the boolean format option will format the response in a human-readable way.

The following sample Python code creates a CallProto message to be sent to a batch request:

call_proto_stm = stm.simple.call_proto
call_proto_stm.service = ‘multipart-json’
call_proto_stm.simpleencoding = bloxweb.batch_pb2.ProtobufEncoding.JSON
call_proto_stm.input.file = ‘input_file.json’
call_proto_stm.output_file = ‘save-to-this-file.txt’

23.14.2.3. ImportDelim

The ImportDelim statement uploads a file to a BloxWeb service using a POST or a PUT HTTP request. Similar to CallProto, the service to connect to is specified via the service parameter, and the input file using the input parameter. In general, this statement should be used to import data via the delimited file services. If the full parameter is true, it will perform a PUT request to the delimited file service, which will fully replace the existing data with the data provided as input. If the full parameter is set to false, it will perform a POST request to the delimited file service, which will only update the workspace inserting the content in the input file.

message ImportDelim
{
  required string service = 1;

  // Either data_file or data_url is required
  optional string data_file = 2;
  optional string data_url = 3;
  optional Input input = 8;

  // Config is deprecated, use config in Transport
  optional string config = 4;
  optional TransportConfig transport = 6;

  optional bool gzip = 5;

  // Default value for full is false, which means POST. Setting full
  // to true will use the PUT method.
  optional bool full = 7;

  // How to return the error records, if desired
  optional string out_file = 9;
  optional string out_url = 10;

  // encryption key to use for out file
  optional string key = 11;

  // whether or not a bad record should abort the transaction
  optional bool abort_on_error = 12;
}

The following example shows how to create an ImportDelim statement, posting data from an S3 file to a service hosted at /delim-basic/sales.

import_stm =stm.simple.import_delim
import_stm.service = 'http://host:8080/delim-basic/sales'
import_stm.input.url = ‘s3://testserver.request/delim-basic/more-sales.csv’
import_stm.full = False

23.14.2.4. ExportDelim

Similar to the ImportDelim, the ExportDelim performs a GET request to a BloxWeb service. The data_file and data_url parameters specify where to save the exported data. The data_file parameter should specify a local file path to save to; the data_url parameter should specify an S3 url. Finally, the key parameter specifies the key to use when encrypting the exported file.

Similary to ImportDelim, the service used with ExportDelim will, in general, be a delimited file service.

message ExportDelim
{
  required string service = 1;
  optional string data_file = 2;
  optional string data_url = 3;

  // encryption key to use for exported file
  optional string key = 4;

  // Config is deprecated, use config in Transport
  optional string config = 5;
  optional TransportConfig transport = 7;

   optional bool gzip = 6;
}

The following is an example of creating an ExportDelim statement requesting data from a service hosted at /delim-basic/sales and saving the output to an S3 file.

export_stm =stm.simple.export_delim
export_stm.service = ‘/delim-basic/sales’
export_stm.data_url = ‘s3://testserver.request/delim-basic/more-sales.csv’

23.14.2.5. S3Upload

The S3Upload statement can be used to upload a file to S3. The file parameter should specify a file on the local disk, while the url parameter should specify a url for the destination S3 file. If a key is specified, it will be used to encrypt the file.

message S3Upload
{
  required string file = 1;
  required string url = 2;
  optional string key = 3;
  optional string config = 4;
}

The following example creates an S3Upload statement in Python:

up_stm = stm.simple.s3_upload
up_stm.file = ‘path-to-local-file’
up_stm.url = ‘s3://amazon.com/path’

23.14.2.6. S3Download

The S3Download statement can be used to download a file from S3 to local disk. The url parameter should specify a S3 url for the file to be downloaded, and the file parameter should specify a path in the local disk to save to.

message S3Download
{
  required string file = 1;
  required string url = 2;
  optional string config = 3;
}

The following example creates a S3Download statement in Python:

up_stm = stm.simple.s3_download
up_stm.file = ‘path-to-local-file’
up_stm.url = ‘s3://amazon.com/path’

23.14.2.7. Exec

The Exec statement executes an external process. The command to be executed should be specified as a sequence of strings, just as the Java Runtime.exec(String[] cmdarray) method.

The timeout parameter specifies how much time, in seconds, the batch should wait for the command to terminate. A value of -1 will disable any timeout. If the destroy_on_timeout parameter is set to true, the command will be interrupted on timeout; if false, the batch will not wait for the command to finish, will return ERROR, but will not interrupt the process.

The parameters pipe_stdout_to and pipe_stderr_to specify how to redirect standard output and standard error. Both parameters have one of the following values:

  • “-”, redirecting output to standard output;

  • a file name, redirecting output to the file;

  • “logger”, redirecting output to the log file. In this case, errors will be logged at WARN level, and normal output at INFO level.

message Exec
{
  repeated string command = 1; // command and its arguments
  required int64 timeout = 2; // use -1 to ignore timeout
  optional bool destroy_on_timeout = 3;

  // by default, stdout and stderr will be ignored
  // the following pipe parameters accept a file name,
  // '-' for stdout, and 'logger' to use default logger
  optional string pipe_stdout_to = 4;
  optional string pipe_stderr_to = 5;
}

The following example executes the command cat test and writes the output to a file called file.txt:

exec_stm = stm.simple.exec_stm
exec_stm.command.append(‘cat’)
exec_stm.command.append(‘test’)
exec_stm.pipe_stdout_to = ‘file.txt’
exec_stm.timeout = -1

23.14.2.8. ClearQueue

The ClearQueue statement clears all messages from a queue. The sqs parameter specifies the information about the endpoints, with the names or urls for the request and response queues. To clear the request queue, simply set the request parameter to true; to clear the response queue, set the response parameter to true. To not only clear the queues, but also delete them, set the delete parameter to true.

message ClearQueue
{
  // sqs is deprecated
  optional SQSTransportConfig sqs = 1;
  optional bool request = 2;
  optional bool response = 3;
  optional bool delete = 4;
  optional TransportConfig transport = 5;
}

The following shows an example for clearing the request queue in a transport config:

clear_stm = stm.simple.clear_queue
clear_stm.sqs.request_queue_name = ‘request-queue’
clear_stm.sqs.response_queue_name = ‘response-queue’
clear_stm.request = True

23.14.2.9. ListQueue

The ListQueue statement lists the messages on a request or response queue. Similar to the ClearQueue statement, the ListQueue statement specifies the queue endpoints using the sqs parameter. The boolean request and response parameters specify which queues should be listed. Finally, the max_messages parameter sets a limit to the number of messages that will be listed. The messages will be printed to standard output.

message ListQueue
{
  // sqs is deprecated
  optional SQSTransportConfig sqs = 1;
  optional bool request = 2;
  optional bool response = 3;
  optional uint32 max_messages = 4;
  optional TransportConfig transport = 5;
}

23.14.2.10. Noop

The Noop statement does not execute anything. It takes no parameters. This is mostly useful when generating batch scripts from code.

23.14.2.11. Fail

The Fail statement does not execute anything and returns an ERROR status with an error message. The error message is set as the text parameter.

message Fail
{
  required string text = 1;
}

Here is an example of a Fail statement:

fail_stm = stm.simple.fail
fail_stm.text = ‘failure message’

This is useful when using a TryCatch control statement.

23.14.3. Control-flow Statements

23.14.3.1. StmSeq

The StmSeq message takes a list of statements, as shown below. The next statement is started only after the current statement has finished with SUCCESS. Otherwise, execution is interrupted and statements not executed are marked as such (DNS status).

message StmSeq
{
  repeated Stm stm = 1;
}

Using the bloxweb-batch Python module, we can create a StmSeq like so:

seq_stm = stm.seq
stm1 = seq_stm.stm.add()
stm1 ... # configure stm1
stm2 = seq_stm.stm.add()
stm2 ... # configure stm2

23.14.3.2. StmPar

Executes a set of statements asynchronously, in parallel. There is no guarantee of which statement will execute first. Statements that fail will be marked with ERROR or PARTIAL_ERROR, but will not interrupt the execution of other statements.

message StmPar
{
  optional uint32 max = 1;
  repeated Stm stm = 2;
}

Creating a StmPar with the Python module is simple:

par_stm = stm.parallel
stm1 = par_stm.stm.add()
stm1 ... # configure stm1
stm2 = par_stm.stm.add()
stm2 ... # configure stm2

23.14.3.3. StmTryCatch

Emulates a try/catch control-flow. It attempts to execute a try statement; if it fails, executes a catch statement. The catch statement is generally used to clean up or workaround the first failure. If the try statement succeeds, execution is finished and returns SUCCESS. Otherwise, StmTryCatch returns the result of the execution of the catch statement.

message StmTryCatch
{
  required Stm try = 1;
  required Stm catch = 2;
}

Here is an example of how to create a StmTryCatch:

try_catch_stm = stm.try_catch
try_catch_stm.try_stm … # configure try_stm
try_catch_stm.catch_stm … # configure catch_stm

To catch the exception thrown by the catch statement and pass a more useful error message, we can use the Fail statement:

catch_stm = stm.catch.stm
catch_stm.fail.text = ‘a more pertinent error message’

23.14.3.4. StmTxn

The StmTxn executes a statement as a transaction, aborting the transaction on ERROR. This requires that all simple statements within the StmTxn must be “abortable”.

Tip

Currently, only the ImportDelim and ExportDelim statements are abortable, so only those statements should be used with StmTxn. Furthermore, all ImportDelim, ExportDelim and transaction services must be within the same workspace.

message StmTxn
{
  required string service = 1;

  // config is deprecated, use config in the transport message
  optional string config = 2;
  optional TransportConfig transport = 5;

  repeated Stm stm = 3;

  // Result
  optional string transaction_id = 4;
}

The most important parameter to configure a StmTxn is the transaction service. This service must be a transaction service created by the delim_txn_service predicate:

service_by_prefix["/multipart/txn/*"] = x,
delim_txn_service(x).

The stm parameter in StmTxn is used to specify the statements within the transaction.

After execution, the transaction_id field will contain the id of this transaction. This can be used to query the BloxWeb log for more details on this transaction.

The following example creates a StmTxn with two import statements:

txn_stm = stm.transaction
txn_stm.service = ‘/multipart/txn’

import_delim1 = txn_stm.stm.add().simple.import_delim
import_delim1.service = 'http://host:8080/multipart/sales'
### continue configuring import_delim1

import_delim2 = txn_stm.stm.add().simple.import_delim
import_delim2.service = 'http://host:8080/multipart/returns'
### continue configuring import_delim2

23.14.4. Writing Protobuf Messages by Hand

While Protobuf messages can be created using the Python or Java API, the format for Protobuf is easy to write by hand.

These are rules for writing protobuf messages by hand:

  • each message is encoded as { sequence-of-fields-and-values };

  • if the field is a primitive type, it is encoded as field_name : value

  • if the field is a message type, it is encoded as field_name { message_contents }

  • the message types are inferred by the definition of the message protocol in the .proto

The example below illustrates a message that specifies a batch specification executing an Exec statement and an Echo statement in sequence:

stm {
    seq {
        stm {
            simple {
                exec_stm {
                    command: “cat”
                    command: “some-file”
                    timeout: 10
                }
            }
        }
        stm {
            simple {
                echo {
                     text: “echo this!”
                }
            }
        }
    }
}

23.15. Configuration

The BloxWeb server and client can be configured statically using config files. The format of BloxWeb config files follows the HierarchicalINIConfiguration of the Apache Commons Configuration project. In essence, they are INI files with named sections and a global section (the variables defined before the first section is defined). For example, the following simple config file defines a global variable and a section with two local variables:

global_variable = some_value

[section_type:section_name]
section_variable_1 = some_value
section_variable_2 = some_value  

At startup, the initial configuration is created by composing the available config files. The BloxWeb server first loads the default bloxweb.config file that is included in the distribution. Then, it loads the bloxweb.config file found in $LB_DEPLOYMENT/config, if it exists. Finally, it loads the config file passed as a parameter to the command line, if any. When a file is loaded, it overrides existing config variables, so it is possible to refine the default configuration. Composition is at the level of variables, but sections are used to identify variables. For example, if the following config file is loaded after the previous example, both global_variable and section_variable_1 will be refined, but section_variable_2 will stay the same. Note that the process for bloxweb-client is the same, but using bloxweb-client.config instead.

global_variable = new_value

[section_type:section_name]
section_variable_1 = new_value  

BloxWeb supports some variables to be used in config files. A variable reference is in the form $(VARNAME). The following variables are currently supported and are substituted when a config file is loaded:

CONFIG_DIR The fully qualified file path for the directory containing the config file being loaded.
LB_DEPLOYMENT_HOME The contents of the environment variable.
LOGICBLOX_HOME The contents of the environment variable.
HOME The contents of the environment variable.

The bloxweb tool has commands that allow certain aspects of the configuration to be modified dynamically, that is, after the BloxWeb server has been started. The install-config command takes a config file in the BloxWeb config format and composes it with the currently running configuration.

$ bloxweb install-config my.config  

Currently it is only possible to load new handlers and static workspaces (see the Server configuration section for details on the section types that are interpreted by the server). The install-handler command is a sub-set of install-config in that it will only load new handlers and will ignore other sections.

One problem with using these commands is that some sections may need to reference files, such as the class of a custom handler. In order for BloxWeb to locate these files, they need to have a fully qualified path, which is not portable. BloxWeb solves this problem with the install-jar command.

$ bloxweb install-jar CustomHandlerRelease.jar 

The jar file contains all files that may need to be referenced. Furthermore, it contains a bloxweb.config file in its top directory. The BloxWeb server will open the jar and load this config file. The advantage of this approach is that all paths in the config file are resolved relative to the jar file, so it becomes simple to reference custom handlers, for example, in a portable manner.

23.15.1. Server

TODO

23.15.1.1. Endpoints

TODO

23.15.1.2. Handlers

TODO

23.15.1.3. Realms

TODO

23.15.1.4. Service Groups

BloxWeb supports service groups to provide the flexibility to host different versions of a service in a single URI and to facilitate authentication enforcement. It is possible, for example, to provide authenticated and non-authenticated versions of a service in the same URI, but at different ports. For example, suppose we want to expose the time service in authenticated and non-authenticated versions in the same URI. These would be the steps:

  1. Services can declare the group to which they belong. A group is a simple string that identifies services. By default, services belong to the special group bloxweb:no-group. Here, we declare a non-authenticated version of the time service at the /time URI and public group. Also, we declare an authenticated version in the same URI but private group.

        service_by_group["/time", "public"] = x,
        default_protobuf_service(x) {
          protobuf_protocol[] = "time",
          ...
        }.
        service_by_group["/time", "private"] = x,
        default_protobuf_service(x) {
          protobuf_protocol[] = "time",
          ...
          auth_realm[] = "realm_name"
        }.
     
  2. Endpoints can declare the groups that they host. They can also enforce that all services in those groups must be authenticated. In our example, we create in bloxweb.config two endpoint configurations to host the public and private groups: the clear TCP endpoint hosts the public group; the ssl TCP endpoint hosts the private group. Furthermore, the ssl endpoint declares that it requires authentication, which will make the BloxWeb server verify that all services that declare to be in the private group indeed have authentication support.

    [tcp:clear]
    port = 8080
    groups = public
    
    [tcp:ssl]
    ssl = true
    port = 8443
    groups = private
    requires_authentication = true

With this configuration in place, clients accessing the /time URI via TCP port 8080 will be directed to the non-authenticated version of the service. A client accessing /time via TCP port 8443 will be directed to the authenticated version instead.

To support backwards compatibility, service and endpoint groups are optional. If a service does not declare a group, it automatically belongs to the special group bloxweb:no-group. Similarly, if an endpoint does not declare the groups it hosts, it automatically hosts only the group bloxweb:no-group.

This is a summary of how the BloxWeb service interprets the group configuration of services and endpoints.

  • A service can belong to one or more groups; endpoints support at least one group. The special bloxweb:no-group group is assigned to services and endpoints that do not declare groups. Endpoints can explicitly list bloxweb:no-group next to other groups.

  • For each endpoint, services prefixes must be unambiguous. That is, in the set of services hosted by an endpoint (taken from all groups it hosts), two services cannot declare the same prefix. The BloxWeb server issues a warning and ignores all services that conflict.

  • If a service belongs to a group that is not supported by any endpoint (including the bloxweb:no-group group), there's a warning and the service is not hosted.

  • If a service without authentication belongs to a group hosted by an endpoint that requires authentication, there's a warning and the service is not hosted in that particular endpoint (but it may be hosted on other endpoints).

23.15.1.5. Static Workspaces

TODO

23.15.2. Client

bloxweb-client.config

Environment variables:

  • BLOXWEB_CLIENT_JVM_ARGS can be used to configure the JVM arguments of all bloxweb-client invocations. Example:

    export BLOXWEB_CLIENT_JVM_ARGS="-Xmx200m"

23.16. CORS Rules

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. With CORS support in Bloxweb, you can build rich client-side web applications and selectively allow cross-origin access to your Bloxweb services.

For Bloxweb and LogiQL, this essentially allow a rich web application application running in one server, to access services from a Bloxweb server on a different server.

To enable CORS on Bloxweb, we create CORS rules. These rules fundamentally declare the origins that will be allowed to access a particular service. The following LogiQL rule declares a CORS rule that, when applied to a service, will allow remote calls from the origin http://foo.com for GET requests.

cors:rule_by_name["allow_foo_GET"] = r,
  cors:rule(r) {
    cors:allowed_origin("http://foo.com")
  }
    <- .

To apply a certain rule to a service, we use the cors:with_rule_by_name predicate:

service_by_prefix["/foo-service"] = x,
default_protobuf_service(x) {
  protobuf_protocol[] = "foo-proto",
  protobuf_request_message[] = "Request",
  protobuf_response_message[] = "Response",
  cors:with_rule_by_name[] = "allow_foo_GET"
}.

It is also possible to configure origins using wildcards, other HTTP methods, and allow for non-standard headers. Here are the predicates that allow for this:

  • allowed_origin(r, origin): origin can be a string with one wildcard, such as '*foo', 'foo*', or 'foo*bar'. The '*' string will allow all origins;

  • allowed_method(r, method): method should be a string describing an HTTP method ('GET', 'POST', 'PUSH', etc);

  • allowed_header(r, header): header should be the name of an http request header that the client will be allowed to send.

The following rule samples how to use these predicates:

cors:rule_by_name["allow_PUSH_POST"] = r,
cors:rule(r) {
  cors:allowed_origin("http://*foo"),
  cors:allowed_method("GET"),
  cors:allowed_method("PUSH"),
  cors:allowed_header("Header1"),
  cors:allowed_header("Header2")
}
  <- .

23.17. Specifications

23.17.1. Service Configuration

TODO. Meanwhile, use service configuration reference.

23.17.2. Session Authentication

TODO

23.17.2.1. Login Service Protocol

TODO

23.17.3. RSA-SHA Authentication

TODO

Chapter 24. Application Console

The application console is a web-based tool, hosted at http://console.logicblox.com/ which allows browsing workspaces and issuing queries against them.

24.1. Installation

The Application Console requires installing a workspace which keeps authentication information and services configuration. This setup will be an option for automatically deployed applications. This section describes manual installation.

The scripts to install this on the server are provided as a binary package or as part of the integration bundle [add file path to query-project folder].

  • Copy the bloxweb.config file to $LB_DEPLOYMENT_HOME/bloxweb.config, or append its content to your bloxweb.config file and restart bloxweb

  • Install the project with

                  bash scripts/install.sh [url-pattern]
                

    This will overwrite the query-ws workspace if it exists but will migrate authentication credentials. The url-pattern parameter above should be a pattern which matches the url that the query tool html server is running on. . If this is not set correctly, cross-origin requests (CORS) will fail. If you are using the application console at http://console.logicblox.com/ the pattern should be *console.logicblox.com.

    Caution

    Note we use a * wildcard in the pattern. The application console is hosted through https, so http://console.logicblox.com will NOT match the origin of the query.

  • Create users and password for the query tool by running the following command

                  bloxweb set-password -c [username] --url http://localhost:55183/query/credentials
                

If you are not using the integration bundle or an automatically deployed application, bloxweb needs to load the connectblox handlers to serve queries from the application console. The install.sh script makes sure those handlers are loaded but it needs to run again when bloxweb is restarted, or you need to enable those handlers manually in bloxweb.config.

24.2. Configuration

All configuration is covered by the installation documentation when you install the application console services manually. If you use an automatically deployed application you still need to create authentication credentials for the application console. This requires acces to the server and is done by the following command.

              bloxweb set-password -c [username] --url http://localhost:55183/query/credentials
            

Chapter 25. Program Analysis

The program analysis tool provides a way for the user to gain insight into his/her DatalogLB program by writing custom queries. The tool works by storing the metadata of the DatalogLB project of interest in a LogicBlox workspace; queries about the project can be written in DatalogLB itself.

25.1. Usage

The program analysis tool can be invoked to import metadata for a separately compiled project into an LB workspace using the following command:

% bloxanalysis --projectDir <ProjectDir> --workspace <AnalysisWorkspaceDir> 

The program analysis tool can also be invoked to import metadata from a set of blocks that have been extracted from an LB workspace using the following commands:

% bloxbatch -db <WorkspaceDir> -extractInstalledBlocks -dir <ExtractedDir>

% bloxanalysis --extractedDir <ExtractedDir> --workspace <AnalysisWorkspaceDir>

One of --projectDir or --extractedDir must always be specified. If no workspace name is specified, an automatically-generated, unique name is used. The full usage of bloxanalysis is as follows:

  • -D,--dir <ProjectDir>: directory containing the compiled LogicBlox project to be analyzed, or a directory containing blocks that have been extracted from a workspace through the -extractInstalledBlocks bloxbatch command. (Default: ".")
  • -W,--workspace <WorkspaceDir>: workspace where project meta-data should be stored. (Default: "analysis_workspace")
  • -U,--useExistingWorkspace: import metadata in an existing workspace, instead of creating a new one. Needs to be specified together with --workspace
  • --printScript <ScriptFile>: file to save bloxbatch interactive commands to
  • -H,--help: prints usage information about command-line options
  • -P,--progress: prints the progress of project import
  • -S, --connectBloxSessionTimeout: Session timeout, for very large projects whose import may take longer than the default timeout
  • -L, --disableAnalysisLibraries: Don't install addon analysis library computing various useful dependencies between predicates and rules deriving into them. (Default: false)

There are also some additional options that are only activated in "developer mode" (i.e., through the option --develop). These additional options are described below:

  • --connectBloxServer: specify a ConnectBlox server to use.
  • --debug: prints debugging information. (Default: false)
  • --debugDetailFileName: set logLevel to debugDetail@factbus and store output in specified file.
  • --declOnly: only import predicate declarations. (Default: false)
  • --develop: activates these developer options. (Default: false)
  • --exportOnly: export to protobuf but don't compare with imported protobuf. (Default: false)
  • --protobufImportOnly: only import to imported schema/moreblox:base. (Default: false)
  • --testImport : verify correctness of import, by exporting from MoReBlox schema to protobuf and comparing with imported protobuf. (Default: false)

25.2. Analysis workspace

The program metadata is stored in the analysis workspace according to the compiler metamodel. This consists of two parts, the external, which is meant for writing analysis queries over Datalog programs, and the internal, which contains some additional information that may be useful for compiler/BloxAnalysis developers and other expert users. Predicate names in the former are prefixed with the namespace blox:compiler, while the ones in the latter are prefixed with blox:compiler:internal. The API specification of this metamodel is described at:

There are also some analysis libraries, built on top of the compiler metamodel. One of them, blox:analysis:dependencyInfo, computes direct and transitive dependencies between rules and predicates as well as transitive super type relations. It is installed by default when a project is imported through BloxAnalysis (unless specified otherwise through --disableAnalysisLibraries).

Chapter 26. Profiling, Monitoring and Tuning

26.1. bloxtop

BloxTop (introduced LogicBlox 3.4) is an interactive text-mode utility that shows how the logic engine is spending its time. It can be started separately from an existing LogicBlox process and connects live to the already running processes. This makes it a very convenient profiling tool for various deployment scenarios. Although the report is largely based on internal implementation details, it does report sufficient information to understand which rules and which factbus connections are taking a lot of time.

26.2. Understanding Query Execution

To be written. External resources:

26.3. Pre-compiled queries

LogicBlox 3.3 introduces support for pre-compiled queries. Pre-compiled queries are queries that are stored in the workspace, typically already during the build of a workspace. When the pre-compiled query is executed, the engine does not have to fully compile and analyze the query, but can immediately execute the logic based on cached execution data structures. This gives pre-compiled queries a major performance advantage over ad-hoc queries.

The block that contains the pre-compiled query has database lifetime, but it is not part of the installed program. That is, the blocks are inactive. For queries, the data of the predicates declared in the block should not have database lifetime, which should be declared as follows inside of the block:

lang:block:predicateLifetime[] = "TransactionLifetime".

With this declaration, the data of the predicates in the query is transaction-specific, and the predicates become unlocked, essentially turning the predicates into transaction-lifetime predicates.

To store a query using bloxbatch, use storeBlock:

$ bloxbatch -db workspace -storeBlock -file foo.logic

When using the scripting language of 'bloxbatch -interactive' (see the Section BloxBatch scripting), use the 'addBlock' command with the parameter 'active' set to 'false'.

addBlock --file foo.logic --active false

After storing a pre-compiled query in the workspace, the query can be executed using bloxbatch with the 'query' option and a 'name' parameter:

$ bloxbatch -db workspace -query -name foo

When using the LDBC API, a stored query can be executed by retrieving the block of the query using getBlock, which can then be executed.

Pre-compiled queries support normal (array) predicates, table predicates as well as file predicates.

26.4. Contention reporting

It can be difficult to understand what the cause is of contention in a concurrent application. To help with understanding the locking behavior of an application, the engine now supports a monitoring feature that reports logic rules that acquire many locks. To enable this feature, set the environment variable 'LB_MONITOR_LOCKS', where the value is the limit on the number of locks. To print the reports to the log, the logging level needs to be at least 'info@watch'.

$ export LB_MONITOR_LOCKS=50
$ bloxbatch .... -logLevel info@watch

26.5. Long-running rules reporting

To help with profiling applications, we have added a feature to easily monitor which rules are taking a long time to evaluate. To enable this feature, set the environment variable 'LB_MONITOR_RULE_TIME', where the value is the limit in number of seconds. To print the reports to the log, the logging level needs to be at least 'debug@watch'.

$ export LB_MONITOR_RULE_TIME=2
$ bloxbatch .... -logLevel debug@watch

26.6. Cycle execution graph

To help understand what predicates and rules are causing recursion, we have added a logging category that prints precise, yet concise, information on why a set of rules is recursive, and what the dependencies between rules are. The required logging level is 'debugDetail', and the category is 'execCycleGraph'.

$ bloxbatch .... -logLevel debugDetail@execCycleGraph

26.7. Detailed Analysis of Contention Issues

(very rough notes)

Locks are acquired by normal bus connections. By looking at the factbus connections, it becomes visible what elements will be locked.

An example to understand locking:

create testws --overwrite

transaction
addBlock -B main <doc>
  a(x) -> .
  p(x) -> a(x).
  q(x) -> a(x).
  lang:pulse(`q).
  lang:lockingPolicy[`a] = "ByElement".
</doc>
commit

transaction
watch p
watch q
commit

logLevel debugDetail
transaction
addBlock <doc>
  +p(x) <- +a(x).
  +q(x) <- +a(x).
</doc>
commit

transaction
exec <doc>
  +a(x).
</doc>
commit

Running with -logLevel debugDetail@factbus prints the bus connections. ElementLockBusConnections are responsible for locking. There are two kinds of locks: exclusive (for writing) and shared (for reading). Although both shared locks and exclusive locks are harmful, many contention issues can be solved by avoiding exclusive locks.

Ordered predicates with costs:
  0: +a( >x )
       cost: 1 (increment 1), number of facts: 1
   2: elementLock exclusive +p( x )
   3: +p( x )
   4: +p( x )
FactBus variables:
   x: uint[32]
FactBus connections:
   0: ArrayIteratorBusConnection<blox_boolean,false>[ >x ]*
   1: TypeLookupBusConnection=a[ x ]
   2: ElementLockBusConnection<a>[ x ]
   3: ArrayUpdateBusConnection<blox_boolean,false>[ x ]
   4: ClauseIdBusConnection[ ]

'elementLock exclusive +p( x )' means that the engine needs to acquire a write lock on some argument of 'p'. The ElementLockBusConnection shows which variable was picked: ElementLockBusConnection<a>[ x ]. This distinction becomes important if p has more than one argument. Only one of these arguments will be locked when writing to 'p'.

-logLevel debugDetail@lockset can be used to print all locks that are acquired.

By watching predicates, it becomes visible when the locks are acquired, and on which elements. Below, you can see that entity element with id '1' of entity 'a' is locked.

/--- running fact bus ---\
rhs 0: +a
        x=[1]1
rhs 1: +a
        x=[1]1
acquired exclusive lock, id=1 lockset=txlocks_a.lck
rhs 2: +p
        x=[1]1
lhs 3: +p
        x=[1]1
rhs 0: +a
        FAIL
\--- 1 facts updated ---/

Finally, notice that since 'q' is a pulse predicate, no write locks are acquired. when writing to q. The same holds for transaction lifetime (query) predicates.

26.8. Guidelines for Monitoring System Resources and Logging

26.8.1. vmstat

The Linux tool vmstat gives us an overall impression of the use of system resources. Using tee allows you to see the output, as well as log to a file.

$  vmstat -n 10 | tee vmstat.log

26.8.2. top

Running the Linux tool top gives us the memory consumption and CPU resources used per process. Again, we use tee to see the output as well as log to a file.

$ top -b -d 10 -M | tee top.log

26.8.3. Logic Evaluation

If detailed analysis of logic evaluation performance is necessary, then run with logLevel debugDetail. This will produce a lot of data, it prevents the problem where the person analyzing the log does not have sufficient information. It depends on the setup how the logging level needs to be set:

  • bloxbatch:
    $ bloxbatch .... -logLevel debugDetail > debugDetail.log
    
  • bloxbatch -interactive:
    create testws --overwrite
    
    logLevel debugDetail
    transaction
    ...
    
  • process manager:
    TODO
    

26.8.4. Run-time profiling

Run bloxtop during the experiments and capture the output at the end of the experiment. Do this in a big terminal, with a small font to get as much information on the screen as possible.

$ bloxtop

At the end, stop refreshing the profile using 'space' and copy-paste the current screen into a file.

Chapter 27. XPath Query Translation

Many applications use the XPath query language to select nodes from XML documents. For instance, running XPath query /library/book/author on the following XML document returns the set of author XML elements corresponding to Nick Hornby and Haruki Murakami.

<library>
  <book>
    <title>High Fidelity</title>
    <author>Nick Hornby</author>
  </book>
  <book>
    <title>A Wild Sheep Chase</title>
    <author>Haruki Murakami</author>
  </book>
</library>

Tool xpath2logic translates a subset of the XPath language to datalog, allowing such queries to be run over XML documents imported into a workspace as predicates. xpath2logic is a stand-alone command line program that takes a plain-text formatted XPath query for its argument and produces a plain-text formatted set of logic rules as its output. Output is always on stdout.

xpath2logic represents XPath expressions as pairs of datalog predicates. For an example, the program invocation

xpath2logic --qname theQuery --xpathexpr /a

produces the logic shown below, representing /a by predicates theQuery_valToInput and theQuery_domain. Here theQuery_valToInput(X,Y) indicates that XML element X is in the result set produced by running the query /a starting from XML element Y. To avoid unnecessary computation, result sets are only computed starting from desired XML elements, specified by IDB predicate theQuery_domain(X). As a convenience an EDB relation theQuery_edbDomain(X) is also produced to help explicitly populate theQuery_domain(X).

 
// Query EDB Domain Relation: theQuery_edbDomain
// Query Map: theQuery_valToInput
// Semantics: Set
//
// XPath text: /a
//
// Original path expression: (./(child::/(Filter (NameTest "a"))))
//
// Post-optimization path expression: (child::/(NameFilter "a"))

theQuery_domain(X) -> system:xml:element(X).

theQuery_valToInput(X,Y) ->
        system:xml:element(X),
        system:xml:element(Y).

generated_query_b0584a7971a59293_domain(X) <- 
        theQuery_domain(X).

theQuery_valToInput(Y,X) <- 
        generated_query_b0584a7971a59293_valToInput(Y,X), 
        theQuery_domain(X). 

theQuery_edbDomain(X) -> system:xml:element(X).

theQuery_domain(X) <- theQuery_edbDomain(X).

27.1. Usage

As described below, xpath2logic accepts several command line arguments.

--qname name

Set the predicate family name for top level query.

--xpathexpr expr

Translate literal expression expr.

--help

Display usage information.

--readable

Attempt to generate human-readable logic by adding line breaks. If --readable is not specified, logic rules are printed on single lines. When translating multiple XPath expressions, sometimes redundant logic is generated for common subexpressions. (E.g. /library/book/title and /library/book/author share subexpression /library). Single-line formatting allows redundant rules be removed with sort -u.

--checkparse

Check that an input expression is in the subset of XPath understood by XPath to logic. An exit status of zero indicates a successful parse.

--genlogic (default)

Generate logic.

--verbose

Print diagnostic information.

--set (default)

Generate logic representing XPath query results as unordered sets.

--seq (experimental)

Generate logic representing XPath query results as ordered sets. Experimental and unsupported.

[file name]

Specify a file containing an XPath expression to translate.

At most one of --checkparse or --genlogic is allowed. At most one of --set or --seq is allowed. At most one expression may be specified using the --xpathexpr flag or by giving a file name. If an expression is not given by --xpathexpr or in a file, xpath2logic expects input on stdin.

Part III. Measure Service

Chapter 28. Concepts

28.1. OLAP

*** Introduction to the ideas behind OLAP here ***

28.2. Measure service

The measure service is our implementation of online analytical processing (OLAP), which is used for analyzing multidimensional data.

In the LogicBlox approach to OLAP there are two primary concepts, that of dimensions and that of measures.

28.3. Dimensions

A dimension is something we can use to classify data points. For example, we can collect data at different members of a time dimension. Or we can collect data at different members of a spatial dimension. Time and space are two fairly traditional dimensions, but for our purposes we can treat any discrete set as a dimension, such as employees, products, colors, etc. In principle, it would be possible to support continuous and infinite sets as dimensions, but that is not something the measure service currently supports.

Beyond simply allowing us to classify data points, a dimension is allowed to have additional internal structure. For example, we can choose to measure time in days or months, or locations in terms of cities or states. To do so we introduce the concept of levels.

In a time dimension we might have a level for "Days", a level for "Months" and a level for "Years". Levels have members which are the points at which we can measure data. For example, the level "Days" might contain the member "May 2nd, 2013". And the level "Months" might contain the member "June 2013". We can then measure data at these different members.

Figure 28.1. Calendar dimension levels and members

Calendar dimension levels and members


However, once we start having data at multiple levels it is can be difficult to compare it. Therefore, in addition to specifying the levels of a dimension, it is useful to provide a relationship between these levels. For example, we might say that it is possible to relate members of the "Day" level to members of the "Month" level, and then from the members of the "Month" level to members of the "Year" level.

Figure 28.2. Calendar dimension level relationships

Calendar dimension level relationships


It might also be possible to directly relate members of the "Day" level to the "Year" level, but as long as such a relationship commutes with the one from "Day" to "Month" and "Month" to "Year" it is unnecessary.

Figure 28.3. Calendar dimension level relationships extended

Calendar dimension level relationships extended


However, dimensions need not be strictly linear. For example, we could add a "Season" level to the dimension and provide additional relationships between "Month" and "Season" and "Season" and "Year".

Figure 28.4. Calendar dimension extended with "Season" level

Calendar dimension extended with "Season" level


For some dimensions, it also makes sense to provide a mapping from a level to itself.

Example 28.1. 

Consider a dimension for measuring data with respect to different parts in manufacturing process. It is natural for the members of a "Parts" level to be related to other parts in the same level. For example, a "Gear" may be part of an "Engine".

Figure 28.5. Example of dimension with a self-relationship

Example of dimension with a self-relationship


Because we allow relationships between any two levels in a dimension, they can be thought of as a graph. However, we do place some structural limitations on the graph to ensure that has the properties needed for well-defined OLAP queries:

  1. The first requirement is that the graph must not contain any cycles that involve more than one node. This allows for the relationships like the one for "Parts", described above, but rules out mappings that form larger cycles. For example, the Vatican is a country, within the city Rome, which is within the region of Lazio, which is within the country Italy. We can illustrate the relationship between members as

    Figure 28.6. Location dimension levels and members

    Location dimension levels and members


    However, to establish those relationships between levels, we would get a dimension looking like:

    Figure 28.7. Disallowed relationship among Location dimension levels

    Disallowed relationship among Location dimension levels


    This dimension's structure has a cycle involving three nodes which is not allowed.

  2. The second requirement is that the transitive closure of the directed edge relationship must form a meet-semilattice. That is, graph reachability becomes the less than or equals relationship (≤) in a partial order, that there is a least element such that for all levels l in the dimension there is exists a level ⊥ such that ⊥ ≤ l. Furthermore, for every pair of levels l1 and l2, there must exist a meet (greatest lower bound).

Generally, dimensions are also described in terms of what are called hierarchies. A hierarchy can be thought of as a named path through the dimension graph. Hierarchies provide a useful modeling option for dimensions and they can also be used to direct some operations involving dimensions to use a specific path.

Another important concept in modeling with dimensions are attributes. Attributes can be thought of as functions or properties of members of a level. An attribute is generally used to allow meta-data concerning a member to be queried. For example, the name or label of a member might be two separate attributes of a level.

Dimensions can be grouped together to form what we call a dimensionality. For example, if you have a "Time" dimension and a "Location" dimension, the tuple (Time, Location) is a dimensionality.

Note

Note that because we currently do not allow a dimension to occur more than once in a dimensionality, dimensionalities like (Time, Location) and (Location, Time) are isomorphic, and for most purposes, we can treat them as being identical.

For many purposes, we can think of dimensionalities as sets rather than tuples. The fact that dimensionalities can be treated as sets allows us order them using the subset relation. Consequently, for a closed OLAP model, dimensionalities form a complete lattice, with the empty () dimensionality as the top element and the set containing all defined dimensions as the bottom element.

Figure 28.8. Example dimensionality lattice ordering

Example dimensionality lattice ordering

Similarly, we can also group levels together to form what we call intersections.

Example 28.2. 

Given the level "Day" and the level "State", the tuple (Day, State) is an intersection. Analogous to dimensionalities, we do not allow multiple levels from the same dimension to appear in the same intersection. Again, this means that intersection such as (Day, State) and (State, Day) are essentially isomorphic.


It is often useful to think of intersections as maps from dimensions to levels within them. As such, every intersection has a corresponding dimensionality. Additionally, treating intersections as maps means we can lift the meet-semilattices from each dimension to an ordering and meet-semilattice on intersections. The top element is the empty intersection (), and if the OLAP model is closed, the bottom element is the intersection containing the bottom levels of the dimensions in the model. It does not necessarily form a complete lattice because the meet-semilattice from each individual dimensions is not guaranteed to have all joins.

Figure 28.9. Example intersection lattice ordering

Example intersection lattice ordering


Sometimes we will refer to an intersection as being a base or leaf intersection. This is an intersection defined solely in terms of the bottom elements of the respective dimensions.

Finally, we can also group members from levels together to form what we call positions. Again, as with dimensionalities and intersections, a position can only contain a single member of a given dimension.

28.4. Measures

A measure is a map from the positions of some intersection to a value or, less frequently, a set of values or no values at all. The canonical OLAP example is the Sales measure, that gives a decimal data value for each position of the intersection (Sku, Store, Week).

Every measure is defined by a measure expression. Measure expresssions are dicussed below, but for intuition it's worth considering two kinds of measures. Metrics are measures defined by the contents of a provided LogiQL predicate. You can think of metrics as the input data to the measure service. In contrast aggregations define a measure by adding up values in other measures, which might be metrics. Putting this together, we might define Sales at (Sku, Store, Week) as a metric referencing LogiQL predicate companydata:sales and use an aggregation measure expression to "roll-up" sales figures to the the a measure at intersection (Sku, Region, Year).

It is not strictly necessary that a measure be a function from positions to values. It is allowable to have a set of values for each position. These measures can be used in queries, but our current reporting mechanism can only handle functional measures. Therefore, some filtering or aggregation is necessary to obtain a report from these relational measures.

Furthermore, is not necessary that a measure contains data. A measure can consist entirely of set of positions. This is isomorphic to having dense measure measuring boolean values, but more space efficient. While these measures may be used in queries, given that there is no data within them, it is not possible to directly query them in a report.

Finally, it is also possible for measures to be parameterized so that their behavior can be adjusted on a per query basis. Because the parametrization mechanism is closely tied to implementation details, we do not cover it in depth here.

Chapter 29. Configuration

The measure service data model is described by a protocol buffer message (if you are not familiar with Protocol Buffers, please visit Google's Developer Guide). This message is stored and read from workspace that is backing the measure service. There are a number of options for creating the message. One would be to construct it in another program or tool and import it. Another is to directly populate the protocol buffer message predicates using LogiQL. Finally, there is a high level LogiQL library that can be used to more concisely describe a measure model. We'll focus on using that library in this chapter.

Regardless of how you populate the model predicates, you'll need to install the bloxweb:measure_service library into your workspace.

This chapter assumes you are already familiar with configuring WebServiceBlox via LogiQL. In general, your service definition will look something like the following, where you may adjust the path "/measure" to whatever makes sense for your application.

service_by_prefix["/measure"]=x,
protobuf_service(x) {
  custom_handler[] = "measure-service",
  protobuf_protocol[] = "bloxweb:measure_service",
  protobuf_request_message[] = "Request",
  protobuf_response_message[] = "Response"
}.

Also, when using the measure configuration library, you'll want to include somewhere in your project the line:

+measure:config:enable().

This will start the process of populating your model.

29.1. Dimensions

The simplest possible dimension we can define would look something like the following in LogiQL:

+measure:config:dimension(_) {
  +measure:config:dimension_hasName[]="Calendar"    (1)
}.

+measure:config:hierarchy_byDimName["Calendar", "Default"] = calHier,    (2)
+measure:config:hierarchy(calHier) {
  +measure:config:level["day"] = "hierarchies:calendar:day"
}.

1

clause stating that there exists a dimension with the name "Calendar".

2

states that the "Calendar" dimension has a hierarchy named "Default", with a single level called "day". Furthermore, it states that in LogiQL the level "day" is represented by the entity type "hierarchies:calendar:day".

At this point our dimension looks like the following:

Figure 29.1. Minimal Calendar dimension

Minimal Calendar dimension


Note

At present the configuration library requires that a dimension have at least one hierarchy. In practice a dimension does not need to have any hierarchies, but the configuration library does not support that currently.

Building on our minimal "Calendar" dimension, we can construct a more interesting dimension:

+measure:config:dimension(_) {
  +measure:config:dimension_hasName[]="Calendar",
  +measure:config:dimension_hasCaption[]="Calendar"        (1)
}.

+measure:config:hierarchy_byDimName["Calendar", "Default"] = calHier,
+measure:config:hierarchy(calHier) {
  +measure:config:level["day"] = "hierarchies:calendar:day",
  +measure:config:attribute["day","id","STRING"]="hierarchies:calendar:day_id",
  +measure:config:level["month"] = "hierarchies:calendar:month",                     (2)
  +measure:config:attribute["month","id","STRING"]="hierarchies:calendar:month_id",  (3)
  +measure:config:levelMap["day", "month"] = "hierarchies:calendar:day2month"        (4)
}.

1

indicates that the "Calendar" dimension should be displayed as "Calendar" by giving it a caption. Captions are never used in queries, and are only additional metadata that a client can use to format its output.

2

a second level is added to the "Default" hierarchy.

3

both levels also have attributes. The attribute lines state that "day" and "month" both have attributes "id" bound to the LogiQL functions "hierarchies:calendar:day_id" and "hierarchies:calendar:month_id" respectively.

4

declaration that there is a mapping from "day" to "month" via the "hierarchies:calendar:day2month" predicate.

Graphically our dimension now looks like the following:

Figure 29.2. Simple Calendar dimension

Simple Calendar dimension


Finally we can make the dimension even more complex.

+measure:config:dimension(_) {
  +measure:config:dimension_hasName[]="Calendar"
  +measure:config:dimension_hasCaption[]="Calendar",
  +measure:config:dimension_hasKind[]="TIME",    (3)
  +measure:config:dimension_hasDefaultHierarchy[]="Chronological"    (2)
}.

+measure:config:hierarchy_byDimName["Calendar", "Chronological"] = chronHier,    (1)
+measure:config:hierarchy(chronHier) {
  +measure:config:level["day"] = "hierarchies:calendar:day",
  +measure:config:attribute["day","id","STRING"]="hierarchies:calendar:day_id",
  +measure:config:level["month"] = "hierarchies:calendar:month",
  +measure:config:attribute["month","id","STRING"]="hierarchies:calendar:month_id",
  +measure:config:level["year"] = "hierarchies:calendar:year",
  +measure:config:attribute["year","id","STRING"]="hierarchies:calendar:year_id",
  +measure:config:levelMap["day", "month"] = "hierarchies:calendar:day2month",
  +measure:config:levelMap["month", "year"] = "hierarchies:calendar:month2year",
}.

+measure:config:hierarchy_byDimName["Calendar", "Seasonal"] = seasHier,
+measure:config:hierarchy(seasHier) {
  +measure:config:level["day"] = "hierarchies:calendar:day",
  +measure:config:attribute["day","id","STRING"]="hierarchies:calendar:day_id",
  +measure:config:level["month"] = "hierarchies:calendar:month",
  +measure:config:attribute["month","id","STRING"]="hierarchies:calendar:month_id",
  +measure:config:level["season"] = "hierarchies:calendar:season",
  +measure:config:attribute["season","id","STRING"]="hierarchies:calendar:season_id",
  +measure:config:level["year"] = "hierarchies:calendar:year",
  +measure:config:attribute["year","id","STRING"]="hierarchies:calendar:year_id",
  +measure:config:levelMap["day", "month"] = "hierarchies:calendar:day2month",
  +measure:config:levelMap["month", "season"] = "hierarchies:calendar:month2season"
  +measure:config:levelMap["season", "year"] = "hierarchies:calendar:season2year"
}.

1

we have added a second hierarchy to the calendar dimension, renaming the original "Default" to "Chronological".

2

We have also specified that when choosing paths through the dimension, it should default to the "Chronological" rather than "Seasonal" hierarchy.

3

the kind of the dimension is defined as "TIME". As with captions, the kind of dimension is not used in any way during query evaluation, and is just a hint to a client on how it might want to format the data. A complete list of the known kinds can be found in the measure service protocol.

Figure 29.3. Finished Calendar dimension

Finished Calendar dimension


Finally, lets look at how we would define a recursive dimension:

+measure:config:dimension(_) {
  +measure:config:dimension_hasName[]="Inventory",
  +measure:config:dimension_hasEdge("part", "hierarchies:inventory:part2part", "part"),    (1)
}.

+measure:config:hierarchy_byDimName["Inventory", "Default"] = calHier,
+measure:config:hierarchy(calHier) {
  +measure:config:level["Parts"] = "hierarchies:inventory:parts",
  +measure:config:level["Warehouse"] = "hierarchies:inventory:warehouse,
  +measure:config:levelMap["Parts", "Warehouse"] = "hierarchies:inventory:parts2warehouse"
}.

1

the hierarchy looks much the same as it did in the "Calendar" examples. The difference is that we've specified here an edge from the "Parts" level to the "Parts" level, mapped via the "hierarchies:inventory:parts2parts" predicate.

Graphically, the "Inventory" dimension looks like the following:

Figure 29.4. Recursive Inventory dimension

Recursive Inventory dimension


Note

It is possible to define arbitrary edges among levels using the "measure:config:dimension_hasEdge" predicate. However, as described in the Concepts chapter, the dimension must still meet the expected well-formedness conditions.

29.2. Metrics

Defining a metric is relatively straightforward. The definition of the canonical Sales metric in LogiQL would be:

+measure:config:metric("Sales") {                          (1)
  +measure:config:metric_usesPredicate[]="data:sales",     (2)
  +measure:config:metric_hasIntersection[]="Product.sku,Location.store,Calendar.week",    (3)
  +measure:config:metric_hasType[]="FLOAT"
}.

1

declares that the name of the metric is "Sales".

2

indicates that the metric is defined in terms of the LogiQL predicate data:sales.

3

declares the intersection of the metric to be (Product.sku,Location.store,Calendar.week).

Note

It is important not to include non-essential whitespace in the value of the measure:config:metric_hasIntersection. The intersection consists of a tuple of dimensions and levels separated by a period. If is unambigious as to which dimension a level belongs, it is possible to omit the dimension and period prefix.

And finally, the fourth line indicates that the measurements in this metric are values of floating-point type.

Currently the following types measurements are supported:

  • STRING
  • INT
  • FLOAT
  • DECIMAL
  • BOOLEAN.

Another metric we might define is product availability:

+measure:config:metric("Avail") {
  +measure:config:metric_hasCaption[]="Availability",      (1)
  +measure:config:metric_usesPredicate[]="data:avail",
  +measure:config:metric_hasIntersection[]="Product.sku,Location.store,Calendar.week"     (2)
}.

1

definition of a metric called Avail with a display caption of Availability.

2

it is defined for the same intersection as the Sales metric, but in this case we have omitted the type declaration. This means that this is a position-only metric, simply encoding whether a given product was available at a particular store during some week.

Chapter 30. Primitive queries

All queries in this chapter and the following chapters are based upon the sample measure application found in the samples/simple1 directory. All the queries are executed using the measure service query-tool that allows directly querying the measure service with the textual representation of protocol buffers. All the examples should transliterate straightforwardly to JSON.

30.1. Attributes

One of the simplest possible queries is to just ask for the contents of an attribute. Because this example query is written in the textual serialization of the protocol buffer format it can be a bit verbose, and there are a number of details to understand:

kind: QUERY                                                (1)
query_request {                                            (2)
  report_name: "attr" return_row_numbers: false            (3)
  measure { kind: ATTRIBUTE                                (4)
    attribute {                                            (5)
      qualified_level { dimension: "Product" level: "sku" }
      attribute: "id"
    }
  }
}

1

kind: QUERY is used to indicate that this messages is a query, as opposed to some of the kind of request.

2

opens the actual query request itself.

3

this line includes two directives to the measure service. The part report_name: "attr" is naming the report generated by this query to be "attr". In general, different queries should have different report names. The part return_row_numbers: false is directing the measure service not to return columns that include the row numbers. Row numbers are generally only useful when returning sparse data, and would make the example responses more difficult to read, so unless otherwise needed we will disable them for all examples.

4

the start of the measure expression that we are querying. The part kind: ATTRIBUTE states that contents of the given measure message correspond to an attribute and therefore the attribute field must be populated.

5

specification of the attribute in terms of a qualified level, a dimension, and a level, along with the attribute name itself, id.

From this query we'll get a response like:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
  }
}

For the purpose of the measure expression query language, a level's attribute can be viewed as a kind one-dimensional metric at the intersection consisting of only that level.

30.2. Metrics

One of the other simplest possible queries is to query the contents of a metric:

kind: QUERY
query_request {
  report_name: "name" return_row_numbers: false
  measure { kind: METRIC                                   (1)
    metric { name: "Sales" }                               (2)
  }
}

1

the field kind: METRIC is used to indicate that this measure expression query is a metric and that the metric field must be populated.

2

inside the actual metric message we supply the name: "Sales" field to indicate that we want the contents of the Sales metric.

This query yields:

report_column {
  float_column {
    value: 200.0
    value: 180.0
    value: 150.0
    value: 110.0
    value: 200.0
    value: 210.0
    value: 100.0
    value: 100.0
    value: 100.0
    value: 105.0
    value: 50.0
    value: 58.0
    value: 70.0
    value: 100.0
    value: 70.0
    value: 75.0
    value: 100.0
    value: 120.0
    value: 300.0
    value: 139.0
    value: 300.0
    value: 209.0
  }
}

which are the values contained in the functional predicate backing the Sales metric. However, these values can be a bit difficult to interpret because the metric Sales is itself at the intersection (sku, store, week) and we have no indication as to how the values we've received correspond to those dimensions.

The solution is add what we call "key requests" to the query:

kind: QUERY
query_request {
  report_name: "name_keyed" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }(1)
  measure { kind: METRIC
    metric { name: "Sales" }
  }
}

1

the 3 key messages request that we return the id attribute of each level in the measure expression's intersection. Each of these attributes is returned as an additional column.

Note

It is only possible to make key requests against levels that are in the intersection of the given query. For example, we could not ask for an attribute for the level month because the query is at the intersection (sku, store, week).

Making the above query with key requests yields the following:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Colgate 20oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Charlotte, NC"
    value: "Charlotte, NC"
    value: "Jacksonville, FL"
    value: "Jacksonville, FL"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Charlotte, NC"
    value: "Charlotte, NC"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Detroit, MI"
    value: "Detroit, MI"
    value: "Chicago, IL"
    value: "Chicago, IL"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
  }
}
report_column {
  string_column {
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 200.0
    value: 180.0
    value: 150.0
    value: 110.0
    value: 200.0
    value: 210.0
    value: 100.0
    value: 100.0
    value: 100.0
    value: 105.0
    value: 50.0
    value: 58.0
    value: 70.0
    value: 100.0
    value: 70.0
    value: 75.0
    value: 100.0
    value: 120.0
    value: 300.0
    value: 139.0
    value: 300.0
    value: 209.0
  }
}

While the response is column oriented, it is still much easier to interpret these results than the raw numbers.

30.3. Terms

The final sort of primitive measure query expression is the term expression. Term expressions have a nullary intersection (). And therefore, there are no keys to request.

kind: QUERY
query_request {
  report_name: "term" return_row_numbers: false
  measure { kind: TERM                                     (1)
    term { kind: CONSTANT constant { int_constant: 42 } }  (2)
  }
}

1

we use the field kind: TERM to indicate that the measure expression is of the kind term.

2

the value expression is defined to be the integer constant 42.

Making this term expression query yields the following:

report_column {
  int_column {
    value: 42
  }
}

This may not seem like a particularly useful query to make, but is a foundational building block that can be used in more complex queries later on.

Chapter 31. Aggregation queries

One of the most common measure service queries is requesting the computation of an aggregation. Here is a simple query to compute the total sales across all products, locations, and times.

kind: QUERY
query_request {
  report_name: "agg_total_name_all" return_row_numbers: false
  measure { kind: AGGREGATION                              (1)
    aggregation {
      method: TOTAL                                        (2)
      expr { kind: METRIC metric { name: "Sales" } }       (3)
      grouping { kind: ALL dimension: "Product" }          (4)
      grouping { kind: ALL dimension: "Location" }
      grouping { kind: ALL dimension: "Calendar" }
    }
  }
}

1

the measure has kind "AGGREGATION" to indicate that the "aggregation" field is populated with an "AggExpr" message.

2

Within the "AggExpr" message we have set the "method" field to "TOTAL" so that the aggregation computes the total of the enclosed measure expression.

3

Within the "AggExpr" message we find the "expr" field which is our first example of a nested measure expression. Within it we have provided a "MeasureExpr" message that is a measure query for the "Sales" metric itself.

4

Finally, the "AggExpr" message contains three "Grouping" messages. Groupings allow for controlling how an aggregation moves a measure expression from one intersection to a higher intersection (in terms of the intersection lattice). When using the "ALL" grouping, a dimension is specified to project away, computing the aggregate for "all" members in that dimension.

Because in this example, we have chosen to project away all the dimensions, the result will have the nullary intersection and simply be a single number. Consequently, there is no point in including any key requests.

Note

Currently, the measure service supports the following aggregation methods: COLLECT, AMBIG, TOTAL, MIN, MAX, COUNT, MODE, and COUNT_DISTINCT.

Sending the query above to the server gives us the following response:

report_column {
  float_column {
    value: 3046.0
  }
}

If we are instead interested in the total number of products ever sold, rather than how much money we have collected, we need only change the aggregation method to COUNT:

kind: QUERY
query_request {
  report_name: "agg_count_name_all" return_row_numbers: false
  measure { kind: AGGREGATION
    aggregation {
      method: COUNT                                        (1)
      expr { kind: METRIC metric { name: "Sales" } }
      grouping { kind: ALL dimension: "Product" }
      grouping { kind: ALL dimension: "Location" }
      grouping { kind: ALL dimension: "Calendar" }
    }
  }
}

1

aggregation method is changed to COUNT.

Executing the query yields the following result:

report_column {
  int_column {
    value: 22
  }
}

Note that we can aggregate over other kinds of measure expressions, rather than just over metrics. For example:

kind: QUERY
query_request {
  report_name: "agg_count_attr_all" return_row_numbers: false
  measure { kind: AGGREGATION
    aggregation {
      method: COUNT
      expr { kind: ATTRIBUTE
        attribute { qualified_level { dimension: "Product" level: "sku" }
                    attribute: "id" }    (1)
      }
      grouping { kind: ALL dimension: "Product" }
    }
  }
}

1

here we count the number of products that we have by aggregating over the sku level's id attribute.

Executing the query yields the following result:

report_column {
  int_column {
    value: 4
  }
}

Always aggregating to a single value does not generally provide for interesting insights into your data, so we also have the option to aggregate to specific intersection:

kind: QUERY
query_request {
  report_name: "agg_total_name_map" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "class" } attribute: "id" }    (2)
  key { qualified_level { dimension: "Location" level: "region" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "month" } attribute: "id" }
  measure { kind: AGGREGATION
    aggregation {
      method: TOTAL
      expr { kind: METRIC metric { name: "Sales" } }
      grouping { kind: MAP dimension: "Product" level: "class" }    (1)
      grouping { kind: MAP dimension: "Location" level: "region" }
      grouping { kind: MAP dimension: "Calendar" level: "month" }
    }
  }
}

1

instead of using the ALL grouping we use the MAP grouping that directs the measure service to aggregate the data for a given dimension to a specific level. In this example we have requested that we aggregate to the class level along the Product dimension. The resulting measure expression has an intersection of (class, region, month).

2

because we are not completely aggregating away all the dimensions this time, we include key requests to make the output more comprehensible.

Executing the query yields:

report_column {
  string_column {
    value: "Bathroom"
    value: "Bathroom"
  }
}
report_column {
  string_column {
    value: "South"
    value: "Midwest"
  }
}
report_column {
  string_column {
    value: "January"
    value: "January"
  }
}
report_column {
  float_column {
    value: 2311.0
    value: 735.0
  }
}

Alternatively, we could have asked for maximum sales value at the (class, region, month) intersection.

kind: QUERY
query_request {
  report_name: "agg_max_name_map" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "class" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "region" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "month" } attribute: "id" }
  measure { kind: AGGREGATION
    aggregation {
      method: MAX
      expr { kind: METRIC metric { name: "Sales" } }
      grouping { kind: MAP dimension: "Product" level: "class" }
      grouping { kind: MAP dimension: "Location" level: "region" }
      grouping { kind: MAP dimension: "Calendar" level: "month" }
    }
  }
}

Executing this query yields:

report_column {
  string_column {
    value: "Bathroom"
    value: "Bathroom"
  }
}
report_column {
  string_column {
    value: "South"
    value: "Midwest"
  }
}
report_column {
  string_column {
    value: "January"
    value: "January"
  }
}
report_column {
  float_column {
    value: 300.0
    value: 120.0
  }
}

We can also freely mix ALL and MAP groupings, as well as simply leave a dimension alone:

kind: QUERY
query_request {
  report_name: "agg_max_name_mixed" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }    (1)
  key { qualified_level { dimension: "Calendar" level: "month" } attribute: "id" }    (2)
  measure { kind: AGGREGATION
    aggregation {
      method: MAX
      expr { kind: METRIC metric { name: "Sales" } }
      grouping { kind: ALL dimension: "Location" }
      grouping { kind: MAP dimension: "Calendar" level: "month" }
    }
  }
}

1

here we are projecting away the Location dimension, and keeping all of the members at the sku level

2

and we are aggregating up from the week level to the month level in the Calendar dimension.

Executing the query yields:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "January"
    value: "January"
    value: "January"
    value: "January"
  }
}
report_column {
  float_column {
    value: 210.0
    value: 120.0
    value: 300.0
    value: 300.0
  }
}

Finally, because we may aggregate over arbitrary measure expressions, it is also possible to aggregate another aggregation.

kind: QUERY
query_request {
  report_name: "agg_max_count_name" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "month" } attribute: "id" }
  measure { kind: AGGREGATION
    aggregation {        (2)
      method: MAX
      expr { kind: AGGREGATION
        aggregation {    (1)
          method: COUNT
          expr { kind: METRIC metric { name: "Sales" } }
          grouping { kind: MAP dimension: "Location" level: "region" }
          grouping { kind: MAP dimension: "Calendar" level: "month" }
        }
      }
      grouping { kind: ALL dimension: "Location" }
    }
  }
}

1

here, we first count the up all the sales made at the intersection (sku,region,month)

2

then we compute the maximum number of sales made across all locations at the intersection (sku, month).

Executing the query yields the following:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "January"
    value: "January"
    value: "January"
    value: "January"
  }
}
report_column {
  int_column {
    value: 6
    value: 6
    value: 2
    value: 2
  }
}

Note

Note that there are still some restrictions on which measure expression we may aggregate over. For example, it doesn't make sense to compute the total of a string or boolean valued metric. Doing so will result in an error response from the measure service.

Chapter 32. Filtering and dicing

Beyond aggregating, there are two other ways to restrict the result of a query. First is filtering, which restricts the result based upon the values of a measure expression. Naturally, filtering does not make sense on position-only measure expressions. Second is dicing, which restricts the result based upon the positions in the measure expression.

32.1. Filtering

One of the simplest filters is to filter values based upon a single comparison operation.

kind: QUERY
query_request {
  report_name: "filter_name_greater" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }
  measure { kind: FILTER     (1)
    filter {                 (2)
      expr{ kind: METRIC
        metric { name: "Sales" }
      }
      comparison { op: GREATER_THAN term { kind: CONSTANT constant { float_constant: 200.0 } } }    (3)
    }
  }
}

1

here, we have taken the Sales metric and wrapped it in a filter measure expression, as indicated by the use of the kind FILTER.

2

a filter measure expression requires us to provide a filter field in the MeasureExpr message.

3

to the FilterExpr message we add a Comparison, where have set the comparison operation to GREATER_THAN and supplied a Term message for a the floating-point constant 200.0.

Executing this query yields the following:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "Jacksonville, FL"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
  }
}
report_column {
  string_column {
    value: "1150125"
    value: "1150118"
    value: "1150118"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 210.0
    value: 300.0
    value: 300.0
    value: 209.0
  }
}

As we can see, we've only returned results for those positions where the Sales value was larger than 200.

It is possible to supply multiple conditions if we want to further constrain the result.

kind: QUERY
query_request {
  report_name: "filter_name_greater_less" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }
  measure { kind: FILTER
    filter { 
      expr{ kind: METRIC
        metric { name: "Sales" }
      }
      comparison { op: GREATER_THAN term { kind: CONSTANT constant { float_constant: 200.0 } } }
      comparison { op: LESS_THAN term { kind: CONSTANT constant { float_constant: 300.0 } } }    (1)
    }
  }
}

1

In this query, we have added a second filter condition restricting the result to those positions whose values are less than 300.

The following comparison operations are currently provided:

  • EQUALS
  • NOT_EQUALS
  • LESS_THAN
  • LESS_OR_EQUALS
  • GREATER_THAN
  • GREATER_OR_EQUALS
  • LIKE.

Note

All comparison operations can be used with all types of data, except LIKE which can only be used with string data and values.

Executing the above query yields:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "Jacksonville, FL"
    value: "Atlanta, GA"
  }
}
report_column {
  string_column {
    value: "1150125"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 210.0
    value: 209.0
  }
}

As expected, this response lacks the positions that had values of 300 before.

As with aggregations, it is possible to apply filters to any other valid measure expression. For example, we can filter an attributes rather than metrics.

kind: QUERY
query_request {
  report_name: "filter_attr_disj" return_row_numbers: false
  measure { kind: FILTER
    filter {
      expr { kind: ATTRIBUTE
        attribute {
          qualified_level { dimension: "Calendar" level: "month" }
          attribute: "id"
        }
      }
      comparison { op: LIKE term { kind: CONSTANT constant { string_constant: "J%" } } }    (1)
      comparison { op: LIKE term { kind: CONSTANT constant { string_constant: "M%" } } }
      is_disjunction: true    (2)
    }
  }
}

1

here we are have used two Comparisons with the LIKE operator. As such we are comparing the values of month's idattribute with the pattern strings "J%" and "M%".

2

we've also set the is_disjunction field to true to indicate rather than requiring the values match both of these patterns, something that would be impossible, that they must match at least one of the conditions.

Executing this query yields the following:

report_column {
  string_column {
    value: "January"
    value: "March"
    value: "May"
    value: "June"
    value: "July"
  }
}

Which as expected are all the months with names starting with a J or a M.

32.2. Dicing

As we noted in the beginning of this chapter, dicing involves restricting the set of results by position rather than value. How do we select the positions to return? We use the positions found in another measure expression.

kind: QUERY
query_request {
  report_name: "dice_name_metric" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }
  measure { kind: DICE        (1)
    dice { 
      expr { kind: METRIC     (2)
        metric { name: "Sales" }
      }
      dicer { kind: METRIC    (3)
        metric { name: "Returns" }
      }
    }
  }
}

1

in this example, we have placed a DiceExpr in the dice field, as indicated by the MeasureExpr kind DICE.

2

we've set the expr field of the DiceExpr to be the measure expression for the Sales metric that we've been using as our running example.

3

we have then added another MeasureExpr for the repeated dicer field. In this case we are using the Returns metric.

Executing the above query yields the following:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Colgate 20oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Charlotte, NC"
    value: "Charlotte, NC"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Charlotte, NC"
    value: "Charlotte, NC"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
  }
}
report_column {
  string_column {
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 200.0
    value: 180.0
    value: 150.0
    value: 110.0
    value: 100.0
    value: 100.0
    value: 100.0
    value: 105.0
    value: 50.0
    value: 58.0
    value: 70.0
    value: 100.0
    value: 300.0
    value: 139.0
    value: 300.0
    value: 209.0
  }
}

Here, we are seeing the sales results for only those positions where there were also returns made. The values are all from the Sales metric, but we've only used those positions in Sales metric that are also in the Returns metric, regardless of whatever value they may have in the Returns metric.

However, it isn't necessary that the measure expressions that we dicing against have the same intersection as the dicer expressions. They could even have completely disjoint intersections, but then the result would be guaranteed to be empty.

kind: QUERY
query_request {
  report_name: "dice_name_filter" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }
  measure { kind: DICE
    dice { 
      expr { kind: METRIC
        metric { name: "Sales" }
      }
      dicer { kind: FILTER    (1)
        filter {
          expr { kind: ATTRIBUTE 
            attribute {
              qualified_level { dimension: "Location" level: "region" }
              attribute: "id"
            }
          }
          comparison { op: EQUALS term { kind: CONSTANT constant{ string_constant: "Midwest" } } }
        }
      }
    }
  }
}

1

in this example, we are using a filtered attribute as the dicer. First, we are filtering out exactly the region that represents the Midwest, and then using the resulting measure expression as the dicer.

In this case the diced expression is at the intersection (sku, store, week) while the dicer expression is at the intersection (region). Naïvely, you might expect that the resulting query would be empty because these are two distinct intersections. However, when we execute the query we get back:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
  }
}
report_column {
  string_column {
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Detroit, MI"
    value: "Detroit, MI"
    value: "Chicago, IL"
    value: "Chicago, IL"
  }
}
report_column {
  string_column {
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 100.0
    value: 100.0
    value: 70.0
    value: 100.0
    value: 70.0
    value: 75.0
    value: 100.0
    value: 120.0
  }
}

What has happened is that the measure service knows how to relate positions in (sku, store, week) to positions in (region). This is because the store level maps up to the level region. So the result includes only those positions for stores that can be mapped to the Midwest region.

Similar to filtering, we can add multiple dicer expression to a DiceExpr.

kind: QUERY
query_request {
  report_name: "dice_name_filter_conj" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }
  measure { kind: DICE
    dice { 
      expr { kind: METRIC
        metric { name: "Sales" }
      }
      dicer { kind: METRIC                                 (1)
        metric { name: "Returns" }
      }
      dicer { kind: FILTER                                 (2)
        filter {
          expr { kind: ATTRIBUTE 
            attribute {
              qualified_level { dimension: "Location" level: "region" }
              attribute: "id"
            }
          }
          comparison { op: EQUALS term { kind: CONSTANT constant { string_constant: "Midwest" } } }
        }
      }
    }
  }
}

1

here we are using the Returns metric for the dicer field.

2

and here we are also using a filtered attribute as the dicer.

Executing this query yields the following:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
  }
}
report_column {
  string_column {
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Lansing, MI"
  }
}
report_column {
  string_column {
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 100.0
    value: 100.0
    value: 70.0
    value: 100.0
  }
}

Also like filtering, we have the option to allow a disjunctions of dicer expressions.

kind: QUERY
query_request {
  report_name: "dice_name_filter_disj" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "sku" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "store" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "week" } attribute: "id" }
  measure { kind: DICE
    dice { 
      expr { kind: METRIC
        metric { name: "Sales" }
      }
      dicer { kind: METRIC
        metric { name: "Returns" }
      }
      dicer { kind: FILTER
        filter {
          expr { kind: ATTRIBUTE 
            attribute {
              qualified_level { dimension: "Location" level: "region" }
              attribute: "id"
            }
          }
          comparison { op: EQUALS term { kind: CONSTANT constant { string_constant: "Midwest" } } }
        }
      }
      is_disjunction: true     (1)
    }
  }
}

1

we've set the is_disjunction field to true to indicate rather than requiring that the values match both of these patterns, that they must match at least one of the conditions.

Executing this query yields the following:

report_column {
  string_column {
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Scope 20oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Crest 4oz"
    value: "Colgate 20oz"
    value: "Colgate 20oz"
    value: "Charmin 12 rolls"
    value: "Charmin 12 rolls"
  }
}
report_column {
  string_column {
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Charlotte, NC"
    value: "Charlotte, NC"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Charlotte, NC"
    value: "Charlotte, NC"
    value: "Lansing, MI"
    value: "Lansing, MI"
    value: "Detroit, MI"
    value: "Detroit, MI"
    value: "Chicago, IL"
    value: "Chicago, IL"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
    value: "Atlanta, GA"
  }
}
report_column {
  string_column {
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
    value: "1150118"
    value: "1150125"
  }
}
report_column {
  float_column {
    value: 200.0
    value: 180.0
    value: 150.0
    value: 110.0
    value: 100.0
    value: 100.0
    value: 100.0
    value: 105.0
    value: 50.0
    value: 58.0
    value: 70.0
    value: 100.0
    value: 70.0
    value: 75.0
    value: 100.0
    value: 120.0
    value: 300.0
    value: 139.0
    value: 300.0
    value: 209.0
  }
}

Finally, when it comes to mapping between intersections, it is possible to map down as well as up.

kind: QUERY
query_request {
  report_name: "dice_agg_filter" return_row_numbers: false
  key { qualified_level { dimension: "Product" level: "class" } attribute: "id" }
  key { qualified_level { dimension: "Location" level: "region" } attribute: "id" }
  key { qualified_level { dimension: "Calendar" level: "month" } attribute: "id" }
  measure { kind: DICE
    dice { 
      expr { kind: AGGREGATION                             (1)
        aggregation {
          method: TOTAL
            expr { kind: METRIC metric { name: "Sales" } }
            grouping { kind: MAP dimension: "Product" level: "class" }
            grouping { kind: MAP dimension: "Location" level: "region" }
            grouping { kind: MAP dimension: "Calendar" level: "month" }
        }
      } 
      dicer { kind: FILTER                                 (2)
        filter {
          expr { kind: ATTRIBUTE 
            attribute {
              qualified_level { dimension: "Location" level: "store" }
              attribute: "id"
            }
          }
          comparison { op: LIKE term { kind: CONSTANT constant { string_constant: "%Atlanta%" } } }
        }
      }
    }
  }
}
        

1

Here, the aggregation measure expression being diced is at the intersection (class, region, month)

2

while the dicer expression is at (store).

Executing the query yields the following:

report_column {
  string_column {
    value: "Bathroom"
  }
}
report_column {
  string_column {
    value: "South"
  }
}
report_column {
  string_column {
    value: "January"
  }
}
report_column {
  float_column {
    value: 2311.0
  }
}
      

Here we can see that the reported positions have been restricted to those where the region maps down to a store in Atlanta. This rules out all the regions other than the South, as they do not contain any cities named Atlanta.

Chapter 33. Measure Expression Grammar

In many contexts the measure service allows concisely specifying a measure query expression using the following grammar:
<ident> := <letter> (<letter> | [0-9])*

<type> := string | int | float | decimal | boolean

<level> := <ident>                 // Unqualified level name
         | <ident>.<ident>         // Level name qualified with dimension
         | <ident>.<ident>.<ident> // Level name qualified with dimension and hierarchy.

<intersection> := (<level>, ..., <level>) // May be nullary 

<integer-literal> := 0 | [1-9][0-9]+  
<fractional-literal> := <integer-literal> . [0-9]+  

<constant> := <integer-literal>                                  // Integer constant
            | <integer-literal>d | <fractional-literal> d?       // Decimal constant
            | <integer-literal>f | <fractional-literal> f        // Float constant
            | <fractional-literal> [eE][+-]?<integer-literal> f? 
            | true | false                                       // Boolean constant
            | "..."                                              // String constant

<term> := <constant>        // constant
        | <ident> : <type>  // param 

<expr> := (<expr>)                       // parens
        | <ident>                        // metric 
        | <level>.<ident>                // attribute
        | <ident>(<expr>, ..., <expr>)   // operator
        | -<expr>                        // sugar for negate(<expr>) 
        | <expr> + <expr>                // sugar for add(<expr>, <expr>) 
        | <expr> - <expr>                // sugar for subtract(<expr>, <expr>) 
        | <expr> * <expr>                // sugar for multiply(<expr>, <expr>) 
        | <expr> / <expr>                // sugar for divide(<expr>, <expr>) 
        | <expr> # <intersection>        // widen
        | <agg-method> <expr> <groupings> // aggregation 
        | filter <expr> by <comparisons> // filter
        | dice <expr> by <dicers>        // dice
        | <expr> & ... & <expr>          // intersection
        | <expr> | ... | <expr>          // union
        | <term>                         // terms

<comparisons> := <comparison> and ... and <comparison> 
               | <comparison> or ... or <comparison>

<dicers> := <expr> and ... and <expr>
          | <expr> or ... or <expr>

<comparison> := = <term> | != <term> | < <term> | <= <term> | > <term> | >= <term> | like <term>

<agg-method> := count | total | collect | ambig | min | max | mode | count_distinct

<groupings> :=  // empty 
             | by <grouping>+ 

<grouping> := all <ident>  // project dimension 
            | to <level>   // rollup

Chapter 34. Spreading

34.1. Concepts

Using the measure service, it's possible for users to update the contents of a measure and the system will reflect the edit in the metrics from which that measure is defined. In the simplest case the user might update a metric directly. In a more complex situation the user might update a measure defined by aggregation, in which which case the system must spread the new aggregate value among the various target values used to compute that aggregate. Because there are different ways to do this, the programmer must specify a spreading policy to guide the update.

For instance, consider the following measures that purport to count the enemies in classic video games, and are defined over a single simple dimension. Observe that measure "Measure BadGuyPopulation at intersection (Game)" is computed using a TOTAL aggregation over measure "Measure BadGuyPopulation at intersection (Foe)"

Table 34.1. Dimension BadGuy

Positions at level FoePositions at level Game
Bob-ombSuper Mario Bros.
Goomba
Koopa Troopa
MoblinThe Legend of Zelda
Like Like


Table 34.2. Metric measure BadGuyPopulation at intersection (Foe)

PositionValue
Bob-omb25.0
Goomba155.0
Koopa Troopa90.0
Moblin38.0
Like Like12.0


Table 34.3. Measure BadGuyPopulation at intersection (Game)

PositionValue
Super Mario Bros.270.0
The Legend of Zelda50.0


Measure "BadGuyPopulation at (Foe)" is a metric, it's easy to update. If the user requests a modification increasing the Moblin count to 40 it's clear how to change both the underlying metric and aggregated measures:

Table 34.4. Update to metric measure BadGuyPopulation at intersection (Foe)

PositionValue
Moblin40.0


Table 34.5. Metric measure BadGuyPopulation at intersection (Foe)

PositionValue
Bob-omb25.0
Goomba155.0
Koopa Troopa90.0
Moblin40.0
Like Like12.0


Table 34.6. Measure BadGuyPopulation at intersection (Game)

PositionValue
Super Mario Bros.270.0
The Legend of Zelda52.0


Updating a metric might correspond to correcting previously incorrect data. In contrast updating non-metric measures is often used a part of planning applications to answer questions about hypothetical scenarios. For instance: "How much should each stores' profits change in order to support a 8% increase in sales?" or "How many Bob-ombs would you need to double the number of enemies in Super Mario Bros.?"

To answer the latter question we can update the computed measure as follows:

Table 34.7. Update to measure BadGuyPopulation at intersection (Game)

PositionValue
Super Mario Bros.540.0

Table 34.8. Measure BadGuyPopulation at intersection (Game)

PositionValue
Super Mario Bros.540.0
The Legend of Zelda52.0

But it's unclear how to adjust the underlying metric. Two reasonable policies are spread-by-even and spread-by-ratio. The former distributes "new" values evenly across each position that contributes to the updated value. In this case the population value for each position that rolls-up to "Super Mario Bros." would be increased by 90.0, yielding the following:

Table 34.9. Metric measure BadGuyPopulation at intersection (Foe)

PositionValue
Bob-omb115.0
Goomba245.0
Koopa Troopa180.0
Moblin40.0
Like Like12.0


In contrast spread-by-ratio increases values in the underlying distribution in proportion to their initial distribution. Under spread-by-ratio the metric would be updated as follows:

Table 34.10. Metric measure BadGuyPopulation at intersection (Foe)

PositionValue
Bob-omb50.0
Goomba310.0
Koopa Troopa180.0
Moblin40.0
Like Like12.0


The following sections show how users can request such updates, how to select a spreading policy, and how to delete data.

34.2. Update structure

Measure updates are made using tabular data exchange services to invoke the measure plugin. The measure plugin takes two inputs: csv-formatted data values, and an update expression defining a spreading policy. Typically these inputs are passed using an html POST method: the spreading policy is encoded in the POST's uri, and the csv values are uploaded as data with mime type "content-type: text/csv". Measure update requests return diagnostic information, not a modified table. A common interaction pattern is to request an update, via POST, followed by a query, via GET, to display updated measures.

Spread definitions are written as protobuf messages of type UpdateExpr, shown below. This message type contains four fields. An expression translates requested updates at intersection inter into actual modifications of the metric named by field metric. Field kind indicates whether the update will delete values from the metric (REMOVE) or alter existing values (SPREAD). As indicated above, there can be many different ways to spread values during an update; field transform defines such a spreading policy.

message UpdateExpr
{
  /**
   * Enumerate the kind of updates that are supported.
   */
  enum Kind { SPREAD = 1; REMOVE = 2; }

  message Transform {
    enum SpreadKind {
      EVEN = 1;
      RATIO = 2;
      PERCENT_PARENT = 3;
      QUERY = 4;
    } 

    required SpreadKind spread_kind = 1; 
    repeated QualifiedLevel distribution = 2 [(blox.options.set) = true];
    optional PercentParent percent_parent = 3;
    optional Query query = 4;
  }
  
  required Kind kind = 1; 
  required string metric = 2; 
  required Intersection inter = 3;
  repeated Transform transform = 4; 
} 

34.3. Direct updates

The simplest form of update is to directly modify a metric. In the model above, we might use the update message,

kind: SPREAD
metric: "Population"
inter {
  qualified_level {
    dimension : "BadGuy"
    level : "Foe"
  }
} 

and csv data,

Foe|Population
Moblin|54000.0

to increase the number of Moblins in the table. Observe that metric named by metric is defined at the intersection identified by inter. No Transform needs to be specified because the update is directly to a metric.

34.4. Indirect spreads

Updates at intersections other than base metrics require the user to guide value spreading. The examples above show spreads from intersection (Game) to a metric at intersection (Foe). These particular spreading policies are realized using the following update expressions.

kind: SPREAD
metric: "Population"
inter {
  qualified_level {
    dimension : "BadGuy"
    level : "Game"
  }
}
transform {
  spread_kind : EVEN
  distribution {
    dimension : "BadGuy"
    level : "Foe"
  }
} 

kind: SPREAD
metric: "Population"
inter {
  qualified_level {
    dimension : "BadGuy"
    level : "Game"
  }
}
transform {
  spread_kind : RATIO
  distribution {
    dimension : "BadGuy"
    level : "Foe"
  }
} 

Spreading policies are defined by an ordered sequence of Transform messages that specify what policy to use for spread calculations, field spread_kind, and which levels change during the spread, repeated field distribution. Multiple Transform messages may be composed, allowing you to spread (e.g.) first by ratio then by even at different points in a hierarchy.

34.5. Spread-by-even

Under spread-by-even, updated values are distributed equally among each member of the target of a transform. For instance, suppose members A, B, and C roll-up to member X. An update that increases X by 12.0, will increase each of A, B, and C by 4.0.

34.6. Spread-by-ratio

Under spread-by-ratio, updated values are distributed proportionally among each member of the target of a transform. For instance, suppose members A, B, and C roll-up to member X. If an update increases the value of X by a multiplicative factor of 4.0 then each value corresponding to A, B, and C will also increase by a multiplicative factor of 4.0.

As as special case spread-by-ratio behaves like spread-by-even when all values in the underlying metric are 0.0.

34.7. Spread-by-percent-parent

Under spread-by-percent-parent the user specifies an intersection that serves as a source of "parent" values.

    message PercentParent {
      required Intersection inter = 1;
    } 

The parent level is assumed to be the total of level being updated, and updates cause sibling values to change to maintain a constant parent value. For percent-parent transforms, field distribution should be empty.

Suppose that positions A, B, and C map to X and D maps to Y. Further consider a metric with values (A, 10.0) (B, 20.0) (C, 30.0) (D, 40.0). Now let's suppose the user requests a percent-parent update with parent intersection corresponding to X and Y and the modification (A, 35.0). This will result in new metric with values (A, 35.0) (B, 10.0) (C, 15.0) (D, 40.0). To understand why, note that new values preserve A + B + C = 60, where 60 is the sum of all the values that roll-up to X. D does not change as the sum of the values that roll up to Y did not change.

34.8. Spread-by-query

Spread-by-query allows the user to specify a measure expression that is used to directly compute a measure as the result of a transform. The expression is specified in the following message, which is stored in Transform's Query field.

    message Query {
      required MeasureExpr expr = 1;
    } 

The specified expression may refer to special metrics New, which indicates the newly updated values and Target indicating the old values belonging to the metric to update.

34.9. Removal

Removal expressions are specified with an UpdateExpr expression with field kind of REMOVE, the name of the base metric, and the intersection at which the positions are provided in the csv data; no spreading transforms should be specified. The csv rows are positions at the specified intersection that should be removed.

If the intersection is the base intersection, the given positions are removed from the base metric. This message,

kind: REMOVE
metric: "BadGuyPopulation"
inter {
  qualified_level {
    dimension : "BadGuy"
    level : "Foe"
  }
} 

and csv data,

Foe
Bob-omb
Moblin

request the rows for Foe's Bob-omb and Moblin to be removed from BadGuyPopulation, resulting in the updated base metric:

Table 34.11. Metric measure BadGuyPopulation at intersection (Foe) after removing Bob-omb and Moblin

PositionValue
Goomba155.0
Koopa Troopa90.0
Like Like12.0


Note that the rows are completely removed, instead of their values being reset to 0.0.

If the intersection is at above the base intersection, then all base positions that map to the csv positions are removed from the base metric. This message,

kind: REMOVE
metric: "BadGuyPopulation"
inter {
  qualified_level {
    dimension : "BadGuy"
    level : "Game"
  }
} 

and csv data,

Foe
The Legend of Zelda

requests that the rows for all Foe's that map to The Legend of Zelda at level Game to be removed from BadGuyPopulation, resulting in the updated base metric:

Table 34.12. Metric measure BadGuyPopulation at intersection (Foe) after removing The Legend of Zelda at level Game

PositionValue
Bob-omb25.0
Goomba155.0
Koopa Troopa90.0


Glossary

Argument

A named parameter of a query.

See Also Level.

Attribute

A function on members of a level.

See Also Level.

Dimension

A set of levels which represent the same concept at different levels of granularity.

See Also Level.

Dimensionality

A tuple of dimensions.

See Also Dimension.

Hierarchy

a subset of a dimension along with a mapping relation which connects the levels in a chain

See Also Dimension, Level.

Intersection

A tuple of levels.

See Also Dimension, Level.

Key

An attribute of a level at which the measure expression in a query is evaluated.

See Also Attribute, Level, Measure expression.

Level

An entity type which belongs to a dimension

See Also Dimension.

Measure

A maping from positions of some intersection to values, sets of values, or nothing.

See Also Dimension, Intersection.

Measure expression

An expression which defines a measure to be queried. Measure expressions can be parameterized with arguments.

Member

An instance of a level

Metric

A metric is a "primitive" measure defined directly by reference to a LogiQL predicate or a dialog.

Position

A tuple of members.

Predicate

The fundamental data structure to store data in a LogicBlox workspace. A predicate contains a set of facts.

Query

A list of measure expressions defined at the same intersection, along with the keys and arguments which is processed by the measure service to produce a report.

See Also Measure expression, Intersection.

Report

The result of a query, as a list of indexed columns, one for each key and measure expression in the query.

Part IV. Blade Application Framework

Chapter 35. Workbook Framework

The new workbook framework, based on services, is introduced in LogicBlox 3.10, but can also be used in combination with LogicBlox 3.9. The services based architecture allows application developers to partition the master workspace and therefore supports much larger applications than the "old workbook framework".

The lb-workbook command-line tool can be used by developers to communicate with the workbook application, e.g. to import data (users, position- or template access), but also to run actions such as creating, refreshing and deleting workbooks (individually or in batch).

The documentation of the workbook services testing API can be found here.

35.1. Getting Started

Environment Variables

The binary distribution of the workbook framework is included in the LogicBlox 3.10 package and all the necessary environment variables are set automatically. When using LogicBlox 3.9 in combination with the new workbook (services) framework, the binary distribution has to be downloaded separately and the following environment variables need to be set:

  • LB_WORKBOOK_HOME: pointing to the folder of the extracted binary.
  • PATH
    export PATH=$LB_WORKBOOK_HOME/bin:$PATH
    
  • LB_LIBRARY_PATH: set this one locally in your build script from LB_WORKBOOK_HOME
    export LB_LIBRARY_PATH=$LB_WORKBOOK_HOME/share:$BLOXWEB_HOME/share
    

BloxWeb configuration

Update the bloxweb configuration with the handlers below. Add the configuration section below to your lb_deployment directory, usually ~/lb_deployment/config/bloxweb.config or /data/lb_deployment/config/bloxweb.config. Depending on your installation location, the section you add should look like:

[handler:workbook-framework:workbook-action-service] 1
jar = <YourInstallDir>/workbook-framework/lib/java/WorkbookFrameworkServices.jar
classname = com.logicblox.workbook.services.WorkbookActionHandler

[handler:bcrypt-credentials] 2
classname = com.logicblox.bloxweb.authentication.BCryptCredentialsHandler

[realm-config:default-password] 3
class = com.logicblox.bloxweb.authentication.PasswordBCryptAuthenticationMechanism
stateful = true

1

Workbook Services handler plugin.

2

The bcrypt handler converts a clear text password into a hashed password on requests that update passwords. Otherwise, it behaves like a normal protobuf service.

3

Default configuration for authentication using passwords. More information on sateful authentication can be found here.

Building the Blade Application Workspace

Generating the configuration files from a Blade project requires that the Blade application workspace be built. This is a one-time activity that needs to be performed after each LogicBlox update. To build the Blade application workspace, run the following commands:

% cd $LOGICBLOX_HOME/blade
% ./build_ws

This will decompress the file containing the Blade source and build the Blade application workspace using the LogicBlox compiler.

Tip

Note that lb-services need to be running to be able to build the Blade application workspace.

35.2. Building Blade applications

The build process of a Blade application usually follows the following steps:

  1. Generate configuration files from the Blade project
  2. Build master and template workspaces
  3. Import the master as well as the template workspaces into ConnectBlox for further processing
  4. Register template workspaces with master workspace
  5. Import application data
  6. Import users
  7. Import position access for all the users
  8. Import template access for all the users
  9. Create workbooks

Below you can find an example of a simple build script that performs steps 1-4 from the list above for an application with a workbook template called "planner". This script creates template workspaces for both workbook template types, namely partitioned and user specific. In the following sections each of these build steps is described in detail.

Tip

The workbook framework supports two types of templates, Partitioned and UserSpecific. The different template types define different ways for determining the initial set of entity positions that will be copied into a workbook from the master workspace, as well as what new entity positions if any will be added to a workbook during a refresh operation. You can find more information on these two workbook types here.

Example 35.1. Example build script part 1 - building master and template workspaces

function generate()
     {
     #generate configuration files from Blade project in folder bladeProject
     #in destination folder test_app
       genAppFilesFromBlade bladeProject test_app --overwrite --rebuildProjectWS
     }

function add_bloxweb_cred_lib()
    {
    #add bloxweb credentials library to project.txt file for inclusion in master ws
    echo bloxweb_credentials,library>>test_app/master/src/project.txt
    }
     
function build()
    {
    #build master and template workspaces
      cd test_app
      if test "$clean" = "true"; then
	 ./clean_all
      fi
      ./build_all
      cd ..
    }
    
function import_workspaces()
    {
      #import master workspace into ConnectBlox
      lb-workbook import-master --app test_app test_app/master/workspace
      
      #import template workspaces into ConnectBlox
      for tmpl in planner_Partitioned planner_UserSpecific; do
	lb-workbook import-template --template $tmpl --app test_app test_app/templates/$tmpl/workspace
      done
    }

function link()
    {
      #register template workspaces with master workspace
      for tmpl in planner_Partitioned planner_UserSpecific; do
	lb-workbook link-template --template $tmpl --app test_app
      done
    }

More information on the genAppFilesFromBlade script and all the configuration options can be found here.

The end goal of the build is to create workbooks that can be accessed by users. Before workbooks can be built, users will have to be imported to the master workspace. The workbook framework makes use of the bloxweb user authentication, all the commands that are described in the Authentication Section can also be used with Blade applications. In the example below a file called users.dlm is imported into the test application test_app. Before the users are imported, the default configuration of the service to use the default password configuration is added to the master workspace. More information on the default password configuration can be found here.

Example 35.2. Example build script part 2 - importing users

function add_default-password-config()
    {
    #add default-password configuration and restart bloxweb services
    lb addblock /blade/test_app/master -f default-password-config.logic --loglevel info
    bloxweb start-services
    }

function import-users()
     {
     #import users from file users.dlm
     bloxweb import-users users.dlm
     }

The users have to be provided in a pipe-delimited file with the following headers:

  • USER - required
  • DEFAULT_LOCALE - optional
  • EMAIL - optional
  • ACTIVE - optional
  • PASSWORD - optional
  • PUBLIC_KEY - optional

More information on the bloxweb import-users command can be found here.

The content of users.dlm for the test application can be found in the example below.

Example 35.3. Content of users.dlm

USER|DEFAULT_LOCALE
john|en_US
mary|en_US

The template access of a user determines which workbooks can be created for a user during the workbook create (batch or single workbook creation). Template access provides a method for allowing or disallowing access to templates within the Blade project without having to set position access to "Deny" for all positions for a particular user. Template access is set by inserting facts into the blox:workbook:schema:template:access predicate, via a delimited-file-service. In the example below a file called template-access.dlm is imported into the test application test_app.

Example 35.4. Example build script part 3 - importing template access

function import-template-access()
     {
     #import template acces from file template-access.dlm
     lb-workbook import-template-access --app test_app template-access.dlm --replace
     }

The template access has to be provided in a pipe-delimited file with the following headers: USER|TEMPLATE. The content of template-access.dlm for the test application can be found in the example below.

Example 35.5. Content of template-access.dlm

USER|TEMPLATE
john|planner_Partitioned
john|planner_UserSpecific
mary|planner_Partitioned
mary|planner_UserSpecific

All the arguments of the lb-workbook import-template-access command are explained in detail here.

Position access is used to manage access to specific entity elements within workbooks. Position access is stored in the blox:WorkBookTemplate:accessOverride predicate and can be managed via a delimited-file-service. In the example below the position access for the calendar hierarchy is imported from a file called position-access-calendar.dlm, for the product hierarchy from a file called position-access-product.dlm. The different strategies when considering position access policies is described here.

Example 35.6. Example build script part 4 - importing position access

function import-position-access()
     {
     #import position acces for calendar hierarchy from file position-access-calendar.dlm
     lb-workbook import-position-access --app test_app --template __master --level Calendar:Month position-access-calendar.dlm --replace
     
     #import position access for product hierarchy from file position-access-product.dlm
     lb-workbook import-position-access --app test_app --template __master --level Product:Department position-access-product.dlm --replace
     }

The position access has to be provided in a pipe-delimited file with the following headers: USER|POSITION_ID|LEVEL. The content of position-access-calendar.dlm for the test application can be found in the example below.

Example 35.7. Content of position-access-calendar.dlm

USER|POSITION_ID|LEVEL
john|Jan 2012|Read
mary|Feb 2012|Read
john|Mar 2012|Write
mary|Apr 2012|Write
john|May 2012|Write
mary|May 2012|Write

Example 35.8. Content of position-access-product.dlm

USER|POSITION_ID|LEVEL
john|shoes|Deny
john|shirts|Write
mary|shoes|Write
mary|shirts|Deny

All the arguments of the lb-workbook import-position-access command are explained in detail here.

The creation of workbooks is the final step of a simple build process. Workbooks can be built one by one or in batch. In the example below all possible workbooks are created during a batch process.

Example 35.9. Example build script part 5 - create workbooks

function create_workbooks_batch()
     {
     #create all workbooks
     lb-workbook create-workbook-batch --app test_app --execute
     }

Once a workbook is built, it is possible to launch it manually. The workspace path of a workbook can be determined by running the lb workspaces command. Another option is to run the lb-workbook list-workbooks command which prints detailed information on all workbooks (you can find more information on the lb-workbook list-workbooks command here).

Example 35.10. Example on launching a workbook manually

launch.sh -db $(lb filepath <filepath> )

The following section gives an overview of the generated Blade configuration files and how developers can manipulate these files for their own projects.

35.3. Blade configuration files

The set of configuration files can be generated using the genAppFilesFromBlade script (in the $LOGICBLOX_HOME/bin/ directory). This script will create configuration files and scripts used to build the application’s master workspace and a workbook template workspace for each workflow defined in the Blade project. The genAppFilesFromBlade script has the following format:

genAppFilesFromBlade destination_directory {--options}
The following options are available when running the script:
--overwrite Forces overwrite if the destination directory exists. All files except for WorkBookTemplate.logic in the master/src/__app/ext/ and templates/*/src/__app/ext/ directory will be replaced if the --overwrite option is specified. If the destination directory already exists and --overwrite is not specified, the script will exit without doing anything.

Tip

The WorkBookTemplate.logic file in the ext/ directory is used to define custom extensions to the generated configuration files that will not be overwritten by the Blade conversion script. You can find more information on this topic here.

--rebuildProjectWS Forces the rebuilding of the Blade project workspace.
--masterType olap|datalog Allows the user to specify whether the master configuration will use LogicBlox OLAP support. The default masterType is "datalog".
--securityPolicy master|template Allows the user to specify the type of security policy. The "template" choice results in the script generating a security policy for each template giving the user the ability to have different security policies for each template rather than having the master security policy only and apply it to all templates. The default for securityPolicy is "master".
--templateType partitioned|user|all Allows the user to specify the type(s) of templates that will be generated. The "partitioned" choice will result in the script generating only the partitioned templates. The "user" choice will result in the script generating only the "UserSpecific" (for user workbooks) templates. The default for templateType is "all".
--noActions Allows the user to specify that actions not be processed and the associated logic files not be built. This can be helpful in determining if the other project properties (measures, rules, etc.) are such that the master and/or template workspaces compile and build as expected.
--clean Allows the user to specify that any existing workspaces be deleted so that incremental compilation will not occur. If the --clean switch is not used, incremental compilation will determine what has changed and re-compile as is appropriate.

The set of configuration files and folders generated by this script are described in detail in the next sections.

35.3.1. Top Level Application Directory

At the top level, all configuration files for one application are stored in a single directory structure. The top level application directory contains the following:

test_app/
    master/ 1
    templates/ 2
    build_all 3
    clean_all 4

1

directory containing configuration files and scripts used to build the master workspace

2

directory containing configuration files and scripts used to build all workbook templates, each template in it's own sub-directory of templates/

3

a shell script which will build workspaces for the master and all workbook templates

4

a shell script which will delete any workspaces and intermediate files for the master and workbook templates

35.3.2. Master Configuration Directory

master/
    build/ 1
    src/ 2
    workspace/ 3
    build.lb 4
    build_ws 5
    clean_ws  6

1

directory containing intermediate files used to build the master workspace

2

directory containing all the master configuration files

3

directory in which the master workspace files are placed by the build_ws script

4

LogicBlox batch script used by build_ws to generate the master workspace

5

shell script which compiles the configuration files and builds the master workspace

6

shell script which deletes the master workspace and any intermediate files generated by the build scripts

35.3.3. Main Master Configuration Files

The master/src/ directory contains a file called project.txt, which is the project description file that is used to incrementally compile all Datalog configuration files. This project file must include the system:app, system:gui, system:protocol, blox:workbook, and blox:wbt:security (or blox:wbt:olap_security if the master uses the OLAP calculation engine) system libraries. You may need to modify this file to add application actions and Datalog rule blocks.

The __app directory contains all master of configuration files that an application developer can modify. This directory is in place to provide a unique __app namespace prefix for all configuration blocks that are generated to prevent potential name collision with other application rule blocks.

Beneath the master/src/__app/ directory are the following files and folders:
actions/ directory containing all application actions that are installed in the master workspace as inactive blocks. There may be *.logic and *.rules files in this directory. All *.logic files contain Datalog rules. All *.rules files contain measure rules that were exported from a Blade project and should have a corresponding translation to a Datalog .logic file.
actions/__init__.logic file generated from a Blade project or hand coded to associate action rule blocks with action names invoked by the user-interface framework.
blocks/ directory containing all application Datalog rule blocks that are installed as active rules in the master workspace.
Entities.logic file containing all entity declarations and properties as well as entity mapping predicate declarations that are installed into the master workspace.
ext/ directory containing configuration extensions that are not overwritten by the genAppFilesFromBlade script.
ext/PreActionInit.logic file which can contain Datalog rules that are executed when the master workspace is being built just before actions are compiled and installed into the workspace. This allows initialization of data, such as remote reference information, needed before actions can be installed.
ext/WorkBookTemplate.logic optional file which can be used to add extensions to the WorkBookTemplate properties created by either of the generation scripts. This file is not replaced when the generation scripts are run, and thus is a safe place to maintain extensions which won’t be overwritten. Example extensions include additional commit or refresh policies and additional commit or refresh groups needed by the application.
gen/GenBlocks.logic file which is currently required to workaround some sequencing issues with how the workbook framework generates rule blocks. It should always contain blox:wbt:_genBlocks[] = true <-- .
InitEntities.logic file containing any entity initialization required before data predicates are registered, for example adding entity elements used as default values for some data predicates.
Measure.rules file which contains the measure rules that are installed into a master workspace that uses the OLAP calculation engine. This file can be missing or empty for a master workspace that doesn’t use the OLAP engine.
OlapProperties.logic file containing OLAP data modeling properties for measures, dimensions, levels, and level maps. This is needed to associate OLAP schema to the core Datalog schema if the master is using the OLAP calculation engine. It can be empty or missing otherwise.
Predicates.logic file which contains Datalog declarations for all application data predicates along with their physical properties such as lang:defaultValue and lang:disjoint. All entity and entity mapping predicate declarations are in the Entities.logic file rather than in this file.
SchemaProperties.logic file which contains non-physical entity and predicate properties such as labels, entity is filtered, default aggregation, etc. If created by one of the generation scripts, this file will also contain other initializations such as floating point precision and composite agg/spread definitions.
TestUsers.logic file which is generated by the genAppFilesFromBlade script if the Blade project contains user definitions. It is optional and not used by default. It can be added to the project.txt file or otherwise executed in the master workspace to initialize test users.
WorkBookTemplate.logic file in the master configuration just contains position access policy settings for the master and any workbook templates.

35.3.4. Template Configuration Files and Directories

The templates directory in the top level application directory contains a directory for each template configuration. As you can see below, the configuration files and scripts for workbook template workspaces are essentially the same as the master configuration files, with additional files for user interface information. The content of template configuration files could vary significantly from the master configuration files if things such as application schema need to vary.

templates/ 
    planner_Partitioned/
	build/ 
	src/
	workspace/
	build.lb
	build_ws
	build_ws.log
	clean_ws
    planner_UserSpecific/
	build/
	src/
	workspace/
	build.lb
	build_ws
	build_ws.log
	clean_ws
Beneath the template/$template_name/src/__app/ directory are the following files additionally to all the ones that are also generated for the master workspace and were described above:
Tasks.logic the legacy Java user-interface framework uses tasks and views defined in this file along with forms defined in the Views.logic file to define user-interface structure and behavior.
Views.logic file which defines the structure of forms used by the legacy Java user-interface framework. Details are discussed in another section of this document.

35.4. Schema Definition

The workbook framework is designed to support both Datalog and OLAP application development models. Datalog application development exclusively uses entity and predicate declarations to define the application schema and Datalog rule blocks in order to define application behavior. OLAP application primarily uses measures, dimensions, levels and level maps to define the application schema and measure rules to define behavior. OLAP applications can use Datalog schema definitions and rule blocks.

In order to support both models, the workbook framework relies on Datalog to define the application schema in terms of entities and predicates. If an application wishes to use an OLAP model, additional properties are set that indicate which entities are to be used as OLAP levels in dimensions, which predicates are to be used as OLAP level maps in dimensions, and which predicates are to be used as OLAP measures.

35.4.1. Entities.logic

The Entities.logic file, a level-0 Datalog file which is part of both master and template configurations, is the starting point for defining an application schema. This file should contain all entity declarations and physical properties. By convention, it should also contain declarations for predicates which will be used in aggregation hierarchies to map between related entities. For example:

Calendar:Month(e), Calendar:Month:id(e:r) -> string(r).
lang:entity(`Calendar:Month).
lang:physical:capacity[`Calendar:Month] = 32767.
lang:ordered(`Calendar:Month).

Calendar:Year(e), Calendar:Year:id(e:r) -> string(r).
lang:entity(`Calendar:Year).
lang:physical:capacity[`Calendar:Year] = 32767.
lang:ordered(`Calendar:Year).

Calendar:Week:month[f] = t ->
  Calendar:Week(f), Calendar:Month(t). 

By convention, related entities and mapping predicates are named with a common prefix, “Calendar:” in this example. However, this is only a convention and application developers can choose alternate conventions if desired.

35.4.2. Predicates.logic

The second major piece of schema definition is contained in the Predicates.logic file, another level-0 Datalog file. This file should contain Datalog declarations and physical properties for all data predicates (non-entity or mapping predicates) in the application. For example:

Sales[k0, k1, k2] = v ->
  Product:Sku(k0), Location:Store(k1), Calendar:Week(k2), float[32](v).
lang:defaultValue[`Sales] = 0.0.
lang:disjoint(`Sales).

35.4.3. InitEntities.logic

If a predicate has an entity value type and that predicate also has a default value, the LogicBlox compiler and runtime currently require that the default value be added to the entity before a predicate using the default value is declared. This limitation requires the workbook framework to first declare all entities (Entities.logic), then initialize the entities if necessary (InitEntities.logic), and only then declare predicates (Predicates.logic). Some examples are Entities.logic and Predicates.logic. InitEntities.logic is a level-0 Datalog file. An example InitEntities.logic file is as follows:

+Calendar:Week(e), +Calendar:Week:id(e:id) <- id = "week_1".

35.4.4. SchemaProperties.logic

The next part of defining an application data model is to specify non-physical entity and predicate properties such as labels, default aggregation method, and formatting. This is accomplished by setting blox:Entity and blox:Predicate properties in the level-1 Datalog file called SchemaProperties.logic.

Current entity properties which can be set in SchemaProperties.logic are label, isSecured, and isFiltered. The name of an entity used in this file must match the name of the entity as declared in the Entities.logic file. There is also an hierarchicalBase property that must to be set. This property defines the lowest set of entities that can be reached from an entity via aggregation/spreading relationships. An example of setting entity properties in the SchemaProperties.logic file is as follows:

+blox:Entity(_)
{
   +blox:Entity:name[] = "Calendar:Month",
   +blox:Entity:label[] = "MONTH",
   +blox:Entity:isSecured(),
   +blox:Entity:isFiltered(),
   +blox:Entity:hierarchicalBase("Calendar:Week")
} <-- .

The isSecured property is used along with any position access policy defined for the master or template configuration to generate predicates and rules which manage the users who are allowed to access specific entity elements. Details are discussed in the Position Access Policies section of this document.

The isFiltered property is used to guide workbook construction rules. All unfiltered entities will be completely copied from master to workbook workspace while filtered entities in workbooks may have only a subset of elements from the master, depending on template configuration properties. More details are in the Workbook Template Properties section.

Currently supported predicate properties include:

  • label
  • defaultValue
  • defaultAggMethod
  • defaultSpreadMethod
  • allowedAggMethods
  • allowedSpreadMethods
  • format
  • readOnly
  • horizontalAlignment
  • percentAggBase
  • percentAggDimension
  • isPrimaryEntityMap
  • isSecondaryEntityMap

The name of a predicate in the SchemaProperties.logic file must match the name of a predicate declared in either the Entities.logic file (for mapping predicates) or the Predicates.logic file.

The developer is advised to be careful with the defaultValue property. This property currently must be specified in both Predicates.logic and SchemaProperties.logic and the values must be the same. This ensures that the commit and refresh rules generated by the framework will be correct and will not produce runtime errors or poor performance.

For predicates used as aggregation mappings between related entities, the developer will typically set a label and indicate if the mapping is primary or secondary (there can be multiple mapping pathways between two entities and the primary mappings defined the preferred pathway). An example is as follows:

+blox:Predicate(_)
{
   +blox:Predicate:name[] = "Calendar:Week:month",
   +blox:Predicate:label[] = "WEEK -> MONTH",
   +blox:Predicate:isPrimaryEntityMap()
} <-- .    

An example of setting predicate properties for an application data predicate is as follows:

+blox:Predicate(_)
{
   +blox:Predicate:name[] = "Sales",
   +blox:Predicate:label[] = "Sales",
   +blox:Predicate:defaultValue[] = "0.0",
   +blox:Predicate:defaultAggMethod[] = "total",
   +blox:Predicate:defaultSpreadMethod[] = "ratioEven",
   +blox:Predicate:allowedAggMethods("total"),
   +blox:Predicate:allowedSpreadMethods("ratioEven"),
   +blox:Predicate:format[] = "0.00"
} <-- .     

The SchemaProperties.logic file created by either of the generation scripts contains floating point precision properties as well as composite spread and aggregation definitions. The developer may choose to keep this information in a separate file, as long as the appropriate modifications to the configuration’s project.txt file are in place and the floating point precision and composition agg/spread files are processed before SchemaProperties.logic is processed. An example of this information is as follows:

+blox:EngineProps:float32Precision[] = 0.0001 <-- .
+blox:EngineProps:float64Precision[] = 0.0001 <-- .

+blox:CompositeAggMethod(_)
{
   +blox:CompositeAggMethod:name[] = "PST",

   +blox:CompositeAggMethod:aggMethod[1] = "first",
   +blox:CompositeAggMethod:kind[1] = "Calendar",
   +blox:CompositeAggMethod:role[1] = "",

   +blox:CompositeAggMethod:aggMethod[2] = "total",
   +blox:CompositeAggMethod:kind[2] = "",
   +blox:CompositeAggMethod:role[2] = ""
} <-- .

+blox:CompositeSpreadMethod(_)
{
   +blox:CompositeSpreadMethod:name[] = "PET",

   +blox:CompositeSpreadMethod:spreadMethod[1] = "ratio",
   +blox:CompositeSpreadMethod:kind[1] = "",
   +blox:CompositeSpreadMethod:role[1] = "",

   +blox:CompositeSpreadMethod:spreadMethod[2] = "last",
   +blox:CompositeSpreadMethod:kind[2] = "Calendar",
   +blox:CompositeSpreadMethod:role[2] = ""
} <-- .     

35.4.5. OlapProperties.logic

Defining an OLAP data model for an application involves first declaring entities and predicates using Datalog rules before mapping those entities and predicates into OLAP modeling concepts. The OlapProperties.logic file contains level-1 Datalog rules which accomplish this mapping.

In order to define measures in this fashion a new blox:Measure element must be created and the basePredicate property set to be the name of the corresponding predicate (from the Predicates.logic file). By convention, the name of the measure is the same as the name of the predicate, but this is not required. For example:

+blox:Measure(_)
{
   +blox:Measure:name[] = "Sales",
   +blox:Measure:basePredicate[] = "Sales"
} <-- .     

OLAP data models group relates levels and mapping between those levels into a dimension. Each level corresponds to a single Datalog entity. Each level map corresponds to a Datalog predicate which maps between two entities. A dimension is defined by a set of base levels and has label, securityLevel, and partitionLevel properties. A dimension will contain all base levels plus all levels that which reachable from a base level via level maps.

The securityLevel property of a dimension is optional. If specified, it is used along with the blox:Entity:isSecured property and the position access policy for a workbook template to generate rules that compute user access rights for level elements. More details can be found in the Position Access Policies section. In general, all secured levels in a dimension will have access rights computed from the access rights defined for the security level of the dimension.

The partitionLevel property for a level is also optional. If specified, it is used along with other workbook template information to determine which elements for all levels in the dimension will be copied from master to workbook when a workbook is created. See Workbook Template Properties for details.

An example set of dimension, level, and level map properties that might be in an OlapProperties.logic file is below. Note that the blox:Level:entity value must be the full name of an entity declared in Entities.logic and that the blox:Level:mappingPred value must be the full name of a mapping predicate declared in Entities.logic. By convention, the name of the dimension matches the prefix for the entity and entity map names, but this is not required.

+blox:Dimension(Cal),
+blox:Dimension:name(Cal:"Calendar"),
+blox:Dimension:label[Cal] = "Calendar",
+blox:Dimension:securityLevel[Cal] = Month,
+blox:Dimension:partitionLevel[Cal] = Month,
+blox:Dimension:baseLevel(Cal,Week),

+blox:Level(Week),
+blox:Level:name(Week:"Week"),
+blox:Level:entity[Week] = "Calendar:Week",

+blox:Level(Month),
+blox:Level:name(Month:"Month"),
+blox:Level:entity[Month] = "Calendar:Month",

+blox:Level:mapsTo(Week, Month),
+blox:Level:primary(Week, Month),
+blox:Level:mappingPred[Week,Month] = "Calendar:Week:month"
   <-- . 

35.5. WorkBook Template Properties

The previous section has described the core application configuration details, providing the foundation of the workbook framework. The primary reason for the existence of the workbook framework is to define how workbooks are built from a master application workspace and how data moves between workbook and master during commit or refresh operations. This section details how this behavior can be configured by an application developer.

35.5.1. WorkBookTemplate.logic for Master Configurations

The workbook framework allows each workbook template to maintain a separate set of position access predicates. For example, the application may require that one template in the application manage position access rights at the "Month" level with default being "Write" access, while another template in the application requires that position access rights be managed at the"Year" level with the default being "Read" access. The only workbook template property configured in the master configuration should be the position access policies for workbook templates. See Position Access Policies for more information.

Note

Note that the name for WorkBookTemplate in a master configuration which defines the default position access policy in the master workspace must be “__master”.

An example WorkBookTemplate.logic file for a master configuration that defines a single position access policy which is shared between the master and two workbook templates is as follows:

+blox:WorkBookTemplate(_)
{
   +blox:WorkBookTemplate:name[] = "__master",

   +blox:WorkBookTemplate:positionAccessPolicy[] = +blox:PositionAccessPolicy(_)
   {
	+blox:PositionAccessPolicy:defaultAccess[] = "Write",
	+blox:PositionAccessPolicy:calcMode[] = "LeastRestrictive",
	+blox:PositionAccessPolicy:securityLevel("Calendar:Month"),
	+blox:PositionAccessPolicy:securityLevel("Location:District"),
	+blox:PositionAccessPolicy:securityLevel("Product:Department")
   }
} <-- .

+blox:WorkBookTemplate(_)
{
   +blox:WorkBookTemplate:name[] = "planner_UserSpecific",
   +blox:WorkBookTemplate:useMasterPositionAccessPolicy()
} <-- .

+blox:WorkBookTemplate(_)
{
   +blox:WorkBookTemplate:name[] = "planner_Partitioned",
   +blox:WorkBookTemplate:useMasterPositionAccessPolicy()
} <-- .

35.5.2. WorkBookTemplate.logic for Template Configurations

While master configurations allow multiple workbook templates to be created so that each one can define a separate position access policy, there should be only a single workbook template created in a template configuration. A workbook template should have:

  • a name: used when building workbooks from the template
  • a type: either Partitioned or UserSpecific
  • a calcEngine value: either Measure or Datalog

Optionally, a workbook template may have the following properties:

  • label
  • positionAccessPolicy
  • uiPolicy
  • refreshGroup
  • commitGroup
  • a set of partitionLevel values
  • and a set of securityLevel values

The name of a workbook template must match the template name in the master configuration’s WorkBookTemplate.logic file if the master configuration defines position access policies for the template. The name should also match the name of the templates sub-directory in which all the template configuration files are located.

The label of a workbook template is currently not used, but is provided for user interfaces in a launchpad application to display some friendly label to users for organizing or otherwise identifying templates or workbooks built from a template.

35.5.3. Template Type

The workbook framework currently supports two types of templates, Partitioned and UserSpecific. The different template types define different ways for determining the initial set of entity positions that will be copied into a workbook from the master workspace, as well as what new entity positions if any will be added to a workbook during a refresh operation. In addition to template type, each entity’s isFiltered property and position access data contribute to how positions are added to a workbook.

Note that partitioned templates should use OLAP schema definition since the framework currently uses this information to determine which entities are partitioned. An entity is considered to be partitioned if it is associated with a level that is in a dimension that contains a partition level. The partition level can be set via the blox:Dimension:partitionLevel property (in the OlapProperties.logic file) or can be defined for the template via the blox:WorkBookTemplate:partitionLevel property (in the WorkBookTemplate.logic file).

The current rules that determine which positions to add to a workbook are as follows:

  • Unfiltered entities are always completely copied from the master regardless of template type or whether the entity is secured or not. An unfiltered entity is any entity that does not have the blox:Entity:isFiltered property set (in SchemaProperties.logic file). A secured entity is one that has the blox:Entity:isSecured property set.
  • For a partitioned template, filtered entities are copied by projecting up from the hierarchical base for the entity. All level map pathways between a base level and the target filtered level will be traversed to find all target level positions that can be reached from the base level positions.
  • For a partitioned template, positions for a filtered base level are determined by either projecting down using level maps from the positions in the partition level for the base level’s dimension (if a partition level is defined for the dimension) or copying all base positions from the master if the base level’s dimension has no partition level defined.
  • For a user-specific template, positions copied into the workbook for all filtered and secured entities are determined by inspecting the position access data for the entity. If a position has either read or write access for at least one of the users the workbook is being built for, that position will be included in the workbook.
  • For filtered but unsecured entities in a user-specific template, all entity positions will be copied from the master into the workbook.

Tip

Note that the framework does not consider position access data for secured but unpartitioned levels in a partitioned template.

35.5.4. WorkBook Access

For partitioned templates, a set of entities or levels are specified as partition levels for the configuration. Each workbook built from a partitioned template will have one element from each partition level in the template. A user will be granted assess to a partitioned workbook if the user has either read or write access to at least one of the partition elements in the workbook.

For a user-specific template, each workbook is built for a specified set of users and only those users will be granted access to the workbook.

The set of users allowed to access a workbook is determined at the time a workbook is built. It is possible to update data in the master workspace to revoke a user’s access to existing workbooks.

Tip

It is currently not possible to grant new users access to an existing workbook.

35.5.5. Calculation Engine

Workbook templates support two different types of calculation engines: Measure or Datalog. A full discussion of the differences between these two engines is beyond the scope of this document. Below are a few guidelines that pertain to the workbook framework:

  • If the calculational behavior of your application is implemented with measure rules and programs (discussed in the Application Behavior section), then you must use the Measure engine, must define an OLAP data model, and must use the OLAP options for the project compilation file.
  • A template that uses the Measure calculation engine may still include Datalog rule blocks that implement part of the application’s calculation behavior.
  • A template that uses the Datalog engine may not use measure rules or programs but may use OLAP data modeling concepts. This could be useful if you wish to build a workbook template that uses OLAP levels and dimensions to support partitioned workbooks, but want to use datalog as the core calculation language.

This property is set via the blox:WorkBookTemplate:calcEngine predicate in the WorkBookTemplate.logic file for a template configuration.

35.5.6. partitionLevel Property

A set of partition levels can be defined as either properties on dimensions in OlapProperties.logic or can be defined as properties on the template in WorkBookTemplate.logic. Any partition level information in the template will override information in a dimension if there is a conflict. The goal here is to eventually allow template configurations to share some parts of their configurations, such as configuration of some dimensions and levels, and optionally override default settings provided by the shared components. The workbook framework does not yet support this kind of sharing unless you manually use symbolic file system links.

The partitionLevel property is keyed by the name of a dimension and uses the name of a level within the dimension as a value. An example is as follows:

+blox:WorkBookTemplate:partitionLevel["Calendar"] = "Month"

35.5.7. securityLevel Property

Templates have a securityLevel property that is intended to be used such as the partitionLevel property to either set or override the securityLevel for a dimension. The framework does not currently use this property for the template. Use the dimension’s securityLevel property in OlapProperties.logic to define security levels.

35.5.8. User Interface Policy

The workbook framework has been designed to work with the future LogicBlox web-based user interface framework as it becomes capable of supporting multi-dimensional OLAP-style pivot tables. T he workbook framework can be configured to work with the legacy Java-based user interface framework. Each template has a uiPolicy property which contains a few flags which control this configuration.

If the legacyUi flag is present, the workbook framework will generate rules that manage view initialization and navigation using the legacy user interface framework. If the commitMenu or refreshMenu flags are present, users will be presented with a menu to access the default commit and refresh rules. The exitMenu flag can be used to add a menu item that will exit the app when it is selected.

An example of setting uiPolicy is as follows:

+blox:WorkBookTemplate:uiPolicy[] = +blox:UiPolicy(_)
{
    +blox:UiPolicy:legacyUi(),
    +blox:UiPolicy:commitMenu(),
    +blox:UiPolicy:refreshMenu(),
    +blox:UiPolicy:exitMenu()
}.

35.5.9. Position Access Policies

The developer may choose from two different strategies when considering position access policies and generate the appropriate logic through the use of the "--securityPolicy" command-line switch for genAppFilesFromBlade (see ):

  • A single position access policy that will be applied to all templates and thus to all workbooks.
  • A position access policy for each template which will be applied to each set of workbooks belonging to each template.

When considering, setting, and verifying the position access policy, the developer should be aware of the following:

  • When a security level is set for any dimension (usually in Blade), accessRights predicates (blox:WorkBookTemplate:accessRights) are generated for all levels in that dimension. The current default value is Deny. Note that if no security level is set for any dimension, these predicates are not generated for any level and all positions will be included in any workbooks built.
  • Position access (blox:WorkBookTemplate:accessOverride) is set at the secured level only for each user via a level_1 predicate that in turns sets theaccessRights predicate value. The algorithm will work up and down the hierarchy from the secured level to set the accessRights predicate values as appropriate. Consider an example Product dimension with levels Sku and Style. For a master wherein all templates are using the master position access policy and security is set at Product:Style the logic imported into the master is as follows (giving Write access to all Styles):
    ^blox:WorkBookTemplate:accessOverride[“__master”, `Product:Style][u,r] = write <-
      system:app:User:name(u:”user1@logicblox.com”), Product:Style(r), system:app:SecurityLevel:name(write:”Write”).
    
  • Once the accessOverride is set by importing the logic, the accessRights settings can be seen in the other levels of the secured dimension via:
    lb print <connectblox_ws_name> ‘blox:WorkBookTemplate:accessRights[“__master”, `Product:Sku]’
    
  • The accessRights default is Deny. The defaultAccess may be changed by adding logic to ./<genAppFilesFromBlade_output_folder>/master/src/__app/ext/WorkBookTemplate.logic as in the example:
    +blox:WorkBookOverrides:positionAccessPolicy[] = P,
    +blox:PositionAccessPolicy(P),
    +blox:PositionAccessPolicy:defaultAccess[P] = "Read",
    +blox:PositionAccessPolicy:calcMode[P] = "LeastRestrictive",
    +blox:PositionAccessPolicy:securityLevel(P, "Product:Style"),
    +blox:PositionAccessPolicy:securityLevel(P, "Calendar:Month")
     <-- .
    When the blox:WorkBookTemplate:useMasterPositionAccessPolicy property is set, both master and templates use the master position access policy. In this case, the blox:WorkBookTemplate:accessRights[“__master”, `<Dim:Level>] facts are copied to blox:WorkBookTemplate:accessRights[“<template_name”, `<Dim:Level>] in each template and workbook. In this case defaultAccess should be the same for both master and templates.

Tip

Setting defaultAccess differently for master and templates when the blox:WorkBookTemplate:useMasterPositionAccessPolicy property is set could result in odd behavior for accessRights.

35.6. Commit and Refresh

The commit and refresh abstractions of the workbook framework support high-level configuration of actions that bring data from the master workspace into a workbook (refresh) and vice versa (commit). The workbook framework supports a variety of configuration options to select the desired behavior of such refresh and commit actions.

By default, committing a workbook will not add new positions from the workbook into the master or delete positions from the master that were deleted from the workbook. The same holds true for refreshing a workbook: no new positions in the master will be added to the workbook and new positions deleted from the master will be deleted from the workbook.

Refresh group.  A refresh group is a configuration of an action that can be executed by a user or a batch procedure to bring data from the master workspace into the workbook. A refresh group primarily contains a set of predicates whose data should be refresh when the refresh group is executed. Refresh groups are associated with workbook templates, where a workbook template can have several refresh groups, and refresh groups can be shared among workbook templates.

Refresh policy.  A refresh group can have a refresh policy, which configures how position (entity elements) are handled when executing the refresh. If a refresh group does not have a refresh policy, then the default refresh policy is used for this group. Refresh policies exist to help with reuse of configurations: typically an application needs only a limited number of refresh policies, while a variety of refresh groups might exist.

Commit group.  Similar to refresh groups, a commit group is a configuration of an action to apply data from a workbook to the master workspace. Commit groups primarily consist of a set predicates, and have a commit policy, which is similar to a refresh policy.

35.6.1. Refresh Policies

Refresh Rules

Refresh policies consist of a collection of refresh rules. Refresh rules can be configured at three levels of granularity:

  • Rules can be configured for specific levels of a dimension.

    blox:RefreshPolicy:levelRule[RP,LNAME] = RR -->
      blox:RefreshPolicy(RP), string(LNAME), blox:RefreshRule(RR).
    

  • Rules can be configured for all levels of a dimension. If there is a dimension-specific, and no level-specific rule for a level, then the dimension-specific rule is used.

    blox:RefreshPolicy:dimensionRule[RP,DNAME] = RR -->
       blox:RefreshPolicy(RP), string(DNAME), blox:RefreshRule(RR).
    

  • A refresh policy has a default rule, which is used for levels that do not have a level-specific rule, nor a dimension-specific rule for the dimension of the level.

    blox:RefreshPolicy:defaultRule[RP] = RR -->
       blox:RefreshPolicy(RP), blox:RefreshRule(RR).
    

A refresh rule has three properties:

  • A boolean flag to specify whether positions should be added to the workbook if new positions are available in the master.

    blox:RefreshRule:addToWorkBook[RR] = V -->
      blox:RefreshRule(RR), boolean(V).
    

  • A boolean flag to specify whether positions should be removed from a workbook if new positions are available.

    blox:RefreshRule:deleteFromWorkBook[RR] = V -->
      blox:RefreshRule(RR), boolean(V).
    

  • A conflict resolution policy, which can be either "Abort" or "Override". The abort policy means that a refresh will be aborted if a new positon is available in the master, but the workbook already contains a position with the same identifier that it did not obtain from the master. Override means that this is assumed to be the same position.

    blox:RefreshRule:conflictResolutionPolicy[RR] = CRP -->
       blox:RefreshRule(RR), blox:ConflictResolutionPolicy(CRP). 
    

It is rarely necessary to define refresh rules yourself. Three standard refresh rules are available that should meet most needs. These rules are:

nameaddToWorkBookdeleteFromWorkBookconflictResolutionPolicy
DefaultfalsefalseOverride
AddDeleteOverridetruetrueOverride
AddDeleteAborttruetrueAbort

Standard Refresh Policies

The workbook framework comes with two refresh policies that are so common that it is useful to have them generally available.

The default refresh policy is used by all refresh groups that do not have a refresh policy. The default refresh policy is available as the predicate blox:RefreshPolicy:default[]. It is defined as:

+blox:RefreshPolicy:default[] = P,
+blox:RefreshPolicy(P) {
  +blox:RefreshPolicy:defaultRule[P] = "Default"
} <-- .

The add/delete/override refresh policy is a refresh policy that be default adds and deletes positions for all evels. This refresh policy is defined as:

+blox:RefreshPolicy:addDeleteOverride[] = P,
+blox:RefreshPolicy(P) {
  +blox:RefreshPolicy:defaultRule[P] = "AddDeleteOverride"
} <-- .

Customizing Standard Refresh Policies

The standard refresh policies can be customized in user-specific configuration. This can get confusing though, because developers familiar with the standard rules might make incorrect assumption about their behavior. If customization is needed, then we suggest creating a new refresh policy and explicitly configure refresh groups to use this policy.

To cause all refresh groups that use the default refresh policy to use the AddDeleteOverride refresh rule, a rule like the following can be added:

+blox:RefreshPolicy:entityRule[CP,"Project:Project"] = "AddDeleteOverride"
  <--
  blox:RefreshPolicy:default[] = CP.

In workbook framework projects generated from Blade, this would typically be done in src/__app/etxt/WorkBookTemplate.logic file for the template configuration, but in principle the rule can be defined anywhere.

35.6.2. Refresh Groups

Default Refresh Group

Workbook framework projects that are generated from Blade projects already contain one default refresh group, which is defined as follows:

+blox:RefreshGroup(_) {
  +blox:RefreshGroup:name[] = "default",
  +blox:RefreshGroup:refreshPolicy[] = blox:RefreshPolicy:default[],
  +blox:RefreshGroup:includeAllLevels(),
  +blox:RefreshGroup:includeAllLevelMaps(),
  +blox:RefreshGroup:includeSystemPredicates(),
  +blox:RefreshGroup:includeLabelPredicates(),
  +blox:RefreshGroup:includeSecurityPredicates(),

  +blox:RefreshGroup:measure("..."),
  +blox:RefreshGroup:measure("..."),
}

The measures included are determined as follows:

  • If there are any measures marked as refreshable, then the list of measures is the list of measures marked as refreshable, and are used in this workbook template.

  • If there are no measures marked as refreshable, then the list of measures is the list of EDB measures that are used in this workbook template.

Custom Refresh Groups

Developers can define additional refresh groups. Typically the purposes of this would be to provide refresh actions that only refresh a subset of the EDB measures. The custom refresh group can be defined anywhere, but currently this is usually added to templates/<template_name>/src/__app/ext/WorkBookTemplate.logic. For example, a custom refresh group that only refreshes the Sales and Returns measures, but does include levels and maps can be defined as:

+blox:WorkBookTemplate:refreshGroup(T, G)
+blox:RefreshGroup(G) {
  +blox:RefreshGroup:name[] = "refresh-group-name",
  +blox:RefreshGroup:refreshPolicy[] = blox:RefreshPolicy:default[],
  +blox:RefreshGroup:includeAllLevels(),
  +blox:RefreshGroup:includeAllLevelMaps(),
  +blox:RefreshGroup:measure("Sales"),
  +blox:RefreshGroup:measure("Returns")
}
  <--
  blox:WorkBookTemplate:name[T] = "planner_Partitioned".

Refresh Group Properties

  • blox:RefreshGroup:refreshPolicy - The refresh policy to use for this group. If no policy is specified, then the default policy is used. Example:

    +blox:WorkBookTemplate:refreshGroup(T, G),
    +blox:RefreshGroup(G) {
      +blox:RefreshGroup:name[] = "test",
      ...
      +blox:RefreshGroup:refreshPolicy[] = P
    }
      <--
      blox:WorkBookTemplate:name[T] = "planner_Partitioned",
      blox:RefreshPolicy:addDeleteOverride[] = P.
    

  • blox:RefreshGroup:measure - The list of measure names (strings) that should be refreshed when the group is executed. Example:

    +blox:WorkBookTemplate:refreshGroup(T, G),
    +blox:RefreshGroup(G) {
      +blox:RefreshGroup:name[] = "refresh-sales",
      +blox:RefreshGroup:measure("Sales"),
      +blox:RefreshGroup:measure("Returns")
    }
      <--
      blox:WorkBookTemplate:name[T] = "planner_Partitioned".
    

  • blox:RefreshGroup:predicate - The list of predicate names (strings) that should be refreshed when the group is executed

  • blox:RefreshGroup:entity - The list of entity names (strings) that should be refreshed when the group is executed. All key and value entities for predicates or measures in the group are automatically included.

  • blox:RefreshGroup:includeAllLevels - Include all levels in the refresh rules for this group. This is short-hand for explicitly listing all levels using blox:RefreshGroup:entity.

  • blox:RefreshGroup:includeAllLevelMaps - Update all mapping predicates between levels when the group is executed. This is a short-hand for listing all mappings explicitly using blox:RefreshGroup:predicate

  • blox:RefreshGroup:includeUiState - Include view preferences in this refresh.

  • blox:RefreshGroup:includeLabelPredicates - Update all level label predicates when the group is executed

  • blox:RefreshGroup:includeSecurityPredicates - Update position access predicates when the group is executed

  • blox:RefreshGroup:includeSystemPredicates - Indicates if the following system predicates should be included in the refresh rules for this group

    • system:Predicate:label

    • system:Predicate:cellStyleName

    • system:Predicate:displayWidth

    • system:Predicate:valueConstraint

    • system:app:TranslationTable

  • blox:RefreshGroup:masterUrl - Specify the URL for the remote workspace containing data that will be refreshed into the workbook. If the refresh is to occur from the master workspace used to build the workbook, this property can (and should) be left blank. This is a string which has the form "lbns://remote_ws". These URLs must to be initialized in the workbook to point to the physical location of the remote workspace by executing datalog as the follows:

    +system:ExternalWorkSpace(e),
    +system:ExternalWorkSpace:url(e:"lbns://remote_ws").            
    ^system:ExternalWorkSpace:localPath[e] = "/path/to/remote_ws"
       <-
       system:ExternalWorkSpace:url(e:"lbns://remote_ws").
    

35.6.3. Commit Policies

+blox:CommitPolicy:entityRule[CP,"Calendar:Week"]="AddDeleteOverride"
  <--
  blox:CommitPolicy:default[] = CP.

35.6.4. Commit Groups

Commit group properties are the same as the refresh group properties using "CommmitGroup" rather than "RefreshGroup".

+blox:WorkBookTemplate:commitGroup(T, G)
+blox:CommitGroup(G) {
  +blox:CommitGroup:name[] = "MY_CUSTOM_COMMIT_GROUP",
  +blox:CommitGroup:commitPolicy[] = P,
  +blox:CommitGroup:includeAllLevels(),
  +blox:CommitGroup:includeAllLevelMaps(),
  +blox:CommitGroup:includeSystemPredicates(),
  +blox:CommitGroup:includeLabelPredicates(),
  +blox:CommitGroup:includeSecurityPredicates(),

  +blox:CommitGroup:measure("COMMITTED_MEASURE1"),
  +blox:CommitGroup:measure("COMMITTED_MEASURE2"),
  +blox:CommitGroup:measure("COMMITTED_MEASURE3"),
  +blox:CommitGroup:measure("COMMITTED_MEASURE4"),
  +blox:CommitGroup:measure("COMMITTED_MEASURE5")
}
  <--
  blox:WorkBookTemplate:name[T] = "MyTemplate_Partitioned",
  blox:CommitPolicy:default[] = P.

35.6.5. Executing Custom Commit and Refresh Groups

Commit and Refresh groups are inactive blocks which can be executed via bloxbatch, .lb script, or lb exec as follows:

transaction
exec --storedBlock blox:RefreshGroup:namedBlock[\"MyTemplate_Partitioned:test\"] 
commit

or

transaction
exec --storedBlock blox:CommitGroup:namedBlock[\"MyTemplate_Partitioned:test\"]
commit

If from within an application action, custom commit and refresh groups can be executed as follows:

^system:app:command[] = cmd <- cmd = "executeBlock(blox:CommitGroup:namedBlock[\"MyTemplate_Partitioned:test\"])".

or

^system:app:command[] = cmd <- cmd = "executeBlock(blox:RefreshGroup:namedBlock[\"MyTemplate_Partitioned:test\"])".

35.6.6. Executing Default Commit or Refresh Rules

There are two basic mechanisms for triggering a commit or refresh operation. The developer may choose to trigger the operation through the legacy Java user interface code. This is how the standard File->Commit and File->Refresh menus work and can also be used by application defined action buttons. These can be executed using one of the following formats:

  • Commit measures -

    ^system:gui:appCommand[] = "triggerCommit(default)".

  • Commit measures and ui -

    ^system:gui:appCommand[] = "triggerCommit(default,__legacy_ui)".

  • Refresh measures -

    ^system:gui:appCommand[] = "triggerRefresh(default)".

  • Refresh measures and ui -

    ^system:gui:appCommand[] = "triggerRefresh(default,__legacy_ui)".

The application commands triggerCommit and triggerRefresh will display a progress dialog while the operation is running, start the execution of data transfer between the master and workbook workspaces, wait for the operation to complete, and update any data cached in the user interface if necessary.

The argument to the trigger commands is a comma-separated list of commit or refresh group names (do not include spaces in this list). The special group name “__legacy_ui” is used to execute the built-in commit or refresh group for UI data that is provided by the framework. The “default” name is the normally used name for the default set of commitable or refreshable measures for the workbook. You can also you the name of any application-defined commit or refresh group.

35.7. lb-workbook command

The lb-workbook command is a very useful command-line tool to interact with workbooks and has been introduced in LogicBlox 3.10.

Tip

See lb-workbook --help for all the commands that are available.

The most common arguments of the lb-workbook command are listed in the table below.

--app NAME name of the application
--workspace NAME name of the master workspace. This option is only needed if the standard naming convention for workspaces (/blade/APP/master) is not used. Normally the --app argument is sufficient.
--template TEMPLATE run command for a specific workbook template only
--user USER run command for a specific user only
--host HOST hostname where workbook services are hosted (default: localhost)
--port PORT port where workbook services are hosted (default: 55183)

The table below lists the commands that are only available for the batch commands.

--execute immediately execute the batch specification (default: false). This option can be used to produce a batch specification, which can then be included in a bigger batch specification.
--verbose generate a batch specification that prints output (default: false)
--sequential do not use parallelism in the batch (default value: false)
--timeout SECONDS the period in seconds that an exchange will wait for a response from the server (default value: 320)

The Python workbook libraries support specifying the host and port of the services, which enables complete remote execution of all lb-workbook operations, except import-master and import-template.

lb-workbook import-master

imports master workspace into ConnectBlox. Usage:

lb-workbook import-master [-h] (--app NAME | --workspace NAME) dir

lb-workbook import-template

imports template workspace into ConnectBlox. Usage:

lb-workbook import-template [-h] --template TEMPLATE (--app NAME | --workspace NAME) dir

lb-workbook import-position-access

The service to import position access is a delimited-file service that requires a pipe delimited file as input with the following header USER|POSITION_ID|LEVEL. Usage:

lb-workbook import-position-access [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                   --template TEMPLATE --level  LEVEL [--replace] [file]

The --replace option specifies whether existing data should be replaced or not. The default value of this argument is false.

lb-workbook import-template-access

The service to import template access is a delimited-file service that requires a pipe delimited file as input with the following header USER|TEMPLATE. Usage:

lb-workbook import-template-access [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] [--replace] [file]

The --replace option specifies whether existing data should be replaced or not. The default value of this argument is false.

lb-workbook list-templates

prints the following information on all the workbook templates that are available in the master workspace:

  • template id
  • workspace name
  • access policy (per security level the detault access)
  • partition levels
  • commit groups
  • refresh groups

lb-workbook list-templates [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]

Example output:

id: "planner_Partitioned"
type: "Partitioned"
workspace_name: "/blade/test_app/templates/planner_Partitioned"
access_policy {
  security_level {
    level: "Calendar:Month"
    default_access: WRITE
  }
  security_level {
    level: "Location:District"
    default_access: WRITE
  }
  security_level {
    level: "Product:Department"
    default_access: WRITE
  }
  security_level {
    level: "Team:Team"
    default_access: WRITE
  }
}
partition_level: "Calendar:Month"
partition_level: "Product:Department"
commit_group {
  name: "__legacy_ui"
}
commit_group {
  name: "default"
}
refresh_group {
  name: "__build"
}
refresh_group {
  name: "__legacy_ui"
}
refresh_group {
  name: "default"
}

id: "planner_UserSpecific"
type: "UserSpecific"
workspace_name: "/blade/test_app/templates/planner_UserSpecific"
access_policy {
  security_level {
    level: "Calendar:Month"
    default_access: WRITE
  }
  security_level {
    level: "Location:District"
    default_access: WRITE
  }
  security_level {
    level: "Product:Department"
    default_access: WRITE
  }
  security_level {
    level: "Team:Team"
    default_access: WRITE
  }
}
partition_level: "Calendar:Month"
partition_level: "Calendar:Year"
partition_level: "Product:Department"
commit_group {
  name: "__legacy_ui"
}
commit_group {
  name: "default"
}
refresh_group {
  name: "__build"
}
refresh_group {
  name: "__legacy_ui"
}
refresh_group {
  name: "default"
}

lb-workbook list-template-access

depending whether the template or user argument is used, this command either lists all the workbook templates that a specific user has access to, or all the users that have access to a specific template.

lb-workbook list-template-access [-h] [--template TEMPLATE] [--user USER] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]

lb-workbook list-batch

prints workbooks that would be created for a default batch.

lb-workbook list-batch [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] [--template TEMPLATE]

Example output:

--------------------------------------------
template: "planner_Partitioned"
partition_element {
  id: "Jan 2012"
  level: "Calendar:Month"
}
partition_element {
  id: "shoes"
  level: "Product:Department"
}
--------------------------------------------
template: "planner_UserSpecific"
username: "john"

lb-workbook list-workbooks

prints information on workbooks, such as:

  • workbook id
  • template
  • workspace name
  • workspace filepath
  • partition element (partitioned workbooks only) id and level
  • start build time
  • end build time

lb-workbook list-workbooks [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] [--user USER]

Example output:

--------------------------------------------
id: 0
template: "planner_Partitioned"
workspace_name: "/blade/test_app/master/workbooks/planner_Partitioned/0_18360_1406316455"
workspace_filepath: "/home/user/lb_deployment/workspaces/xxxx"
partition_element {
  id: "Jan 2012"
  level: "Calendar:Month"
}
partition_element {
  id: "shoes"
  level: "Product:Department"
}
start_build_time: "2013-03-04 03:35:34-05:00"
end_build_time: "2013-03-04 03:35:40-05:00"

--------------------------------------------

lb-workbook export-template-access

exports workbook template access from the master workspace. The service to export template access is a delimited-file service that results in a pipe delimited file with the following header: USER|TEMPLATE. Usage:

lb-workbook export-template-access [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] [file]

lb-workbook export-position-access

exports position access file from master workspace (based on predicate blox:WorkBookTemplate:accessOverride). The service to export position access is a delimited-file service that results in a pipe delimited file with the following header: USER|POSITION_ID|LEVEL. Usage:

lb-workbook export-position-access [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                   --template TEMPLATE --level LEVEL [--full] [file]

The --full option specifies whether default facts should be included in the export. The default value of this argument is false.

lb-workbook create-workbook

creates either a user specific workbook (based on a user name as input) or a partitioned workbook (based on a partition element(-s) as input). Usage:

lb-workbook create-workbook [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                            --template TEMPLATE [--user USER] [--elem LEVEL ID]

Tip

The --elem argument can be repeated multiple times, if a workbook is based on multiple partitions, see example below:

lb-workbook create-workbook --app test_app --template planner_Partitioned --elem Calendar:Month "Jan 2012" --elem Product:Department shoes

lb-workbook create-workbook-batch

creates workbooks during a batch. Usage:

lb-workbook create-workbook-batch [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                  [--template TEMPLATE] [--execute] [--verbose] [--sequential]
                                  [--timeout TIMEOUT] [file]

When this command is run without the --execute option, it can be used to generate a batch specification, which can then be included in a bigger batch specification.

An example batch specification producing 3 workbooks (1 user specific and 2 partitioned) would look like the following:

stm {
  parallel {
    stm {
      simple {
        call_proto {
          service: "http://localhost:8080/blade/test_app/master/workbook-action-service"
          encoding: JSON
          input {
            text: "{\"create\": {\"workbook\": {\"partition_element\": [{\"id\": \"Jan 2012\", \"level\": \"Calendar:Month\"},
	    {\"id\": \"shoes\", \"level\": \"Product:Department\"}], \"template\": \"planner_Partitioned\"}}}"
          }
        }
      }
      description: "create workbook planner_Partitioned for Calendar:Month(Jan 2012), Product:Department(shoes)"
    }
    stm {
      simple {
        call_proto {
          service: "http://localhost:8080/blade/test_app/master/workbook-action-service"
          encoding: JSON
          input {
            text: "{\"create\": {\"workbook\": {\"partition_element\": [{\"id\": \"Jan 2012\", \"level\": \"Calendar:Month\"},
	    {\"id\": \"shirts\", \"level\": \"Product:Department\"}], \"template\": \"planner_Partitioned\"}}}"
          }
        }
      }
      description: "create workbook planner_Partitioned for Calendar:Month(Jan 2012), Product:Department(shirts)"
    }
    stm {
      simple {
        call_proto {
          service: "http://localhost:8080/blade/test_app/master/workbook-action-service"
          encoding: JSON
          input {
            text: "{\"create\": {\"workbook\": {\"username\": \"mary\", \"template\": \"planner_UserSpecific\"}}}"
          }
        }
      }
      description: "create workbook planner_UserSpecific for mary"
    }
  }
}

Tip

Set the time-out for heavier batches reasonably high.

lb-workbook delete-workbook

deletes a specific workbook. Usage:

lb-workbook delete-workbook [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] --id ID

lb-workbook delete-workbook-batch

delete workbooks during a batch. Usage:

lb-workbook delete-workbook-batch [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                  [--user USER] [--template TEMPLATE] [--execute] [--verbose]
                                  [--sequential] [--timeout TIMEOUT] [file]

lb-workbook refresh-workbook

refreshes a specific workbook. Usage:

lb-workbook refresh-workbook [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                             --id ID [--group GROUP]

The optional --group argument specifies the refresh group that should be executed. By default the refresh groups 'default' and '__legacy_ui' are executed.

lb-workbook refresh-workbook-batch

refresh workbooks during a batch. Usage:

lb-workbook refresh-workbook-batch [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                   [--user USER] [--group GROUP] [--template TEMPLATE] [--execute]
                                   [--verbose] [--sequential] [--timeout TIMEOUT] [file]

lb-workbook exec-workbook

executes one or more inactive blocks in a workbook. Usage:

lb-workbook exec-workbook [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                    [--id ID] --blocks [BLOCKS [BLOCKS ...]]

lb-workbook exec-workbook-batch

executes inactive blocks across multiple workbooks. Usage

lb-workbook exec-workbook-batch [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT]
                                       [--user USER] --block NAME [NAME ...] [--template TEMPLATE] [--execute]
                                       [--verbose] [--sequential] [--timeout SECONDS] 
                                       [file]

lb-workbook import-users

Note

Deprecated from LogicBlox 3.10.2 onwards. Please use bloxweb import-users instead. More information on the bloxweb import-users command can be found here.

imports users into the master workspace. The service to import users is a delimited-file service that requires a pipe delimited file as input with the following header USER|DEFAULT_LOCALE. Usage:

lb-workbook import-users [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] [--replace] [file]

lb-workbook export-users

Note

Deprecated from LogicBlox 3.10.2 onwards. Please use bloxweb export-users instead. More information on the bloxweb export-users command can be found here.

exports users from the master workspace. The service to export users is a delimited-file service that results in a pipe delimited file with the following header: USER|DEFAULT_LOCALE. Usage:

lb-workbook export-users [-h] (--app NAME | --workspace NAME) [--host HOST] [--port PORT] [file]

35.8. Workbook Services

35.8.1. Protocols

The protocols that are included in the distribution can be found in the $LOGICBLOX_HOME/workbook-framework/lib/protobuf folder:

  • /blox/workbook/services/model.proto

  • /blox/workbook/services/rpc.proto

35.8.2. URLS

Services are hosted at the following URLs:

Path Description Protocol Service group Port
$ws/workbook-service For workbook information (non-batch). rpc.proto:WorkbookRequest bloxweb:public 8080
bloxweb:internal 55183
$ws/workbook-template-service For workbook template information (non-batch). rpc.proto:TemplateRequest bloxweb:public 8080
bloxweb:internal 55183
$ws/workbook-action-service For workbook actions (build, delete, refresh, commit). rpc.proto:WorkbookActionRequest bloxweb:internal 55183
$ws/workbook-hierarchy-service For hierarchy information (such as positions). rpc.proto:HierarchyRequest bloxweb:internal 55183
$ws/authorization/users For user information. USER|DEFAULT_LOCALE bloxweb:internal 55183
$ws/authorization/template_access For template access. USER|TEMPLATE bloxweb:internal 55183
$ws/authorization/position_access/$template/$level For position access. USER|POSITION_ID|LEVEL where level is Deny, Write, or Read. bloxweb:internal 55183

35.8.3. Authentication

In the default configuration, the workbook services hosted on port 8080 are not authenticated. In deployments these workbook services need to be configured to use authentication. This is an application-specific concern, because the authentication realms and methods of authentication will differ per application. For more information on authentication realms and their configuration, see the documentation on BloxWeb Authentication. It is also useful to understand BloxWeb Service Groups.

The workbook services are setup to make it easy to configure authentication. The predicate blox:workbook:services:common:service_config:client_service contains the two services that by default are available on port 8080. These services are available on port 8080 because they are in the service group bloxweb:public, and the default bloxweb.config hosts services in this group on port 8080. To add an authentication realm to these services, the application configuration should include an IDB rule as follows.

bloxweb:config:service:auth_realm[x] = "realm_name"
  <-
  blox:workbook:services:common:service_config:client_service(x).

To rule out service configuration mistakes, it is always useful to configure the TCP endpoint to require authentication on all services. The following example configuration will guarantee that no services will ever be available on the port of tcp:public that are not authenticated. In the $LB_DEPLOYMENT_HOME/config/bloxweb.config configuration file, include:

[tcp:public]
requires_authentication = true

For applications that use additional endpoins, for example SQS or RabbitMQ queues, the endpoint can include the group bloxweb:internal on these endpoints to make the batch-related workbook services available on that endpoint. If the rest of the application does not use service groups, then the endpoint can list the bloxweb:no-group group to host the other services as well. For example:

[sqs:sample]
request_queue_name = sample-request
response_queue_name = sample-response
groups = bloxweb:internal, bloxweb:no-group

35.9. Configuring the blox-applet-server

The blox-applet-server can be hosted by LogicBlox applications and used on the back-end to launch workbooks. The blox-applet-server is automatically started by lb-services if the blox-applet-server configuration files contain a setting enabled = true.

Configuration Files

The blox-applet-server is configured via the blox-applet-server.config file. The default file can be found in $LOGICBLOX_HOME/config/blox-applet-server.config. This file is not intended for editing. The user may override the default settings by placing a copy in $LB_DEPLOYMENT_HOME/config and modifying it as desired.

Configuration Options

Option Description
enabled Boolean (true/false) indicating whether the blox-applet-server should be started by lb-services.
port = 9090 The port to which the blox-applet-server will attach to listen for incoming connections. There is normally no reason to change this option.
processmanager = localhost:52569 The processmanager port to which the blox-applet-server will listen. Port 52569 is the default port of the processmanager. There is normally no reason to change this option.
jvm_args Options to pass to the JVM (typically used to set the maximum heap size).
enableTestForm = true Enables the page hosted at localhost:9090/test/applet that can be useful for testing the applet and workbook hosting.
allowDeferredResponse = true Boolean indicating whether to allow the blox-applet-server to defer responses.
minSessionInactivityOverrideMin = 5 The number of minutes with no activity that must pass before a user is allowed to override a session.
maxRequestWaitMsec = 60000 The maximum time in seconds that the blox-applet-server will wait for a response to a request.
traceRequests = false Boolean indicating whether the blox-applet-server will log the tracing of requests.
traceInvokerCalls = false Boolean indicating whether the blox-applet-server will trace invoker calls.
logExceptions = true Boolean indicating whether the blox-applet-server will log exceptions.
validateWorkSpaceAccess = throwException The action the blox-applet-server will take when the validating workspace access fails. Values can be:
  • throwException

  • ignore

  • autoTerminateSession

  • log

sessionExpirationCheckSec = 30 The number of seconds between times the blox-applet-server will check for session expiration.
sessionInactivityTimeoutMin = 60 The number of minutes of inactivity that must pass before a session is automatically expired. This relates directly to sessionExpirationCheckSec.

35.10. Migrating from pre-3.9 to 3.9.x

As the LogicBlox core engine is refined and more features are supported, support for certain methods and older syntax that lacks precision are removed. This simplesect outlines the methods that have changed from pre-3.9 releases to 3.9.x and instructs the user regarding how to use the new methods and/or syntax.

Changes to Workbook Commits and Refreshes

The legacy "commitWorkBook" and "refreshWorkBook" commands do not work with the new 3.9 and beyond framework. Please see the Commit and Refresh Groups section for details.

UI Support for the Master Workspace

By default, the master workspace is built without UI support. This reduces the size of the workspace, increasing the speed with which it is built and the speed with which it can be accessed. As a result of this change the sometimes-used model in older LB versions of users directly accessing the master workspace is no longer supported.

Blade and the 3.9.x Workbook Schema Configuration Files

The genAppFilesFromBlade script can be used to generate workbook template configuration files from an existing blade Project (see here for more information on how to build Blade applications).

Actions

The 3.9.x and beyond workbooks framework incorporates a number of changes to how workbook actions are handled. These changes are summarized below:

  • Measure language actions are not supported anymore. The developer may use "bloxbatch -convertMeasureRules" to convert measure actions to datalog.

    Tip

    Note that the genAppFilesFromBlade automatically performs this function.

  • All actions are now pre-compiled. This means compilation checking is now performed during the building of the application workspaces. Actions with errors will prevent the build from completing successfully, even if they are not used at runtime.
  • The genAppFilesFromBlade script attempts to include actions in a workbook template config only if all measures/predicates in the action exist in the template. Developers can also manually define which actions are to be included in which templates (including the master) in the "Workflow Actions" node in Blade.

    Tip

    Note that once the user decides to manually configure which actions are to be included in which templates, this must be done for all templates including the master.

Remote References

The use of remote references (used most often for refresh and commit) has changed in 3.9.x and beyond as compared to 3.7.x. Below is a list of these changes and how they might change development tasks:

  • References with the ":remote" prefix are no longer generated for all measures and levels. This results in smaller workspaces and a quicker and smoother feel for the user.
  • If the application is to use remote references, they must be declared in the block. When remote references are required for use in actions, the developer may choose to place the logic which defines remote references in ./<genAppFilesFromBladeOutputFolder>/templates/<template_name>/scr/__app/ext/PreActionInit.logic.
  • These remote references should always be local predicates in inactive blocks.
  • Installing predicate remote reference declarations in active blocks results in the fact that opening the local workspace must also open the remote workspace.
  • Below is an example of how this logic looks for committing local "Sales" to remote "Sales_TY_Retail":
    // this should be set up before refs created.  deployWorkBookApp builds the reference to system:ExternalWorkSpace (next 3 lines).
    +system:ExternalWorkSpace(e),
    +system:ExternalWorkSpace:url(e:"lbns://master"),
    +system:ExternalWorkSpace:localPath[e] = "/path/to/master".
    
    _master:Week(e), _master:Week:id(e:id) -> string(id).
    lang:remoteRef[`_master:Week] = "lbns://master#Calendar:Week".
    lang:remoteRef[`_master:Week:id] = "lbns://master#Calendar:Week:id".
    
    _master:Sku(e), _master:Sku:id(e:id) -> string(id).
    lang:remoteRef[`_master:Sku] = "lbns://master#Product:Sku".
    lang:remoteRef[`_master:Sku:id] = "lbns://master#Product:Sku:id".
    
    _master:Store(w), _master:Store:id(w:id) -> string(id).
    lang:remoteRef[`_master:Store] = "lbns://master#Location:Store".
    
    _master:Sales[sk,st,wk] = v ->
       _master:Sku(sk), _master:Store(st), _master:Week(wk), float[32](v).
    lang:remoteRef[`_master:Sales] = "lbns://master#Sales_TY_Retail".
    	// note name difference in master...
    
    // commit local Sales to Sales_TY_Retail in master
    ^_master:Sales[r_sk, r_st, r_wk] = v <-
       Sales[sk, st, wk] = v,
       Product:Sku:id(sk:skn), _master:Sku:id(r_sk:skn),
       Location:Store:id(st:stn), _master:Store:id(r_st:stn),
       Calendar:Week:id(wk:wkn), _master:Week:id(r_wk:wkn).
    

Element Labels, Position Access, and Lock Predicates

Label, position access, and lock predicates are now generated with level-1 rules as needed by the developer. Predicates such as Product:Sku:label, Product:Sku:label_es_MX, and Product:Sku:positionAccess no longer exist.

  • Actions which insert and/or update positions must be updated after to use the new approach.
  • To reference these predicates in rules they must be looked up via their level-1 names as in the following example:
    +blox:Locale:labelPred[`Calendar:Week, "en_US"][w5] = "Week 5"
    <- Calendar:Week:id(w5:"week_5").
    
    +blox:Locale:labelPred[`Calendar:Week, "es_MX"][w5] = "Week 5 (MX)"
    <- Calendar:Week:id(w5:"week_5").
    
    // first param here is name of the template since each template can
    // have separate access rights
    ^blox:WorkBookTemplate:accessOverride[
       "__master", `Location:District][u,d] = read
       <-
          system:app:User:name[u] = "user1",
          Location:District:id[d] = "district_9",
          system:app:SecurityLevel:name[read] = "Read".
    
  • Import scripts must be updated to use these new level-1 names for the label predicates as in the following example:
    option,typeReportingLevel,warning
    option,quotedValues,true
    option,escapeQuotedValues,true
    
    fromFile,$(SCRIPT_DIR)/../Data/Product.csv,Sku,Sku,SkuLabel,SkuLabel,Style,Style,StyleLabel,StyleLabel,TtlProd,TtlProd,TtlProdLabel,TtlProdLabel
    
    toPredicate,Product:Sku,Sku
    toPredicate,blox:Locale:labelPred[`Product:Sku,"en_US"],Sku,SkuLabel
    toPredicate,Product:Style,Style
    toPredicate,blox:Locale:labelPred[`Product:Style,"en_US"],Style,StyleLabel
    toPredicate,Product:TtlProd,TtlProd
    toPredicate,blox:Locale:labelPred[`Product:TtlProd,"en_US"],TtlProd,TtlProdLabel
    toPredicate,Product:Sku:style,Sku,Style
    toPredicate,Product:Style:ttlProd,Style,TtlProd
    

Batch Operations

The legacy batch scheduler and the Portal Admin application are not supported in 3.9.x and beyond. The developer must use ConnectBlox to build workspaces and cronned batch jobs for repeated batches.

Performance

Building workbook template workspaces can be slow. Template workspace build can be two minutes for a small test application and 7-8 min or more for a large applications. The building of the workbooks (children) proceeds very quickly. This is a result of two key facts:

  • Due to the template workspace strategy, the building of the child workbook is basically a simple copy of the template workspace and the populating of hierarchies and data.
  • The building of the child workbooks proceeds asynchronously. As a result multiple workbooks build in parallel. As an example, in one test wherein thirteen workbooks are being built, all thirteen workbooks build in three minutes where previous to 3.9.x they built in 1.5 minutes each.

Subtype Refmodes

Prior to 3.9, entities that are subtypes of another entity could have their own refmode predicate. Now all subtypes use the refmode predicate from their root type. This is most commonly seen with the legacy UI entities such as system:gui:grid:Grid and system:gui:button:Button. If the application makes use of rules which use refmode predicates for these entities, the base type refmode must be used as in the following example wherein:

system:gui:grid:gridId(g:id)

is replaced with:

system:gui:guiComponentId(g:id), system:gui:grid:Grid(g)

Note that it is probably a bad idea in general to use guiComponentId in any application rules. It is difficult to predict what the ID value is for any UI component. Instead use the system:gui:viewPath property which uses strings which, as a result, should be less fragile.

Master Workspace Import/Export

The master workspace of a Blade-based application contains a few kinds of data that users typically want to preserve when the application is upgraded or re-deployed. These are:

  • User-specific UI preferences
  • Position access and template access
  • Hierarchies (positions, mappings and position labels)
  • Base predicates of measures

To help import/export this data, the LogicBlox distribution contains a library of shell functions: $LOGICBLOX_HOME/libexec/deploy-library.sh

All functions take two arguments: the name of a master workspace (the ConnectBlox workspace name, not the file path) and the directory of the data. The content of this directory can be version-specific and will probably be changed over time, so users should not depend on whatever is in these directories.

Examples:

  • export_preferences /blade/app/master data/preferences
  • export_access /blade/app/master data/access
  • export_hierarchies /blade/app/master data/hierarchies
  • export_measure_base_predicates /blade/app/master data/measure-base-predicates

  • import_preferences /blade/app/master data/preferences
  • import_access /blade/app/master data/access
  • import_hierarchies /blade/app/master data/hierarchies
  • import_measure_base_predicates /blade/app/master data/measure-base-predicates

User grid preferences (stored and committed by the user) are saved in the master workspace. Default ("__corporate") preferences are stored in the template workspace. No default preferences are saved in the master workspace. A typical export script would be:

#! /bin/bash
source $LOGICBLOX_HOME/libexec/deploy-library.sh

ws=/blade/app/master
export_preferences $ws data/preferences
…

This will export the user preferences from the master workspace. The "import_preferences" function can then be used to import these preferences into a new master after which a refresh run across all workbooks will refresh them into freshly-built workspaces.

Chapter 36. Blade Tips and Tricks

36.1. Invoking Protobuf Services from Blade

Note

The features described in this chapter were introduced in LogicBlox 3.10.9.

Blade applications support a simple mechanism for invoking BloxWeb services, analogous to the ability to run the bloxweb-client call-json command from a Blade workspace.

Refreshing a grid

The UI command,refreshUi can be used to trigger a grid to update in response to external data changes (e.g. when an external process has updated data in a workspace).

Example 36.1. 

Example of how a grid with the name NetSales can be triggered to be refreshed:
+system:gui:appCommand[]="refreshUi(NetSales.grid)".

The refreshUi command can take either the ID of either a single grid or form or a comma-delimited list of grid and form IDs (no spaces are allowed). If the ID refers to a form, we will find every grid on the form and refresh the grid's data. Otherwise, only the specified grid gets refreshed.

Note

Please note that currently only the grid and form components support the refreshUI command.

Polling Progress Dialog

To support asynchronous updates to the UI of Blade applications, the existing progress dialog has been extended to allow it poll a workspace to detect when a certain task has completed. The polling is done by executing a LogiQL query that must return a single scalar value in an output predicate named "_".

Example 36.2. 

_[] = requestStatus[].
// where requestStatus has been previously declared as
//       requestStatus[] = s -> string(s).

The polling interval can be specified, along with an optional timeout and optional message to be displayed if a timeout occurs. Initialization and finalization commands can be used to specify LogiQL queries that are executed prior to the start of the polling and after the polling has finished.

Example 36.3. 

Below is an example of how to configure and start a polling progress dialog. In this example, we first clear a requestStatus predicate, display the progress dialog, and start polling the workspace once a second. If the statusQuery returns a result within 10 seconds, the polling will stop. Otherwise a timeout message will be displayed and the polling will stop. Finally, we execute a LogiQL query that will request the UI to refresh the grid identified by the name "NetSales.grid". This allows the user to start an external process that asynchronously updates a workspace and wait for the update to complete.

^system:gui:ProgressDialog:message[dlg] =  "Optimization request submitted...",
^system:gui:ProgressDialog:title[dlg] =  "Waiting",
^system:gui:ProgressDialog:state[dlg] =  open,
^system:gui:ProgressDialog:allowInterruption[dlg] =  false,
^system:gui:ProgressDialog:indeterminate[dlg] =  true,
^system:gui:ProgressDialog:initCommand[dlg] =  init,
^system:gui:ProgressDialog:finalizeCommand[dlg] =  finalize,
^system:gui:ProgressDialog:statusQuery[dlg] =  status,
^system:gui:ProgressDialog:statusDialogTitle[dlg] =  "Optimization Response",
^system:gui:ProgressDialog:uiUpdateInterval[dlg] = 1000, // poll every second
^system:gui:ProgressDialog:statusTimeout[dlg] = 10000, // timeout after 10 seconds
^system:gui:ProgressDialog:timeoutMessage[dlg] = "Request Timed Out" <-
      system:gui:ProgressDialog:name@previous(dlg:"default"),
      system:gui:DialogState:name@previous(open:"Open"),
      status = "_[] = requestStatus[].",
      init = "-requestStatus[] = s <- requestStatus@previous[] = s.",
      finalize = "+system:gui:appCommand[]=\"refreshUi(NetSales.grid)\".".         

Invoking Bloxweb Services

Invoking a BloxWeb Service can be done by using the system:gui:appCommand mechanism or embedded in a progress dialog. If the system:gui:appCommand mechanism is used, the service call will by synchronous with the UI suspended until the service call completes. The progress dialog support allows the service invocation to be either synchronous or asynchronous and also supports a timeout. In both cases, the service invocation is configured with the URI of the service to invoke and a JSON string that encodes the parameters passed to the service.

Note

There is no way to parse the JSON response from the service. If a service needs to provide feedback to the UI of a Blade app, it will need to write results into workspace predicates which are displayed in the UI.

Example 36.4. 

Example of the appCommandmechanism, executing a LogiQL query:

   ^system:gui:appCommand[] = cmd <- 
        cmd = "invokeService(" + uri + ", " + json + " )", 
        uri = "http://localhost:8080" + wsPath + /mdo_reopt",
        wsPath = connectblox_workspace_path[workspace:me[]], 
        json = "{\"reopt\":{ src_ws_path: \"" + wsPath + "\" }}".    

To call BloxWeb services from a progress dialog, the serviceUri and serviceJson properties need to be set. The following properties are optional:

  • serviceAllowsCancel: if this property is set to true, the service invocation will be asynchronous. If the property is set false (default value), the service call will be made synchronously.
  • uiUpdateInterval: this property (in milliseconds) is used during the polling until the timeout is exceeded or the service finishes.
  • timeoutMessage: this property can be used to specify an optional message to be displayed if a timeout occurs.
  • statusTimeout: this property can be used to set the timeout (in milliseconds).

Example 36.5. 

Example of how to configure a progress dialog to invoke a service asynchronously:

^system:gui:ProgressDialog:message[dlg] =  "Optimization request submitted ($(TIME))...",
^system:gui:ProgressDialog:title[dlg] =  "Waiting",
^system:gui:ProgressDialog:state[dlg] =  open,
^system:gui:ProgressDialog:allowInterruption[dlg] =  false,
^system:gui:ProgressDialog:indeterminate[dlg] =  true,
^system:gui:ProgressDialog:initCommand[dlg] =  init,
^system:gui:ProgressDialog:finalizeCommand[dlg] =  finalize,
^system:gui:ProgressDialog:statusQuery[dlg] =  status,
^system:gui:ProgressDialog:statusDialogTitle[dlg] =  "Optimization Response",
^system:gui:ProgressDialog:uiUpdateInterval[dlg] = 1000, // check every 1 second
^system:gui:ProgressDialog:statusTimeout[dlg] = 10000,  // timeout after 10 seconds
^system:gui:ProgressDialog:timeoutMessage[dlg] = "Request Timed Out",
^system:gui:ProgressDialog:serviceUri[dlg] =  uri,
^system:gui:ProgressDialog:serviceJson[dlg] =  json,
^system:gui:ProgressDialog:serviceAllowsCancel[dlg] = true <-
        system:gui:ProgressDialog:name@previous(dlg:"default"),
        system:gui:DialogState:name@previous(open:"Open"),
        status = "_[] = mdoRequestStatus[].",
        init = "-mdoRequestStatus[] = s <- mdoRequestStatus@previous[] = s.",
        finalize = "+system:gui:appCommand[]=\"refreshUi(NetSales.grid)\".",
        uri = "http://localhost:8080" + wsPath + "/mdo_reopt",
        wsPath = connectblox_workspace_path[workspace:me[]],
        json = "{\"reopt\":{ src_ws_path: \"" + wsPath + "\" }}".   

When invoking an asynchronous service, a "Cancel" button will be made available to the user. Clicking "Cancel" will try to stop the service invocation.

No "Cancel" button is displayed to the user when the service call is made synchronously. The UI will be suspended until the service finishes.

Tip

If the progress dialog message property contains the string "$(TIME)", this string will be replace with a timer showing the elapsed hours, minutes, and seconds since the start of the service invocation.

36.2. Setting and re-setting the sorting of views

Note

The features described in this chapter were introduced in LogicBlox 3.10.9.

The sorting of a grid can be set or reset by running a "Datalog Action". Below you can find an overview of all the supported commands, together with some examples.

Tip

The grid is referenced by a string similar to the string in system:gui:grid:Grid predicate. The __Main.sp.SecondPanel. part is removed when referring to a grid.

Example 36.6. 

In this example, the setSortingSpec() function is used to set the sorting of the grid called "SampleGrid":

^system:gui:appCommand(;"uiCommand(SampleGrid.grid.setSortingSpec(\"nullfalse#Descending;25;col;Location:Store=Store3;\"))").

Let's have a closer look what this command does exactly:

  1. nulltrue / nullfalse: if the sort was done using the sort dialog or not. If the value is nulltrue then there are multiple columns specified in the rest of the string. For example:
    ^system:gui:appCommand(;"uiCommand(SampleGrid.grid.setSortingSpec(\"nulltrue#Ascending;25;col;Location:Store=Store1!Descending;25;col;Location:Store=Store2!Ascending;25;col;Location:Store=Store3;\"))").
    
  2. Descending / Ascending: sort type
  3. 25 / <a number>: indicates the total number of elements being sorted. In the example, the sorting is done on a column and there are 25 elements in the column to be sorted.
  4. col / row: indicates the which direction the sorting is done. In the example, the sorting is done on a column.
  5. 'Location:Store=Store3': Level and the element in the hierarchy that is being sorted

Example 36.7. 

In this example, the resetSorting() function is used to remove the sorting in a grid, similar to what the "remove sort" option in the toolbar would do:

^system:gui:appCommand(;"uiCommand(SampleGrid.grid.resetSorting())").

Example 36.8. 

The restoreCorporateDefaultSorting() function in this example restores the sorting of the grid to the corporate default sorting:

^system:gui:appCommand(;"uiCommand(SampleGrid.grid.restoreCorporateDefaultSorting())").

Example 36.9. 

Finally, the restoreSorting() function can be used to restore the user stored sorting(<user_name>:default:<grid_name>.grid. The sorting does not get updated, if no user-stored sorting fact exists.

^system:gui:appCommand(;"uiCommand(SampleGrid.grid.restoreSorting())").

Note

Please note that for the commands regarding the restoring of the default or user specific sorting to work, the grids have to be in the same state as the sorting information that is committed to the workspace.

Part V. Administration

Chapter 37. Backup and Copy

Copying the workspace of a live system, for example to take a snapshot of the workspace for backup purposes, needs to be done carefully to prevent the copied workspace from being corrupted. We have implemented a new feature, based on the new memory manager (so only available on Linux), that can make a copy of a workspace even when the system cannot be taken down entirely. This feature is available as 'bloxbatch -hotCopy'. The following example copies the workspace in directory 'workspace' to a directory 'test-workspace'.

bloxbatch -hotCopy workspace:test-workspace

When running bloxbatch -hotCopy, the engine will spend up to ten minutes waiting for an idle period with no transactions. (The length of the waiting period is adjustable by setting the environment variable BLOX_HOTCOPY_NICE_SECONDS). When an idle period occurs, or when the waiting period ends, it will grab an exclusive lock, which will prevent new transactions from starting. Active transactions will be allowed to complete, after which the copy is performed, and the lock is released. Incoming transactions will now be accepted again.

An idle period can also be enforced via the process manager. This needs to happen in several steps. First, incoming transactions need to be disallowed using 'forbidTransaction'. Next, a period of waiting is necessary to finish the currently active transactions. If the active transactions take too much time, then they might need to be aborted. After performing the hot copy via bloxbatch, incoming transactions can be allowed again.

$ processmanager --port ... --adminPort ... --forbidTransactions
$ processmanager --port ... --adminPort ... --wait --timeout 10
$ processmanager --port ... --adminPort ... --abort --timeout 10
$ bloxbatch -hotCopy workspace:test-workspace
$ processmanager --port ... --adminPort ... --allowTransactions

Chapter 38. Workspace Corruption and Consistency

Although our goal is to completely prevent workspace corruption issues, occasionally a workspace still gets into a corrupted state. We have implemented a tool to check the workspace for signs of corruption, and optionally address some corruptions by reinitializing internal statistics on data-structures, which might resolve the corruption problems.

This feature is available as the 'check' option in bloxbatch, and cannot be run with any other transactions concurrently accessing the workspace. The check feature currently implements two separate checks: type hierarchies are check for consistency, and the meta-data on internal data structures is verified.

$ bloxbatch -db workspace -check

To attempt to fix a workspace, add the 'fixCounts' option:

$ bloxbatch -db workspace -check -fixCounts

38.1. Intensional (IDB) predicate consistency check

Intensional (IDB) predicates are defined using normal (non-delta) logic rules. Rules for IDB predicates with database lifetime are incrementally evaluated, which means that once the IDB predicate has been evaluated the facts are stored in the database, and in every transaction only the facts that might have changed are updated, deleted, or asserted. The correctness of incremental evaluation relies heavily on the engine considering all possible changes that could affect the computed facts. Also, it is crucial for the developer to only use deterministic language features in rules for IDB predicates (see Section Compile-time error reporting).

Unfortunately, we occasionally find problems with the implementation of incremental evaluation. To help analyze these problems, we have implemented a tool that checks if the IDB predicates of a workspace have the correct facts. The check works in three steps: first, the workspace that needs to be checked is copied to a separate workspace. In this copied workspace, all IDB predicates are completely re-evaluated using the option 'runAllLogic'. Next, the incrementally computed IDB predicates in the original workspace can be compared to the freshly computed ones in the new workspace.

$ cp -R workspace check-workspace   # use hotCopy for a live system!
$ bloxbatch -db check-workspace -runAllLogic
$ bloxbatch -db workspace -compareIDB check-workspace

Part VI. Appendix

Appendix A. Built-in predicates

A.1. Primitive type conversion

General form

<FromTypeName1>:<ToTypeName2>:convert[from] = to ->
   <FromTypeName1>(from), <ToTypeName2>(to).

Note that type names using [] e.g. "uint[32]" are written without the brackets e.g. "uint32".

Example

foo[] = x -> uint[32](x).
bar[] = y -> string(y).

^bar[] = str
<-
   str = uint32:string:convert[int],
   int = +foo[].

Polymorphic form

blox:lang:<ToTypeName2>[from] = to ->
   <FromTypeName1>(from), <ToTypeName2>(to).

The polymorphic form requires the type of the result :<ToTypeName2> to appear in the name of the conversion function, but doesn't require the type of the argument <FromTypeName1>. This form is compiled into the corresponding <FromTypeName1>:<ToTypeName2>:convert function. Note that type names using [] e.g. "uint[32]" are written without the brackets e.g. "uint32". The first letter of the type name is in upper case. In separately compiled files, the "blox:lang:" module prefix can be aliased away.

Example

u32[] = u -> uint[32](u).
i8[] = i -> int[8](i).
str[] = s -> string(s).

^u32[] = u
<-
   u = blox:lang:toUint32[i],
   i = +i8[].

^u32[] = u
<-
   u = blox:lang:toUint32[s],
   s = +str[].

A.2. Comparison Operations

Where T is any type.

T:eq_2[arg1] = arg2          -> T(arg1), T(arg2).
T:eq_3[arg1, arg2] = result  -> T(arg1), T(arg2), boolean(result). 
T:ne_2(arg1, arg2)           -> T(arg1), T(arg2). 
T:ne_3[arg1, arg2] = result  -> T(arg1), T(arg2), boolean(result).
T:lt_2(arg1, arg2)           -> T(arg1), T(arg2).
T:lt_3[arg1, arg2] = result  -> T(arg1), T(arg2), boolean(result).
T:gt_2(arg1, arg2)           -> T(arg1), T(arg2).
T:gt_3[arg1, arg2] = result  -> T(arg1), T(arg2), boolean(result).
T:le_2(arg1, arg2)           -> T(arg1), T(arg2).
T:le_3[arg1, arg2] = result  -> T(arg1), T(arg2), boolean(result).
T:ge_2(arg1, arg2)           -> T(arg1), T(arg2).
T:ge_3[arg1, arg2] = result  -> T(arg1), T(arg2), boolean(result).

A.3. Math Operations

Where T is any numeric type.

T:add[arg1, arg2] = result       -> T(arg1), T(arg2), T(result).
T:subtract[arg1, arg2] = result  -> T(arg1), T(arg2), T(result).
T:multiply[arg1, arg2] = result  -> T(arg1), T(arg2), T(result).
T:divide[arg1, arg2] = result    -> T(arg1), T(arg2), T(result).

A.4. Floating Point Functions

Where T is a floating point type.

T:log[arg] = result          -> T(arg), T(result).
T:log10[arg] = result        -> T(arg), T(result).
T:exp[arg] = result          -> T(arg), T(result).
T:sqrt[arg] = result         -> T(arg), T(result).
T:tan[arg] = result          -> T(arg), T(result).
T:cos[arg] = result          -> T(arg), T(result).
T:sin[arg] = result          -> T(arg), T(result).
T:ceil[arg] = result         -> T(arg), T(result).
T:floor[arg] = result        -> T(arg), T(result).
T:abs[arg] = result          -> T(arg), T(result).
T:pow[base, power] = result  -> T(base), T(power), T(result).
T:isNan(arg)                 -> T(arg).
T:notNan(arg)                -> T(arg).
T:isFinite(arg)              -> T(arg).
T:notFinite(arg)             -> T(arg).

A.5. Canonical values of built-in types

For primitive types and entities there is an implicitly declared predicate :canonicalElement that returns zero, or the entity corresponding to the zero index.

A.6. Datetime predicates

A.6.1. datetime:now

Declaration:

datetime:now[]= dt -> datetime(dt).

datetime:now produces the current time.

This is a non-determinstic builtin, so it cannot be used IDB rules.

A.6.2. datetime:create

Datetimes can be constructed by using the datetime:create predicate. For example:

datetime:create[1989,5,8,0,0,0]

A.6.3. datetime:format and datetime:formatTZ

Declaration:

datetime:format[dt, format]= s -> datetime(dt), string(format), string(s).

datetime:formatTZ[dt, format, tz] = s -> datetime(dt), string(format),
string(tz), string(s).

datetime:format formats a datetime into a string format (in case of formatTZ based on a given timezone) according to the specified datetime format.

The table below lists all the supported date facet format flags.

Format SpecifierDescriptionExample
%a Abbreviated weekday name "Mon" => Monday
%A Long weekday name "Monday"
%b Abbreviated month name "Feb" => February
%B Full month name "February"
%d Day of the month as decimal 01 to 31
%D Equivalent to %m/%d/%y
%G This has the same format and value as %y, except that if the ISO week number belongs to the previous or next year, that year is used instead.
%g Like %G, but without century.
%j Day of year as decimal from 001 to 366 for leap years, 001 - 365 for non-leap years. "060" => Feb-29
%m Month name as a decimal 01 to 12 "01" => January
%u The day of the week as a decimal, range 1 to 7, Monday being 1.
%U The week number of the current year as a decimal number, range 00 to 53, starting with the first Sunday as the first day of week 01. In 2005, Jan 1st falls on a Saturday, so therefore it falls within week 00 of 2005 (week 00 spans 2004-Dec-26 to 2005-Jan-01. This also happens to be week 53 of 2004).
date d(2005, Jan, 1); // Saturday
// with format %U
ss << d; // "00"
d += day(1); // Sunday
ss << d; // "01" beginning of week 1
%V The ISO 8601:1988 week number of the current year as a decimal number, range 01 to 53, where week 1 is the first week that has at least 4 days in the current year, and with Monday as the first day of the week.
%w Weekday as decimal number 0 to 6 "0" => Sunday
%W Week number 00 to 53 where Monday is first day of week 1
date d(2005, Jan, 2); // Sunday
// with format %W
ss << d; // "00"
d += day(1); // Monday
ss << d; // "01" beginning of week 1
%y Two digit year "05" => 2005
%Y Four digit year "2005"
%Y-%b-%d Default date format "2005-Apr-01"
%Y%m%d ISO format "20050401"
%Y-%m-%d ISO extended format "2005-04-01"

The table below lists all the supported time facet format flags.

Format SpecifierDescriptionExample
%- Placeholder for the sign of a duration. Only displays when the duration is negative. "-13:15:16"
%+ Placeholder for the sign of a duration. Always displays for both positive and negative. "+13:15:16"
%f Fractional seconds are always used, even when their value is zero "13:15:16.000000"
%F Fractional seconds are used only when their value is not zero.
"13:15:16"
"05:04:03.001234"
%H The hour as a decimal number using a 24-hour clock (range 00 to 23).
%I The hour as a decimal number using a 12-hour clock (range 01 to 12).
%k The hour (24-hour clock) as a decimal number (range 0 to 23); single digits are preceded by a blank.
%l The hour (12-hour clock) as a decimal number (range 1 to 12); single digits are preceded by a blank.
%M The minute as a decimal number (range 00 to 59).
%O The number of hours in a time duration as a decimal number (range 0 to max. representable duration); single digits are preceded by a zero.
%p Either `AM' or `PM' according to the given time value, or the corresponding strings for the current locale.
%P Like %p but in lowercase: `am' or `pm' or a corresponding string for the current locale.
%r The time in a.m. or p.m. notation. In the POSIX locale this is equivalent to `%I:%M:%S %p'
%R The time in 24-hour notation (%H:%M)
%s Seconds with fractional seconds. "59.000000"
%S Seconds only "59"
%T The time in 24-hour notation (%H:%M:%S)
%q ISO time zone (output only). "-0700" // Mountain Standard Time
%Q ISO extended time zone (output only). "-05:00" // Eastern Standard Time
%z Abbreviated time zone (output only). "MST" // Mountain Standard Time
%Z Full time zone name (output only). "EDT" // Eastern Daylight Time
%ZP Posix time zone string "EST-05EDT+01,M4.1.0/02:00,M10.5.0/02:00"

More information on the Boost libary can be found here.

A.6.4. datetime:parse

Declaration:

datetime:parse[s, format]= dt -> string(s), string(format), datetime(dt).

datetime:parse parses a string representing a datetime according to a specified datetime format.

The number of supported formatting patterns is limited compared to datetime:format.

%dDay of month as decimal 1-31
%mMonth of year as decimal 1-12
%bAbbreviated month name (example: Feb)
%qQuarter of year as decimal 1-4
%YYear as 4 digits
%yYear as 2 digits
%HHours as 2 digits in 24-hour clock (0-23)
%MMinutes as 2 digits (0-59)
%SSeconds as 2 digits (0-59)
%Z or %QTimezone

A.6.5. datetime:import

Declaration:

datetime:import[ts] = dt -> int[64](ts), datetime(dt).

datetime:import converts a Unix epoch timestamp into a datetime, the timestamp represents the number of seconds since 1/1/1970.

A.6.6. datetime:export

Declaration:

datetime:export[dt] = ts -> datetime(dt), int[64](ts).

datetime:export converts a datetime into a Unix epoch timestamp, the timestamp represents the number of seconds since 1/1/1970.

A.6.7. datetime:add

Declaration:

datetime:add[old, offset, resolution] = new ->
    datetime(old), int[64](offset), string(resolution), datetime(new).

datetime:add adds time to a datetime, the resultion is a string representing the resolution of the offset, for instance "days", "seconds" or "months". For example:

datetime:add[datetime:now[], 7, "days"]

results in the datetime for a week from now.

A.6.8. datetime:subtract

Declaration:

datetime:subtract[old, offset, resolution] = new ->
    datetime(old), int[32](offset), string(resolution), datetime(new).

datetime:subtract subtracts time from a datetime, the resultion is a string representing the resolution of the offset, for instance "days", "seconds" or "months".

A.6.9. datetime:part and datetime:partTZ

Declaration:

datetime:part[dt, component] = value ->
    datetime(dt), string(component), int[32](value).

datetime:partTZ[dt, component, tz] = value ->
    datetime(dt), string(component), string(tz), int[32](value).

datetime:part extracts a component from a datetime and returns it as an integer, available components include "year", "month", "day", "hour", "minute" and "second". For example:

datetime:part[datetime:now[], "day"]

or with a timezone:

datetime:part[datetime:now[], "day", "CET"]

A.6.10. datetime:offset

Declaration:

datetime:offset[dtFrom, dtTo, resolution] = offset ->
    datetime(dtFrom), datetime(dtTo), string(resolution), int[64](offset).

datetime:offset calculates the offset for two dates in a certain resolution. Available resolutions: "years", "months", "days", "hours", "minutes" and "seconds". For example:

_(hours) <-
  today = datetime:now[],
  tomorrow = datetime:add[today, 1, "days"],
  datetime:offset[today, tomorrow, "hours"] = hours.

(This returns 24).

A.7. String predicates

A.7.1. string:add

Declaration:

string:add[s1, s2] = s3 -> string(s1), string(s2), string(s3).

string:add concatenates two strings and in a function predicate alternative to the "+" operator for strings. The following is equivalent:

fullname[] = name <- name = string:add["Mr ", "Pete"].
fullname[] = name <- name = "Mr " + "Pete".

A.7.2. string:upper

Declaration:

string:upper[s] = supper -> string(s), string(supper).

string:upper uppercases a string, i.e. it turns "pete" into "PETE". For instance:

string:upper["pete"] = upper_pete

A.7.3. string:lower

Declaration:

string:lower[s] = slower -> string(s), string(slower).

string:lower lowercases a string, i.e. it turns "PeTe" into "pete". For instance:

string:lower["pete"] = lower_pete

A.7.4. string:substring

Declaration:

string:substring[s, pos, len] = substr -> string(s),
  uinit[32](pos), uint[32](len), string(substr).

string:substring calculates the substring of a string. For instance:

string:substring["Hello there", 0, 5] = hellostr

binds "Hello" to hellostr.

A.7.5. string:length

Declaration:

string:length[s] = n -> string(s), uint[32](n).

string:length calculates the length of a string. For instance:

string:length["Hello"] = n

binds 5 to n.

A.7.6. string:like

Declaration:

string:like(s, pattern) -> string(s), string(pattern).

string:like checks if a string matches a wildcard pattern.

Where pattern is a string where underscore ('_') represents any single character and percent ('%') represents any sequence of zero or more charactets. The two special characters can be escaped to search for them.

For instance, the following hold:

string:like("Hello", "%ello"),
string:like("Hello", "%ll%"),
string:like("Hello", "H%l%").

A.7.7. string:replace

Declaration:

string:replace[s, subs, replaces] = s2 -> string(s),
  string(subs), string(replaces), string(s2).

string:replaces a substring with another string. For instance:

string:replace["Hello world", "world", "earth"] = msg

binds "Hello earth" to msg.

A.7.8. string:split

Declaration:

string:split[s, sep, idx] = tok -> string(s), string(sep),
  uint[32](idx), string(tok).

string:split splits a string into parts For instance:

_(idx, part) <- string:split["path/to/someplace", "/", idx] = part.

returns:

0, path
1, to
2, someplace

A.8. Ordered Entity Operations

Where T is any ordered entity type.

T:next[before] = after     -> T(before), T(after).
T:offset[from, to] = dist  -> T(from), T(to), int[32](dist).
T:first[] = elem           -> T(elem).
T:last[] = elem            -> T(elem).

A.9. Numeric Ranges

Where T is any numeric type, range generates a set of elements from start to end varying by increment.

T:range(start, end, increment, element)  -> T(start), T(end), T(increment), T(element).

A.10. Boolean Operations

boolean:bitand[arg1, arg2] = result  -> boolean(arg1), boolean(arg2), boolean(result).
boolean:bitor[arg1, arg2] = result   -> boolean(arg1), boolean(arg2), boolean(result).
boolean:bitxor[arg1, arg2] = result  -> boolean(arg1), boolean(arg2), boolean(result).
boolean:bitand[arg1, arg2] = result  -> boolean(arg1), boolean(arg2), boolean(result).
boolean:bitnot[arg] = result         -> boolean(arg), boolean(result).

Appendix B. Compiler Errors and Warnings

B.1. MULTIPLE_VALUES

Predicates with multiple value arguments are only allowed in query blocks, not active blocks.

a(x) -> .
q(x;y,z) -> a(x), a(y), a(z).
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block multiple-values: line 2: error: Predicate  'q' with database lifetime cannot have 
more than one value argument. (code: MULTIPLE_VALUES)
    q(x;y,z) -> a(x), a(y), a(z).
    ^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.2. IDB_META

B.3. EDB_RULE

By default writing IDB rules for EDB predicates will cause the runtime to clear the contents of the EDB and change the predicate's derivation type to "DerivedAndStored". To avoid this happening accidentally, it is possible to specify the derivation type of an EDB predicate as "Extensional". This will indicate to the compiler that it is intended for the predicate to always be an EDB.

If a IDB rule is written for an EDB predicate with a derivation type of "Extensional", the compiler will detect this and report an EDB_RULE error.

e(x) -> int[64](x).
lang:derivationType[`e]="Extensional".

e(x) <- x = 3. // EDB_RULE error

B.4. INCONSISTENT_EQUALITY

    

B.5. MULTIPLE_RULES_NONCYCLIC

    

B.6. DIV_ZERO

This error is reported when the compiler is certain that a divide by zero will occur in the given logic. The analysis used by the compiler is conservative and will likely not detect some divisons by zero.

p(x) -> int[64](x).

p(x) <- x = 10/0.  // DIV_ZERO error

p(x) <- x = 8/y, y = 0. // no DIV_ZERO error 

B.7. NO_DECLARATION

The compiler will infer the predicate declarations for some predicates that appear in logic. However, sometimes this can lead to the compiler not reporting an error, or at least the real error, when a predicate name is mispelled. The pedantic warning NO_DECLARATION reports when a predicate declaration has been inferred for a predicate, which could be a sign that the predicate's name has been mispelled.

foo(x) -> int[64](x).
foa(x) <- x = 3.  // NO_DECLARATION

B.8. UNKNOWN_COMPILER_OPTION

    

B.9. CONSTRUCTOR_ILLEGAL_VALUETYPE

If a predicate is declared to be a constructor for an entity type which is not scalable, this error is triggered. This issue can usually be resolved by marking the entity predicate to have the storage type "ScalableScalar". For instance:

lang:physical:storageModel[`myentity] = "ScalableSparse".
myentity(me) -> .

lang:constructor(`myentity_cons).
myentity_cons[id] = me -> string(id), myentity(me).

B.10. SIMILAR_VAR

One source of errors is the misspelling of variable names. Although this mistake is sometimes caught through the VARIABLE_SOLITARY warning, when the misspelling is common, these errors can be elusive to catch. The pedantic warning SIMILAR_VAR is reported when two variables are similar to each other based upon the number of edits needed to convert one variable to the other.

The maximum edit distance used for determining similarity can be set through the environment variable BLOXCOMPILER_EDIT_DISTANCE. The default edit maximum edit distance is 2. Variables shorter than the maximum edit distance will not be considered.

a(x) -> int[32](x).
a(foa) <- foo=5, fob=3, foa = foo + fob.
// SIMILAR_VAR: foa and foo, foa and fob, foo and fob.      

B.11. SUBTYPE_PRIM

    

B.12. SUBTYPE_MULTI

The compiler reports this warning when an entity is declared with more than one superentity. The current treatment of entities with more than single superentity is sometimes inconsistent and can lead to subtle errors. Entities with more than one superentity will be disallowed in the future.

    

B.13. SUBTYPE_PRIM

    

B.14. TYPE_INFER

    

B.15. PULSE_NONPULSE_SUPER

    

B.16. NONPULSE_PULSE_SUPER

    

B.17. PULSE_CONSTRAINT

    

B.18. FUNC_SEMICOLON_DEPRECATED

    

B.19. ONETOONE_DEFAULT_VALUE

    

B.20. DYNAMIC_TYPE_CONSTRAINT

Some integrity constraints, such as type declarations, can be checked entirely statically, while some must be deferred and checked at runtime. These runtime checks carry a performance penalty. However, it can be easy to write integrity constraints that appear to be type declarations but are not or contain subformulas that must be checked dynamically.

The pedantic warning DYNAMIC_TYPE_CONSTRAINT is used to inform the user of which formulas in a constraint appears to be a type declaration may need to be checked at runtime.

p(x,y) -> int[32](x), int[32](y).
r(x) -> int[32](x).

p(x,y) -> r(x), r(y). // DYNAMIC_TYPE_CONSTRAINT: r(x), r(y)

B.21. SCALABLE_SPARSE_ENTITY_ONLY

    

B.22. STORAGE_MODEL_TOP_ENTITY

    

B.23. POLYMORPHIC_LITERAL

Polymorphic predicates such as 'eq_2' and 'add' are not true predicates and exist only as a convenience for the user. They are always converted to a specific type predicate by the compiler, such as 'int32:eq_2' or 'float64:add', by using the inferred types of the predicate's arguments. However, when writing one of these polymorphic predicates as a literal, it may not be possible to infer from the context the typed predicate to which is should be converted. Therefore, using polymorphic predicates in predicate literals is disallowed. If you know what type the predicate should be instantiated with, write the specific typed predicate literal instead.

    

B.24. COMP_UNORDERED

The language does not allow comparison other than = and != on entities that are not ordered. If you get this error, then you probably want to compare some value associated to the entity instead (typically the ref-mode). In some cases, perhaps you want to declare the entity to be ordered.

B.25. META_MODEL_DEPRECATED

Before LogicBlox 3.3 and 3.4, most of the meta properties for predicate were initialized using delta logic that manipulates the meta model. The problem with this approach is that the compiler checks for more and more semantic problems that require knowledge of these properties before the logic is evaluated.

While this is only a warning in 3.4, we strongly suggest to address these problems. The error report includes advice on how to address the problem. We are planning to completely disallow the delta logic method of setting predicate properties in 3.5.

product(x) -> .
discount[x] = y -> product(x), float[32](y).

^system:Predicate:isUniquelyDerived[p] = true <-
  +system:Predicate:fullName[p ] = "discount".
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block meta-model-deprecated: line 4: error: predicate 
'system:Predicate:isUniquelyDerived' is not declared at level 0 (code: 
PREDICATE_UNDECLARED)
    ^system:Predicate:isUniquelyDerived[p] = true <-
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.26. Skolem functions

B.26.1. VAR_UNBOUND_CONSTRUCTOR_NO_ENT

Introduced in LogicBlox 3.6

Was VAR_UNBOUND_SKOLEM_NO_ENT prior to LogicBlox 3.9.

    

B.26.2. ENTITY_CREATE_CONSTRUCTOR

Introduced in LogicBlox 3.6

Was ENTITY_CREATE_SKOLEM prior to LogicBlox 3.9.

    

B.26.3. CONSTRUCTOR_DEFAULT_VALUE

Introduced in LogicBlox 3.7

Was SKOLEM_DEFAULT_VALUE prior to LogicBlox 3.9.

    

B.26.4. MULTI_CONSTRUCTOR_ONE_VAR

Introduced in LogicBlox 3.7

Was MULTI_SKOLEM_ONE_VAR prior to LogicBlox 3.9.

    

B.27. Auto-numbered predicates

B.27.1. AUTONUMBERED_WRONG_ARITY

    

B.27.2. AUTONUMBERED_ENTITY_KEY

    

B.27.3. AUTONUMBERED_VALUE

    

B.27.4. AUTONUMBERED_HEAD

    

B.27.5. AUTONUMBERED_NOT_REFMODE

    

B.27.6. LANGAUTONUMBERED_INCONSISTENT

    

B.27.7. LANGAUTONUMBERED_WS

    

B.27.8. LANGAUTONUMBERED_NOT_WS

    

B.28. Module system

B.28.1. BLOCK_PARSE

    

B.28.2. BLOCK_NAME

    

B.28.3. BLOCK_UNKNOWN_PREDICATE

    

B.28.4. BLOCK_UNKNOWN_PREDICATE_TYPO

    

B.28.5. BLOCK_PREDICATE_SEALED

    

B.28.6. BLOCK_PREDICATE_SEALED_SUPERTYPE

    

B.28.7. BLOCK_NO_MODULE

    

B.28.8. BLOCK_OPEN

    

B.28.9. BLOCK_REDUNDANT_EXPORT

    

B.28.10. BLOCK_EXPORT_INCONSISTENT

    

B.28.11. BLOCK_UNREALIZED_EXPORT

    

B.28.12. BLOCK_ALREADY_ACTIVE

    

B.28.13. BLOCK_ALREAD_INACTIVE

    

B.28.14. BLOCK_PREDICATE_EXPR

    

B.28.15. BLOCK_SEPARATE_COMPILATION

    

B.28.16. BLOCK_SIMPLE_NAME

    

B.28.17. ALIAS_PREDICATE

    

B.28.18. ALIAS_NAMESPACE

    

B.28.19. ALIAS_UNKNOWN_ALL

    

B.28.20. ALIAS_UNKNOWN_PREDICATE

    

B.28.21. ALIAS_UNKNOWN_NAMESPACE

    

B.28.22. ALIAS_UNKNOWN_EITHER

    

B.29. Separate compilation

B.29.1. PROJECT_NOT_FOUND

    

B.29.2. PROJECT_NOT_FILE

    

B.29.3. PROJECT_CLOSE

    

B.29.4. PROJECT_NOT_ACCESSIBLE

    

B.29.5. PROJECT_IO_EXCEPTION

    

B.29.6. PROJECT_FORMAT

    

B.29.7. PROJECT_QUALIFIER

    

B.29.8. PROJECT_SUMMARY_CONFLICT

    

B.29.9. PROJECT_MODULE_NOT_FOUND

    

B.29.10. PROJECT_MODULE_NOT_DIRECTORY

    

B.29.11. PROJECT_MODULE_UNKNOWN_FILE

    

B.29.12. PROJECT_MODULE_INVALID_SUMMARY

    

B.29.13. PROJECT_DEPENDENCY_CYCLE

    

B.29.14. PROJECT_INVALID_EXECUTE

    

B.29.15. LIBRARY_NOT_FOUND

    

B.29.16. INVALID_LIBRARY

    

B.30. Hierarchical syntax

B.30.1. HIER_NO_ENCLOSING

This error is reported if you write a hierarchical expression that is not written inside a hierarchical formula.

person(p) -> .
name[p]=s -> person(p), string(s).

address(a) -> .
street[a]=s -> address(a), string(s).

home[a]=p -> person(p), address(a).

favorite(p) -> person(p).

person(p),
name[p]="Bill",
home[address(_) { street("Anywhereville") }]=p
<- favorite(p).  
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_NO_ENCLOSING: line 13: error: this hierarchical expression does not have an 
enclosing hierarchical formula. (code: HIER_NO_ENCLOSING)
    home[address(_) { street("Anywhereville") }]=p
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.30.2. HIER_HEAD_NOT_ATOM

This error is reported if you write a formula in the head of a hierarchical that is not an atom.

t() ->.
a(x) ->.
foo(x, y) -> a(x), int[64](y).

t() <- (x < 1, a(_)) { foo(x) }.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_HEAD_NOT_ATOM: line 5: error: in the current implementation of hierarchical 
syntax, the head of a hierarchical formula, must consist entirely of conjunctions of 
atoms. (code: HIER_HEAD_NOT_ATOM)
    t() <- (x < 1, a(_)) { foo(x) }.
    
            ^^^^^

block HIER_HEAD_NOT_ATOM: line 5: error: in the current implementation of hierarchical 
syntax, the head of a hierarchical formula, must consist entirely of conjunctions of 
atoms. (code: HIER_HEAD_NOT_ATOM)
    t() <- (x < 1, a(_)) { foo(x) }.
    
            ^^^^^

2 ERRORS (BloxCompiler version 66466_f4a494eeb3f9)

B.30.3. HIER_NO_CANDIDATES

This error is reported if there are no candidate expressions in the head of a hierarchical.

t() ->.
foo(x) -> int[64](x).

t() <- t() { foo() }.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_NO_CANDIDATES: line 4: error: there are no candidate expressions in the head 
of the enclosing hierarchical formula, but there are fewer user supplied arguments than 
the predicate foo expects. (code: HIER_NO_CANDIDATES)
    t() <- t() { foo() }.
    
                 ^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.30.4. HIER_CANDIDATE_NO_TYPE

This error is reported when no type can be inferred for a candidate expression in the hierarchical head.

    

B.30.5. HIER_ARG_TYPE_NOT_KNOWN

This error is reported when a predicate without a declaration is used in the body of a hierarchical. Because the predicate has no declaration, there is no way to be sure what the correct number and type of arguments should be for this predicate and whether it requires any insertions from the hierarchical head.

    

B.30.6. HIER_UNSUPPORTED_EXPR

Currently, candidate expressions may only be constants or variables. This error is reported when the compiler finds some other sort of expression in the head of a hierarchical.

t() ->.
foo(x) -> int[64](x).

t() <- foo(2 + 3) { t() }.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************

B.30.7. HIER_AMBIGUOUS_HEAD_BINDING

This error is reported if there are two or more candidate expressions in the head of a hierarchical that have non-disjoint types. Roughly, disjoint means that the type of the expressions types are not subtypes of each other.

a(x) ->.
foo(x, y) -> a(x), a(y).
t() ->.

t() <- foo(x, y) { foo(x, y) }.

*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_AMBIGUOUS_HEAD_BINDING: line 5: error: the head of this hierarchical formula 
contains two binding with non-disjoint types: y and x.  Therefore, it is not possible to 
unambiguously determine which expression should be used for insertion in the body of the 
hierarchical formula. (code: HIER_AMBIGUOUS_HEAD_BINDING)
    t() <- foo(x, y) { foo(x, y) }.
           ^^^^^^^^^^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.30.8. HIER_AMBIGIOUS_BODY

This error is reported in the even that there are two or more possible positions in which a candidate expression could be inserted into a hierarchical atom.

a(x) ->.
foo(x, y) -> a(x), a(y).
t() ->.

t() <- a(x) { foo(x) }.

*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_AMBIGIOUS_BODY: line 5: error: there are 2 possible positions where the 
expression x could be inserted into the atom in this hierarchical formula.  Therefore, it 
is not possible to unambiguously determine where to insert the expression. (code: 
HIER_AMBIGUOUS_BODY_ATOM)
    t() <- a(x) { foo(x) }.
                  ^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.30.9. HIER_EXPR_NOT_ENTITY

This error is reported when the head of a hierarchical expression is not a single atom that is an entity.

person(p) -> .

address(a) -> .
street[a]=s -> address(a), string(s).

home[p]=a -> person(p), address(a).

favorite(p) -> person(p).

foo(a, p) -> address(a), person(p).

person(p) {
  home( foo(_, _) { street("Anywhereville") })
} <- favorite(p).
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_EXPR_NOT_ENTITY: line 12: error: the head of this this hierarchical 
expression, foo(_, _), is not an entity. (code: HIER_EXPR_NOT_ENTITY)
    person(p) {
      home( foo(_, _) { street("Anywhereville") })
    } <- favorite(p).
    
    ^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.30.10. HIER_ATOM_TOO_MANY_SUPPLIED

This error is reported when the user has supplied more arguments to a hierarchical atom than are needed.

a(x) ->.
b(x) ->.
foo(x, y) -> a(x), b(y).
t() ->.

t() <- (a(_), b(y)) { foo(y) }.

*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_ATOM_TOO_MANY_SUPPLIED: line 6: error: this hierarchial atom contains 1 user 
supplied key(s), yet only 0 user supplied key(s) are needed in this context (code: 
HIER_ATOM_TOO_MANY_SUPPLIED)
    t() <- (a(_), b(y)) { foo(y) }.
                          ^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.30.11. HIER_ATOM_TOO_FEW_SUPPLIED

This error is reported when the user has supplied fewer arguments to a hierarchical atom than are needed.

a(x) ->.
b(x) ->.
foo(x, y) -> a(x), b(y).
t() ->.

t() <- a(_) { foo() }.

*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block HIER_ATOM_TOO_FEW_SUPPLIED: line 6: error: this hierarchial atom contains 0 user 
supplied key(s), yet 1 user supplied key(s) are needed in this context (code: 
HIER_ATOM_TOO_FEW_SUPPLIED)
    t() <- a(_) { foo() }.
                  ^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.31. Delta logic

B.31.1. DELTA_UNGUARDED

We have added a compile-time warning when installing deltas rule that can considerably harm performance. Generally, it is a bad idea to install a delta rule that does not have any delta or pulse predicates in the body of the rule. These rules will run on every transaction, which can be expensive, but even worse: in concurrent systems a write lock will be necessary on the predicates in the head of this rule, which combined with predicate-level locking can potentially make your system largely non-concurrent.

Two examples:
retailLogic: line 5: warning: installed delta rules are evaluated in
every transaction if they are not guarded by at least one delta or
pulse predicate in the body of the rule. Because this requires a write
lock on the predicate, this can have significant impact on
performance, in particular in concurrent systems. (code:
DELTA_UNGUARDED)

^filterTestPred(sk, w; ) <- Product:Sku(sk), Calendar:Week(w), Calendar:Week:id(w; "week_2").
suite.program: line 23: warning: installed delta rules are evaluated
in every transaction if they are not guarded by at least one delta or
pulse predicate in the body of the rule. Because this requires a write
lock on the predicate, this can have significant impact on
performance, in particular in concurrent systems. (code:
DELTA_UNGUARDED)

+map(x;y) <- dimX:val(x:2), dimY:val(y:2).

B.31.2. NEGATED_UNGUARDED

This error occurs when you have written delta rule to be executed at a stage other than initial that references a negated atom from an earlier stage. Furthermore, that atom contains existentially quantified variables. For example,

+p(x) <- +r(x), !q@prev(y, x). 

If this delta rule is compiled to run at stage final, this rule would trigger the error. Here, the contents of the predicate 'q' are being queried for the previous stage and 'q' references the existentially quantified variable 'y'.

The reason this error is reported is because the current runtime cannot execute negated formulas with existentially quantified variables directly. Instead, the compiler generates a projection predicate that materializes the negated atom. For example, the compiler would rewrite the above delta rule as

+p(x) <- +r(x), !_projection(x).
+_projection(x) <- q@prev(y, x).

The problem with this rewrite is that in this specific case the conditions are such that the projection rule generated by the compiler would have produced a DELTA_UNGUARDED warning from the compiler if the logic had written this way originally. To avoid having the compiler rewrite a program into one that may perform badly for no apparent reason, we instead report an error.

The error can be resolved by manually providing the projection. If the original example was rewritten as shown above, a DELTA_UNGUARDED warning would be reported, but the logic would compile. In practice, it would be better to write the projection so that included a references a relevant delta or pulse predicate in its body. For example, given our original example, the following logic would probably be a reasonable replacement

+p(x) <- +r(x), !_projection(x).
+_projection(x) <- +r(x), q@prev(y, x).

Here we have guarded the projection rule with a reference to '+r(x)'.

It is possible that in the future the runtime will be able to execute the original rule directly eliminating the need for projection predicates, and obviating this error and the manual rewrite.

B.32. Aggregations

B.32.1. AGG_PARSE

This error is reported when the compiler cannot parse the aggregate operations.

B.32.2. AGG_INVALID_OP

    

B.32.3. AGG_INVALID_ARGS

    

B.32.4. AGG_DISJ

    

B.32.5. AGG_MULTIPLE_DEFS

    

B.32.6. AGG_TYPE_NOT_EXACT

    

B.32.7. COMPLEX_EXP_IN_AGG_HEAD

    

B.32.8. AGG_KEY_OUTPUT

This error is reported when an output of an aggregation is used as a key in a head atom. We presently cannot evaluate such logic, so we reject this at compile-time.

B.32.9. AGG_VALUE_NON_OUTPUT

This error is reported when the value arguments of a functional predicate in the head of an aggregation are not outputs to the aggregation. We presently cannot evaluate such logic, so we reject this at compile-time.

B.33. Default values

B.33.1. DEFAULT_DISJOINT

If a predicate has a default value, then the predicate must be functionally determined. However, the functional dependency analysis the compiler currently uses cannot determine whether predicates defined using multiple rules or disjuncts do not overlap. Therefore, it will by default skip the analysis on predicates defined by multiple rules or disjuncts.

If the compiler encounters a predicate with a default value defined by multiple rules or disjuncts, it will emit this warning to provide an explanation as to why you may be seeing DEFAULT_FUNCTIONAL errors and how you may be able to resolve them.

B.33.2. DEFAULT_FUNCTIONAL

If a predicate has a default value, then the predicate must be functionally determined. This is necessary because the implementation of default values currently does not support integerity checking of facts. If a predicate is functional determined, then we know that integrity violations are impossible, so we can simply omit the check.

a(x) -> .
p(x) -> a(x).
q(x) -> a(x).

f[x] = y -> a(x), int[32](y).
lang:defaultValue[`f] = 0.

f[x] = y <- p(x), y = 2.
f[x] = y <- q(x), y = 5.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block DEFAULT_FUNCTIONAL-1: unknown line: error: predicate 'f' has a default value, but 
is defined by multiple rules or disjuncts.  The compiler could not determine whether 
these rules or disjuncts overlap.  Consequently, the compiler did not analyze the logic 
to determine whether 'f' is functionally determined.  If you know that the rules or 
disjuncts for this predicate do not overlap, add 'lang:disjoint(`f)' to your logic. 
(code: DEFAULT_DISJOINT)

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

a(x) -> .
p(x, y) -> a(x), a(y).

f[x] = y -> a(x), int[32](y).
lang:defaultValue[`f] = 0.

f[x] = y <- p(x, _), y = 2.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block DEFAULT_FUNCTIONAL-2: line 7: warning: predicate 'f' has a default value, but this 
rule does not uniquely determine its value from its keys (code: DEFAULT_UNIQUE)
    f[x] = y <- p(x, _), y = 2.
    
    ^^^^^^^^^^^^^^^^^^^^^^^^^^

block DEFAULT_FUNCTIONAL-2: line 7: warning: predicate 'f' has a default value, but this 
rule does not uniquely determine its value from its keys (code: DEFAULT_UNIQUE)
    f[x] = y <- p(x, _), y = 2.
    
    ^^^^^^^^^^^^^^^^^^^^^^^^^^


B.33.3. DEFAULTVALUE_MULTIHEAD

    

B.33.4. DEFAULT_RECURSION

Predicates that are defined using recursive rules can only have a default value if the recursion is linear (through an ordered entity). You will get this error if the recursion is not linear and recursively computes a predicate that does have a default value.

For general recursion, it is unclear what the meaning of the recursive rules is if they compute a predicate with a default value. This issue is similar to how negation on a recursively computed predicate is disallowed.

a(x) -> .
lang:ordered(`a).

f[x] = y -> a(x), int[64](y).
lang:defaultValue[`f] = 0.
lang:disjoint[`f] = true.

f[x] = 1 <- a:last[] = x.
f[x] = y <- y = f[a:next[x]] + 1.

// TODO causes a false positive in safety analysis
lang:compiler:warning:SPECIFIC_STARRED_EDGE_IN_SAFETY_GRAPH_CYCLE[] = false.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block defaultvalue-recursion: line 6: error: predicate 'lang:disjoint' has arity 1, but 
is used here with 2 arguments (code: PREDICATE_WRONG_ARITY)
    lang:disjoint[`f] = true.
    ^^^^^^^^^^^^^^^^^^^^^^^^

block defaultvalue-recursion: line 6: error: predicate 'lang:disjoint' has arity 1, but 
is used here with 2 arguments (code: PREDICATE_WRONG_ARITY)
    lang:disjoint[`f] = true.
    ^^^^^^^^^^^^^^^^^^^^^^^^

2 ERRORS (BloxCompiler version 66466_f4a494eeb3f9)

B.33.5. DEFAULTVALUE_TABLE

Declaring a default value for a functional predicate makes this predicate a total function. Currently, we only support default values on functional predicates with a finite domain. Table predicates (predicates with a primitive type in the key) do not have a finite domain.

p[x] = y -> int[32](x), int[32](y).
lang:defaultValue[`p] = 0.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block defaultvalue-table: line 2: error: predicate 'p' is a table predicate (has a 
primitive type in the key) and has a default value, a combination which is not supported. 
(code: DEFAULTVALUE_TABLE)
    lang:defaultValue[`p] = 0.
    ^^^^^^^^^^^^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.34. Entities

B.34.1. CAPACITY_ENTITY

Only entities have a capacity property, which determines the number of elements an entity can have. For a normal predicate, there is no restriction on its capacity. Normal predicates can hold all the facts allowed by the cross-product of the types of the arguments of the predicate.

a(x) -> .
p(x,y) -> a(x), a(y).
lang:physical:capacity[`p] = 256.
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block capacity-entity: line 3: error: predicate 'p' is not an entity and the capacity 
property only applies to entities (code: CAPACITY_ENTITY)
    lang:physical:capacity[`p] = 256.
    
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.34.2. PULSE_ENTITY_IN_DB_LIFETIME

a(x) -> .
p(x) -> a(x).

lang:pulse(`a).
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block pulse-entity: line 2: error: event entity 'a' cannot be used as an argument of 
database lifetime predicate 'p' (code: PULSE_ENTITY_IN_DB_LIFETIME)
    p(x) -> a(x).
    ^^^^

block pulse-entity: line 2: warning: A constraint with a pulse or delta on its RHS very 
likely needs a pulse or delta in its LHS (code: PULSE_CONSTRAINT)
    p(x) -> a(x).
    ^^^^^^^^^^^^

1 ERROR,  1 WARNING (BloxCompiler version 66466_f4a494eeb3f9)

B.34.2.1. ENTITY_ONLY_USED_AS

      

B.35. Recursion

B.35.1. CALC_SUB_RECURSION

Calculated subtypes are not supported if they are defined using recursive rules. The exception to this are scalable calculated subtypes, which do support recursion. However, scalable entities are currently only a prototype, so we do not suggest to address this limitation in this way.

a(x) -> .
b(x) -> a(x).
c(x) -> b(x).

lang:entity(`c).
p(x,y) -> a(x),b(y).

p(x,y) <- a(x),b(y).
b(x) <- p(x,_).
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block calc-subtype-recursion: line 9: error: calculated subtype 'b' is recursively 
defined. This is not supported for non-scalable entities: 
   ----
   'b@FINAL' depends on (via rule at line 9 of block calc-subtype-recursion: 'b(x)
  <- p(x, _)')
   'p@FINAL' depends on (via rule at line 8 of block calc-subtype-recursion: 'p(x, y)
  <- a(x), b(y)')
   'b@FINAL'
   ----
 (code: CALC_SUB_RECURSION)
    b(x) <- p(x,_).
    
    ^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.35.2. DISJUNCTIVE_LINEAR_RECURSION

The linear recursion optimization that is applied by our engine has always been applied too eagerly. In particular, recursion was analyzed as linearly recursive even though there was a disjunction involved in the recursion. This results in two mutually recursive rules that in fact do not in general compute the correct results.

In future releases we will address this problem, but due to the widespread use we cannot disable it right now. To make sure the risk involved in this optimization is visible, we have give a warning for the problematic disjunction.

block __MeasureFormula_R1_F0: line 1: warning: performing the linear
recursion optimization on this rule resulted in a disjunction of
clauses. In some instances, evaluating multiple clauses in a mutually
linear recursive fashion may result in incorrect results with the
current logic execution engine.  To avoid this you can turn off the
linear recursion optimization by adding
'lang:compiler:optimization["linearRecursion"]=false.' to your logic.
(code: DISJUNCTIVE_LINEAR_RECURSION)

B.36. Incremental Evaluation

B.36.1. INCR_NONDET

Installed, non-delta logic rules are evaluated incrementally by the engine, which means that computed results are stored, and future transactions only recompute changes based on changes in the predicates in the body of the rule. That is, existing results that are not invalided are assumed to still be correct. For non-deterministic features, storing those results is actually incorrect, since they can change in every single transaction. Therefore, such non-deterministic logic features can be used only in logic that is fully evaluated, which is any rule in queries or delta rules in queries and installed blocks. If you do use these features, then you need to know that the application might work with results that are logically incorrect.

p[] := datetime:now[].
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block nondet: line 1: error: unexpected token ':' (code: UNEXPECTED_TOKEN)
    p[] := datetime:now[].
    
        ^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)

B.37. File predicates

B.37.1. RAW_FILE_PREDICATE_ARGS

Raw file predicates are used to represent the complete content of a file as a string. They must have have exactly one value argument of type string, and no key arguments at all.

create /tmp/db --overwrite

transaction
exec <doc>
   _file[x] =y -> string(x), string(y).

   lang:physical:storageModel[`_file] = "RawFile".
   lang:physical:filePath[`_file] = "file.csv".
</doc>
commit
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block __block0: line 3: error: raw file predicate '$__block0:_file' must have a value of 
type string and no key arguments (code: RAW_FILE_PREDICATE_ARGS)
       lang:physical:storageModel[`_file] = "RawFile".
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1 ERROR (BloxCompiler version 66466_f4a494eeb3f9)


B.37.2. FILE_PREDICATE_ARGS

All arguments of a file predicate must have a primitive type, such as a string or an integer.

create /tmp/db --overwrite

transaction
exec <doc>
   e(x) -> .
   _file(x,y) -> string(x),e(y).

   lang:physical:storageModel[`_file] = "DelimitedFile".
   lang:physical:filePath[`_file] = "file.csv".
</doc>
commit
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block __block0: line 4: error: argument type 'e' for file predicate '$__block0:_file' is 
not a primitive type (code: FILE_PREDICATE_ARGS)
       lang:physical:storageModel[`_file] = "DelimitedFile".
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

block __block0: line 1: warning: every predicate declared in a command or query must be 
local: 'e'.  Support for non-local predicate declarations in command or query will be 
removed in LogicBlox 4.1 (code: NON_LOCAL_PREDICATE_DECLARATION_DEPRECATED)
       e(x) -> .
       ^^^^

1 ERROR,  1 WARNING (BloxCompiler version 66466_f4a494eeb3f9)


B.37.3. FILE_PREDICATE_STAGES

fp-stage2.logic: line 9: error: file predicate '_file' is
used with a stage modifier, but file predicates do not support stages
(code: FILE_PREDICATE_STAGES)

+_fun(x,y) <- _file@prev(x,y).
^^^^^^^^^^^^^^^

B.37.4. FILE_PREDICATE_DELTA

File predicates do not support delta modifiers.

fp.logic: line 9: error: file predicate '_file' cannot have a
delta modifier in the body of a rule (code: FILE_PREDICATE_DELTA)
+a(x,y) <- +_file(x,y).
^^^^^^^^^^^

B.37.5. FILE_PREDICATE_RECURSION

File predicates cannot be used in recursion.

fp-recurse8.lb: line 10: error: file predicate '_file' cannot be used in recursion:
----
'_file' depends implicitly on (via implicit frame rule: '_file@FINAL <- +_file@FINAL')
'+_file' depends on (via rule at line 10 of block fp-recurse8: '+_file(x, y) <- +a(x, y)')
'+a' depends on (via rule at line 8 of block fp-recurse8: '+a(x, y) <- _file(x, y)')
'_file'
----
(code: FILE_PREDICATE_RECURSION)
+_file(x,y) <- +a(x,y).
^^^^^^^^^^^^^^^^^^^^^^

B.37.6. FILE_PREDICATE_INACTIVE

File predicates can only be used in inactive blocks.

B.38. Derived-only predicates

B.38.1. DERONLY_SINGLE

      

B.38.2. DERONLY_NEGATIVE

      

B.38.3. DERONLY_PULSE

Derived-only pulse predicates may only be defined via event rules.

      

B.38.4. DERONLY_NO_RULES

A derived-only predicate needs to be defined using at least one rule. If a derived-only predicate is used, but there are no rules defined for it, then this error is reported.

a(x) -> . 
p(x, y) -> a(x), a(y). 
lang:derivationType[`p] = "Derived". 

q(x, y) <- p(x, y).
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block DERONLY_NO_RULES: line 5: error: predicate 'p' is a derived-only predicate that 
needs to be unfolded, but no rules are defined for this predicate. (code: 
DERONLY_NO_RULES)
    q(x, y) <- p(x, y).
    ^^^^^^^^^^^^^^^^^^

block DERONLY_NO_RULES: line 5: error: predicate 'p' is a derived-only predicate that 
needs to be unfolded, but no rules are defined for this predicate. (code: 
DERONLY_NO_RULES)
    q(x, y) <- p(x, y).
    ^^^^^^^^^^^^^^^^^^

2 ERRORS (BloxCompiler version 66466_f4a494eeb3f9)

B.38.5. DEFAULTVALUE_DERONLY

Derived-only predicates do not support default values.

deronly-not-supported.logic: line 5: error: predicate 'f' is a
derived-only predicate and has a default value, a combination which is
not supported. (code: DEFAULTVALUE_DERONLY)
lang:defaultValue[`f] = 0.
^^^^^^^^^^^^^^^^^^^^^^^^^

B.38.6. DERONLY_RECURSION

Derived-only predicates cannot be recursively defined.

lang:derivationType[`p] = "Derived".
lang:derivationType[`q] = "Derived".

p(x) <- x = 3.
p(x) <- x = 5, q(x).
q(x) <- p(x).
*******************************************************************
Warning: BloxBatch is deprecated and will not be supported in LogicBlox 4.0.
Please use 'lb' instead of 'bloxbatch'.
*******************************************************************
block deronly-recursion: line 5: error: predicate 'p' is a derived-only predicate 
recursively defined using its own values, through the following rules:
   ----
   'p@FINAL' depends on (via rule at line 5 of block deronly-recursion: 'p(5)
  <- , q(5)')
   'q@FINAL' depends on (via rule at line 6 of block deronly-recursion: 'q(x)
  <- p(x)')
   'p@FINAL'
   ----
 (code: DERONLY_RECURSION)
    p(x) <- x = 5, q(x).
    ^^^^^^^^^^^^^^^^^^^

block deronly-recursion: line 6: error: predicate 'q' is a derived-only predicate 
recursively defined using its own values, through the following rules:
   ----
   'q@FINAL' depends on (via rule at line 6 of block deronly-recursion: 'q(x)
  <- p(x)')
   'p@FINAL' depends on (via rule at line 5 of block deronly-recursion: 'p(5)
  <- , q(5)')
   'q@FINAL'
   ----
 (code: DERONLY_RECURSION)
    q(x) <- p(x).
    
    ^^^^^^^^^^^^

2 ERRORS (BloxCompiler version 66466_f4a494eeb3f9)

Appendix C. Platform Environment Variables

C.1. General Enviroment Variables

C.1.1. LOGICBLOX_HOME

Directory where LogicBlox is installed (for example, has a subdirectory bin).

C.1.2. JAVA_HOME

Directory where Java is installed. This is a standard environment variable used by many Java-based applications.

C.1.3. LD_LIBRARY_PATH

C.2. Compiler Enviroment Variables

C.2.1. LB_BLOXCOMPILER_SERVER_PORT

If set and if started in server mode, the compiler will attempt to use the value of this environment variable for the port it will use for listening for connections, rather than the default (???).

If set and if LB_BLOXCOMPILER_SERVER has also been set, the runtime will use the value of this environment variable as the port that it will connect to to contact the compiler server, rather than the default.

C.2.2. BG_PRINT_GENERATED

Setting this environment variable (to any value) triggers the BloxCompiler to print all code generated by MoReBlox.

C.2.3. BG_USE_SPARSE_TYPE

C.2.4. BloxGenericsDebug

C.2.5. BloxGenericsStdErr

C.2.6. LB_AGG_NO_PARSE

Directs the compiler to treat all aggregates or P2Ps as if it does not understand them. This will among other things, prevent the compiler from attempting to parse them for correctness.

C.2.7. LB_BLOXCOMPILER_DEBUG

C.2.8. LB_BLOXCOMPILER_LOG_CONF

C.2.9. LB_BLOXCOMPILER_PEDANTIC

If set to ERROR or error, will produce additional errors for code style issues. If set to any other value, will report the same code style issue as warnings.

C.2.10. LB_BLOXCOMPILER_PROFILE

C.2.11. LB_DETERMINISTIC_NAMES

If this environment variable is set, when the compiler is generating fresh names, it will use shorter names and always start it a fixed index rather than a random one. This is sometimes useful for debugging the difference between two runs without having to account for different names being chosen. However, this can sometimes cause failures when compiling MoReBlox programs as the two different instances of the compiler running may end up generating names that conflict.

C.2.12. BLOXCOMPILER_EDIT_DISTANCE

Setting this environment variable to a natural number will inform the compiler how close of a match is required when reporting errors like BLOCK_UNKNOWN_PREDICATE_TYPO that inform you you may have misspelled something. Setting the value to 0 will turn off these error messages, as it means an exact match is required. A value of 1 will allow matches with a Levenshtein edit distance of 1, which means that one insertion, deletion, or substitution is allowed. A value of 2 will allow matches with a Levenshtein edit distance of 2, etc.

C.2.13. LB_GEN_MOREBLOX_SCRIPT

C.2.14. LB_NEGATED_EXISTENTIALS

C.2.15. LB_PERFORM_DNF

Setting this environment variable to any other value than "true" will inform the compiler to not perform the DNF transformation. At this time, turning of the DNF transformation is only useful for experimental purposes as it is not yet possible to execute the code the compiler will produce if disjunctions are not removed.

C.2.16. LB_PROVENANCE_DEBUG

C.2.17. LB_SAFETY_DEBUG

C.2.18. LB_MOREBLOX_DEBUG

C.2.19. LB_MOREBLOX_STDERR

C.2.20. LB_MOREBLOX_PROFILE

C.2.21. USE_SCALABLE_TYPE

C.3. Runtime Enviroment Variables

C.3.1. LB_MEM

Amount of memory to be used by the BloxPagerDaemon (e.g. LB_MEM="6G").

C.3.2. LB_LOCK_TIMEOUT

Configure the time (in milliseconds) the engine will wait for a lock before aborting. LB_LOCK_TIMEOUT=0 will do no-wait locking, i.e. fail immediately.

This environment variable should not be used, except for performance experiments.

C.3.3. LB_MONITOR_LOCK_WAIT_TIME

If this variable is set (time in seconds), a log message will be generated at the warning log level if the time it takes to acquire a predicate lock exceeds this value. The warning message contains details about the lock and the wait time.

C.3.4. BLOX_PM_PROCESSING_TIMEOUT

Maximum number of milliseconds a transaction is allowed to run for. The value 0 means that infinite transactions are allowed.

C.3.5. LB_SYNC_PAGES

If set (to any value), synchronize database files to disk after every commit. Currently, the LogicBlox runtime does not synchornize to disk by default. We recommend that this environment variable is set to avoid data corruption.

C.3.6. LB_SYNC_DELTA_PAGES

If set (to any value), synchronize database transaction files to disk after a commit. For delta files, this is less useful, and we currently do not suggest to use this setting.

C.3.7. LB_USE_INDEXES

If set to true, enable the experimental alternative index feature.

C.3.8. LB_EXCEPTION_STACK_TRACE

For every exception thrown internally in the runtime, print the stacktrace when it is caught. This includes exceptions that are not necessarily real errors, since exceptions are also used for transaction aborts and constraint violaitons.

This environment variable should not be used, except for debugging purposes.

C.3.9. BLOX_LOG_PREFIX

The prefix used for all messages printed to the logger. See the documentation on log levels.

C.3.10. LB_BLOXPAGER_LOGFILE

Logfile used by the BloxPagerDaemon.

C.3.11. LB_PAGER_FAILSAFE_DIR

Directory to use when page tables do no longer fit in the available shared memory. This should never happen if a machine is configured correctly. If it does happen, then the shared memory need to be increased, or LB_MEM needs to be set to restrict the amount of memory used. If the failsafe directory is used, then this will reduce performance and may cause failure if the filesystem becomes full.

LB_PAGER_FAILSAFE_DIR should not be on a volume that has a high risk of filling up, which could cause a system crash due to filling up the root volume and workspace corruption when bloxpager is not able to write a page since the disk is full.

You can check the /tmp directory for very large files with LB_. The presence of these files is a sign that the overflow location has been used. Since the overflow files are NOT automatically cleaned by the pager, you should manually clean them by shutting down all lb services and then remove them as any other file.

Default: /tmp

C.3.12. LB_LIBJVM

Full path to the library libjvm.so, libjvm.dylib, or jvm.dll. This environment variable can be used if there are persistent problems with the runtime not using the right JVM.

C.3.13. LB_DEFAULT_ISOLATION_LEVEL

Acceptable values are SINGLE_USER, READ_UNCOMMITTED, READ_COMMITTED, LESSER_REPEATABLE_READ, SNAPSHOT_ISOLATION, REPEATABLE_READ.

You should probably stick to considering SINGLE_USER and LESSER_REPEATABLE_READ (latter is the default).

C.3.14. LB_BLOXCOMPILER_SERVER

Setting this environment variable informs the runtime to connect to the a compilation server for compilation requests rather than using an in-process JVM instance and JNI calls.

C.3.15. LB_BLOXCOMPILER_SERVER_HOSTNAME

If set and if LB_BLOXCOMPILER_SERVER has been set, the runtime will use the value of this environment variable as the hostname of the computer to connect to for compilation requests.

C.3.16. LB_JVM_ARGS

If the runtime is not using the compiler in server mode, it will pass the contents of this environment variable as arguments to the in-process JVM it starts.

Note, if the LB_BLOXCOMPILER_SERVER variable is set, then this environment variable will be ignored. If using a compilation server you will need to specify the arguments you would like when invoking java to start the server.

C.4. Deployment Enviroment Variables

C.4.1. LB_DEPLOYMENT_HOME (introduced in 3.9)

Location for log files, configuration files, and temporary files to keep track of processes.

C.4.2. LB_LOGDIR

Default location (directory) of log files used by LogicBlox services. The default value is $LB_DEPLOYMENT_HOME/logs

C.4.3. LB_APPLET_CONFIG (introduced in 3.9)

Full path to the configuration file that customizes the configuration of the blox-applet-server. This variable can also be set to the empty string, which indicates to the lb-services script that blox-applet-server should be started, but not custom configuration is used.

C.4.4. LB_WORKSPACES (introduced in 3.9)

Space-separated list of full paths to workspaces deployed on a server. This is used for capturing the server state.

Note: we plan to soon only require this configuration for non-ConnectBlox workspaces.

Appendix D. blox:compiler API specification

The blox:compiler API is the part of the compiler metamodel that is intended to be used in users' analysis queries. Its "top-level" module, blox:compiler, contains declarations of predicates corresponding to various DatalogLB constructs (e.g., "predicate", "rule" etc). Moreover, for most of these constructs there are individual modules containing predicates representing properties of the corresponding construct (e.g., the name of a predicate is declared as blox:compiler:predicate:qualifiedName, the head of a rule is blox:compiler:rule:head etc).

D.1. blox:compiler

block(`compiler) {
  export(`{
    code(P) -> .
    project(P) -> code(P).
    block(B) -> code(B).
    clause(C) -> code(C).
    constraint(C) -> clause(C).

    rule(R) -> clause(R).
    externalAgg(Agg) -> rule(Agg).
    knownAgg(Agg) -> externalAgg(Agg).

    /*********** formula structure **********************/
    formula(F) -> code(F).
    quantifiedFormula(F) -> formula(F).
    exists(F) -> quantifiedFormula(F).

    /* A composite formula, i.e., a conjunction or disjunction of 
       smaller formulas. */
    compositeFormula(F) -> formula(F).
    comparison(F) -> formula(F).
    negation(F) -> formula(F).
    atom(F) -> formula(F).
    deltaAtom(F) -> atom(F).

    /*********** expression structure **********************/
    expr(E) -> code(E).

    /*********** Constants *********************************/
    constant(C) -> expr(C).
    stringConst(C) -> constant(C).
    intConst(C) -> constant(C).
    uintConst(C) -> intConst(C).
    realConst(C) -> constant(C).
    boolConst(C) -> constant(C).
    datetimeConst(C) -> constant(C).

    /********* Region *********************/
    region(X) -> .
   
    /********* Predicates, types etc **********/
    predicate(P) -> .

    type(P) -> predicate(P).
    entity(P) -> type(P).
    primitiveType(P) -> type(P).
    numericType(P) -> primitiveType(P).

    /*********** Variable Declaration ***********/
    vardecl(V) -> .


    /*********** Symbols ***********/
    symbol(E) -> expr(E).

    predicatename(N) -> symbol(N).
    blockname(N) -> symbol(N).

    /*********** Variable Identifier ***********/
    varname(V) -> symbol(V).

    aggOperation(O) -> code(O).

    /********** Application *********************************/    
    application(A) -> expr(A).
    deltaApplication(A) -> application(A).

    /********** Binary Expressions *************************/
    binaryExpr(B) -> expr(B).

    /********** Other Expressions *************************/
    notExpr(E) -> expr(E).
    predicateExpr(E) -> expr(E).

    /********* Source code position representation *********************/
    range(R) -> .

    /* Clause sets, needed for MoReBlox compilation */
    clauseSet(CSet), clauseSet_ref(CSet:i) -> uint[32](i).
    anonBlock(CL) -> clauseSet(CL).

  }),
  clauses(`{

    lang:entity(`code).
    lang:entity(`project).
    lang:entity(`region).
    lang:entity(`block).
    lang:entity(`type).
    lang:entity(`entity).
    lang:entity(`primitiveType).
    lang:entity(`numericType).
    lang:entity(`clause).
    lang:entity(`constraint).
    lang:entity(`rule).
    lang:entity(`externalAgg).
    lang:entity(`knownAgg).
    lang:entity(`predicate).
    lang:entity(`constant).
    lang:entity(`stringConst).
    lang:entity(`intConst).
    lang:entity(`boolConst).
    lang:entity(`realConst).
    lang:entity(`datetimeConst).
    lang:entity(`uintConst).
    lang:entity(`varname).
    lang:entity(`comparison).
    lang:entity(`expr).
    lang:entity(`formula).
    lang:entity(`compositeFormula).
    lang:entity(`quantifiedFormula).
    lang:entity(`negation).
    lang:entity(`atom).
    lang:entity(`deltaAtom).
    lang:entity(`vardecl).

    lang:physical:storageModel[`code] = "ScalableSparse".
    lang:physical:storageModel[`region] = "ScalableSparse".
    lang:physical:storageModel[`predicate] = "ScalableSparse".
    lang:physical:storageModel[`vardecl] = "ScalableSparse".

    lang:physical:capacity[`code] = 20000000.

    !(!primitiveType(_)).
    !(!numericType(_)).

    lang:entity(`aggOperation).
    lang:entity(`predicatename).
    lang:entity(`blockname).
    lang:entity(`deltaApplication).
    lang:entity(`binaryExpr).

  }) 
} <-- .

D.2. blox:compiler:project

block(`project) {
  export(`{
    /* The name of this project. */
    name[Prog] = Name -> project(Prog), string(Name).

    /* The libraries in which this project depends on */
    dependsOn(P,L) -> project(P), project(L).
  }),
  clauses(`{
    /* Each Project has a name */
    project(P) -> name[P] = _.
  })
} <-- . 

D.3. blox:compiler:block

block(`block) {
  export(`{
    /* The name of a block. */
    name[Block]=Name -> block(Block), string(Name).

    /* The display name of a block. */
    displayName[Block]=DName -> block(Block), string(DName).

    /* The level of a block. */
    level[Block]=Level -> block(Block), uint[32](Level).

    /* The file name of a block (i.e., the name of the file in which it is defined). */
    fileName[Block]=FileName -> block(Block), string(FileName).

    /* Whether this is a legacy block, i.e., it is not part of any module. */
    isLegacy(Block) -> block(Block).

    /* The "default" lifetime of predicates declared in this block (can be overridden 
       by lifetime declarations for specific predicates). */
    predLifetime[Block]=Lifetime -> block(Block), LIFETIME:TYPE(Lifetime).

    /* Whether this block is inactive. */
    isInactive(Block) -> block(Block).

    /* Whether this block is active. */
    isActive(Block) -> block(Block).

    /* Whether this block is an "execute" block. */
    isExecute(Block) -> block(Block).

    /* Whether this block is inactive after fixpoint. Inactive blocks are usually 
       executed at stage initial. But an inactive block after fixpoint can be 
       executed at stage final. */
    isInactiveAfterFixpoint(Block) -> block(Block).

    /* The stage of this block (PREVIOUS, INITIAL, OVERRIDE or FINAL). */
    stage[Block]=S -> block(Block), STAGE:MODIFIER(S).

    /* The lifetime of this block (DatabaseLifetime, SessionLifetime or TransactionLifetime). */
    lifetime[Block]=L -> block(Block), LIFETIME:TYPE(L).

    /* The project (entity) in which this block belongs. */
    inProject[Block]=Proj -> block(Block), project(Proj).    

    /* The name of the project in which this block belongs. */
    inProjectName[Block]=ProjName -> block(Block), string(ProjName).    

    /* The blocks that depend on this one (if any). */
    hasDependent(Block,DependentBlock) -> block(Block), block(DependentBlock).

    /** convenience predicates **/
    /* Get the id/internal representation for a block using its name and level. */
    byNameLevel[Name,Level]=Block -> string(Name), uint[32](Level), block(Block).

    /* Whether this block corresponds to some logic in a source file. */
    fromSource(Block) -> block(Block).

  }),
  clauses(`{

    block(Block) -> name[Block] = _, level[Block] = _.

    fromSource(Block) -> fileName[Block] = _.

    byNameLevel[Name,level] = Block
    <- 
        name[Block] = Name, 
        level[Block] = level.

    lang:derivationType[`isInactive]="DerivedAndStored".
    lang:derivationType[`isActive]="DerivedAndStored".
    lang:derivationType[`isExecute]="DerivedAndStored".
    lang:derivationType[`isInactiveAfterFixpoint]="DerivedAndStored".

  })
} <-- . 

D.4. blox:compiler:predicate

block (`predicate) {
  export(`{
    /* The name of this predicate without a namespace prefix. */
    localName[P]=N -> predicate(P), string(N).

    /* The full qualified name of this predicate, i.e., with a namespace prefix. */
    qualifiedName[P]=N -> predicate(P), string(N).

    /* The display name of this predicate as a string. Used to provide a more 
       user-friendly name for predicates that are generated from MoReBlox programs. */
    displayName[P]=Name -> predicate(P), string(Name).

    /* The arity of this predicate. */
    arity[P]=A -> predicate(P), uint[8](A).

    /* The number of key arguments of this predicate. */
    keyArity[P]=K -> predicate(P), uint[8](K).

    /* The level of this predicate (i.e., the level of the program in which it is declared). */
    level[P]=Level -> predicate(P), uint[32](Level).

    /* The type of the ith argument of this predicate. */
    argType[P,i]=ArgType -> predicate(P), uint[8](i), type(ArgType).


    /* The block in which this predicate was declared. */
    //declarationBlock[P]=B -> predicate(P), block(B).
    declarationBlock(P,B) -> predicate(P), block(B).

    /* Whether this predicate is local. */
    isLocal(P) -> predicate(P).

    /* Whether this predicate is a refmode for some entity. */
    isRefMode(P) -> predicate(P).

    /* Whether this predicate is polymorphic. */
    isPolymorphic(P) -> predicate(P).

    /* Whether this predicate is a functional mapping between two ordered entities. */
    isOrderedFunctionalMapping(P) -> predicate(P).

    /* Whether this is a file predicate. */
    isFilePredicate(P) -> predicate(P).

    /* Whether this is a pulse predicate. */
    isPulsePredicate(P) -> predicate(P).

    /* Whether this predicate is sealed. */
    isSealed(P) -> predicate(P).

    /* Whether this predicate is one-to-one. */
    isOneToOne(P) -> predicate(P).

    /* Whether this is a skolem predicate. */
    isSkolem(P) -> predicate(P).

    /* Whether this predicate is an auto-numbered refmode. */
    isAutoNumbered(P) -> predicate(P).

    /* Whether this predicate is synthetic. */
    isSynthetic(P) -> predicate(P).

    /* Whether this is a calculated predicate. */
    isCalculated(P) -> predicate(P).

    /* Whether this is a built-in predicate. */
    isBuiltIn(P) -> predicate(P).

    /* The default value of a predicate (if it has one). */
    defaultValue[P]=D -> predicate(P), constant(D).

    /* The derivation type of a predicate (i.e., one of NOT_DERIVED, 
       EXTENSIONAL, DERIVED, DERIVED_AND_STORED). */
    derivationType[P]=D -> predicate(P), DERIVATION:TYPE(D).

    /* The locking policy for this predicate (i.e., one of BY_PREDICATE, BY_ELEMENT, UNLOCKED). */
    lockingPolicy[P]=S -> predicate(P), LOCKING:POLICY(S).

    /* The storage model for this predicate (i.e., one of SPARSE, DENSE, DELTA_SPARSE, CHUNKED, 
       DELIMITED_FILE, BINARY_FILE, RAW_FILE, SCALABLE_SPARSE, TCP_STREAM). */
    storageModel[P]=S -> predicate(P), STORAGE:MODEL(S).

    /* The partitioning for this predicate (i.e., one of PARTITIONED, REPLICATED, MASTER_ONLY, FRAGMENT). */
    partitioning[P]=S -> predicate(P), PARTITIONING:SCHEME(S).


    /** file predicate properties **/
    /* The file path for a file predicate. */
    filePath[P]=S -> predicate(P), string(S).

    /* The delimiter for a file predicate. */
    delimiter[P]=D -> predicate(P), string(D).

    /* Whether this file predicate has line numbers. */
    hasLineNumbers(P) -> predicate(P).

    /* Whether this file predicate has column names. */
    hasColumnNames(P) -> predicate(P).

    /* The delimiter for the column names for this file predicate. */
    columnNameDelimiter[P]=S -> predicate(P), string(S).


    /** Skolem (constructor) for importing or creating new predicates, based on their name and level **/
    /* Get the id/internal representation for a predicate using its name and level. */
    byNameLevel[Name, Level] = Pred -> string(Name), uint[32](Level), predicate(Pred).    

    /**** Methods related to protobuf import ****/
    /* Whether this predicate was actually imported from a predicate 
       declaration in a source file, or is a built-in or some other 
       predicate that was necessary for the import to work. */
    hasDeclaration(P) -> predicate(P).

    /* To be deprecated, but for now is needed for protobuf export/compare. */
    isOwned(P) -> predicate(P).    

    /* Whether this is a scalar predicate. */
    isScalar(P) -> predicate(P).

    /* Whether this is a functional predicate. */
    isFunctional(P) -> predicate(P).

    /* Whether this is not a functional predicate. */
    isNotFunctional(P) -> predicate(P).

    /* Whether this is a comparison predicate. */
    isComparisonPredicate(P) -> predicate(P).

    hasTransactionLifetime(P) -> predicate(P).

  }),

  clauses(`{
    lang:constructor(`byNameLevel).

    /* Unary (non-entity) Predicates */
  
    lang:isEntity[`isLocal]=false.
    lang:isEntity[`isRefMode] = false.
    lang:isEntity[`isFilePredicate] = false.
    lang:isEntity[`hasDeclaration] = false.

    /* Mandatory Role Constraints */

    predicate(P) -> localName[P] = _, qualifiedName[P] = _.
    predicate(P) -> arity[P] = _, keyArity[P] = _.

    /* Arity-related Constraints */

    hasTransactionLifetime(P) 
    <- 
       isPulsePredicate(P) ;
       (declarationBlock(P,B), 
        ((isLocal(P), block:stage[B]=STAGE:INITIAL[]) ;
         (block:isInactive(B))
        )
       ).

    hasArgsUpToIndex(P, i)
    <-
        hasArgsUpToIndex(P, i - 1),
        argType[P, i] = _.

    hasArgsUpToIndex(P, 0)
    <-
        argType[P, 0] = _.

    arity[P] = m -> m = 0 ; hasArgsUpToIndex(P, m - 1) ; isPolymorphic(P).

    // arity[P] = m, m  > 0 -> argType[P, m - 1] = _.
    // argType[P, i] = _ -> i = 0 ; argType[P, i - 1] = _.

    argType[P, i] = _ -> i >= 0, i < arity[P].

    /* Various Constrainnts */

    declarationBlock(P,_) -> hasDeclaration(P).
    hasDeclaration(P) -> declarationBlock(P,_).
    isAutoNumbered(P) -> isRefMode(P).

    /* Type Constraints */

    // By convention we set these as below - in protobuf message entities don't actually 
    // have an arity field
    type(P) -> arity[P] = 1, keyArity[P] = 1.

    /* File-predicates */

    filePath[P] = _   ; 
    delimiter[P] = _  ;
    hasLineNumbers(P) ; 
    hasColumnNames(P) ;
    columnNameDelimiter[P] = _ 
        -> isFilePredicate(P).

    /* Utility predicates */

    isFunctional(Predicate)
    <-
        arity[Predicate] > keyArity[Predicate].

    isNotFunctional(Predicate)
    <-
        arity[Predicate] <= keyArity[Predicate].

    isScalar(Predicate)
    <-
       arity[Predicate] = 1,
       keyArity[Predicate] = 0.

    /* Produce Type self-referencing Argument Type Info */
    argType[P,0] = P <- type(P).

  })
} <-- .

D.5. blox:compiler:entity

block (`entity) {
  alias_all(`predicate),
  export(`{
    /** entity properties **/
    /* The direct supertypes of this entity type. */
    hasSuperType(Type,Super) -> type(Type), type(Super).

    /* Whether this is a top type, i.e., does not have any supertype. */
    isTopType(P) -> type(P).

    /* The refmode for this entity (if it has one). */
    hasRefMode[Ent]=Ref -> entity(Ent), predicate(Ref).

    /* Whether this entity is ordered. */
    isOrdered(P) -> entity(P).

    /* The capacity of this entity (as specified in the source file through lang:physical:capacity). */
    capacity[P]=C -> entity(P), uint[64](C).

    /* The top type that is a (direct or transitive) supertype of this entity type. */
    hasBaseType[Type]=BaseType -> type(Type), type(BaseType).

  }),

  clauses(`{
    lang:isEntity[`isOrdered] = false.
    lang:isEntity[`isTopType]=false.

    /* Type Constraints */

    isTopType(P) -> !hasSuperType(P, _).
    type(P), !hasSuperType(P, _) -> isTopType(P).

    hasBaseType[_] = BaseType -> isTopType(BaseType).    

    hasBaseType[Pred]=Base
      <-  hasSuperType(Pred,Base), isTopType(Base)
        ; hasSuperType(Pred,Super), hasBaseType[Super]=Base.

    hasRefMode[Entity]=RefMode
      <-   isRefMode(RefMode), argType[RefMode,0]=Entity, entity(Entity)
         ; hasBaseType[Entity]=Base, hasRefMode[Base]=RefMode, entity(Entity).

  }) 
} <-- .

D.6. blox:compiler:clause

block(`clause) {
  export(`{
    /* The block in which this clause belongs. */
    inBlock[Clause] = Block -> clause(Clause), block(Block).

    /* The region that this clause occupies. */
    hasRegion[Clause] = Reg -> clause(Clause), region(Reg).

    /* Print position information for this clause as a string. */
    regionAsString[Clause] = PosString -> clause(Clause), string(PosString).

    /* Print this clause as a string. */
    toString[Clause] = String -> clause(Clause), string(String).

    /* The quantified variables in this clause. Should not 
       include variables that are quantified in the head and
       body formulas, i.e., these should only be the common
       variables between the head and body of a rule. */
    quantifiesVar(Clause,Var) -> clause(Clause), vardecl(Var).
  }),
  clauses(`{

    clause(C) -> inBlock[C] = _ ; rule:viewRuleInBlock[C] = _.
    clause(C) -> toString[C] = _.
    clause(C) -> hasRegion[C] = _, regionAsString[C] = _.
    clause(C) -> rule(C) ; constraint(C).

    regionAsString[c] = s
    <-
       hasRegion[c] = Region,
       region:startLine[Region] = Line,
       region:startColumn[Region] = Col,
       region:inBlock[Region] = BlockName,
       s = "at block " + BlockName + 
            ", line:col " + 
            uint32:string:convert[Line] + ":" + uint32:string:convert[Col] + "\n".
  })

} <-- .

D.7. blox:compiler:rule

block(`rule) {
  export(`{
    /* Print this rule as a string. */
    toString[Rule] = repr -> rule(Rule), string(repr).

    viewRuleInBlock[Rule] = Block -> rule(Rule), block(Block).

    /* The formula comprising the head of a rule */
    head[Rule]=Head -> rule(Rule), formula(Head).
    /* The formula comprising the body of a rule */
    body[Rule]=Body -> rule(Rule), formula(Body).

    /* Linearly recursive rules */
    linearlyRecursive(r) -> rule(r).
    /* Uniquely derived rules */
    uniquelyDerived(r) -> rule(r).

    /* Alternate Indexes */
    head_alt_index(Rule,Head) -> rule(Rule), formula(Head).
    body_alt_index(Rule,Body) -> rule(Rule), formula(Body).
  }),
  clauses(`{
    lang:isEntity[`linearlyRecursive] = false.
    lang:isEntity[`uniquelyDerived] = false.

    rule(r) -> head[r] = _, body[r] = _.

    formula:inClause[f] = r <- !formula:isEmpty(f), (head[r] = f ; body[r] = f).

    head_alt_index(Rule,Head) <- head[Rule] = Head.
    body_alt_index(Rule,Body) <- body[Rule] = Body.

    toString[Rule] = s
    <-
       head[Rule] = Head,
       body[Rule] = Body,
       formula:toString[Head] = hs,
       formula:toString[Body] = bs,
       s = hs + " <- " + bs + ".\n".

  })
} <-- .

D.8. blox:compiler:constraint

block(`constraint) {

  export(`{
    /* The left-hand side of a constraint. */
    lhs[Cons]=LHS -> constraint(Cons), formula(LHS).
    /* The right-hand side of a constraint. */
    rhs[Cons]=RHS -> constraint(Cons), formula(RHS).

    /* Print this constraint as a string. */
    toString[Constraint]=String -> constraint(Constraint), string(String).
  }),
  clauses(`{

    constraint(c) -> lhs[c] = _, rhs[c] = _.

    formula:inClause[f] = c <- !formula:isEmpty(f), (lhs[c] = f ; rhs[c] = f).

    toString[Constraint]=s
    <-
       lhs[Constraint]=Lhs,
       rhs[Constraint]=Rhs,
       formula:toString[Lhs]=ls,
       formula:toString[Rhs]=rs,
       s = ls + " -> " + rs + ".\n".

  })
} <--.

D.9. blox:compiler:externalAgg

block(`externalAgg) {

  export(`{
    /* The library of this aggregate (i.e., "agg" or "choice"). */
    lib[Agg] = S -> externalAgg(Agg), string(S).

    /* The name of the aggregate operation for this aggregate clause as a string, e.g., min, max, count etc */
    opString[Agg] = S -> externalAgg(Agg), string(S).
  }),
  clauses(`{

    externalAgg(A) -> lib[A] = _, opString[A] = _.
  })
} <--.

D.10. blox:compiler:formula

block(`formula) {
  export(`{

    /* The clause in which this formula appears */
    inClause[F] = C -> formula(F), clause(C).

    /* Textual Representation */
    toString[F] = repr -> formula(F), string(repr).

    /* The type of a formula as a string, i.e., conjunction, disjunction or negation */
    hasType[N] = S -> formula(N), type:type(S).

    /* The region occupied by a formula. */
    regionOf[Formula] = Reg -> formula(Formula), region(Reg).

    /* Outer is a negated formula, Inner is the formula inside the negation */
    negatesInnerFormula[Outer] = Inner -> negation(Outer), formula(Inner).
    negationOf[Inner] = Outer -> negation(Outer), formula(Inner).

    /* Outer is a quantified formula, Inner is the formula inside the quantifiers of F */
    quantifiesInnerFormula[Outer] = Inner -> quantifiedFormula(Outer), formula(Inner).
    quantifiedBy[Inner] = Outer -> quantifiedFormula(Outer), formula(Inner).

    /* Outer is a composite formula, Inner is one of the simpler formulas that comprise F */
    subFormula(Outer,Inner) -> compositeFormula(Outer), formula(Inner).
    superFormula(Inner,Outer) -> compositeFormula(Outer), formula(Inner).

    /* Ordered subformulas */
    firstSubFormula[Outer] = Inner -> compositeFormula(Outer), formula(Inner).
    nextSubFormula[Prev,Outer] = Inner -> compositeFormula(Outer), formula(Inner), formula(Prev).
    existsSubFormulaParentOfLeftRightPair[Prev,Outer] = Inner -> compositeFormula(Outer), formula(Inner), formula(Prev).

    EMPTY_CONJUNCTION[] = F -> compositeFormula(F).
    EMPTY_DISJUNCTION[] = F -> compositeFormula(F).
    
    //////////////// Useful Predicates for analysis queries ////////////////

    /* Atoms appearing in a formula */
    hasAtom[Formula,Atom] = negated -> formula(Formula), atom(Atom), boolean(negated). 
    hasAtom_alt[Atom,Formula] = negated -> formula(Formula), atom(Atom), boolean(negated). 

    /* DELTA_INSERT or DELTA_UPSERT atoms appearing in a formula */
    hasAssertAtom(Formula,Atom) -> formula(Formula), deltaAtom(Atom).

    /* Atoms of a particular delta type appearing in a formula */
    hasDeltaAtom[Formula,Atom] = DeltaType -> formula(Formula), deltaAtom(Atom), DELTA:MODIFIER(DeltaType).

    /* Predicates whose atoms appear in a formula */
    containsPred(Formula,Pred) -> formula(Formula), predicate(Pred).

    /* Predicates whose non-delta atoms appear in a formula */
    containsPredNoDeltas(Formula,Pred) -> formula(Formula), predicate(Pred).

    /* The quantified variables for this quantified formula */
    quantifiesVar(Formula,Var) -> quantifiedFormula(Formula), vardecl(Var).

    /* The formula is empty */
    isEmpty(Formula) -> formula(Formula).
  }),
  clauses(`{

//    lang:isEntity[`isEmpty] = false.

    /* The empty formulas are shared, thus may have multiple regions/clauses */

    EMPTY_CONJUNCTION[] = F -> compositeFormula(F), !hasAtom[F, _] = _, hasType[F] = type:CONJUNCTION[].
    EMPTY_DISJUNCTION[] = F -> compositeFormula(F), !hasAtom[F, _] = _, hasType[F] = type:DISJUNCTION[].

    isEmpty(F) <- formula(F), !hasAtom[F, _] = _.

    !(!EMPTY_CONJUNCTION[] = _).
    !(!EMPTY_DISJUNCTION[] = _).
  
    /* Constraints */

    negatesInnerFormula[_] = f -> atom(f) ; atom(quantifiesInnerFormula[f]).

    negation(F) -> hasType[F] = type:NEGATION[].
    hasType[F] = type:NEGATION[] -> negation(F).

    hasDeltaAtom[Formula,Atom] = _ -> hasAtom[Formula,Atom] = _.

    formula(F) -> toString[F] = _.
    formula(F), !(inClause[F] = _) -> isEmpty(F).
    compositeFormula(F) -> hasType[F] = _.

    /* Transitively computing the clauses */

    inClause[F] = clause
    <-
        inClause[CF] = clause, (subFormula(CF, F) ; negatesInnerFormula[CF] = F ; quantifiesInnerFormula[CF] = F).

    /* Predicate hasAtom and variants */

    hasAtom_alt[F, F] = false
    <-
        atom(F).

    hasAtom_alt[A, F] = negated
    <-
        subFormula(F, SF),
        hasAtom_alt[A, SF] = negated.

    hasAtom_alt[A, Outer] = flip
    <-
        negationOf[Inner] = Outer,
        hasAtom_alt[A, Inner] = negated,
        ((negated = false, flip = true); (negated = true, flip = false)).

    hasAtom_alt[A, Outer] = negated
    <-
        quantifiedBy[Inner] = Outer,
        hasAtom_alt[A, Inner] =  negated.

    hasAtom[Formula, Atom] = neg <- hasAtom_alt[Atom, Formula] = neg.

    hasDeltaAtom[Formula, Atom] = DeltaType
    <-
       hasAtom[Formula,Atom] = _, atom:deltaType[Atom] = DeltaType.

    hasAssertAtom(Head,Atom)
    <- 
       hasDeltaAtom[Head,Atom] = DELTA:INSERT[]
     ; hasDeltaAtom[Head,Atom] = DELTA:UPSERT[].

    /* Alternate Indices */

    superFormula(Inner,Outer) <- subFormula(Outer,Inner).
    quantifiedBy[Inner] = Outer <- quantifiesInnerFormula[Outer] = Inner.
    negationOf[Inner] = Outer <- negatesInnerFormula[Outer] = Inner.

    /* Predicate Related */

    containsPred(Formula,Pred)
    <-
       hasAtom[Formula,Atom] = _, atom:pred[Atom] = Pred.

    containsPredNoDeltas(Formula,Pred)
    <- 
       hasAtom[Formula,Atom] = _, atom:pred[Atom] = Pred, !atom:deltaType(Atom,_).

    /* Textual Representation */

    _builderOf[F] = B -> string:builder(B), compositeFormula(F).
    _nodeOf[SF, F] = N -> string:builder:node(N), compositeFormula(F), formula(SF).

    lang:constructor(`_builderOf).
    lang:constructor(`_nodeOf).

    string:builder(B), _builderOf[F] = B
    <-
        compositeFormula(F), !isEmpty(F).

    string:builder:delim[B] = ","
    <-
        _builderOf[F] = B,
        hasType[F] = type:CONJUNCTION[].

    string:builder:delim[B] = ";"
    <-
        _builderOf[F] = B,
        hasType[F] = type:DISJUNCTION[].

    string:builder:node(N), 
    _nodeOf[Inner, Outer] = N
    <-
        subFormula(Outer,Inner).

    _nodeOf_alt[Outer, Inner] = N <- _nodeOf[Inner, Outer] = N.

    string:builder:node:valueOf[N] = str
    <- 
       _nodeOf_alt[_, Inner] = N, toString[Inner] = str.

    toString[F] = str
    <-
       hasType[F] = type:CONJUNCTION[],
       _builderOf[F] = B,
       string:builder:toString[B] = str.

    toString[F] = "(" + str + ")"
    <-
       hasType[F] = type:DISJUNCTION[],
       _builderOf[F] = B,
       string:builder:toString[B] = str.

    string:builder:firstNode(B, N)
    <-
       firstSubFormula[Outer] = Inner, 
       _builderOf[Outer] = B,
       _nodeOf[Inner, Outer] = N.

    string:builder:nextNode(P, B, N)
    <-
       nextSubFormula[Prev, Outer] = Inner,
       _builderOf[Outer] = B,
       _nodeOf[Inner, Outer] = N,
       _nodeOf[Prev, Outer] = P.

    toString[Atom] = s
    <- 
       atom:toString[Atom] = s.

    toString[Outer] = "!(" + s + ")"
    <- 
       negatesInnerFormula[Outer] = Inner,
       toString[Inner] = s.

    toString[Outer] = s
    <-
       quantifiesInnerFormula[Outer] = Inner,
       toString[Inner]=s.   

    toString[Formula] = ""
    <-
       isEmpty(Formula).

  })
} <-- .

D.11. blox:compiler:atom

block(`atom) {
  export(`{
    /* The predicate for this atom. */
    pred[Atom]=Pred -> atom(Atom), predicate(Pred).
    /* Alternative index for pred. */
    pred_alt_index(Atom,Pred) -> atom(Atom), predicate(Pred).

    /* The name of the predicate for this atom. */
    predName[Atom]=PredName -> atom(Atom), string(PredName).
    /* The number of arguments for this atom. */
    arity[Atom]=Args -> atom(Atom), uint[8](Args).
    /* The number of key arguments for this atom. */
    keyArity[Atom]=Keys -> atom(Atom), uint[8](Keys).
    /* The ith argument of this atom. */
    arg[Atom,i]=Arg -> atom(Atom), uint[8](i), expr(Arg).
    //non_fun_arg(Atom,i,Arg) -> atom(Atom), uint[8](i), expr(Arg).
    /* The delta type for this delta atom (DELTA_INSERT, DELTA_DELETE, DELTA_UPSERT or DELTA_UPSERT_DEFAULT). */
    deltaType[Atom]=dlt -> deltaAtom(Atom), DELTA:MODIFIER(dlt).
    deltaToString[Atom]=dlt -> atom(Atom), string(dlt).

    /* Whether this atom is one-to-one. */
    oneToOne(Atom) -> atom(Atom).
    /* The stage of this atom (PREVIOUS, INITIAL, OVERRIDE or FINAL). */
    stage[Atom]=stg -> atom(Atom), STAGE:MODIFIER(stg).
    stageToString[Atom]=stg -> atom(Atom), string(stg).

    /* The index of the last argument of this atom. */
    maxAtomArgIndex[Atom]=i -> atom(Atom), uint[8](i).

    /* The display name for this atom. */
    displayName[P]=S -> atom(P), string(S).    

    /* Whether this atom is negated inverse ??? */
    isNegatedInverse(A) -> atom(A).

    /* Print this atom as a string. */
    toString[Atom]=String -> atom(Atom), string(String).
  }),

  clauses(`{

    /* A distinct expression is generated per atom occurrence */

    _exprIn(E, A, i) <- arg[A, i] = E.
    _exprIn(E, A, i), _exprIn(E, B, j), constant(E) -> A = B, i = j.

    /* Mandatory Role Constraints */

    atom(A) -> pred[A] = _, predName[A] = _.
    atom(A) -> arity[A] = _, keyArity[A] = _.
    atom(A) -> stage[A] = _.

    atom(A) -> toString[A] = _.
    deltaAtom(A) -> deltaType[A] = _.
    
    /* Arity-related Constraints */

    hasArgsUpToIndex(A, i)
    <-
        hasArgsUpToIndex(A, i - 1),
        arg[A, i] = _.

    hasArgsUpToIndex(A, 0)
    <-
        arg[A, 0] = _.

    arity[A] = m -> m = 0 ; hasArgsUpToIndex(A, m - 1).
    arg[A, i] = _ -> i >= 0, i < arity[A].

    arg[_, _] = expr -> constant(expr) ; varname(expr).

    /* Utility-predicates */

    arity[Atom] = Arity
    <-
       pred[Atom]=Pred,
       predicate:arity[Pred]=Arity.

    pred_alt_index(Atom,Predicate)
    <- 
       pred[Atom]=Predicate.

    maxAtomArgIndex[Atom]=i
    <- 
       arity[Atom]=Arity,
       Arity > 0,
       m = Arity-1,
       uint32:uint8:convert[m]=i.

    /* Textual Representation */

    toString[Atom] = deltaToString[Atom] + PredName + stageToString[Atom] + argsToString[Atom]
    <-
       atom:pred[Atom] = Pred,
       (
         (!predicate:displayName[Pred]=_, atom:predName[Atom]=PredName) ;
          predicate:displayName[Pred]=PredName
       ).

    stageToString[Atom] = Stage
    <-
      (stage[Atom] = STAGE:PREVIOUS[], Stage = "@prev") ;
      (stage[Atom] = STAGE:INITIAL[],  Stage = "@init") ;
      (stage[Atom] = STAGE:OVERRIDE[], Stage = "@override") ; 
      (stage[Atom] = STAGE:FINAL[],    Stage = "").

    deltaToString[Atom] = Delta
    <-
      (deltaType[Atom] = DELTA:INSERT[], Delta = "+") ;
      (deltaType[Atom] = DELTA:DELETE[], Delta = "-") ;
      (deltaType[Atom] = DELTA:UPSERT[], Delta = "^") ;
      (deltaType[Atom] = DELTA:UPSERT_DEFAULT[], Delta = "^") ;
      (atom(Atom), !deltaType[Atom] = _, Delta = "").

    argToStr[Atom,i]=s
    <-
       arg[Atom,i]=arg, expr:toString[arg] = s.

    argsToStringUpToIndex(Atom,i,s)
    <-
      (i = 0, 
       argToStr[Atom,i]=s
      )  
      ; 
      (i > 0, 
       argToStr[Atom,i]=as,
       argsToStringUpToIndex(Atom,i-1,str),
       s = str + "," + as
      ).

    argsToString[Atom] = "()"
    <-
       arity[Atom] = 0.

    argsToString[Atom] = "[] = " + argToStr[Atom, 0]
    <-
       pred[Atom] = Pred,  
       predicate:isFunctional(Pred),
       maxAtomArgIndex[Atom] = 0.

    argsToString[Atom] = "[" + keys + "] = " + argToStr[Atom, value]
    <-
       pred[Atom] = Pred,
       predicate:isFunctional(Pred),
       maxAtomArgIndex[Atom] = value,
       argsToStringUpToIndex(Atom, value - 1, keys),
       value > 0.

    argsToString[Atom] = "(" + args + ")"
    <-
       pred[Atom] = Pred,
       !predicate:isFunctional(Pred),
       maxAtomArgIndex[Atom] = m,
       argsToStringUpToIndex(Atom, m, args).
       
  })
} <--.

D.12. blox