Chapter 22. Hierarchical Import/Export

Many applications store hierarchical data in the workspace. For instance, the following schema describes hierarchical information about a person:

block (`addrbook) {
  export ( `{
    person(x), person_id(x:id) -> int(id).
    person_age[x]=y -> person(x), int(y).
    person_address[x]=y -> person(x), addrbook:address(y).

    address(x), address_id(x:id) -> int(id).
    address_city[x]=y -> address(x), string(y).
    address_state[x]=y -> address(x), string(y).
 } ), ...

Ad hoc techniques for importing or exporting such data from the workspace can be complex and slow. For import, inherently hierarchical data must be flattened and imported into the workspace; for export, flat data extracted from the workspace must have its hierarchical structure reconstructed. In addition, import via delta rules or export via queries have negative performance implications.

Hierarchical import/export is designed to address this problem. Hierarchical import/export allows you to give a hierarchical description of your data as a Google Protocol Buffer Message. You can then write rules that pull data from the message into your working schema (for import), or derive data into the message from your working schema (for export).

22.1. Using Hierarchical Import/Export

There are four steps in using hierarchical import/export:

  1. Providing a specification of your data as a Google protocol buffer message.
  2. Adding a directive to your project file that will generate logic to represent your newly declared message types and ensure the runtime system is aware of these message types. (Project files are described in Chapter 23, LogiQL Project.)
  3. Writing rules that derive data from your message schema to your working schema, or vice versa.
  4. Using either lb commands or services (Chapter 26, Protobuf and Global Protobuf Services) to import/export your data.

In the remainder of this chapter we will use the above person schema as a running example.

22.1.1. Defining A Protocol Buffer Message Specification

This section demonstrates how to build a protobuf schema for representing information about a person, including her name and (some information about) her address. The following protocol buffer message specification describes such information:

package addrbook;

message Person {
  required uint32 age=1;
  required Address address=2;
}

message Address {
  required string city=1;
  required string state=2;
}

In what follows, we will assume this specification is the contents of file person.proto.

22.1.2. Importing the Protocol Message Specification

Adding the following line to your project file generates the definitions for Person and Address shown above and associates the descriptor with a name, myproto, that the runtime system uses to identify this family of message types.

myProject, project
person.proto, proto, descName=myproto

Several options can be given in the third field of the proto directive. These are described below.

descName=name

Required. Sets the descriptor name associated with imported protobuf message types.

lifetime=transaction or lifetime=database

Optional. Describes whether logical representation of the protobuf messages should have transaction or database lifetime. Default is database.

derivationType=edb or derivationType=idb

Optional. Determines whether the generated predicates will be declared as EDBs or IDBs. Declaring them as IDBs only makes sense if you will only ever be exporting data from the predicates, and not importing data. Default is edb.

protoPath=path

Optional. Search path for message types included in .proto files via import statements.

namespace={old1->new1 , old2->new2 , ...}

Optional. A map rewriting top-level namespaces for generated logic.

legacyLogic=true or legacyLogic=false

Optional. The default is false. When true, specifies that logic should be generated as flat files instead of modules. For forward compatibility, predicate names are identical whether or not legacyLogic is set. This is most useful in the case when recursive protobuf declarations would lead to illegal recursive modules.

dropPackages=p,q,r,...

Optional. The default is google.protobuf, blox.options, blox.internal. Specifies that logic should not be generated for the given protobuf packages. This can be useful when including third-party protobuf packages containing types that are not valid in LogiQL, or when a package is included twice via different proto project directives.

If two or more .proto files will create logic in the same namespace, it is necessary to import them together by listing them in the left column of a single proto directive. For example, suppose we refactored the message declarations above into two .proto files. The following directive will import messages from both and also rename the top-level LogiQL package used for generated logic.

myProject, project
person_only.proto addr_only.proto, proto, descName=myproto namespace={addrbook->foo}

The resulting predicate declarations are as follows:

block (`foo) {
  export ( `{
    Address(x), AddressId(x:id) -> int(id).
    Address_city[x]=y -> Address(x), string(y).
    Address_state[x]=y -> Address(x), string(y).

    Person(x), PersonId(x:id) -> int(id).
    Person_age[x]=y -> Person(x), int(y).
    Person_address[x]=y -> Person(x), foo:Address(y).
 } ), ...

22.1.3. Exchanging Data Between a Message and a Workspace

You are responsible for writing rules that populate the message schema with the data from your workspace. This is written using regular LogiQL logic. Below is an example of how to derive addrbook:Person and addrbook:Address entities for export, from corresponding person and address entities declared in a workspace.

begin_export() -> .
lang:pulse(`begin_export).

addrbook:Person(p_out),
addrbook:Person_age[p_out] = age,
addrbook:Address(a_out),
addrbook:Person_address[p_out] = a_out,
extract_address(a_in, a_out) <-
    addrbook:person(p_in), addrbook:person_age[p_in] = age,
    addrbook:person_address[_] = a_in, begin_export().

extract_address(a_in, a_out) ->
    addrbook:Address(a_in), addrbook:Address(a_out).
lang:pulse(`extract_address).

addrbook:address_city[a_out] = city,
addrbook:address_state[a_out] = state <-
    extract_address(a_in, a_out),
    addrbook:address:city[a_in] = city,
    addrbook:Address_state[a_in] = state.

The above rules are written with the assumption that it would be a pre-compiled block, called when necessary to export a message. Thus, it includes a pulse predicate begin_export. This predicate can be used to control when the rules generating message data should be evaluated.

Similar rules can be written to take data from the message predicates to your workspace.

22.1.4. Exporting/Importing a Message

To be able to read an exported message, or to construct a message to import, you would have to use code generated by protoc: a tool distributed with the Google Protocol Buffer tool. protoc can generate messaging APIs to be used with a number of different languages, such as C++, Python, Java, etc. To read more about the use of such APIs, please consult the protocol buffer manual.

Export

Exporting data from a workspace requires that you first evaluate the data exchange rules that convert data from your workspace to the message schema. Whether this is done by invoking a pre-compiled block or via one-off executions is the programmer's choice. Assuming that data is available in the message schema, one can use lb export-proto command to export that data into a message.

Import

Importing data into a workspace works similarly to exporting. A protocol buffer message must be constructed first, using the code generated by protoc from your message specification. A lb import-proto command can then be used to import a message into a workspace.

22.2. Set semantics for repeated fields

Protobuf repeated fields may be annotated to indicate that they should be represented as unordered sets instead of indexed predicates. This eliminates the need to generate or track indices. For example, protobuf declaration

repeated string foo = 1 [(blox.options.set) = true];

is represented in logic by

A_foo(x, y) -> A(x), string(y).