Chapter 41. Database Branching

LogicBlox 4 adds database branching to the features of the database engine. Branching provides LogicBlox databases with capabilities similar to versioning in other database systems, or branching and tagging in source code management systems. The LogicBlox branching functionality offers several advantages over versioning in other database systems:

  • Branching is fast. It happens in constant time regardless of database size.
  • Branches are perfectly isolated from each other and can have independent data and schemata.
  • Branches can be created from other branches, forming a tree of database changes.
  • Full database functionality is available on all branches (data queries and updates, schema changes, branching, etc.).

Branching is used internally for concurrency control in LogicBlox databases. Some branching capabilities are also exposed via command line utilities and lb-workflow, so they can be used for applications requiring functionality such as

  • "what-if" analysis or other tasks that require transactions to run for hours, days, or weeks;
  • live programming where schema or rule changes performed by one user must be isolated from other users’ activities in the same database;
  • application upgrades with simple backout of changes;
  • batch checkpoints to provide recovery or restart boundaries.

Although it is similar, LogicBlox database branching behaves somewhat differently from either branching or tagging in source code management systems like Mercurial. In a LogicBlox database, each transaction that modifies the state of the database (schema or data) will create a new version of the database. Each branch name in a LogicBlox database can be seen as a pointer to a specific database version. Modifying the state of the database effectively clones the most recent version of the branch being modified to create a new version of the database, applies a set of changes to the new version, and moves the branch pointer to the new version. This would be similar to a source code management system that combined branching and tagging, where a modification to a branch would automatically move the tag to the most recent branch state.

Each transaction that operates against the database works in a single branch only (data from other branches can be read, but not written). LogicBlox commands and protocols make it possible to specify the branch being operated on. If a branch is not specified, a transaction will operate on the default branch, whose name is master. Transactions on the master branch and transactions on other branches have the same behavior: the master branch is a pointer to a specific database version, and changes to the master branch will clone the version referred to by that pointer, apply the changes, and move the master branch pointer to the new version.

The LogicBlox branching implementation allows the default branch of a database to be changed. All this does is to move the master branch pointer of the database to the most recent version of a different branch. Afterwards, all database operation that do not explicitly mention a branch will work from the version referred to by the new master pointer (creating a new version and moving the master pointer as discussed above). The branch used as the new default branch will remain in the database. Its pointer will remain at the same database version, unchanged unless some transaction explicitly targets it, regardless of changes made to the master branch. Be aware that changing the default database branch will make it impossible to access data from the database version pointed to by the master branch prior to the change, unless you explictly create a new branch from the master branch before changing the default branch.

41.1. Special Branches

When a new database is created, it has a default branch called master. All database operations will target the master branch unless otherwise specified via command options or protocol parameters.

With current LogicBlox releases, the lb branches command will list a special branch called nil. This branch should only be used internally and should be hidden from applications. No application should issue any command that explicitly refers to the nil branch. Doing so could create a workspace that is unusable.

The --all option to the lb branches command will show all the branches created by an application, the master and nil branches discussed above, and also a set of branches whose names look like timestamps.

$ lb branches /test --all
master
branch_1
nil
2015-07-02 19:25:10,917476+00:00
2015-07-02 19:24:53,182987+00:00
2015-07-02 19:24:10,650640+00:00
2015-07-02 19:23:53,784605+00:00
2015-07-02 19:23:29,734409+00:00

The branches with timestamp names are automatically created to refer to database versions created by a transaction at a particular time. These can be used to recover back to a previous database state (see the replace-default-branch command discussed below). Currently only 64 of these automatic branches are maintained. Once more than 64 have been created, the database will start pruning them using an algorithm that leaves more versions closer to the current time and fewer versions spread farther apart as you go back in time. The current version pointed to by any branch (named or default) will not be pruned.

41.2. Illustration

To illustrate the relationships between branches and versions in a LogicBlox database, consider the following sequence of operations. We use the following conventions:

  • Above each image is a description of the database operation and the result depicted in the image.
  • Versions of the database are represented as cylinders with the version number inside.
  • Dotted lines from branch name to a database version indicate the pointer to the most recent version for each branch.
  • The default branch name is displayed as "master".
  • Solid arrows between database versions indicate a transaction commit that creates a new version from a prior version.
  • The default branch and version are highlighted in red.
1. Create a new database → default branch points to version 0
2. Modify the default branch → default branch points to version 1
3. Create a new branch, branch_1, from the default branch → default branch and branch_1 both point to version 1
4. Modify the default branch → default branch points to version 2, branch_1 stays at version 1
5. Modify branch_1 → default branch remains at version 2, branch_1 version moves to 3
6. Create branch called branch_2 from branch_1 → default version remains 2, both branch_1 and branch_2 versions point to 3
7. Change default branch to branch_1 → default, branch_1, and branch_2 versions all point to 3
8. Modify branch_2branch_2 version moves to 4, default and branch_1 versions remain at 3
9. Modify default branch → default version moves to 5, branch_1 version remains 3, and branch_2 versions remains 4
10. Delete branch_1branch_1 pointer to version 3 is removed, default version remains at 5 and branch_2 version remains at 4

41.3. Branches in LogiQL Rules

LogiQL rules can refer to predicates across branches using the pred@branch_name syntax. Current LogicBlox releases have the following restrictions on cross-branch rules:

  • Branch names must conform to predicate naming restrictions. They must start with an alphabetic character and contain only alphanumerics or underscores. You can create branch names that don't follow these rules, but if you do so you will not be able to refer to those branches in LogiQL rules. Note also that the name of a branch cannot be identical to the name of a transaction stage (see Section 19.4.4, “Stage suffixes”).
  • The rules cannot be active in the database. They can be added to the database as inactive rules to be executed as needed or can be executed dynamically (e.g., via the lb exec command).
  • Cross-branch rules can only write to the "current" branch for the transaction, either the default branch for the database or a named branch for the transaction, if specified. Predicates in the head of a rule must not include the @branch_name specification. For example,
    lb exec /test '^sales[] = 123.' --branch branch_name
    is legal while
    lb exec /test '^sales@branch_name[] = 123.' --branch branch_name
    is not.
  • A rule can read from more than one branch. For example, the command
    lb exec /test '^sales2[] = sales1@master[] + sales3@branch3[].' --branch branch2
    is legal.
  • All predicates in the rules must be defined in the branch of the transaction that executes the rules. (This does not mean that the predicates must be declared in that particular branch: a predicate that is defined in a branch is also defined in its descendants.)

    A related restriction is that you cannot install a LogiQL library that defines a predicate schema and also includes an inactive block with cross-branch rules that use the schema. The schema must be installed in a separate transaction before the inactive cross-branch rules are installed.

  • In general, schemas are allowed to vary across branches, but a predicate that is used in a cross-branch LogiQL rule must have the same signature in each of the relevant branches (i.e., the branch in which the rule is executed, and the branches that are mentioned in the rules via @branch_name, either explicitly or by being bound to a branch alias).

    Sometimes a predicate is an IDB predicate in one branch, and an EDB predicate in another. This presents no problems to cross-branch referencing, as long as the signature is the same.

  • The @branch_name syntax means that you can't look at other stages of predicates (i.e., @previous) in a foreign branch. The following is illegal
    lb exec /test '^sales[] = sales@branch3@previous[].'
  • Entity elements have internal indices maintained for them. These are maintained independently across branches, which means that you can't use entity-typed variables across branches in a rule. If you execute the following commands to create a database with a simple schema
    $ lb create --overwrite /test
    $ lb addblock /test 'sku(s), sku:id(s:n) -> string(n). sales[s] = v -> sku(s), int(v).'
    $ lb exec /test '+sku(s), +sku:id(s:"sku1"), ^sales[s] = 123.'
    $ lb branch /test branch1
    then the command
    lb exec /test '^sales[s] = 222 <- sales@branch1[s] != 0.'
    will be illegal: the variable s is of an entity type, so it cannot be used in two branches. Instead, you must do something to map entities from one branch to entities in another, perhaps using refmodes like
    lb exec /test '^sales[s2] = 222 <- sales@branch1[s] != 0, sku:id@branch1(s:sid), sku:id(s2:sid).'
    See Section 8.6, “Foreign Predicates” for more details.

Example 41.1. Importing entities from a different branch>

If an entity predicate is associated with a constructor predicate, we can import entities across branches by importing from the constructor predicate.

create --unique

addblock <doc>
  e(v) -> .
  c[u] = v -> string(u), e(v).
  lang:constructor(`c).
</doc>

branch A
branch B

exec --branch A <doc>
  +c["alpha"] = _.
  +c["beta" ] = _.
</doc>
echo -- c 0:
print c

exec --branch B <doc>
  +c["gamma"] = _.
</doc>

exec <doc>
  +c["delta"] = _.
</doc>
echo -- c 1:
print c

exec <doc>
  +c[x] = _ <- c@A[x] = _.
</doc>
echo -- c 2:
print c

exec --branch A <doc>
  +c[x] = _ <- c@B[x] = _.
</doc>

exec <doc>
  +c[x] = _ <- c@A[x] = _.
</doc>
echo -- c 3:
print c

close --destroy

The results are:

created workspace 'unique_workspace_2016-11-16-20-13-20'
added block 'block_1Z1B38UR'
-- c 0:
-- c 1:
"delta" [10000000006]
-- c 2:
"alpha" [10000000001]
"beta"  [10000000000]
"delta" [10000000006]
-- c 3:
"alpha" [10000000001]
"beta"  [10000000000]
"delta" [10000000006]
"gamma" [10000000002]
deleted workspace 'unique_workspace_2016-11-16-20-13-20'

Branch aliases

The @branch tag need not name an existing branch. It may be a branch alias, i.e., a variable that must be bound to a concrete branch name at execution time.

This feature is particularly useful in an inactive block, which may be activated in a number of different situations and refer to different branches, depending on the circumstances.

Example 41.2. Using branch aliases

The following lb script declares four branches which contain compatible declarations of predicate p.

On branch D there are two additional predicates, q and r, and an inactive block, TEST, which fills these predicates with the contents of p from other branches. These branches are unspecified, i.e., they are represented by branch aliases.

On each of the branches A, B and C predicate p is populated with different contents.

The inactive block is executed on branch E (an offshoot of D) after binding the two branch aliases to A and B. (Please note that the argument of option bind-branch must not contain any whitespace.)

The intention is to populate q with the contents of p@A, and r with the contents of p@B.

Finally, the inactive block is executed on branch D, but this time both branch aliases are bound to C, so both q and r will be filled from the same source.

create --unique

branch A
branch B
branch C
branch D

addblock --branch A <doc>
p(x) -> int(x).
</doc>

addblock --branch B <doc>
p(x) -> int(x).
</doc>

addblock --branch C <doc>
p(x) -> int(x).
</doc>

addblock --branch D <doc>
p(x) -> int(x).
q(x) -> int(x).
r(x) -> int(x).
</doc>

addblock --branch D --name TEST --inactive <doc>
+q(x) <- p@Alias(x).
+r(x) <- p@AnotherAlias(x).
</doc>

exec --branch A <doc>
+p(1).  +p(2). +p(3).
</doc>

exec --branch B <doc>
+p(3). +p(4).  +p(5).
</doc>

exec --branch C <doc>
+p(6). +p(7).
</doc>

branch E --parent D

exec --branch E --bind-branch Alias=A,AnotherAlias=B --storedBlock TEST

exec --branch D --bind-branch Alias=C,AnotherAlias=C --storedBlock TEST

echo D q:
print --branch D q
echo D r:
print --branch D r
echo E q:
print --branch E q
echo E r:
print --branch E r

close --destroy

The output is:

created workspace 'unique_workspace_2016-06-13-23-57-44'
added block 'block_1Z1C3KPN'
added block 'block_1Z1CXOWI'
added block 'block_1Z1DQ0PC'
added block 'block_1Z1EI6GM'
added block 'TEST'
D q:
6
7
D r:
6
7
E q:
1
2
3
E r:
3
4
5
deleted workspace 'unique_workspace_2016-06-13-23-57-44'

41.4. Command-line Utilities

The lb utility includes several commands to query and manipulate branches:

lb branches

List all branches currently in the database. This requires a workspace name. The --all option will cause all application-defined and internally generated branches to be listed. Without --all, only application-defined branches are listed.

Note

An internal branch named nil will currently be listed. This branch should not be used by applications.

lb branch

Create a new named branch from either the current default branch or from another named branch. This command requires the workspace name and new branch name. The --parent option can be used to create a new branch from the most recent version of some other named branch instead of the most recent version of the current default branch. The --overwrite option can be used to replace the current branch with the same name, if it already existed. This is an atomic operation, so other clients cannot observe a moment where the branch does not exist.

lb delete-branch

Remove a named branch from a database. This requires a workspace name and a currently existing branch name. It is allowed to delete a branch that is a parent for some other branch. The parent is only important when a new branch is first created to determine what version of the database to clone for the new branch.

The default branch, named master, cannot be deleted. An error will be reported by the lb delete-branch command if you try.

Note

The special branch nil should not be deleted. Current releases allow this, but it can make the workspace unusable. None of the automatically generated backup branches should be explicitly deleted either.

lb replace-default-branch

Change the default branch to point to the most recent database version of a different named branch. All subsequent database operations that do not explicitly refer to a branch will start from this version. This requires a workspace name and the name of an existing branch.

Be aware that using this command will cause the master branch to point to a different database version. The database version previously pointed to by the master branch will be inaccessible unless you explicitly create a named branch from the master branch immediately before changing the default branch. Also note that if the database is being currently accessed by other clients while you are doing this, there could be changes made to the default branch between the time you create the new branch and the replacement of the default branch. Any such changes will be in some inaccessible database version.

In addition, all of the lb commands that manipulate data or schemata in a workspace accept a --branch option that identifies the branch on which to operate. If not specified, the default branch called master is used.

41.4.1. Command Examples

A new database has just the special nil branch plus the master default branch.

$ lb create --overwrite /test
$ lb branches /test
master
nil

Create a new branch called branch_1 from the default branch (master).

$ lb branch /test branch_1
$ lb branches /test
master
nil
branch_1

Add a schema to branch_1, leaving the default branch (master) alone.

$ lb addblock --branch branch_1 /test 'sku(s), sku:id(s:id) -> string(id). sales[sk]=v -> sku(sk), int(v).'
$ lb print /test sku
# error: Could not find predicate sku
$ lb print --branch branch_1 /test sku
# ok

Add data to branch_1, leaving default branch (master) alone.

$ lb exec --branch branch_1 /test '+sku(s), +sku:id(s:id), +sales[s]=10 <- id="sku_1";id="sku_2".'
$ lb print --branch branch_1 /test sales
[10000000004] "sku_2" 10
[10000000005] "sku_1" 10

Create branch_2 from branch_1 and show it has branch_1's data.

$ lb branch --parent branch_1 /test branch_2
$ lb print --branch branch_2 /test sales
[10000000004] "sku_2" 10
[10000000005] "sku_1" 10

Update/insert/delete data in branch_2. branch_1 remains unchanged.

$ lb exec --branch branch_2 /test '^sales[sk]=20 <- sku:id(sk:"sku_2").'
$ lb exec --branch branch_2 /test '-sales[sk]=v <- sku:id(sk:"sku_1"), sales@previous[sk]=v.'
$ lb exec --branch branch_2 /test '+sku(sk), +sku:id(sk:"sku_3"), ^sales[sk]=30.'
$ lb print --branch branch_2 /test sales
[10000000004] "sku_2" 20
[10000000007] "sku_3" 30
$ lb print --branch branch_1 /test sales
[10000000004] "sku_2" 10
[10000000005] "sku_1" 10

Change the default branch to branch_1 and show that printing from the default branch shows the data from branch_1.

$ lb replace-default-branch /test branch_1
$ lb print /test sales
[10000000004] "sku_2" 10
[10000000005] "sku_1" 10

Change the default branch to one of the automatically created backup branches (one that was created before any schema modifications happened).

$ lb branches /test --all
master
branch_2
nil
branch_1
2016-03-10 16:15:01,518449+00:00
2016-03-10 16:11:30,197487+00:00
$ lb replace-default-branch /test "2016-03-10 16:11:30,197487+00:00"
$ lb print /test sku
# error: Could not find predicate sku

Delete branch_1. This just removes the pointer from branch_1 to some database version. No other branches or versions are changed.

$ lb delete-branch /test branch_1

41.5. Protobuf Interfaces

Database branching functions can be accessed via a set of service interfaces built into the LogicBlox platform. Applications can interact directly with these interfaces via the lb web-client command, or from other languages, if necessary. The relevant pieces of the protocol specification are shown below. In this specification, the GetBranchNames* messages correspond to the lb branches command, the CreateNamedBranch* messages correspond to the lb branch command, the CloseNamedBranch* messages correspond to the lb delete-branch command, and the RevertDatabase* messages correspond to the lb replace-default-branch command.

/**
 * Request to retrieve a list of available workspace versions.  This will
 * include both named branches, and (if include_auto_versions is true)
 * automatic backups which have names like "2015.10.30.13.11.58.992634".
 */
message GetBranchNames {
   required string workspace = 1;
   required bool include_auto_versions = 2;
}

/**
 * Response for GetBranchNames request, containing the list of available
 * workspace versions.
 */
message GetBranchNamesResponse {
   repeated string names = 1;
}

/**
 * Request to create a new named branch of the workspace.  Fails if
 * a branch of that name already exists.
 */
message CreateNamedBranch {
   required string workspace = 1;
   required string branch = 2;
   optional string from_branch = 3;
   optional bool overwrite = 4;
}

message CreateNamedBranchResponse {
}

/**
 * Request to close a branch of the workspace.  All data associated with that
 * branch is irretrievably lost.  Fails if a branch of that name does not
 * exist.
 */
message CloseNamedBranch {
   required string workspace = 1;
   required string branch = 2;
}

message CloseNamedBranchResponse {
}

/**
 * Request to revert the database to a previous version.  The specified branch
 * will become the new default branch.  The old default branch will remain
 * accessible for some period of time as a backup version.
 */
message RevertDatabase {
   required string workspace = 1;
   required string older_branch = 2;
}

message RevertDatabaseResponse {
}

41.5.1. Installing the Services

Before the branching services can be used, they must be configured and installed. See the LogicBlox Administration Guide for details on service configuration. For the purposes of this introduction, start by creating the following files in the same directory:

branch-services.config:
[workspace:s_example]
service_get_branch_names = $(CONFIG_DIR)/get_branch_names.json
service_create_branch = $(CONFIG_DIR)/create_branch.json
service_delete_branch = $(CONFIG_DIR)/delete_branch.json
service_replace_default_branch = $(CONFIG_DIR)/replace_default_branch.json
get_branch_names.json:
{
  "handler" : "lb:web:connectblox:get_branch_names",
  "prefix"  : "/s_example/get-branch-names",
  "group" : "lb:web:internal",
  "http_method": "POST",
  "request_protocol"  : "ConnectBlox",
  "request_message"   : "GetBranchNames",
  "response_protocol" : "ConnectBlox",
  "response_message"  : "GetBranchNamesResponse"
}
create_branch.json:
{
  "handler" : "lb:web:connectblox:create_branch",
  "prefix"  : "/s_example/create-branch",
  "group" : "lb:web:internal",
  "http_method": "POST",
  "request_protocol"  : "ConnectBlox",
  "request_message"   : "CreateNamedBranch",
  "response_protocol" : "ConnectBlox",
  "response_message"  : "CreateNamedBranchResponse"
}
delete_branch.json:
{
  "handler" : "lb:web:connectblox:close_branch",
  "prefix"  : "/s_example/delete-branch",
  "group" : "lb:web:internal",
  "http_method": "POST",
  "request_protocol"  : "ConnectBlox",
  "request_message"   : "CloseNamedBranch",
  "response_protocol" : "ConnectBlox",
  "response_message"  : "CloseNamedBranchResponse"
}
replace_default_branch.json:
{
  "handler" : "lb:web:connectblox:replace_default_branch",
  "prefix"  : "/s_example/replace-default-branch",
  "group" : "lb:web:internal",
  "http_method": "POST",
  "request_protocol"  : "ConnectBlox",
  "request_message"   : "RevertDatabase",
  "response_protocol" : "ConnectBlox",
  "response_message"  : "RevertDatabaseResponse"
}

Then execute the following to install the service handlers (this assumes LogicBlox services are already running).

$ lb web-server load --config branch-services.config
installed static workspace s_example

If the services were correctly installed, they will appear when using the lb web-server list --services command.

$ lb web-server list --services
---------------------------------- ------------- -------------- ----------------- ----------------- ------- -----------
      Prefix                        HTTP Method    Workspace          Groups       Disabled Status   Realm   Endpoints
---------------------------------- ------------- -------------- ----------------- ----------------- ------- -----------
 /s_example/create-branch            POST          s_example       lb:web:internal   enabled          internal
 /s_example/delete-branch            POST          s_example       lb:web:internal   enabled          internal
 /s_example/get-branch-names         POST          s_example       lb:web:internal   enabled          internal
 /s_example/replace-default-branch   POST          s_example       lb:web:internal   enabled          internal

In the branch-services.config file, s_example is the name of a static workspace that hosts the services and can be changed if desired. The *.json files above contain configuration information for each service. The prefix property can be changed as well. The group property is used to specify access control to the service, with the lb:web:internal value meaning that the service can be accessed without authentication on the default lb-web internal port (55183). See the LogicBlox Administration Guide for more information on service access control. All the other properties in the *.json files should remain unchanged. The $(CONFIG) references in the branch-services.config file indicates that the *.json files will be located in the same directory as the branch-services.config file.

The content of the branch-services.config file can be added to the default lb-web-server.config file to cause these services to be initialized each time LogicBlox services are started on a machine. Just make sure that the *.json files are in the same directory as the lb-web-server.config file, or adjust their paths appropriately.

41.5.2. Example

As an example, JSON can be used to send a GetBranchNames request to a workspace by using the lb web-client command.

$ echo '{ "request": { "workspace": "/test" "include_auto_versions": false } }' | lb web-client call http://localhost:55183/s_example/get-branch-names
{"response": {"names": ["master","nil"]}}