Chapter 5. Backup & Recovery

5.1. Exporting Databases using Hotcopy

LogicBlox supports hotcopy for making copies of the database that are consistent snapshots. With hotcopy the database can be copied while transactions are being processed, potentially modifying the database state. The state of the database copy will be identical to the source database at the moment when the hotcopy process started, even though the copying is taking place in parallel to concurrent write transactions.

Note: before LogicBlox 4.3.6.1 write transactions on the source database would be blocked during hotcopy operations due to an oversight in the implementation.

Similar to other database systems, it is not possible to naively do a file system copy of the database while the database is processing transactions. File system copies are not atomic operations, so the resulting database would not be a consistent snapshot, and likely to be unusable (corrupted).

To hotcopy a database to a backup location use lb export-workspace. For example, to export database test to the local directory /mnt/backup/copy_of_test use:

$ lb export-workspace test /mnt/backup/copy_of_test

The database is copied using a socket connection to the database server, so the database server process does not need permission to write to the target location.

The hotcopy operation can also be executed from a different machine. If the lb-server is running on a machine serv1, then we can hotcopy a database from serv1 directly to the filesystem of another machine, for example remote1. Here, the LogicBlox database software needs to be installed on machine remote1 as well and remote1 needs to be able to communicate with the lb-server running on serv1. To verify that communication is possible, use

user@remote1$ lb --hostname serv1 status

on remote1. If the lb-server on serv1 listens on a non-standard port, the --port option can be used. It might be necessary to use the full-qualified DNS name for serv1 or its IP address. For example, to create a local copy of the database test on serv1 into the directory /mnt/backup/copy_of_testat the machine remote1, use the following command on remote1:

 user@remote1$ lb --hostname serv1 export-workspace test /mnt/backup/copy_of_test

5.2. Importing Databases

To restore a database that had been exported, use lb import-workspace. For example, to import the database located in the directory /mnt/backup/copy_of_test into the lb-server as workspace restored use:

$ lb import-workspace restored /mnt/backup/copy_of_test

The import-workspace operation requires the lb-server process to have read permission to the source database files. Due to the direct usage of the file system by the server, importing databases from a remote server is currently not supported.

LogicBlox databases are stored in the workspaces directory the same way as they are stored after export, albeit under a different directory name. It is thus possible, though not encouraged, to simply copy an exported into the lb-server workspaces directory, but only when lb-server is not running. This could be exploited to directly import a database stored on a remote server without creating a local copy first. Before copying into the lb-server workspaces directory, obtain the directory name where the database will be stored via lb filepath test. Then, create this directory, and copy the database files into it. Once lb-server is started database test will be available.

5.3. Backup Strategies

Cloud Deployments

When the LogicBlox database is deployed on cloud infrastructure such as AWS, EBS snapshots of the device containing the workspaces are a good method for making backups. The snapshotting facility of cloud block devices is incremental, so taking snapshots regularly should be efficient. Note that fully initializing a volume from a snapshot can take many hours though, so if an application is restored to a snapshot, then initially the performance of the appliction can be significantly worse.

As long as the snapshotting facitility used is an atomic operation that creates a consistent snapshot of the file system, then the database can be actively processing transactions when the snapshot procedure is started.

If a striped RAID configuration is used, then it is not possible to obtain a consistent snapshot by snapshotting the striped EBS volumes separately.

Alternative to storing databases on EBS volumes, databases can also be stored on local instance storage. Because instance storage can easily be lost, it is best to combine this with replication.

Non-Cloud Deployments

For large databases, it may not be feasible to use full copies of the database as a backup solution. However, if the database is relatively small, then hotcopy can be used for backing up workspaces. LogicBlox currently does not support incrementally updating copies of the database to a new version.

If the filesystem supports taking atomic snapshots (eg., btrfs) or if the device-layer supports snapshotting in cooperation with the filesystem (eg., lvm +ext3/ext4), then it is also possible to take snapshots of the partition containing the LB_DEPLOYMENT_HOME/workspaces directory. Because of the way LogicBlox organizes data and the commit process, it is safe to take atomic snapshots of the filesystem for backup at any time even when write-transactions are concurrently processed. Note that simply copying the workspace folders is not an atomic operation and can easily create a corrupted workspace.

Tagging Database Versions for Point-in-time Recovery

LogicBlox supports tagging database revisions, which can be used as form of point-in-time recovery. If a database has multiple branches, then all branches have to be tagged separately. Until deleted, the tagged versions will use disk space in the database files. Depending on the application and update pattern, this could grow the database fairly quickly.

Backup vs Replication

Replication (discussed in the next section) is in principle the best method for keeping an up-to-date copy of the database available for recovery or failover. Combined with tagging, this also supports point-in-time recovery. However, if a deployment only uses replication as a recovery strategy, then only a single physical copy of the database is kept. If somehow the primary as well as the secondary databases get corrupted by an operation, then recovery to an earlier point-in-time is not possible and no recovery option is available at all. We recommend to combine replication with a backup strategy that does keep strictly separate physical copies of the database.

5.4. Replication

LogicBlox supports online replication, where transactions committed to a workspace on a primary server are automatically copied to a read-only workspace on one or more secondary servers. The replicated workspaces can answer read-only queries. If the primary server fails, one of the secondary servers can be “promoted” to the primary role.

5.4.1. Overview

This section gives a short tour of the replication facilities. In the examples below, we assume that there are two machines with host names primary and secondary, both running lb-server. The machine secondary must be able to connect to TCP port 5518 on primary. We use the prefix primary$ to denote commands executed on the primary server, and secondary$ for commands on the secondary server.

Note

You can try out replication on a single machine by running multiple lb-server instances. For example, the following starts an instance that listens on port 2000 and stores workspaces underneath /tmp/secondary:

$ lb-server --logfile /dev/stderr -p 2000 --adminPort 2001 --workspaceFolder /tmp/secondary

You can connect to this instance using lb’s --port flag, e.g. lb --port 2000 print test foo.

We start by creating a simple workspace named test on the primary:

primary$ lb create test
primary$ lb addblock test 'foo(x) -> int(x).'
added block 'block_1Z2N9P2J'
primary$ lb exec test '+foo(x) <- int:range(1, 10, 1, x).'

We can now tell the secondary server to start replicating workspace test:

secondary$ lb start-mirror test primary

When this command returns, the secondary will have a complete copy of the latest version of the workspace.

Note

With LogicBlox replication, the secondary servers connect to the primary, not the other way around. This is to make configuration easier: we do not need to change the configuration of the primary as secondaries come and go. If the primary were responsible for connecting to the secondaries, it would always need to have an up-to-date list of secondaries.

However, once the connection is established, it is the primary that pushes new versions of the workspace to the secondaries; secondaries do not need to poll continuously for updates.

The secondary cannot be modified, but it can answer read-only queries, e.g.

secondary$ lb print test foo
1
…
10

Note that lb query needs the flag --readonly:

secondary$ lb query --readonly test '_[x] = x * x <- foo(x).'
/--------------- _ ---------------\
1  1
2  4
…
10 100
\--------------- _ ---------------/

After replication has started, the secondaries automatically receive new versions of the workspace from the primary during or after every transaction. For example:

primary$ lb exec test '+foo(11).'

secondary$ lb query --readonly test '_[] = x <- agg<<x = max(y)>> foo(y).'
11

You can use the command lb info to get information about the status of a workspace. For instance, on the primary, it will show something like this:

primary$ lb info test
commit_version: 4
is_mirror: false
active_mirror_count: 1
…

The line commit_version: 4 shows that there have been four transactions in this workspace; is_mirror: false denotes that this is a primary workspace; and active_mirror_count: 1 indicates that one secondary server is currently replicating this workspace. Here is the output on the secondary:

secondary$ lb info test
commit_version: 4
is_mirror: true
is_mirroring: true
pages_received: 154
versions_received: 4
…

Here, the commit version is also 4, meaning that the secondary is up to date. The line is_mirroring: true shows that the secondary is currently connected to the primary. There are also some connection statistics: the secondary has received 4 versions and 154 database pages (each of which is 32 KiB).

You can temporarily stop replication using lb stop-mirror:

secondary$ lb stop-mirror test

secondary$ lb info test | grep is_mirroring
is_mirroring: false

In this state, the secondary does not receive updates from the primary. It can still answer read-only queries. You can resume replication in the same way you started the initial replication, using lb start-mirror:

secondary$ lb start-mirror test primary

Resumption is incremental: the primary will only send the delta between the most recent versions on the secondary and primary.

The final operation on a secondary workspace is to promote it, turning it into a regular (primary) workspace that can be modified and that can be replicated. This is done as follows:

secondary$ lb promote-mirror test

secondary$ lb info test | grep is_mirror
is_mirror: false

This operation is irreversible: it is not possible to demote a primary.

5.4.2. Synchronous and Asynchronous Replication

LogicBlox support synchronous as well as asynchronous replication. For synchronous replication, a transaction using diskcommit will not complete until the transaction is durably replicated to all connected mirrors.

replication_mode commit_mode
diskcommit softcommit
synchronous Transactions do not complete until they have been written to the primary disk and have been replicated to mirrors. Transactions complete before they have been written to the primary disk. Replication also might still be in progress. This configuration is currently not useful.
asynchronous Transactions do not complete until they have been written to the primary disk. Replication might still be in progress. Transactions complete before they have been written to the primary disk. Replication also might still be in progress.

In addition to the configuration on the primary server, the commit_mode on the secondary server can be used to refine the durability requirements. The commit_mode on the secondary determines when replication is considered complete on a mirror. If the setting is softcommit, then replication is considered complete before the transaction is durably written to disk on the secondary. If the setting is diskcommit, then the changes are written to disk before completion.

Synchronous replication of transactions only starts after an initial full replication of the database has completed. The purposes of this is to not block write transactions for the initial full replication, which could take many hours if the database is large.

5.4.3. Monitoring and Database Information

Progress tracking

Commit version and commit timestamp

5.4.4. Limitations

Replication currently has a number of limitations:

  • There is no automatic failover yet: secondaries are not automatically promoted if the master fails, and there is no election mechanism to select a secondary.

  • Secondaries do not yet automatically reconnect to the primary if the primary is restarted or the connection is interrupted. As a workaround, you can call lb start-mirror periodically from a cron job or similar; this command does nothing if replication is already active.

  • Secondaries are read-only. All changes must be performed on the primary workspaces.

  • While resuming replication is incremental (only the changes, rather than the full workspace, is sent), sending of the initial version is not incremental. Thus, if, during the creation of a new secondary, the initial lb start-mirror command is interrupted, a subsequent call to lb start-mirror must start all over.

  • The TCP connection between the primary and the secondaries is unencrypted. Thus, if they need to communicate across untrusted network links, it is important to use IPsec or a similar technology to establish a secure channel between the machines.