LogicBlox 4.0.6

Release Date: January 7th 2014

Executive Summary

LogicBlox 4.0.6 introduces support for automatic parallel evaluation of a query across multiple cores. Domain parallelism, a mechanism for splitting data into pieces such that a query can be evaluated on each piece in parallel, allows LogicBlox to scale up its performance with the number of cores available in a single machine. Applications should expect dramatic performance improvements of both data load as well as query evaluation.

In addition to domain parallelism, 4.0.6 includes enhancements to the services framework that enforce stricter checking of incoming service messages, as well as support for specifying Java manifest files in the lb config tool.

What's New

Domain Parallelism

We are excited to introduce domain parallelism with LogicBlox 4.0.6. With domain parallelism the data needed for a given query is on-the-fly split up in pieces, such that a query can be evaluated for all pieces of data in parallel, using all available processor cores. Applications built using LogicBlox 4.0.6 can expect significant performance gains on multi-core machines, with data load, queries, and inactive blocks.

Figure 1 and Figure 2 illustrate the performance of LogicBlox 4.0.6 on the data-load of TPC-H, a standard decision support benchmark. The scale factor on the x-axis corresponds to the size of the data-set in gigabytes. The dark blue line in both charts is LogicBlox. The systems we compare ourselves to are PostgresSQL, MySQL, Amazon Redshift, MonetDB, and a commercial in-memory column store (legend left out for anonymization). We compare the duration of the load on two different AWS EC2 instances types: the hs1.8xlarge instance has 8 physical cores, and the cr1.8xlarge instance has 16 cores. All experiments are configured to use all cores. It is clear that with domain parallelism, LogicBlox outcompetes all systems on the data load, and that the gap grows bigger when more processor cores are available.

Figure 2.  TPC-H Data Load - cr1.8xlarge

TPC-H Data Load - cr1.8xlarge

Figure 3.  TPC-H Data Load - hs1.8xlarge

TPC-H Data Load - hs1.8xlarge

Figure 3 illustrates the performance gains of LogicBlox 4.0.6 compared to LogicBlox 3.x for the 22 queries of TPC-H, translated into LogiQL. The combined benefit of domain parallelism, our novel join algorithm, and accurate cost estimation results in a dramatic speed-up in total query evaluation time.

Figure 4. LogicBlox 4.0.6 vs. 3.x for 22 TPC-H queries

LogicBlox 4.0.6 vs. 3.x for 22 TPC-H queries

LogicBlox has also evaluated domain parallelism on several other benchmarks, such as a variety of graph queries, and queries using complex joins. Consistently, we have seen good scaling of the query with the number of available cores. Some of these experiments were earlier presented at the LogicBlox User Days 2013.

Note

Note that domain parallelism is currently enabled only for data loading and for query or inactive block evaluation. Evaluation and maintenance of IDB predicates will be parallelized in a subsequent release.

Services Framework

It is now possible to specify on a protobuf service whether it should parse the JSON strictly or loosely. The default, strict, behavior causes an error, if the JSON contains fields that are not specified in the protobuf specification. To specify that JSON messages should be parsed loosely, use the json_parsing predicate in configuration. The following example illustrates how to specify a service using loose parsing.

Example 3. Example of loose JSON parsing of a service

service_by_prefix["/loose-time"] = x,
    default_protobuf_service(x) {
      protobuf_protocol[] = "time",
      json_parsing[] = "loose",
      protobuf_request_message[] = "Request",
      protobuf_response_message[] = "Response"
    }.

Developer Tools

The jar() function of the LB configuration tool now takes a manifest parameter that specifies how to generate the manifest. It can use a static file, or it can configure the manifest with a main class and getting jars from the classpath.

Corrected Issues

The issues listed below have been corrected since the 4.0.5 release.

  • The default search path for Measure Service handlers changed to $(...)/lib/java/handlers, making it impossible for .jar files to be loaded by the LB web-server when not intended.
  • The lb services status command now prints the status OFF when services are not started.

Installation and Upgrade information

Installing LogicBlox 4.0.6 is as simple as following the steps outlined below:

  1. Download the installation package
  2. Extract the tarball in <YourPreferredInstallDirectory>
  3. Run the following command:
    source <YourPreferredInstallDirectory>/logicblox-4.0.6/etc/profile.d/logicblox.sh
    
    NOTE: this script will set all the necessary environment variables. You might want to add this command to your .bashrc.

Release Information

Table 3. 

Server requirements
Operating System: 64 bit Linux
Java Runtime Environment 1.7. You can check your Java version by executing java -version.
Python 2.7.