LogicBlox 4.0

Release Date: July 2nd 2013

Executive Summary

LogicBlox 4.0 unveils the foundation for the LogicBlox Smart Database technology. The Smart Database delivers raw performance, a simplified programming model and reduced database administration overhead, together with full ACID-compliance.

  • Innovations in the core query evaluation algorithm and the concurrency model enable LogicBlox 4.0 to out-perform LogicBlox 3.x across a wide spectrum of query workloads (transactional, analytical, as well as graph). For certain query workloads, LogicBlox 4.0 out-competes industry leaders.
  • Innovations in query optimization and continued improvement in the programming model reduce the need for manual tuning of queries as well as the database. Preliminary benchmarks show that for natural queries -- those that have not been hand-optimized by a human expert -- LogicBlox 4.0 delivers query evaluation times that are competitive with the evaluation times of hand-optimized queries on LogicBlox 3.x.
  • Beyond full ACID-compliance, LogicBlox 4.0 delivers full serializability as the default isolation level, further relieving programmers from having to reconcile anomalies that may arise from weaker isolation levels.

LogicBlox 4.0 supports applications built using the BloxWeb service-oriented architecture. Some modification of logic may be necessary. Please refer to the migration guide for details.

What's New

The more appropriate question for this release is, "What's not new?" The LogicBlox 4.0 runtime was reinvented from the ground up to improve raw query evaluation as well as concurrent read/write transaction performance. We show benchmarks that demonstrate the improvements in query evaluation and the handling of concurrent transactions. LogicBlox 4.0 continues to evolve the language, as well. The number of parameters a programmer needs to tune in order to meet an application's performance requirements has been dramatically reduced. We outline these simplifications. We also use benchmarks to demonstrate the effectiveness of automatic query optimization and tuning over manual efforts in LogicBlox 4.0.

On Performance

On Query Evaluation

One of the core innovations in LogicBlox 4.0 is the join algorithm. We demonstrate its current performance characteristics using three sets of benchmarks: TPC-H, some realistic queries extracted from an existing application that have proven to be problematic for LogicBlox 3.10, and the 4-clique query which demonstrates the applicability of LogicBlox 4.0 on graph algorithms.

Figure 1 illustrates the performance of LogicBlox 4.0 on TPC-H against LogicBlox 3.10, and LogicBlox 4.0-parallel. (Note that parallel evaluation is not included in LogicBlox 4.0 but is expected for a subsequent release). TPC-H is the industry standard benchmark suite for OLAP databases; it includes 22 complex queries over large volumes of data. These queries are representative of the queries seen on typical LogicBlox applications. Figure 1 shows that LogicBlox 4.0 takes only half the amount of time that LogicBlox 3.10 takes to complete the entire TPC-H suite. The median query time for LogicBlox 4.0 is 24.6 seconds, while 115.1 seconds on LogicBlox 3.10.

Figure 5. TPC-H Scale 10

TPC-H Scale 10

Figure 2 demonstrates the performance of LogicBlox 4.0 over LogicBlox 3.10 on two queries extracted from an existing project. These queries were found to cause application performance issues on LogicBlox 3.10. The figure demonstrates that LogicBlox 4.0 improves upon LogicBlox 3.10 performance by an order of a magnitude. It further demonstrates that running the same query the second time (columns 2 and 4) yields further gains due to the indexes that have already been created during the queries' first runs.

Figure 6. Sample problematic queries on 3.10

Sample problematic queries on 3.10

Finally, we demonstrate the performance of LogicBlox 4.0 on a graph query, 4-clique. A clique is a graph where every vertex is connected to every other vertex in the clique. Graph queries, such as 4-clique, are typical in its use of self-joins. Specialized databases have been implemented specifically to support graph queries. As shown in Figure 3, LogicBlox 4.0 (red line) outperforms PostgreSQL, Amazon Redshift, commercial in-memory column-store, as well as graph databases. This benchmark illustrates the unique promise for the LogicBlox database to unify not only the processing of transactional and analytical, but also graph workloads.

Figure 7. 4-Clique

4-Clique
On Concurrency

LogicBlox 4.0 implements a novel, lock-free concurrency model. This model is key in enabling high transaction throughput for mixed read and write, long- and short-running concurrent transactions. LogicBlox 4.0 currently supports fully concurrent read transactions, while serializing the writes. In future releases, we will be implementing concurrent writes, as well.

Figure 4 shows the scalability of LogicBlox 4.0 using a micro-benchmark, where we show the number of trivial read/write transactions per second on an small database, while increasing the number of threads available to the runtime. The blue and yellow lines show that for concurrent access with all readers, or one writer along with readers, LogicBlox 4.0 achieves nearly perfect, linear speedup as more threads are allocated to it. The green line demonstrates the impact of performing ad-hoc queries without pre-compilation; its performance indicates the cost of compiling queries. The ad hoc query scenario is a less likely scenario for users of deployed applications, and is more likely to impact application support personnel, who may use ad hoc queries to inspect the state of an application.

Figure 8. Perfect scaling of read/write concurrent transactions

Perfect scaling of read/write concurrent transactions

We also include three benchmarks showing transaction throughput at various levels af connectivity:

  • Figure 5 demonstrates read-only transaction throughput of LogicBlox 4.0 when transactions are executed directly against the lowest level, C++ API into the workspace. This benchmark provides a good indicator for what the runtime is capable of, independent of the overhead imposed by higher level layers such as the database server (ConnectBlox) layer and the service container (BloxWeb) layer.
  • Figure 6 demonstrates the throughput of a simple read-only transaction, when executed through the database server layer. Observe that we maintain a near perfect scaling to the number of threads. However, compared to Figure 5, one can observe around 25% reduction in transaction throughput in this scenario, which can be attributed to the overhead of the database server, communication over TCP sockets, as well as to additional resources needed to run separate clients applications.
  • Figure 7 demonstrates the throughput of a slightly more involved read-only transaction, which takes an input parameter from the client and producing output data. Again, while maintaining near-perfect scaling with respect to the number of threads, additional overhead of marshalling input and output data decreases performance.

Both Figure 6 and Figure 7 further demonstrate that LogicBlox 4.0 achieves orders of magnitude improvement in transaction throughput over LogicBlox 3.10.

Figure 9. Transaction throughput at lowest database connectivity layer

Transaction throughput at lowest database connectivity layer

Figure 10. Transaction throughput of a simple read-only transaction through ConnectBlox server layer

Transaction throughput of a simple read-only transaction through ConnectBlox server layer

Figure 11. Transaction throughput of a more involved read-only transaction through ConnectBlox server layer

Transaction throughput of a more involved read-only transaction through ConnectBlox server layer

On Simplifying Programming Model and Database Administration

LogicBlox 4.0 includes changes to the programming language that simplifies the number of choices a programmer has to make to produce a correct, performant program.

Simplified numeric types

LogicBlox 4.0 supports one simple decimal type, which is a fixed-point decimal type that replaces floating point decimals decimal[64] and decimal[128]. Fixed-point decimal provides exact representations for additions and substractions, and allows for optimization for certain aggregations. It is recommended over floating point types, i.e. float[32] and float[64].

Furthermore, LogicBlox 4.0 implements aggressive data compression. Thus, it is no longer necessary to declare integer types of various bit-width in an attempt to save storage space. We encourage the use of int[64].

Simplified predicate properties

LogicBlox 4.0 no longer requires the declaration of certain predicate properties regarding data storage and locking. The following predicate properties are no longer necessary, and will be removed in subsquent 4.x releases:

  • entity capacities
  • storage models
  • locking policies
  • scalable types
  • supplemental indexes

On automatic query optimization

We conclude with a benchmark illustrating the effectiveness of automated query optimization provided by LogicBlox 4.0. The first two columns of Figure 8 ("query1-first" and "query1-second") show the running times of a query on 3.10 as well as 4.0, the first time it is run, and the second. The second two columns ("query2-first" and "query2-second") show the running times of logically the same query, but hand-tuned for 3.10.

The chart illustrates that without programmer hand-tuning a query, LogicBlox 4.0 achieves 4x performance gains over LogicBlox 3.10. After going through the effort of hand tuning, however, LogicBlox 3.10 is able to evaluate the query faster than the non-parallel LogicBlox 4.0 (but still slower than the parallel version, expected for subsequent 4.x releases).

The key takeaway from the benchmark is that LogicBlox 4.0 is capable of providing very good performance without incurring the cost of programmers hand-tuning their queries. In addition to allowing programmers to express logic more naturally, system-generated optimizations is capable of adapting itself, and re-optimize, to the characteristics of the data as they change in the database. We encourage programmers to evaluate the performance of their natural queries on LogicBlox 4.0 without applying any manual optimization.

Figure 12. Effect of Tuning

Effect of Tuning

Known Limitations and Discontinued Feature

LogicBlox 4.0 supports applications build on the service-oriented framework only. Blade-based applications are not currently supported.

Additionally, the following features are not supported, but are planned for subsequent releases:

  • Delimited File Services can only be used to import/export simple files. Features such as error reporting or optional columns are not yet supported.
  • Mathematical programming and machine learning extensions.
  • Default value predicates.
  • Ordered entities.
  • Replace and remove block.
  • choice and ambig predicate-to-predicate mappings.
  • String concatenation aggregations.
  • argmin and argmax. To get the argument, an additional join can be used.
  • There is limited support for user-level logging. More detailed logging levels will be supported in subsequent releases.
  • Strings stored in predicates can be at most 255 characters long. Post-4.0 versions will remove this limit.
  • Predicates can have arity at most 64. Post-4.0 versions will remove this limit.
  • A few primitive predicates are missing. These include: boolean:hash, int64:hash, floatXX:hash, datetime:hash, float64:isFinite, and floatXX:round2.
  • We do not detect whether transactions that are marked as read-only are indeed read-only. Possible changes by transactions marked as read-only are most of the times not committed into the database; however, once in a while we do commit read-only marked transactions (so that we can make subsequent use of indices created in the transaction). Until we verify that read-only marked transactions do not change the database, it is the responsibility of the user to do so.
  • For functional EDB predicates, retractions are performed based on key match only, rather than on key-value match. For example, -a["foo"]="bar" removes a["foo"]="quz"; i.e., it behaves like -a["foo"]=_. This will be corrected in a future release.

The following features are removed, and will not be supported in subsequent releases:

  • Floating-point decimal primitive types (decimal[64] and decimal[128]) are replaced by a single fixed-precision decimal type.
  • Primitive type color is not supported.
  • Meta-predicates such as system:Predicate are not supported. Future versions of meta-predicates will likely differ substantially from how meta predicates are handled in 3.X.
  • MoReBlox. While continuing to be supported in the LogicBlox 3.x series, support for generic programming in future releases is undergoing a redesign process and will be different enough that existing MoReBlox programs will require a rewrite.
  • MochaBlox and WrappedControls. While continuing to be supported in the LogicBlox 3.x series, we encourage application programmers to choose the user interface framework of their choice, and communicate with the workspace over the service interface.

Installation and Upgrade information

Installing LogicBlox 4.0 is as simple as following the steps outlined below:

  1. Download the installation package.
  2. Extract the tarball in <YourPreferredInstallDirectory>
  3. Run the following command:
    source <YourPreferredInstallDirectory>/logicblox-4.0.0/etc/profile.d/logicblox.sh
    
    NOTE: this script will set all the necessary environment variables. You might want to add this command to your .bashrc.

Release Information

Table 9. 

Server requirements
Operating System: 64 bit Linux
Java Runtime Environment 1.7, update 11 or higher
Python 2.6.4