LogicBlox 4.4.0

Release Date: January 3rd 2017

Executive Summary

LogicBlox 4.4 introduces significant performance improvements to the database as well as tools:

  • The LogicBlox 4.4 database incorporates the use of write-optimized data structures. Applications with workloads that have random update behavior should expect significant performance improvements.

  • IDB rules defining min/max aggregations should see much improved performance during incremental maintenance.

  • Better export performance on TDX services that export default-valued predicates.

  • Improved installation time of large workflows.

LogicBlox 4.4 also introduces several exciting features for both end-users and developers:

  • Improvements to the extract-example command for isolating testcases from an existing database.

  • New options --estimate and --include-default for the lb popcount command.

  • New features for Modeler-js based applications, such as:

    • Support for form mode, as a new data rendering option.

    • A new tool bar to easily switch between the different view modes.

    • Ability to add level members directly from the grid.

    • Usability improvements that allow users to quickly change the axis configuration by being able to multi-select measures and levels to be placed on the view.

Please refer to LogicBlox 4.4.0 - New Features Playlist for video highlights of the new Modeler-js features.

What's New

Database

  • Write-optimized data structures: Before LogicBlox 4.4, the LogicBlox database used a relatively straightforward B+-tree based data structure for predicate storage. B+-trees are well-known to have performance issues when inserts or updates are random (i.e. for a sorted set of data the changes are not localized, but uniformly or randomly spread across the sorted data set). Note that when updates to a given predicate are sequential, the corresponding updates to indices over the predicate may be random. In LogicBlox 4.4 we are introducing a write-optimized data structure we call 'alpha tree', which is based on a write-optimized version of B+-trees called B-epsilon-trees. Workloads that have random update behavior will see significantly improved performance.

    Example 67. 

    For example, in one of the applications by our partners, data is continuously read from a queue, draining the entire queue in every iteration, and importing this data as a batch into the database. Depending on the load of the system, the batch size will increase to handle the load on the system (called dynamic batching). Our internal benchmarks are setup to test the performance of such a system with fixed batch sizes of 100, 1K, 5K and 50K rows to confirm that indeed the performance improves with bigger batches, and ideally performance for a specific batch stays constant over time. For any LogicBlox version before 4.4 this was not the case.

    The following charts show the batch duration in seconds (vertical) for different batch sizes (100, 1K, 5K, 50K) across the duration of the experiment (horizontal).

    • The red line represents timing using LogicBlox 4.3.15
    • The orange line represents timing using LogicBlox 4.3.16
    • The blue line represents timing using LogicBlox 4.4.0, using write-optimized data structures

    The performance win for this type of workload is evidently huge, and we're looking forward to see the applications that utilize this new capability.

    For those interested in the more technical details of write-optimized data structures, we can recommend the academic publication "An Introduction to Bε-trees and Write-Optimization" available at http://supertech.csail.mit.edu/papers/BenderFaJa15.pdf.

  • Much improved performance for incremental maintenance of min/max aggregations (defined using IDB rules): In our benchmarks we see a performance improvement of 50% in workloads that involve min/max aggregations (and also do include other logic). The performance tuning does not apply to min/max aggregations used in queries.

  • Optimization of the ConnectBlox QueryPredicate command:

    • The command no longer makes a complete scan of the predicate per column, among other improvements.

    • It is now possible for the client to disable the expensive random access entity refmode lookup if they do not need this information.

Services Framework

  • The semantics of exporting default-valued predicates bound to optional TDX columns is now sconsistent with that of non-default-valued predicates. If the predicate would have the default value for a row, TDX now exports the empty string (instead of the default value). Export performance involving default-valued predicates is expected to improve with this change.

    Example 68. 

    For example, consider this simple model, together with file and file binding definitions.

        sku(x), sku_id(x:id) -> int(id).
        sales[x] = v -> sku(x), int(v).
        returns[x] = v -> sku(x), int(v).
        lang:defaultValue[`returns] = 0.
    
        file_definition_by_name["file"] = fd,
        file_definition(fd) {
          file_delimiter[] = "|",
          column_headers[] = "SKU, SALES, RETURNS",
          column_formats[] = "integer, integer, integer",
          file_columns_optional[] = "RETURNS"
        }.
    
        file_binding_by_name["file"] = fb,
        file_binding(fb) {
          file_binding_definition_name[] = "file",
          predicate_binding_by_name["sales"] =
            predicate_binding(_) {
              predicate_binding_columns[] = "SKU, SALES"
            },
          predicate_binding_by_name["returns"] =
            predicate_binding(_) {
              predicate_binding_columns[] = "SKU, RETURNS"
            }
        }.

    A service exposing this file binding will export a row for each tuple in the sales predicate. Previously, the value of RETURNS in each row would be 0 (if there are no returns for that sku) or the value from returns. With the new semantics, if there are no returns for that sku, RETURNS will get the empty string.

    Note

    Note that this is a backwards incompatible change. Please refer to the Upgrade Instructions on how to keep the previous behavior.

  • lb web-client now supports customizable timeouts at asynchronous transactions. Timeouts can be set at transaction level, or at each request level, where the request timeouts take precedence over the transaction.

Measure Service

  • Inlining analysis improvements: It is now possible to configure inlining behavior with the inline annotation.

  • New CubiQL rewriting infrastructure in place: Until all rewrites are converted from Java to Scala it will be possible to configure the behavior in lb-measure-service.config via the optimizer field. Check the template configuration file for information on all the possible options.

  • Optimized report generation:

    • The Measure Service now generates lighter weight reporting logic for QueryRequests for data at an intersection involving a single level.

    • Converting data retrieved from the runtime is now constant time in the result size rather than linear time.

      Note

      Exception is for decimal data. In order to get constant time conversion of decimal data, it is now necessary to set the binary_decimal_columns field of QueryRequest message to true. This will become the default in 4.4.1.

    • It is now possible to query only the non-default values of a CubiQL expression. The measure service will provide the default value so that it isn't necessary for the client to infer it. This can be obtained by setting default_values field of the QueryRequest message to true. This will become the default in 4.4.1.

Modeler-js

  • Forms: In addition to pivot tables and charts, Modeler-js now also allows users to view and edit data using forms. To enter form mode, users can select the Details option that is available as a right-click option on the level headers, or the form icon the new tool bar. In form mode, all levels that were previously on the rows or columns are automatically moved to the slice, to allow the user to easily change the selection. Users can also add, remove or re-arrange the visible measures on the form without leaving the view, by switching to configuration mode.

    It is also possible to configure a sheet to be displayed by default as a form, by setting the viewMode to form in the grid configuration of the sheet.

    Example 69. 

    In the grid configuration below we display the City level on the rows and the Measures pill on the columns. The Details option is available as a right-click option on the header cells:

    Notice how the City level is moved to the slice in form mode? You can switch between the cities by changing the value on the slice.

    To display the view by default as a form, the sheet configuration that earlier displayed the data in the grid needs to be extended with the configuration below, highlighted in bold:

    {
      "id": "your-sheet-id",
      "pivotConfig": {},
      "views": {
        ...
        "grid": {
          "module": "PivotGrid",
          "viewMode": "form",
          "config": {
            "form": {
              "axis": "Y"
            }
          }
        }
      },
      ...
    }

    Using the viewMode option we specify that the data needs to be displayed as a form. Additionally, we need to specify which axis from the grid has to be turned into the form - this axis, in our example the Y axis, needs to be the one that holds the Measures pill.

    Additionally, the selector configuration option can be used to specify that a specific level member should be pre-selected on the slice. In the example below, we specify that the level member city-2 from the Location:City level should be selected by default:

    {
      ...,
      "selector": {
        "Location:City": "city-2"
      }
      ...
    } 

    Note

    Currently, the form mode can be opened only if the sheet has an axis with the Measures pill and/or attributes on it.

  • Tool bar: A collapsible tool bar is now available for each sheet. The tool bar contains a set of buttons that allow the users to quickly switch between view modes, such as grid, chart or form, as well as to and from configuration mode. The tool bar is displayed now on all sheets by default, but users can collapse it to maximize their working space.

    Example 70. 

    To display the tool bar closed by default, the the new menuBarOpen option in the sheet configuration can be set to false, as illustrated in the example below in bold:

     {
      "id": "your-sheet-id",
      "title": "your title",
      "menuBarOpen": false,
      "pivotConfig": {},
      ...
    }

  • Support for multi-selection and deselection in axis configuration panel: Multiple measures and levels can now be added in one go from the improved axis configuration panel. The new configuration window allows users to search through both the levels and measures at the same time, and select multiple values at once. Measures and levels can also be removed from the view again by de-selecting them in the axis configuration panel.

  • Level Member Creation from the Grid: It is not necessary anymore to write custom logic to let users add new level members. To add a new level member, users can now right-click on the header of the level to which they would like to add a member and select the new option + Create <level_label>. This will open up the new level member creation form.

    Example 71. 

    To add a new SKU from the view below, the user can right-click on the header displaying all the SKUs and click on the Create SKU button, as shown in the figure below.

    The user can then enter a label for the SKU and optionally an ID, if it should be different to the label.

    To enable this feature in a modeler-based application the the editSchema property needs to be set to true in the modelingFeatures section which is a part of the configuration passed to AuthenticatedModelerController.

  • Revised Modeling Features Configuration: The modelingFeatures section which is part of the configuration passed to AuthenticatedModelerController has been cleaned up. The following options were removed:

    • addLevelMembers
    • editLevelMembers
    • addLevels
    • ruleEditor
    • addMetrics
    • editMetrics

    and are replaced by the following options:

    • editSchema: allows to create / modify / delete levels, measures and level members.
    • editRules: allows you to use "Formula Editor" which is available from the cogwheel menu.

Workflow

  • The lb-workflow install command by default now populates the workflow schema using CSV files and file predicates. This may improve the installation time considerably compared to generating large amounts of delta logic, which needed to be compiled and executed.

Developer Tools

  • Improvements to the extract-example command: We have improved a previously undocumented feature for isolating testcases from an existing database. The lb extract-example command can be used to isolate a logic rule, its input data, and possibly incremental updates. This command is useful for isolating performance problems, or perhaps database bugs. The command takes care of generating a standalone schema, IDB rules, export and import logic, and also supports capturing data for incremental updates.

    Take a look at the reference manual for more detailed usage information and examples.

  • New option --estimate for lb popcount command: With the introduction of write-optimized data structures it is no longer possible to determine the popcount of a predicate in constant time, i.e. it is necessary to explicitly count the facts in the predicate. For large databases this can be expensive, in particular if the database is bigger than the available memory. It is possible to return an estimate of the popcount in constant time though, and the popcount now features the --estimate option for this. For analyzing the size of the database, the estimate may actually be more useful anyway, because it will more closely reflect the space used by the predicate. The estimated popcount is expected to be within 10% of the precise popcount.

    Note

    The precise popcount is still the default, so the --estimate option may need to be introduced in same places to not introduce performance problems.

  • New option --include-default for lb popcount command: If this option is used, then for default value predicates the popcount returned includes the default facts (which are not stored, but logically are facts). This means that the popcount for a default predicate will always be the product of the popcount of the entities in the key, and will not change when the number of non-default facts changes.

  • Command lb delete now has a --force option that does not give an error if the workspace to be deleted does not exist.

  • The lb branch command now supports the ws@branch syntax.

    Example 72. 

    For example, to create a branch named bar from foo, it is now possible to run the following command:

    $ lb branch db@foo bar

Corrected Issues

The issues listed below have been corrected since the 4.3.17 release:

  • Resolved an issue for incorrect functional dependency violations due to a bug in incremental prefix join: Incremental maintenance using prefix-join was introduced in LogicBlox 4.3.15 and significantly improves the performance of rules meeting the prefix-join requirements. The implementation had a subtle bug that could lead to incorrect functional dependency violation errors. This issue has been fixed. The fix is also available in all patch releases of earlier versions.

  • The command lb branches no longer lists the special built-in branch nil.

  • Fixed an issue in the TDX generator that prevented file column names to be LogiQL literals, such as a number or a boolean literal.

  • Fixed issue on TDX asynchronous transactions behaving synchronously while decompressing files sent via x-blox-content-uri. Now file decompression is done in a different thread.

  • Fixed issue with lb web-server not starting on machines with 64 or more cores.

  • Measure Service:

    • Corrected point analysis on metrics with transformations.

    • Tuned heuristic of when rollups are inlined.

    • We now allow level member creation simultaneously with initializing its attributes.

  • Modeler-js:
    • When adding a measure to a view using the axis configuration panel, the Measure display option is now selected by default, instead of the Attribute display option.

    • Resolved a performance issue when resizing a column or row on a grid with very large headers.

    • Improved error checking of the measure model configuration files:

      • Developers now receive an exception if a level is listed in the intersection configuration file, but not in the dimension or level configuration files. Previously, the level/dimension would get generated.

      • Duplicate entries in the hierarchy configuration file now cause an exception.

    • The link cell type now also accepts URLs with the # character in it.

    • Resolved an issue where resizing the cell width of a column would also reset the height.

    • Resolved an issue where numerical cells without facts, that had a specific formatting configured, were under certain circumstances displaying "$0.00" values instead of showing blank.

    • Resolved an issue where headers were merged incorrectly, when both the parent and the child level had the same label.

    • Resolved an issue where cells could be left in "protected" state after a multi-cell edit in deferred calc mode.

    • The behavior of opening and closing the conditional formatting window is now consistent with the behavior of the filter dialog.

    • Couple of small improvements to the keyboard navigation.

Installation and Upgrade information

Installation Instructions

Install LogicBlox 4.4.0 by following the steps outlined below:

  1. Download the installation package.
  2. Extract the tarball in <YourPreferredInstallDirectory>
  3. Run the following command:
    source <YourPreferredInstallDirectory>/logicblox-4.4.0/etc/profile.d/logicblox.sh
    NOTE: this script will set all the necessary environment variables. You might want to add this command to your .bashrc.

Upgrade Information

  • To keep the old behavior for exporting default-valued predicates bound to optional TDX columns, developers can use the file_binding_default_value configuration on the file binding to change the default column value from empty string to the LogiQL default value of the predicate.

    Example 73. 

    For example, adding this line to the file binding of the example above would export a zero instead of the empty string for the RETURNS column:

    file_binding_default_value[fb, "RETURNS"] = "0"

Release Information

Server requirements
Operating System: 64 bit Linux; OSX 10.10+ is supported for local development.
Java Runtime Environment 8
Python 2.7 or higher
Client requirements
Applications using modeler-js User Interface Components: Google Chrome
Requirements for applications using non-modeler-js components may vary per application.