Importing and Exporting Data with TDX

Thus far we’ve looked at using delta updates to insert data. This is a great way to get started and insert small amounts of data e.g. via the REPL. However, when larger amounts of data have to be imported into, or from a LogicBlox workspace you’re better off using LogicBlox’ TDX (Tabular Data eXchange) services. TDX makes it easy, with a single LogiQL definition, to create high-performance web services that support both data import and export in a comma-separated value (CSV) format. These imports and exports are primarily useful for data integration: for instance, to import data from some other database or legacy and move it into LogicBlox, or vice versa.

Let’s have a look how you can define such a TDX service.

We’ll base our example on our running example of Fred’s Olde Ice Cream Shoppe emporium. We started this example in the LogiQL in 30 minutes tutorial, and have since built it out:

In this installment we’ll add some simple TDX services to easily import and export large quantities of data into our workspace. The full source code for our project can be found on BitBucket.

We will be creating two services for this purposes:

  1. Ice cream prices and costs
  2. Ice cream sales data

To add these services we need to do three things:

  1. Extend our .project file and add a dependency on the lb_web library and add the new tdx module that we’ll create.
  2. Create the .logic files that implement the TDX services. We’ll put these in the tdx folder of our project (same as the module we added to the .project file).
  3. Extend our config.py file to add LB web as a dependency and to easily load our new services through make

Extending our project file

LogicBlox has support for reusable libraries that can be “imported” by listing them in your project file. A few of those libraries are shipped with LogicBlox and for TDX we’ll be using the lb_web library, which we can use by adding the following line to our .project file:

lb_web, library

In addition to this, we’ll also create a new LogicBlox module that we’ll call tdx (with .logic files stored in tdx/*.logic):

tdx, module

Our new project file now looks as follows:

application, projectname
lb_web, library
core, module
tdx, module

Defining TDX services

Next, we’ll move on to defining the actual services. Let’s start with a service to define ice creams, set their prices and costs. A service definition consists of three parts:

  1. The file definition: this defines the separator and columns in your CSV data and the types of those fields.
  2. The file binding: this maps the columns defined in your file definition to predicates in your workspace.
  3. The file to service URL mapping your service definition to a URL.

Our CSV files will look as follows:

ICECREAM|COST|PRICE
Popsicle Lemon|25|50
Fruit Sundae|120|200
Mango Sorbet|40|70
Cone Chocolate|50|80
Cone Vanilla|44|70

The result of this should be the equivalent of the first part of our old data/init_data.logic file:

//lang:logiql
+core:icecream:cost["Popsicle Lemon"] = 25.
+core:icecream:cost["Fruit Sundae"] = 120.
+core:icecream:cost["Mango Sorbet"] = 40.
+core:icecream:cost["Cone Chocolate"] = 50.
+core:icecream:cost["Cone Vanilla"] = 44.
+core:icecream:price["Popsicle Lemon"] = 50.
+core:icecream:price["Fruit Sundae"] = 200.
+core:icecream:price["Mango Sorbet"] = 70.
+core:icecream:price["Cone Chocolate"] = 80.
+core:icecream:price["Cone Vanilla"] = 70.

Here’s what the first part — the file definition — looks like:

//lang:logiql
file_definition_by_name["tdx/icecream"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "ICECREAM,COST,PRICE",
  column_formats[] = "string,int,int"
}.

This should read quite naturally: we’re defining a service with the symbolic name tdx/icecream. It defines a “file” using | as a delimiter. And the columns are named “ICECREAM”, “COST” and “PRICE”, and the first has a textual value, and second and third an numeric value. If you’ve never come across hierarchical syntax before (the { ... } syntax), it is primarily used as syntactic sugar for the following:

//lang:logiql
file_definition_by_name["tdx/icecream"] = fd,
file_definition(fd),
file_delimiter[fd] = "|",
column_headers[fd] = "ICECREAM,COST,PRICE",
column_formats[fd] = "string,int,int".

That is: insert fd as a first argument to all predicates within the curly braced-block. For details see the reference manual chapter on hierarchical syntax.

Alright, that was easy.

Next, the mapping from the columns in the file definition:

//lang:logiql
file_binding_by_name["tdx/icecream"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "tdx/icecream",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["core:icecream:cost"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "ICECREAM,COST"
    },
  predicate_binding_by_name["core:icecream:price"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "ICECREAM,PRICE"
    }
}.

In this code we’re creating a file binding for the file definition tdx/icecream. The file_binding_entity_creation attribute defines what should happen for entities that don’t already exist in the workspace. Setting this to accumulate will automatically create these entities, and setting it to none will make the import fail for entries that reference non-existing entities. We’d like entities to be created automatically, so we picked accumulate.

The predicate_binding_by_names define which of the file column names should be used as arguments to which predicates. For instance, in our example CSV file we have this line:

Popsicle Lemon|25|50

Based on our first predicate_binding_by_name mapping, Popsicle Lemon will be used as first argument to core:icecream:cost predicate, and 25 as its second. Since cost is a functional predicate, this means that Popsicle Lemon will be its key and 25 its value. Effectively this mean the following code will run:

+core:icecream:cost["Popsicle Lemon"] = 25.

The mapping for the price works exactly the same, but instead uses PRICE as a second argument.

And then the last bit of the puzzle: mapping our service to an actual URL:

//lang:logiql
service_by_prefix["/tdx/icecream"] = x,
delim_service(x) {
  delim_file_binding[] = "tdx/icecream"
}.

which makes our service available at /tdx/icecream (e.g. http://localhost:8080/tdx/icecream).

All that’s left to do is wrap this all in LogiQL module boiler plate, aliasing all the predicate namespaces we’re using:

//lang:logiql
block(`icecream) {
  alias_all(`lb:web:delim:schema),
  alias_all(`lb:web:delim:schema_abbr),

  alias_all(`lb:web:delim:binding),
  alias_all(`lb:web:delim:binding_abbr),

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:delim),

  clauses(`{
    file_definition_by_name["tdx/icecream"] = fd,
    file_definition(fd) {
      file_delimiter[] = "|",
      column_headers[] = "ICECREAM,COST,PRICE",
      column_formats[] = "string,int,int"
    }.

    file_binding_by_name["tdx/icecream"] = fb,
    file_binding(fb) {
      file_binding_definition_name[] = "tdx/icecream",
      file_binding_entity_creation[] = "accumulate",
      predicate_binding_by_name["core:icecream:cost"] =
        predicate_binding(_) {
          predicate_binding_columns[] = "ICECREAM,COST"
        },
      predicate_binding_by_name["core:icecream:price"] =
        predicate_binding(_) {
          predicate_binding_columns[] = "ICECREAM,PRICE"
        }
    }.

    service_by_prefix["/tdx/icecream"] = x,
    delim_service(x) {
      delim_file_binding[] = "tdx/icecream"
    }.
  })
} <-- .

Adding a Sales Service

A TDX service for sales doesn’t look much different:

//lang:logiql
file_definition_by_name["tdx/sales"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "ICECREAM,WEEK,SALES",
  column_formats[] = "string,int,int"
}.

file_binding_by_name["tdx/sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "tdx/sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["core:sales:week_sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "ICECREAM,WEEK,SALES"
    }
}.

service_by_prefix["/tdx/sales"] = x,
delim_service(x) {
  delim_file_binding[] = "tdx/sales"
}.

That’s it. Again we define a three-column file, but this time with “ICECREAM”, “WEEK” and “SALES” columns. These are then mapped to a single predicate week_sales where the first two are used as the key and the last value as a value. That is:

ICECREAM|WEEK|SALES
Popsicle Lemon|1|100

Would result in something like this:

//lang:logiql
+core:sales:week_sales["Popsicle Lemon", 1] = 100.

And just like the previous service this one can now be used to do both import and export.

Extending config.py

We have to make a few minor changes to our config.py file used for building our project:

  1. We have to add lb_web as a dependency to the project and our application library.
  2. For convenience we’ll add a make start-service target that compiles your project, loads it into a workspace and starts the TDX services.

Adding the lb_web dependency is as easy tweaking our depends_on and lb_library calls:

//lang:python
depends_on(logicblox_dep, lb_web_dep)

lb_library(name='application', srcdir='.', deps = ['lb_web'])

The extra make target can be added as follows:

//lang:python
rule('start-services', ['check-ws-application'], [
  'lb web-server load-services'
], True)

the lb web-server load-services command there is the command that scans all workspaces for web services and installs them into LB web.

Trying it out

Let’s try our new service! To compile and load, run:

make start-services

Now let’s call our web service. We can do this in various ways, e.g. using curl:

curl http://localhost:8080/tdx/icecream

This will perform a HTTP GET call, that is: a request to export data via our TDX service (similarly, PUT is used for import). This will return:

$ curl http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE

So, just the header at this point, because we don’t have any data in our workspace yet. Alternatively, we can also use lb web-client export for this purpose, which has some fancy features:

$ lb web-client export http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE

The result is still the same, though.

So let’s import some data. In a file named ic.csv we’ll put some familiar sample data:

ICECREAM|COST|PRICE
Popsicle Lemon|25|50
Fruit Sundae|120|200
Mango Sorbet|40|70
Cone Chocolate|50|80
Cone Vanilla|44|70

Then we’ll run lb web-client import on that:

$ lb web-client import -i ic.csv http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE|CAUSE|CAUSE_CODE

This will return another CSV file listing any entries that failed to import and the reason they failed to import. Luckily this list is empty so we’re good. Let’s see if the import was successful:

$ lb web-client export http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE
"Cone Chocolate"|"50"|"80"
"Cone Vanilla"|"44"|"70"
"Fruit Sundae"|"120"|"200"
"Mango Sorbet"|"40"|"70"
"Popsicle Lemon"|"25"|"50"

We’ve got data!

Let’s do the same for sales. Create a sales.csv:

ICECREAM|WEEK|SALES
Popsicle Lemon|1|122
Fruit Sundae|1|88
Mango Sorbet|1|72
Cone Chocolate|1|4
Cone Vanilla|1|257
Popsicle Lemon|2|112
Fruit Sundae|2|60
Mango Sorbet|2|44
Cone Chocolate|2|9
Cone Vanilla|2|200

And import it and verify that it worked:

$ lb web-client import -i sales.csv http://localhost:8080/tdx/sales
ICECREAM|WEEK|SALES|CAUSE|CAUSE_CODE

$ lb web-client export http://localhost:8080/tdx/sales
ICECREAM|WEEK|SALES
"Cone Chocolate"|"1"|"4"
"Cone Chocolate"|"2"|"9"
"Cone Vanilla"|"1"|"257"
"Cone Vanilla"|"2"|"200"
"Fruit Sundae"|"1"|"88"
"Fruit Sundae"|"2"|"60"
"Mango Sorbet"|"1"|"72"
"Mango Sorbet"|"2"|"44"
"Popsicle Lemon"|"1"|"122"
"Popsicle Lemon"|"2"|"112"

Yay!

TDX has many advanced features, like the ability to import multiple files in a single transaction. For these and other features, please refer to the reference manual about TDX.

Happy data integration!

0 Comments

Leave a reply

© Copyright 2019. Infor. All rights reserved.

Log in with your credentials

Forgot your details?