Thus far we’ve looked at using delta updates to insert data. This is a great way to get started and insert small amounts of data e.g. via the REPL. However, when larger amounts of data have to be imported into, or from a LogicBlox workspace you’re better off using LogicBlox’ TDX (Tabular Data eXchange) services. TDX makes it easy, with a single LogiQL definition, to create high-performance web services that support both data import and export in a comma-separated value (CSV) format. These imports and exports are primarily useful for data integration: for instance, to import data from some other database or legacy and move it into LogicBlox, or vice versa.
Let’s have a look how you can define such a TDX service.
We’ll base our example on our running example of Fred’s Olde Ice Cream Shoppe emporium. We started this example in the LogiQL in 30 minutes tutorial, and have since built it out:
- By moving our code into a LogicBlox project
- By defining custom types (entities)
- By using delta logic rules
In this installment we’ll add some simple TDX services to easily import and export large quantities of data into our workspace. The full source code for our project can be found on BitBucket.
We will be creating two services for this purposes:
- Ice cream prices and costs
- Ice cream sales data
To add these services we need to do three things:
- Extend our
.project
file and add a dependency on thelb_web
library and add the newtdx
module that we’ll create. - Create the
.logic
files that implement the TDX services. We’ll put these in thetdx
folder of our project (same as the module we added to the.project
file). - Extend our
config.py
file to add LB web as a dependency and to easily load our new services throughmake
Extending our project file
LogicBlox has support for reusable libraries that can be “imported” by listing them in your project file. A few of those libraries are shipped with LogicBlox and for TDX we’ll be using the lb_web
library, which we can use by adding the following line to our .project
file:
lb_web, library
In addition to this, we’ll also create a new LogicBlox module that we’ll call tdx
(with .logic
files stored in tdx/*.logic
):
tdx, module
Our new project file now looks as follows:
application, projectname
lb_web, library
core, module
tdx, module
Defining TDX services
Next, we’ll move on to defining the actual services. Let’s start with a service to define ice creams, set their prices and costs. A service definition consists of three parts:
- The file definition: this defines the separator and columns in your CSV data and the types of those fields.
- The file binding: this maps the columns defined in your file definition to predicates in your workspace.
- The file to service URL mapping your service definition to a URL.
Our CSV files will look as follows:
ICECREAM|COST|PRICE
Popsicle Lemon|25|50
Fruit Sundae|120|200
Mango Sorbet|40|70
Cone Chocolate|50|80
Cone Vanilla|44|70
The result of this should be the equivalent of the first part of our old data/init_data.logic
file:
//lang:logiql
+core:icecream:cost["Popsicle Lemon"] = 25.
+core:icecream:cost["Fruit Sundae"] = 120.
+core:icecream:cost["Mango Sorbet"] = 40.
+core:icecream:cost["Cone Chocolate"] = 50.
+core:icecream:cost["Cone Vanilla"] = 44.
+core:icecream:price["Popsicle Lemon"] = 50.
+core:icecream:price["Fruit Sundae"] = 200.
+core:icecream:price["Mango Sorbet"] = 70.
+core:icecream:price["Cone Chocolate"] = 80.
+core:icecream:price["Cone Vanilla"] = 70.
Here’s what the first part — the file definition — looks like:
//lang:logiql
file_definition_by_name["tdx/icecream"] = fd,
file_definition(fd) {
file_delimiter[] = "|",
column_headers[] = "ICECREAM,COST,PRICE",
column_formats[] = "string,int,int"
}.
This should read quite naturally: we’re defining a service with the symbolic name tdx/icecream
. It defines a “file” using |
as a delimiter. And the columns are named “ICECREAM”, “COST” and “PRICE”, and the first has a textual value, and second and third an numeric value. If you’ve never come across hierarchical syntax before (the { ... }
syntax), it is primarily used as syntactic sugar for the following:
//lang:logiql
file_definition_by_name["tdx/icecream"] = fd,
file_definition(fd),
file_delimiter[fd] = "|",
column_headers[fd] = "ICECREAM,COST,PRICE",
column_formats[fd] = "string,int,int".
That is: insert fd
as a first argument to all predicates within the curly braced-block. For details see the reference manual chapter on hierarchical syntax.
Alright, that was easy.
Next, the mapping from the columns in the file definition:
//lang:logiql
file_binding_by_name["tdx/icecream"] = fb,
file_binding(fb) {
file_binding_definition_name[] = "tdx/icecream",
file_binding_entity_creation[] = "accumulate",
predicate_binding_by_name["core:icecream:cost"] =
predicate_binding(_) {
predicate_binding_columns[] = "ICECREAM,COST"
},
predicate_binding_by_name["core:icecream:price"] =
predicate_binding(_) {
predicate_binding_columns[] = "ICECREAM,PRICE"
}
}.
In this code we’re creating a file binding for the file definition tdx/icecream
. The file_binding_entity_creation
attribute defines what should happen for entities that don’t already exist in the workspace. Setting this to accumulate
will automatically create these entities, and setting it to none
will make the import fail for entries that reference non-existing entities. We’d like entities to be created automatically, so we picked accumulate
.
The predicate_binding_by_name
s define which of the file column names should be used as arguments to which predicates. For instance, in our example CSV file we have this line:
Popsicle Lemon|25|50
Based on our first predicate_binding_by_name
mapping, Popsicle Lemon
will be used as first argument to core:icecream:cost
predicate, and 25
as its second. Since cost
is a functional predicate, this means that Popsicle Lemon
will be its key and 25
its value. Effectively this mean the following code will run:
+core:icecream:cost["Popsicle Lemon"] = 25.
The mapping for the price
works exactly the same, but instead uses PRICE
as a second argument.
And then the last bit of the puzzle: mapping our service to an actual URL:
//lang:logiql
service_by_prefix["/tdx/icecream"] = x,
delim_service(x) {
delim_file_binding[] = "tdx/icecream"
}.
which makes our service available at /tdx/icecream
(e.g. http://localhost:8080/tdx/icecream
).
All that’s left to do is wrap this all in LogiQL module boiler plate, aliasing all the predicate namespaces we’re using:
//lang:logiql
block(`icecream) {
alias_all(`lb:web:delim:schema),
alias_all(`lb:web:delim:schema_abbr),
alias_all(`lb:web:delim:binding),
alias_all(`lb:web:delim:binding_abbr),
alias_all(`lb:web:config:service),
alias_all(`lb:web:config:service_abbr),
alias_all(`lb:web:config:delim),
clauses(`{
file_definition_by_name["tdx/icecream"] = fd,
file_definition(fd) {
file_delimiter[] = "|",
column_headers[] = "ICECREAM,COST,PRICE",
column_formats[] = "string,int,int"
}.
file_binding_by_name["tdx/icecream"] = fb,
file_binding(fb) {
file_binding_definition_name[] = "tdx/icecream",
file_binding_entity_creation[] = "accumulate",
predicate_binding_by_name["core:icecream:cost"] =
predicate_binding(_) {
predicate_binding_columns[] = "ICECREAM,COST"
},
predicate_binding_by_name["core:icecream:price"] =
predicate_binding(_) {
predicate_binding_columns[] = "ICECREAM,PRICE"
}
}.
service_by_prefix["/tdx/icecream"] = x,
delim_service(x) {
delim_file_binding[] = "tdx/icecream"
}.
})
} <-- .
Adding a Sales Service
A TDX service for sales doesn’t look much different:
//lang:logiql
file_definition_by_name["tdx/sales"] = fd,
file_definition(fd) {
file_delimiter[] = "|",
column_headers[] = "ICECREAM,WEEK,SALES",
column_formats[] = "string,int,int"
}.
file_binding_by_name["tdx/sales"] = fb,
file_binding(fb) {
file_binding_definition_name[] = "tdx/sales",
file_binding_entity_creation[] = "accumulate",
predicate_binding_by_name["core:sales:week_sales"] =
predicate_binding(_) {
predicate_binding_columns[] = "ICECREAM,WEEK,SALES"
}
}.
service_by_prefix["/tdx/sales"] = x,
delim_service(x) {
delim_file_binding[] = "tdx/sales"
}.
That’s it. Again we define a three-column file, but this time with “ICECREAM”, “WEEK” and “SALES” columns. These are then mapped to a single predicate week_sales
where the first two are used as the key and the last value as a value. That is:
ICECREAM|WEEK|SALES
Popsicle Lemon|1|100
Would result in something like this:
//lang:logiql
+core:sales:week_sales["Popsicle Lemon", 1] = 100.
And just like the previous service this one can now be used to do both import and export.
Extending config.py
We have to make a few minor changes to our config.py
file used for building our project:
- We have to add
lb_web
as a dependency to the project and ourapplication
library. - For convenience we’ll add a
make start-service
target that compiles your project, loads it into a workspace and starts the TDX services.
Adding the lb_web
dependency is as easy tweaking our depends_on
and lb_library
calls:
//lang:python
depends_on(logicblox_dep, lb_web_dep)
lb_library(name='application', srcdir='.', deps = ['lb_web'])
The extra make
target can be added as follows:
//lang:python
rule('start-services', ['check-ws-application'], [
'lb web-server load-services'
], True)
the lb web-server load-services
command there is the command that scans all workspaces for web services and installs them into LB web.
Trying it out
Let’s try our new service! To compile and load, run:
make start-services
Now let’s call our web service. We can do this in various ways, e.g. using curl:
curl http://localhost:8080/tdx/icecream
This will perform a HTTP GET
call, that is: a request to export data via our TDX service (similarly, PUT
is used for import). This will return:
$ curl http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE
So, just the header at this point, because we don’t have any data in our workspace yet. Alternatively, we can also use lb web-client export
for this purpose, which has some fancy features:
$ lb web-client export http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE
The result is still the same, though.
So let’s import some data. In a file named ic.csv
we’ll put some familiar sample data:
ICECREAM|COST|PRICE
Popsicle Lemon|25|50
Fruit Sundae|120|200
Mango Sorbet|40|70
Cone Chocolate|50|80
Cone Vanilla|44|70
Then we’ll run lb web-client import
on that:
$ lb web-client import -i ic.csv http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE|CAUSE|CAUSE_CODE
This will return another CSV file listing any entries that failed to import and the reason they failed to import. Luckily this list is empty so we’re good. Let’s see if the import was successful:
$ lb web-client export http://localhost:8080/tdx/icecream
ICECREAM|COST|PRICE
"Cone Chocolate"|"50"|"80"
"Cone Vanilla"|"44"|"70"
"Fruit Sundae"|"120"|"200"
"Mango Sorbet"|"40"|"70"
"Popsicle Lemon"|"25"|"50"
We’ve got data!
Let’s do the same for sales. Create a sales.csv
:
ICECREAM|WEEK|SALES
Popsicle Lemon|1|122
Fruit Sundae|1|88
Mango Sorbet|1|72
Cone Chocolate|1|4
Cone Vanilla|1|257
Popsicle Lemon|2|112
Fruit Sundae|2|60
Mango Sorbet|2|44
Cone Chocolate|2|9
Cone Vanilla|2|200
And import it and verify that it worked:
$ lb web-client import -i sales.csv http://localhost:8080/tdx/sales
ICECREAM|WEEK|SALES|CAUSE|CAUSE_CODE
$ lb web-client export http://localhost:8080/tdx/sales
ICECREAM|WEEK|SALES
"Cone Chocolate"|"1"|"4"
"Cone Chocolate"|"2"|"9"
"Cone Vanilla"|"1"|"257"
"Cone Vanilla"|"2"|"200"
"Fruit Sundae"|"1"|"88"
"Fruit Sundae"|"2"|"60"
"Mango Sorbet"|"1"|"72"
"Mango Sorbet"|"2"|"44"
"Popsicle Lemon"|"1"|"122"
"Popsicle Lemon"|"2"|"112"
Yay!
TDX has many advanced features, like the ability to import multiple files in a single transaction. For these and other features, please refer to the reference manual about TDX.
Happy data integration!