Chapter 27. Data Exchange Services

27.1. Configuring Tabular Data Exchange Services

Tabular data exchange services (TDX) are HTTP services that offer delimited files for download to export data from LogicBlox workspaces (GET) and support uploading delimited files for importing data (POST/PUT). ServiceBlox provides a built-in handler for defining such tabular data exchange services at a very high level. This section describes how to configure and use these services.

To introduce TDX file services we will use a simple example of multi-dimensional sales data. Consider a workspace with the following schema defined for the hierarchy and measures.

block(`hierarchy) {
  export(`{
    sku(x), sku_id(x:s) -> string(s).
    store(x), store_id(x:s) -> string(s).
    week(x), week_id(x:s) -> string(s).
  })
} <--. 
block(`measure) {
  alias_all(`hierarchy),
  export(`{
    sales[x, y, z] = v -> sku(x), store(y), week(z), int(v).
  })
} <--. 

For this application the customer uses a delimited file for sales data, as in the following example.

SKU     | STORE       | WEEK | SALES
apples  | atlanta     | W1   | 10
oranges | atlanta     | W2   | 15
apples  | portland    | W1   | 20
oranges | portland    | W2   | 5

We shall define a tabular data exchange service to import data in this delimited file format to the sku, store, week and sales predicates. We shall also define a service to export from these predicates to a delimited file in this format.

A tabular data exchange service is defined by three parts:

  • File definition which defines the format of a file, such as header names, column formats, optional columns, and the delimiter character that is used.

  • File binding which specifies how columns in a delimited file are bound to predicates in the workspace. This file binding is a high-level, bi-directional specification, which means that it can be used for both the import and the export.

  • Service configuration which defines the service to be hosted by the ServiceBlox service container.

The ServiceBlox programming interface for defining these parts follows. The ServiceBlox handler for tabular data exchange services uses predicates in lb:web:delim namespaces. File definition is defined in lb:web:delim:schema and predicate binding is defined in lb:web:delim:binding. To avoid cluttering logic, and to make logic more readable, it is good practice to use aliases.

27.1.1. File Definition

A delimited file is defined by creating a lb:web:delim:schema:file_definition element and populating interface predicates, and then saving this by name in lb:web:delim:schema:file_definition_by_name. Example code:

block(`files) {
  alias_all(`lb:web:delim:schema),
  alias_all(`lb:web:delim:schema_abbr),

  clauses(`{
    file_definition_by_name["sales"] = fd,
    file_definition(fd) {
      file_delimiter[] = "|",
      column_headers[] = "SKU,STORE,WEEK,SALES",
      column_formats[] = "alphanum,alphanum,alphanum,integer"
    }.
  })
} <--. 
Required file definition interface settings
file_delimiter lb:web:delim:schema

Delimiter character.

column_headers lb:web:delim:schema_abbr

Comma separated list of file headers.

column_formats lb:web:delim:schema_abbr

Comma separated list of column formats (see the table below for supported column formats).

The order of the column names in column_headers must match the order of the column formats in column_formats but does not necessarily correspond to the order of the columns in the delimited file.

Optional file definition interface settings
file_mode lb:web:delim:schema

Specification of how to handle quoting and escaping (see File Modes).

file_columns_required lb:web:delim:schema_abbr

Comma-separated list of required columns. Will make all columns that are not listed optional.

file_column_required lb:web:delim:schema_abbr

Set the column with this header as required. Will make all columns that are not listed optional.

file_columns_optional lb:web:delim:schema_abbr

Comma-separated list of optional columns (i.e., missing values are allowed).

file_column_optional lb:web:delim:schema_abbr

Set the column with this header as optional (i.e., missing values are allowed).

file_columns_can_be_absent lb:web:delim:schema_abbr

Comma-separated list of columns that can be optionally absent (i.e., missing values are allowed, and, moreover, the entire column may be missing from the file on imports).

file_column_can_be_absent lb:web:delim:schema_abbr

Set the column with this header to be optionally absent (i.e., missing values are allowed, and, moreover, the entire column may be missing from the file on imports).

file_column_format lb:web:delim:schema_abbr

Set the format of a column by its header.

column_description lb:web:delim:schema

A textual description for a column. This information is displayed by the meta-service and can be used to document file definitions.

The file_column_format setting provides an alternative way to specify column formats. For example,

file_column_formats["SKU"] = "alphanum"

specifies that the format of the SKU column is alphanumeric. The column_formats setting may be omitted, provided that file_column_format is used to specify the format of each column listed in the column_headers setting.

The following table lists the currently supported formats. TDX will enforce these formats. During an import, it is an error if a data value does not conform to the specified format (e.g., if -1 is bound to a positive integer column). During exports, if data in the predicate does not conform to the format, it is simply filtered out from the exported file.

Supported column formats
Syntax Description Details
alphanum

Alphanumeric string that maps to a string.

The string is trimmed. Only letters and numbers are valid.

string

A string that must not be empty.

The string is trimmed. Must not be empty after trimmed.

string*

A possibly empty string.

The string is trimmed. Can be empty after trimmed. Cannot be used for an optional column.

raw_string

A string that is not processed by TDX.

The string is not trimmed. Can be empty. Cannot be used for an optional column.

char

A string that has a single character.

The string is trimmed. Can be the empty character. Must have a single character.

uuid

A Universally Unique Identifier (UUID) string.

The string is trimmed. Only UUIDs are valid.

integer

Integer number.

0+

Non-negative integer.

1+ or >0

Positive integer.

decimal

Decimal number.

0.0+

Non-negative decimal.

>0.0

Positive decimal.

float

Floating-point number.

0.0f+

Non-negative float.

>0.0f

Positive float.

boolean(t;f)

Boolean value.

The literals t and f are case-insensitive specifications of the expected formats for true and false. For example, this could be (1;0), (t;f) or (true;false). Values that are different from these two options are considered invalid.

datetime(format)

Datetime value.

The value is serialized to and from string with datetime:formatTZ and datetime:parse using the format string, e.g., datetime('%m/%d/%y'). See documentation on built-ins.

date(format)

Date value.

The value is serialized to and from string with datetime:formatTZ and datetime:parse using the format string, e.g., date('%m/%d/%y'). See documentation on built-ins. Imports create datetime objects in the UTC timezone.

Format validations

Additional validations can be added to most formats, as illustrated in the code below:

  file_definition_by_name["sales"] = fd,
  file_definition(fd) {
    file_delimiter[] = "|",
    column_headers[] = "SKU,SALES",
    column_formats[] = "string([a-zA-Z0-9]*),float(>=0; <=20; precision 2)"
  }.

In the example above, the string column defines a regular expression [a-zA-Z0-9]*, indicating it only accepts alphanumeric characters. We also have a float column defining three validations, separated by semicolon:

  • >=0: indicating it must be non-negative.
  • <=20: indicating it must not be greater than 20.
  • precision 2: indicating this float number must not have more than 2 digits after the decimal point.

The following table lists currently available validations:

Format Description
>

Greater Than (example >0). Accepted in any numerical format.

>=

Greater Than or Equal To (example >=0). Accepted in any numerical format.

<

Less Than (example <0). Accepted in any numerical format.

<=

Less Than or Equal To (example <0). Accepted in any numerical format.

precision

Maximum number of digits after decimal point. Example precision 2 will accept 3, 3.1 and 3.14, but reject 3.141. Accepted in float and decimal formats. Note: Integer numeric types will not reject this validation, but it will have no effect.

Regular Expression

If the validation does not match any of the above, it defaults to a regular expression. Regular expressions are accepted in all formats, except boolean, date, datetime and uuid. Regular expressions are strings and must be escaped as any String Literal. The following example defines a string with a simple regular expression:

  file_definition_by_name["sales"] = fd,
  file_definition(fd) {
    file_delimiter[] = "|",
    column_headers[] = "SKU,SALES",
    column_formats[] = "string([a-zA-Z0-9]*),float"
  }.

Special attention is needed if the regular expression includes comma (,) or semicolon (;) characters, which are not allowed in the above notation. If needed, a regular expression with these characters must be defined using the format_regex auxiliar predicate and referenced in the format, like illustrated below:

  file_definition_by_name["sales"] = fd,
  file_definition(fd) {
    file_delimiter[] = "|",
    column_headers[] = "SKU,SALES",
    column_formats[] = "string(exp1),float",
    format_regex["exp1"] = "[a-zA-Z]{0,9}"
  }.

Optional Columns

TDX provides support for dealing with missing values, such as the following file:

SKU     | STORE       | WEEK | YEAR | SALES
apples  | atlanta     | W1   | 2012 |
oranges | atlanta     | W2   | 2012 | 15
apples  | portland    | W1   | 2012 |
oranges | portland    | W2   | 2012 | 5

In TDX, a column can be "required", "optional", or "possibly absent". If a column is required, this means that it must be present in the file, and every row must specify a value for that column. If a column is optional, this means that it must be present in the file, but some rows may have a missing value for that column. If a column is allowed to be absent, this means that missing values are allowed, and, furthermore, the entire column may be absent from the file on imports.

By default, all columns mentioned in the file definition are required columns. To change this, we can use the file_columns_required or the file_columns_optional predicate, as shown below. When the file_columns_required predicate is used, all columns not specified in that predicate are treated as optional columns. When the file_columns_optional predicate is used, all columns not specified in that predicate are treated as required columns.

file_definition_by_name["sales"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SKU,STORE,WEEK,SALES",
  column_formats[] = "alphanum,alphanum,alphanum,integer",
  file_columns_required[] = "SKU,STORE,WEEK"
}.

Possibly absent columns are specified using file_columns_can_be_absent, as in the following example:

file_definition_by_name["sales-returns"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SKU,STORE,WEEK,SALES,RETURNS",
  column_formats[] = "alphanum,alphanum,alphanum,integer",
  file_columns_can_be_absent[] = "SALES,RETURNS"
}.

In this example, the SALES and RETURNS columns may or may not be present in the file. If they are present, then, in any given row, a value may but need not be given for each of these columns.

File Modes

TDX allows the customization of how data values are quoted and escaped in import and export services. The file mode specification consists of a string with zero or more of the following entries.

Supported file mode entries
Syntax Import Export
raw

Import text as is (will break if string contains delimiter or line break).

Export text as is (will create bad csv when string contains delimiter or line break).

quote=c

Parse c as the quote character (c must be a single character, but not a whitespace character).

Use c as quote character.
unix

Remove quotes, unescapes all escaped characters, i.e \n\rt and the quote.

Add quotes, escapes all characters for which it is needed.

excel

Removes quotes and replaces double quotes with quotes.

Adds quotes, double quotes quote character and escapes \n\r\t .

Escaping is applied upon export, and unescaping upon import. In the unix style, escaping is done by prefixing the quote character, line-feed and carriage-return with a backslash (e.g., "foo"bar" is replaced by "foo\"bar"). The backslashes are removed on imports. In excel mode, escaping is done by doubling the quote character (e.g., "foo""bar").

The default file mode is unix quote=" which means that by default Unix escaping rules are used, and values are enclosed in double quotes. That is, imports require values to be either not quoted or quoted with ", and exports always quote with ". For example, a file with a single string column named X behaves like the following by default:

// import
X
foo
"bar"

// export
X
"foo"
"bar"

This behavior can be changed statically with the file_mode attribute. For example, the following definition will export the file with ' as the quote character.

file_definition_by_name[name] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  file_mode[] = "quote='",
  column_headers[] = "X",
  column_formats[] = "string"
}.

This allows the following interaction (note that double quotes are now simply part of the record):

// import
X
foo
"bar"
'baz'

// export
X
'foo'
'"bar"'
'baz'

By setting the file_mode to raw, no quotes are added on export, and import values are treated completely as part of the record:

// import
X
foo
"bar"
'baz'

// export
X
foo
"bar"
'baz'

Finally, it is possible to override the static default by using a URL parameter to the TDX service:

// POST to /file?tdx_file_mode="raw"
X
foo
"bar"
'baz'

// GET to /file?tdx_file_mode="quote=*"
X
*foo*
*"bar"*
*'baz'*

27.1.2. File Binding

The file binding for a delimited file is defined by creating a lb:web:delim:binding:file_binding element, populating the interface predicates, and then saving this by name in lb:web:delim:binding:file_binding_by_name. Here is an example that shows a basic binding to one predicate:

block(`server_init) {
  alias_all(`lb:web:delim:binding),
  alias_all(`lb:web:delim:binding_abbr),

  clauses(`{
    file_binding_by_name["sales"] = fb,
    file_binding(fb) {
      file_binding_definition_name[] = "sales",
      predicate_binding_by_name["measure:sales"] =
        predicate_binding(_) {
          predicate_binding_columns[] = "SKU,STORE,WEEK,SALES"
        }
    }.
  })
} <--.

The predicate_binding_columns is used to map file column names onto the attributes of the predicate, in order. In this example, columns "SKU,STORE,WEEK,SALES" map to a predicate defined as measure:sales[sku, store, week] = sales. Some of the predicate attributes, such as the keys in this example, may have entity types, provided that the entity type has a reference-mode predicate. In this case, the values in the associated file column are treated as refmode values for the entity type in question (see Section 27.1.5, “Import and Export” below).

The following table shows how TDX maps primitive LogiQL types into column formats. For each primitive type, the table lists the compatible column formats. For example, int primitives can be bound directly onto integer, 0+ and >0 columns, but int128 primitives can only be bound onto uuid columns. When the TDX generator encounters incompatible types it attempts to convert the primitive into some compatible primitive using standard from:to:convert predicates. The last column of the table shows the conversions supported by TDX. For example, an int primitive bound to a float column will be first converted using int:float:convert, and an int128 bound to uuid will be converted with int128:string:convert.

Supported column format bindings and primitive conversions
Primitive LogiQL type Column formats Primitive conversions
int integer, 0+, >0 decimal, float, string
float float, 0.0f+, >0.0f decimal, int, string
decimal decimal, 0.0+, >0.0 float, string
string raw_string, string, string*, alphanum, char, uuid boolean, datetime, decimal, float, int, int128
boolean boolean string
datetime date, datetime string
int128 uuid string

In general, a file binding can have any number of associated predicate bindings. When multiple predicate bindings are specified, then each predicate gets populated independently on import; on export, the predicates are joined to give rise to rows of the output file. See the section on import and export below for more details.

The above example binding will support import to the sales predicate assuming that entity elements already exist in sku, store, and week. It is common to optionally add elements to all entity types, this is defined by populating the lb:web:delim:binding_abbr:file_binding_entity_creation predicate. Example:

file_binding(fb) {
  ...
  file_binding_entity_creation[] = "accumulate",
  ...
}

File binding configurations may apply to all predicate bindings, as in the above entity creation example, or to individual predicate bindings.

Required file binding settings
file_binding_definition_name

Specifies the file definition that this file binding is associated with.

predicate_binding_by_name

Specifies predicate bindings associated with this file binding.

Optional file binding settings
file_binding_predicate_columns lb:web:delim:binding_abbr

Comma-separated list of column headers. Applies to all predicate bindings for this file binding.

file_binding_entity_creation lb:web:delim:binding_abbr

Set entity creation for all predicate bindings of a file binding. The supported values are:

  • none: do not create entity elements. This is the default value if no entity creation is specified.
  • accumulate: add new elements that did not exist previously.
  • ignore: skip lines with non-existing entities. This can be useful for partitioned deployments but can be dangerous as well, as there is no way to distinguish typos from entities that are not on that partition.

If entity creation is configured on the file binding, then it is recursively applied to all predicate bindings in this file binding. The setting on the predicate binding will recursively apply to all column bindings of the predicate binding.

file_binding_column_entity_creation lb:web:delim:binding_abbr

Default entity creation from the file binding to one specific argument. Applies to all predicate bindings for this file binding.

file_binding_ignore_idb_on_import lb:web:delim:binding

Specifies that predicate bindings that bind IDB predicates should be ignored on imports. Importing into IDBs is not possible because IDBs cannot be directly populated. This allows binding to IDBs that contain information to join on exports while reusing the file binding for imports into the remaining predicate bindings.

file_binding_import_default_value lb:web:delim:binding

By default, the value that represents an empty optional column on imports is the empty string. This option allows the definition of a different string to be deemed the empty value. On imports, an optional column with either the empty string or the default value is deemed missing and will not be imported. This allows the definition of "zero-stripping" policies to ignore import rows with certain values.

file_binding_export_default_value lb:web:delim:binding

By default, the value that represents an empty optional column on exports is the empty string. This option allows the definition of a different string to be deemed the empty value. On exports, an optional column without a value will be exported with the default value instead of with an empty string. This allows the definition of "zero-filling" policies.

file_binding_default_value lb:web:delim:binding

A shorthand to apply the same value to both file_binding_import_default_value and file_binding_export_default_value.

The index argument for options that apply to a particular index is a zero-based index to the arguments of a predicate. For example, to enable entity creation for sku and store entities use:

file_binding(fb) {
  ...
  file_binding_column_entity_creation[0] = "accumulate",
  file_binding_column_entity_creation[1] = "accumulate",
  ...
}
Options applying to individual predicate bindings
predicate_binding_columns lb:web:delim:binding_abbr

Comma-separated list of column headers. Multiple columns can be combined to one column header by a semi-colon. A column-binding transformation must be provided to combine multiple columns to one value (see below).

predicate_binding_entity_creation lb:web:delim:binding_abbr

Set entity creation for all column bindings. See file_binding_entity_creation for the supported values. The setting on a predicate binding overrides the setting on the file binding, and will recursively apply to all column bindings of the current predicate binding.

predicate_binding_export_only lb:web:delim:binding

Specifies that this predicate binding should be applied only for exports, and should be ignored for imports. This is useful to specify export filters which do not apply for imports. See Section 27.1.5, “Import and Export” below for more details.

predicate_binding_import_only lb:web:delim:binding

Specifies that this predicate binding should be applied only for imports, and should be ignored for exports. This is useful to populate auxiliary predicates that cannot provide a bi-directional transformation. See Section 27.1.5, “Import and Export” below for more details.

predicate_binding_no_retraction_on_post lb:web:delim:binding

The default behavior on POST requests for predicate bindings that bind optional columns is to retract existing values for the keys if the value column is left empty. This behavior may be undesirable in certain circumstances. This flag can then be used to specify that empty optional columns should not cause a retraction on POSTs.

predicate_binding_filter lb:web:delim:filter

A filter to be applied on imports (use export_only predicate bindings to filter exports). The filter must be a predicate with the same arity as the columns being filtered. Only values that exist in the filter are imported on POSTs or PUTs. Furthermore, on PUTs, only values in the filter are retracted if they are not present in the new file.

column_binding_by_arg lb:web:delim:binding

Column binding to describe more precisely how column(s) from a delimited file map to an argument of a predicate.

Column binding options
column_binding_import_function, column_binding_export_function lb:web:delim:binding

Transformation functions to apply to value(s) of this column binding, see the section called “Transform Functions” below.

column_binding_entity_creation lb:web:delim:binding

Set entity creation for this column binding. See file_binding_entity_creation for the supported values.

Transform Functions

When importing a delimited file, it is often necessary or convenient to combine column values, or to perform a modification of column values prior to inserting them into predicates in a workspace. Similar manipulations may also be called for when exporting to a delimited file. TDX allows developers to specify column binding transform functions to be applied during imports and/or exports.

In the following we will introduce transform functions by means of examples. Suppose that we want to import this simple sales file and that the application model is the one shown in the following listing.

SKU     | STORE       | WEEK | YEAR | SALES
apples  | atlanta     | W1   | 2012 | 10
oranges | atlanta     | W2   | 2012 | 15
apples  | portland    | W1   | 2012 | 20
oranges | portland    | W2   | 2012 | 5
sku(x),   sku_id(x:id)   -> string(id).
store(x), store_id(x:id) -> string(id).
week(x),  week_id(x:id)  -> string(id).

sales[sku, store, week] = sales -> sku(sku), store(store), week(week), integer(sales).

Now suppose that there are small mismatches between the file format and data, and the schema of our application. For example, we may want to have names of all stores in upper case characters. One possible way to transform the import file is to bind the STORE column to the refmode of sample:sales:store entities, and apply the string:upper function upon imports. This can be specified with the following file binding:

...
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["sample:sales:store"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "STORE",
      column_binding_by_arg[0] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper"
        }
    }
}.
...

Note that import functions bind the values from columns to their keys, and produce a value that is then stored in the predicate being bound. This binding is thus roughly equivalent to applying string:upper[STORE] = id and then using id as the refmode of a store entity.

This file binding would create stores, but we are also interested in creating sales values. The problem now is that in the workspace weeks are identified by a concatenation of year and week id (for example, '2012-W1') The solution is to define a concat_dash function that concatenates two strings with a dash ('-'). Then, we apply this function to the values of the YEAR and WEEK columns, and use the resulting value as the refmode of the week-of-year entity:

concat_dash[s1, s2] = s1 + "-" + s2.
lang:derivationType[`concat_dash]="Derived".

...
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["sample:sales:sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU, STORE, YEAR;WEEK, SALES",
      column_binding_by_arg[2] =
        column_binding(_) {
          column_binding_import_function[] = "sample:sales:services:concat_dash"
        },
      column_binding_by_arg[1] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper"
        }
    }
}.
...

The above file binding will concatenate YEAR and WEEK as well as apply string:upper to the STORE column. Note that column_binding_by_arg is indexed by the components in predicate_binding_columns, which are separated by commas (,). If a component contains multiple columns, because the function contains multiple arguments, they are separated by semicolons (;). Also note that any function can be used, including built-ins (such as string:upper), derived and materialized functions.

This file binding supports importing files but it would fail for exports. The problem is that we need to specify the inverse of concat_dash, that is, how to decompose the week-of-year entity refmode to export WEEK and YEAR columns. But before attending to this issue, let us see how we could extend the binding to export the names of store entities in lower case. This is accomplished by using the inverse export function, string:lower

    ...
      column_binding_by_arg[1] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper",
          column_binding_export_function[] = "string:lower"
        }
...

Note that export functions are applied with respect to the predicate being bound. This binding is thus roughly equivalent to applying store_id(x:id), string:lower[id] = STORE. This also means that we need inverse export functions when exporting to multiple columns, as in the case of WEEK and YEAR, because we will have to bind the entity refmode to the value of the function, and the multiple columns to the keys. So to implement our requirement, we define a split_dash function that splits a string into two strings by the '-' character, and then apply this function as an inverse to the binding. The resulting code for our example is the following:

concat_dash[s1, s2] = s1 + "-" + s2.
lang:derivationType[`concat_dash]="Derived".

split[s1, s2] = s -> string(s1), string(s2), string(s).
split[s1, s2] = s <-
  string:split[s, '-', 0] = s1,
  string:split[s, '-', 1] = s2.
lang:derivationType[`split2]="Derived".

...
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  file_binding_entity_creation[] = "accumulate",
  predicate_binding_by_name["sample:sales:sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "SKU, STORE, YEAR;WEEK, SALES",
      column_binding_by_arg[2] =
        column_binding(_) {
          column_binding_import_function[] = "sample:sales:services:concat_dash",
          column_binding_export_function_inverse[] = "sample:sales:services:split_dash"
        },
      column_binding_by_arg[1] =
        column_binding(_) {
          column_binding_import_function[] = "string:upper",
          column_binding_export_function[] = "string:lower"
        }
    }
}.
...

Predicate binding helpers

Adding alias_all(`lb:web:delim:tdx_helpers) to the service will expose the helper predicates below to use for TDX specification:

Predicate setting helper Description Example
binds_pred

Associate a predicate with a list of comma-separated column headers.

binds_pred["product:sku:label"] = "SKU_NBR,SKU_DESC".
accumulate_entity

Set entity creation for a column.

accumulate_entity("CLASS").
decumulate_entity

Unset entity creation for a column.

decumulate_entity("SKU").
export_only

Only bind this predicate when exporting.

export_only("product:sku").
import_only

Only bind this predicate when importing.

import_only("product:sku:imported").
transform_col_import

Transformation function to apply to column(s) on import.

transform_col_import("SHRINK_AMT", "string:upper").
transform_col_export

Transformation function to apply to column(s) on export.

transform_col_export("SHRINK_AMT", "string:lower").
transform_col

A set of two transformation functions to apply to column(s), first one on import and the second one on export.

transform_col("SHRINK_AMT", "string:upper", "string:lower").

The example below illustrates a file definition using TDX helpers.

block(`product) {

  alias_all(`lb:web:delim:tdx_helpers),
...

  clauses(`{
...

  fb(fb),
  file_binding_by_name[product_file_definition_name[]] = fb,
  file_binding(fb) {
    file_binding_definition_name[] = product_file_definition_name[],
    accumulate_entity("SKU_NBR"),
    binds_pred["product:sku:label"] = "SKU_NBR,SKU_DESC",
    binds_pred["product:sku"] = "SKU_NBR",
    binds_pred["product:sku:imported"] = "SKU_NBR",
    export_only("product:sku"),
    import_only("product:sku:imported"),

  }.

  })
} <-- . 

27.1.3. File Row Offsets

TDX exposes a file offset as a special (reserved) column header named TDX_OFFSET, which can then be used in predicate bindings. For example, in this code we bind the sales file, which has a single column, to the predicate measures:sales, which is a functional predicate from int to int.

sales[x] = v -> int(x), int(v).

file_definition_by_name["sales"] = fd,
file_definition(fd) {
  file_delimiter[] = "|",
  column_headers[] = "SALES",
  column_formats[] = "int"
},
file_binding_by_name["sales"] = fb,
file_binding(fb) {
  file_binding_definition_name[] = "sales",
  predicate_binding_by_name["measures:sales"] =
    predicate_binding(_) {
      predicate_binding_columns[] = "TDX_OFFSET, SALES"
    }
}.

Upon import, the offset of the rows in the file will be bound to the TDX_OFFSET column and then used to populate measure:sales. Upon export, the value in the column bound to TDX_OFFSET is ignored (equivalent to measures:sales[_] = SALES).

TDX_OFFSET works as any integer column: it can be bound to an entity by refmode, accumulated, be subject of transformation functions, etc. The value of the offset is the number of bytes from the beginning of the file up to the row being imported and, therefore, guaranteed to be unique and monotonically increasing in the file being imported, but not guaranteed to be continuous.

27.1.4. Service Configuration

A tabular data exchange service is configured with ServiceBlox by creating a delim_service. Example:

block(`service_config) {

  alias_all(`lb:web:config:service),
  alias_all(`lb:web:config:service_abbr),
  alias_all(`lb:web:config:delim),

  clauses(`{

    service_by_prefix["/sales"] = x,
    delim_service(x) {
      delim_file_binding[] = "sales"
    }.

  })

} <--.

27.1.5. Import and Export

Import is performed as follows. If one of the columns mentioned in the file definition is missing from the input file, and the column was not specified as possibly absent, the input file is rejected. The file may contain additional columns that are not referred to in the configuration, and these columns are simply ignored. The order in which columns appear in the input .csv file is irrelevant: columns are identified by column name, as listed in the file header.

If a row of the file has a missing value in one of the required columns, or if the value in some column is not of the required type, the row is considered invalid (see the section called “Invalid Records” below for more detail on error handling). If the file binding includes predicate bindings that are filters, then all rows that do not pass the filter are ignored. For the remaining rows, row import is performed for each predicate binding that is not export_only and that involves only columns for which the current row has no missing optional values. Predicate bindings that bind IDBs are ignored (if file_binding_ignore_idb_on_import) or cause an error. Implicit conversions (such as from strings to numeric values) are performed where needed. If the predicate binding involves an import function that cannot be successfully applied to the given column values, the row is considered invalid. In particular, if one of the predicate attributes has an entity type and a value does not occur in the associated refmode predicate, either a new entity is created with the given refmode value (if entity creation is enabled via the accumulate policy), or an error message is generated (if entity creation is not enabled).

Export is performed as follows. Predicate bindings that are import_only are ignored, as well as all predicate bindings that are filters. The remaining predicate bindings are handled as follows: for each predicate that is associated with a predicate binding, a set of partial rows is produced (partial, because they only have the columns that occur in the predicate binding). This may involve applying export functions if present, and refmode lookups in case of entities. Next, a join is applied to these sets of partial rows, for all the predicates whose associated predicate binding refers only to required columns of the CSV file. Finally, if there are any predicates whose associated predicate binding refers to one or more optional columns of the .csv file, then the partial rows produced from those predicates are used (via an outer join) to provide potentially missing values for the optional columns.

Invalid Records

Records that failed to import are reported back to the user. The format of the report is the same as in the import file, including headers, plus two additional columns that indicate the error (see the section called “Causes”). If no records failed to import, the server returns a file with only the headers. This feature is enabled by default.

Returning error records does have a small performance penalty as the server must write and then return the file containing the bad records. If for some reason you wish to disable the feature, you can specify the --ignore-bad-records flag on lb web-client or simply do not specify an output_file or output_url in the batch. See the section called “Accessing via HTTP” for how to disable when accessing via HTTP.

Causes

The resulting data that reports which records were not imported will contain all the columns of the original import plus additional two columns describing why the records were not imported. The first column CAUSE, will contain a human readable string such as "'SKU' is a required column.". The last column is CAUSE_CODE which will contain a constant string value of the error type for easy parsing. Here are the error codes, along with their descriptions.

Error Code Description
REQUIRED_COLUMN

A column defined as required is missing from the file.

WRONG_FORMAT

An import data value is invalid with respect to the column format.

DOES_NOT_EXIST

An entity or record referenced by the data does not exist.

MALFORMED_ROW

The row could not be parsed because it contains invalid characters or a different number of columns than the rest of the file.

FAILED_FUNCTION_APPLICATION

An import function application resulted in an undefined value.

FAILED_PRIMITIVE_CONVERSION

A primitive conversion resulted in an undefined value. Conversions are used when adjusting import values for import functions and refmode types.

Accessing via HTTP

If you are not using lb web-client or batch, you can still enable or disable this feature by using the return_errors query string parameter. Since the feature is enabled by default, simply accessing the URL as normal will return the bad records. If you do not wish to return the error records, set the tdx_return_errors query parameter equal to 1 or true. Also, on_error can be used to make the server return the error file only if import errors were detected.

Partial Imports and Aborting Transactions

TDX by default aborts an import transaction if any row fails to import. If invalid records are requested, these are still returned, even if the transaction aborts. You can optionally configure a service to allow partial imports, i.e., import good records and ignore bad records. You can configure this option by service or by individual request.

To configure a service to default to allow partial imports, you must assert to the lb:web:config:delim:allow_partial_import predicate. This will default all transactions for the configured service to import valid rows even if they receive error messages. This setting can still be overwritten by individual requests via the following methods, all of which take precedence over the lb:web:config:delim:allow_partial_import predicate.

If you are using a batch configuration, you can set the abort_on_error field of the ImportDelim message; if you are using the lb web-client command line tool, you can specify either the --abort flag or the --allow-partial-import flag; finally, if you are using raw HTTP, you can set the tdx_allow_partial_import query parameter equal to 1 or true to allow, 0 or false to abort.

27.2. Dynamic Tabular Data Exchange Services

While TDX services allows us to statically configure a file binding and thus create one service for each file binding, we can also import and export delimited files by specifying the file binding as a request parameter named file_binding.

To host the dynamic tabular data exchange service we use the dynamic_delim_service predicate, as shown below. This configuration allows us to export or import delimited files by accessing /delim?file_binding=..., where ... is a JSON representation of the FileBinding protobuf message.

service_by_prefix["/delim"] = x,
dynamic_delim_service(x)
  <- .

27.2.1. Building FileBinding Messages

A FileBinding message describes the file structure that we want to export (FileDefinition) and how to build the file from different predicates (PredicateBinding). The optional entity_creation field allows you to set the default entity creation policy for all columns of all predicate bindings.

message FileBinding
{
  required FileDefinition file = 2;

  repeated PredicateBinding binding = 3;
  optional string entity_creation = 4;
}

The FileDefinition describes the format of the files and the options on the columns. If the required field is empty then all columns are considered required (unless otherwise specified). Conversely, if the required field is not empty, all columns that are not mentioned in this field are considered optional.

message FileDefinition
{
  required string delimiter = 2;
  required string column_headers = 3;
  required string column_formats = 4;

  optional string file_columns_required = 5;
  optional string file_columns_optional = 6;
  optional string file_columns_can_be_absent = 7;
  optional string mode = 8;

}

Finally, a PredicateBinding specifies how predicate columns bind to the columns in the file definition.

message PredicateBinding
{
  required string predicate_name = 1;
  required string predicate_binding_columns = 2;
  repeated ColumnBinding column = 3;

  optional bool export = 4 [default = true];
  optional bool import = 5 [default = true];
  optional bool no_retraction_on_post = 11 [default = false];

  optional string entity_creation = 9;

  repeated Filter filter = 10;
}

Example 27.1. 

The following JSON file definition message describes a file with three columns, "PERSON", "FATHER", and "MOTHER".

"file": {
  "delimiter": "|",
  "column_headers": "PERSON,FATHER,MOTHER",
  "column_formats": "alphanum,alphanum,alphanum"
} 

The following two PredicateBinding messages specify how to build the PERSON, FATHER, and MOTHER columns from predicates person_father and person_mother:

"binding":[ {
    "predicate_name": "person_father",
    "predicate_binding_columns": "PERSON,FATHER"
  },
  {
    "predicate_name": "person_mother",
    "predicate_binding_columns": "PERSON,MOTHER"
  }] 

Combining the messages together, we build the file binding message that should be sent as the file_binding parameter to the dynamic delimited file service:

{
    "file": {
      "delimiter": "|",
      "column_headers": "PERSON,FATHER,MOTHER",
      "column_formats": "alphanum,alphanum,alphanum"
    },
    "binding": [
      {
        "predicate_name": "person_father",
        "predicate_binding_columns": "PERSON,FATHER"
      },
      {
        "predicate_name": "person_mother",
        "predicate_binding_columns": "PERSON,MOTHER"
        }]
}