Chapter 14. Series

LogiQL provides a special form of rule to support generation of series from a given iterator function.

All series rules have the following general structure:

R(x, v) <-
    series<< v = Func<initParam>[index](value) >>
       phi(x, initParam, index, value).

where

  • <initParam> and (value) are optional (in the sense that they do not appear in all the specific forms of series rules);
  • Func is a function that generates the series;
  • phi is a formula that includes occurrences of the following variables:
    • variables that appear in the head of the rule (here schematically represented as x);
    • the initialization parameters of Func (if any);
    • variables used to index the elements of the series (here schematically represented as index);
    • variables used as arguments to the generator function, if any (here schematically represented as value).

Func, the generator function, can be thought of as a wrapper for the following two functions:

state = Func_init(initParam)
Initializes the generator state from the initial parameters.
(state', v) = Func_next(state, value)
Computes the next generator state and output from the previous state and the current value.

14.1. Semantics

The semantics of series can be described as follows. First, the body is wrapped in an auxiliary predicate:

R%tmp(x, initParam, index, value) <- phi(x, initParam, index, value).

Then we populate R via the following procedure:

for each (x, initParam, _, _) in R%tmp do:
  state := Func_init(initParam)
  for each (index, value) s.t. R%tmp(x, initParam, index, value), in sorted order, do:
    (state, v) := Func_next(state, value)
    insert R(x, v)

The outer loop goes over the various groups (if we use group-by, see below), the inner loop generates the sequences (series) of results for each group.

14.2. runtotal

Introduction

The running total aggregation computes an accumulated total over a time series. For example, the following table illustrates how the runtotal aggregation computes the total sales at a given date from a predicate that contains day-by-day sales:

day Aug 1 Aug 2 Aug 3 Aug 4 Aug 5 Aug 6 Aug 7 Aug 8
sales 1 4 3 6 -2 8 0 2
acc. sales 1 5 8 14 12 20 20 22

In LogiQL, the accumulative sales can be defined using the running total series aggregation as follows:

Example 14.1. Running total series

sales[day]     = t -> int(day), decimal(t).
acc_sales[day] = t -> int(day), decimal(t).

acc_sales[day] = t <-
   series<< t = runtotal[day](sls) >>  sales[day] = sls.

There is often a need to compute multiple running totals, for example separately for each location, product, or bank account. This is known as a group-by (cf. a similar mechanism in sorting, as illustrated in Example 13.6, “Sorting with group-by). The following LogiQL rule shows how the accumulated sales can be computed separately for each stock keeping unit:

sales[sku, day] = t -> sku(sku), int(day), decimal(t).
acc_sales_by_sku[sku, day] = t -> sku(sku), int(day), decimal(t).

acc_sales_by_sku[sku, day] = t <-
   series<< t = runtotal[day](sls) >>  sales[sku, day] = sls.

Semantically, this is equivalent to the following normal total aggregation, but the running total is computed more efficiently by not repeating the computation of intermediate totals.

day(x) -> int(x).

acc_sales[sku, day1] = t <-
   agg<< t = total(sls) >>
      sales[sku, day2] = sls,
      day2 <= day1,
      day(day2),
      day(day1).

The runtotal aggregation provides a mechanism for resetting the accumulated total at specific points in time, for example at the beginning of each month. The following extends the sales example with resets:

Example 14.2. Running total series with resets

sales_runtotal_by_sku[sku, day] = t <-
   series<<t = runtotal[day](sls) resets at reset[sku, day] = sls>>
      sales[sku, day] = sls.

Predicate reset could have any other name, of course.

Please see the section called “Detailed Usage for Reset functionality” for more information.

The following table illustrates how the reset functionality works in a simple case:

day Aug 1 Aug 2 Aug 3 Aug 4 Aug 5 Aug 6 Aug 7 Aug 8
sales 1 4 3 6 -2 8 0 2
reset       100        
acc. sales 1 5 8 100 98 106 106 108

Detailed usage

The runtotal body (i.e., the part that follows >>) must have only one atom, and the atom must refer to a single-valued functional predicate whose value must be of a summable type (int, decimal, or float). All the key variables in the body atom must appear in the head of the rule.

The running total aggregation requires the time argument (day in the example) to be the rightmost argument of the predicate.

While we use the terms “time series” and “time argument” for convenience, the time argument is not required to be a datetime or to represent time.

If the runtotal predicate has key arguments other than the time argument, then the other key arguments function as a group-by (sku in the example).

The time series argument must be of a primitive type. All primitive types have a sort order (e.g., for integers 1 < 2, for strings “a” < “ab”, for datetime 2015-09-11 15:00:00 < 2015-09-11 16:00:00), and the order of these values determines the order in which values are accumulated.

Detailed Usage for Reset functionality

If the reset functionality is used, then the reset atom and the body atom must have the same signature and must use the same variable names with the same order. Adding a reset changes the computation of the accumulated total in the following way:

  1. If both the reset predicate and the body predicate have a value, then the generated value is the reset value.
  2. If reset has no value and body has a value, then the generated value is the previous generated value plus the body value.
  3. If reset has a value and body has no value, then no value is generated. Additionally, the reset value is treated as "the previous generated value" for the next generated value.
  4. If neither reset nor body have a value, then no value is generated.

The following table shows an example of the different scenarios:

day Aug 1 Aug 2 Aug 3 Aug 4 Aug 5 Aug 6 Aug 7
sales 1 4 3   4   5
reset   2       3  
acc. sales (b) 1 (a) 2 (b) 5 (d) (b) 9 (c) (b) 8

This way of handling reset will not be correct for all applications, but its design makes it easy to change into something that is required.

For example, to include reset values in the resulting running total:

acc_sales[sku, day] = v <- intermediate[sku, day] = v.
acc_sales[sku, day] = v <- reset[sku, day] = v.

intermediate[sku, day] = t <-
   series<<t = runtotal[day](sls) resets at reset[sku, day] = sls>>
      sales[sku, day] = sls.

This formulation will not result in a functional dependency violation when the sales predicate also has an incremental value for a specific day, because of rule (a).

If the incremental value should be added to the reset value, then the reset predicate can be computed separately. For example:

reset_incr[sku, day] = reset[sku, day] + sales[sku, day].
reset_incr[sku, day] = reset[sku, day] <- !sales[sku, day] = _.

intermediate[sku, day] = t <-
   series<<t = runtotal[day](sls) resets at reset_incr[sku, day] = sls>>
      sales[sku, day] = sls.

14.3. rndnum

LogiQL provides a number of functions that allow the user to generate a collection of random numbers drawn from a particular distribution. Here is a simple example:

s3[st, w] = v -> store(st), week(w), float(v).
s3[st, w] = v <-
   series<< v = rnd_binomial<1, m, seed>[w] >>
      week(w), store(st), m = store:med[st], seed = store:number[st].

The predicate s3 contains a set of random numbers drawn from a Bernoulli distribution (binomial with n = 1) with median store:med[st] for each store st. All numbers along the time series dimension [w] will be drawn form the same distribution. In general the series function has the form v = rnd_distrname_type(x1, ..., xn, s) where x1 through xn are parameter values specific to the distribution in question, and s is an integer representing the random generator seed. The use of the seed is mandatory. The variables x1, ..., xn, s must be bound on the right hand side of the series rule.

The currently available generators for distributions are as follows:

rnd_uniform_int<min, max, seed> = z -> int(min), int(max), int(z), int(seed)
Uniform discrete distribution for the interval [min,max].
rnd_uniform_real<min, max, seed> = z -> float(min), float(max), float(z), int(seed)
Uniform distribution U(min,max).
rnd_binomial<n, p, seed> = z -> int(n), float(p), float(z), int(seed)
Binomial distribution with parameters n and p.
rnd_cauchy<mu, x0, seed> = z -> float(mu), float(x0), float(z), int(seed)
Cauchy distribution with location parameter x0 and scale parameter mu.
rnd_poisson<lambda, seed> = z -> float(lambda), int(z), int(seed)
Poisson distribution with parameter lambda.

Example 14.3. Random number generators

emp(fname, lname, serial) -> string(fname), string(lname), int(serial).
emp("TJ",    "Green",       132).
emp("Dan",   "Olteanu",     135).
emp("Todd",  "Veldhuizen",  323).
emp("Geoff", "Washburn",     41).
emp("Benny", "Kimelfeld",  5936).

sample(m) -> int(m).
sample(1).
sample(2).
sample(3).
sample(4).
sample(5).

salary[fname, lname, year, sample] = amount ->
   string(fname), string(lname), int(amount), int(year), int(sample).
salary[fname, lname, _, sample] = _ -> emp(fname, lname, _), sample(sample).

height[fname, lname, year, sample] = h ->
   string(fname), string(lname), float(h), int(year), int(sample).
height[fname, lname, _, sample] = _ -> emp(fname, lname, _), sample(sample).

// uniform int
salary[f, l, 2000, i] = s <-
   series<< s = rnd_uniform_int<500, 1000, serial>[i] >>
      emp(f, l, serial), sample(i).

// uniform real
height[f, l, 2000, i] = h <-
   series<< h = rnd_uniform_real<3f, max, serial>[i] >>
      max = 6f, emp(f, l, serial), sample(i).

// binomial
salary[f, l, 2001, i] = s <-
   series<< s = rnd_binomial<t, 0.2f, serial>[i] >>
      t = 5, p = 0.2f, emp(f, l, serial), sample(i).

Please note that the line that immediately follows the declaration of salary or height is not a declaration, but a more general constraint: see Chapter 16, Constraints.