## Chapter 14. Series

LogiQL provides a special form of rule to support generation of series from a given iterator function.

```SeriesRule =
Atom  "<-"  "series" "<<" VariableName "=" Generator ">>" Formula "."

Generator = RuntotalGenerator | RandomGenerator .

RuntotalGenerator = "runtotal" "[" Index "]" [ Value ] [ "resets" "at" Atom ] .
RandomGenerator = DistributionName "<" InitialParameters ">" [ Index ] .

Index = VariableName .
Value = "(" VariableName ")" .```

(Information about `DistributionName` and `InitialParameters` can be found in Section 14.3, “Random number series”.)

All `series` rules have the following general structure:

```R(x, v) <-
series<< v = Func<initParam>[index](value) >>
phi(x, initParam, index, value).```

where

• `<initParam>` and `(value)` are optional (in the sense that they do not appear in all the specific forms of `series` rules);
• `Func` is a function that generates the series;
• `phi` is a formula that includes occurrences of the following variables:
• variables that appear in the head of the rule (here schematically represented as `x`);
• the initialization parameters of `Func` (if any);
• variables used to index the elements of the series (here schematically represented as `index`);
• variables used as arguments to the generator function, if any (here schematically represented as `value`).

`Func`, the generator function, can be thought of as a wrapper for the following two functions:

`state = Func_init(initParam)`
Initializes the generator state from the initial parameters.
`(state', v) = Func_next(state, value)`
Computes the next generator state and output from the previous state and the current value.

## 14.1. Semantics

The semantics of `series` can be described as follows. First, the body is wrapped in an auxiliary predicate:

`R%tmp(x, initParam, index, value) <- phi(x, initParam, index, value).`

Then we populate `R` via the following procedure:

```for each (x, initParam, _, _) in R%tmp do:
state := Func_init(initParam)
for each (index, value) s.t. R%tmp(x, initParam, index, value), in sorted order, do:
(state, v) := Func_next(state, value)
insert R(x, v)```

The outer loop goes over the various groups (if we use `group-by`, see below), the inner loop generates the sequences (series) of results for each group.

## 14.2. `runtotal`

### Introduction

The running total aggregation computes an accumulated total over a time series. For example, the following table illustrates how the `runtotal` aggregation computes the total sales at a given date from a predicate that contains day-by-day sales:

 day Aug 1 Aug 2 Aug 3 Aug 4 Aug 5 Aug 6 Aug 7 Aug 8 sales 1 4 3 6 -2 8 0 2 acc. sales 1 5 8 14 12 20 20 22

In LogiQL, the accumulative sales can be defined using the running total series aggregation as follows:

Example 14.1. Running total series

```sales[day]     = t -> int(day), decimal(t).
acc_sales[day] = t -> int(day), decimal(t).

acc_sales[day] = t <-
series<< t = runtotal[day](sls) >>  sales[day] = sls.```

There is often a need to compute multiple running totals, for example separately for each location, product, or bank account. This is known as a group-by (cf. a similar mechanism in sorting, as illustrated in Example 13.6, “Sorting with `group-by`). The following LogiQL rule shows how the accumulated sales can be computed separately for each stock keeping unit:

```sales[sku, day] = t -> sku(sku), int(day), decimal(t).
acc_sales_by_sku[sku, day] = t -> sku(sku), int(day), decimal(t).

acc_sales_by_sku[sku, day] = t <-
series<< t = runtotal[day](sls) >>  sales[sku, day] = sls.```

Semantically, this is equivalent to the following normal total aggregation, but the running total is computed more efficiently by not repeating the computation of intermediate totals.

```day(x) -> int(x).

acc_sales[sku, day1] = t <-
agg<< t = total(sls) >>
sales[sku, day2] = sls,
day2 <= day1,
day(day2),
day(day1).```

The `runtotal` aggregation provides a mechanism for resetting the accumulated total at specific points in time, for example at the beginning of each month. The following extends the sales example with resets:

Example 14.2. Running total series with resets

```sales_runtotal_by_sku[sku, day] = t <-
series<<t = runtotal[day](sls) resets at reset[sku, day] = sls>>
sales[sku, day] = sls.```

Predicate `reset` could have any other name, of course.

Please see the section called “Detailed Usage for Reset functionality” for more information.

The following table illustrates how the reset functionality works in a simple case:

 day Aug 1 Aug 2 Aug 3 Aug 4 Aug 5 Aug 6 Aug 7 Aug 8 sales 1 4 3 6 -2 8 0 2 reset 100 acc. sales 1 5 8 100 98 106 106 108

### Detailed usage

The `runtotal` body (i.e., the part that follows `>>`) must have only one atom, and the atom must refer to a single-valued functional predicate whose value must be of a summable type (`int`, `decimal`, or `float`). All the key variables in the body atom must appear in the head of the rule.

The running total aggregation requires the time argument (`day` in the example) to be the rightmost key argument of the predicate.

While we use the terms “time series” and “time argument” for convenience, the time argument is not required to be a datetime or to represent time.

If the runtotal predicate has key arguments other than the time argument, then the other key arguments function as a group-by (`sku` in the example).

The time series argument must be of a primitive type. All primitive types have a sort order (e.g., for integers `1 < 2`, for strings `“a” < “ab”`, for datetime `2015-09-11 15:00:00 < 2015-09-11 16:00:00`), and the order of these values determines the order in which values are accumulated.

### Detailed Usage for Reset functionality

If the reset functionality is used, then the reset atom and the body atom must have the same signature and must use the same variable names with the same order. Adding a reset changes the computation of the accumulated total in the following way:

1. If both the reset predicate and the body predicate have a value, then the generated value is the reset value.
2. If reset has no value and body has a value, then the generated value is the previous generated value plus the body value.
3. If reset has a value and body has no value, then no value is generated. Additionally, the reset value is treated as "the previous generated value" for the next generated value.
4. If neither reset nor body have a value, then no value is generated.

The following table shows an example of the different scenarios:

 day Aug 1 Aug 2 Aug 3 Aug 4 Aug 5 Aug 6 Aug 7 sales 1 4 3 4 5 reset 2 3 acc. sales (b) 1 (a) 2 (b) 5 (d) (b) 9 (c) (b) 8

This way of handling reset will not be correct for all applications, but its design makes it easy to change into something that is required.

For example, to include reset values in the resulting running total:

```acc_sales[sku, day] = v <- intermediate[sku, day] = v.
acc_sales[sku, day] = v <- reset[sku, day] = v.

intermediate[sku, day] = t <-
series<<t = runtotal[day](sls) resets at reset[sku, day] = sls>>
sales[sku, day] = sls.```

This formulation will not result in a functional dependency violation when the sales predicate also has an incremental value for a specific day, because of rule (a).

If the incremental value should be added to the reset value, then the reset predicate can be computed separately. For example:

```reset_incr[sku, day] = reset[sku, day] + sales[sku, day].
reset_incr[sku, day] = reset[sku, day] <- !sales[sku, day] = _.

intermediate[sku, day] = t <-
series<<t = runtotal[day](sls) resets at reset_incr[sku, day] = sls>>
sales[sku, day] = sls.```

## 14.3. Random number series

LogiQL provides a number of functions that allow the user to generate a collection of random numbers drawn from a particular distribution. Here is a simple example:

```s3[st, w] = v -> store(st), week(w), float(v).
s3[st, w] = v <-
series<< v = rnd_binomial<1, m, seed>[w] >>
week(w), store(st), m = store:med[st], seed = store:number[st].```

The predicate `s3` contains a set of random numbers drawn from a Bernoulli distribution (binomial with `n = 1`) with median `store:med[st]` for each store `st`. All numbers along the time series dimension `[w]` will be drawn form the same distribution. In general the series function has the form `v = rnd_distrname_type(x1, ..., xn, s)` where `x1` through `xn` are parameter values specific to the distribution in question, and `s` is an integer representing the random generator seed. The use of the seed is mandatory. The variables `x1`, ..., `xn`, `s` must be bound on the right hand side of the `series` rule.

The currently available generators for distributions are as follows:

``` rnd_uniform_int<min, max, seed> = z -> int(min), int(max), int(z), int(seed) ```
Uniform discrete distribution for the interval `[min,max]`.
``` rnd_uniform_real<min, max, seed> = z -> float(min), float(max), float(z), int(seed) ```
Uniform distribution `U(min,max)`.
``` rnd_binomial<n, p, seed> = z -> int(n), float(p), float(z), int(seed) ```
Binomial distribution with parameters `n` and `p`.
``` rnd_cauchy<mu, x0, seed> = z -> float(mu), float(x0), float(z), int(seed) ```
Cauchy distribution with location parameter `x0` and scale parameter `mu`.
```rnd_poisson<lambda, seed> = z -> float(lambda), int(z), int(seed) ```
Poisson distribution with parameter `lambda`.

Example 14.3. Random number generators

```emp(fname, lname, serial) -> string(fname), string(lname), int(serial).
emp("TJ",    "Green",       132).
emp("Dan",   "Olteanu",     135).
emp("Todd",  "Veldhuizen",  323).
emp("Geoff", "Washburn",     41).
emp("Benny", "Kimelfeld",  5936).

sample(m) -> int(m).
sample(1).
sample(2).
sample(3).
sample(4).
sample(5).

salary[fname, lname, year, sample] = amount ->
string(fname), string(lname), int(amount), int(year), int(sample).
salary[fname, lname, _, sample] = _ -> emp(fname, lname, _), sample(sample).

height[fname, lname, year, sample] = h ->
string(fname), string(lname), float(h), int(year), int(sample).
height[fname, lname, _, sample] = _ -> emp(fname, lname, _), sample(sample).

// uniform int
salary[f, l, 2000, i] = s <-
series<< s = rnd_uniform_int<500, 1000, serial>[i] >>
emp(f, l, serial), sample(i).

// uniform real
height[f, l, 2000, i] = h <-
series<< h = rnd_uniform_real<3f, max, serial>[i] >>
max = 6f, emp(f, l, serial), sample(i).

// binomial
salary[f, l, 2001, i] = s <-
series<< s = rnd_binomial<t, 0.2f, serial>[i] >>
t = 5, p = 0.2f, emp(f, l, serial), sample(i).```

Please note that the line that immediately follows the declaration of `salary` or `height` is not a declaration, but a more general constraint: see Chapter 18, Constraints.