Chapter 4. Lexical Properties

Note

In the text of the manual we usually use this font for quoted text. This allows us to clearly distinguish, e.g., between the letter A, the letter A in single quotes ('A') and the letter A in double quotes ("A").

LogiQL programs are represented as Unicode text encoded as UTF-8. Programs consist of a number of tokens. Each token is matched using the longest match principle: whenever adding more input characters to a valid token will result in a new valid token, the longer token is the one used. For example, >= is parsed as a single token >= rather than as the two tokens > and =.

In this manual, the notation U+nnnn is used to indicate the Unicode code point whose hexadecimal number is nnnn. For example, U+0020 is the space character. More frequently, a character is described by quoting it. For example, A is the same character as U+0041. (Note that U+0041 is just a way to refer to the character in the text: you cannot actually replace an occurrence of A with U+0041 in a LogiQL program.)

4.1. Identifiers

Identifier = BasicIdentifier { ":" BasicIdentifier } .

BasicIdentifier = ("_" | Letter) { "_" | Letter | Digit } .

An identifier is a sequence of characters where each character is a Unicode letter, a Unicode numeric digit, an underscore (_), or a colon (:). However, the first character cannot be a digit or a colon.

Example 4.1. Identifiers

x
y
cost
sales_2010
PriceStoreSku
sku:cost

The identifier whose name consists of a single underscore character (_) is called the anonymous variable. It is used to denote the existence of a value that is not referenced by the clause containing it. (See Section 9.2, “Variables” for more information.)

4.2. Literals

In this section we describe the formats in which literals of various types can be expressed directly in LogiQL. The general idea is that the type of a literal be immediately visible to both the user and the compiler. Please note that when numbers are printed by the system (e.g., when listing the contents of a predicate), they will not appear in precisely this format. For example, 1f is a floating point literal, 1d and 1.1d are decimal fixed-point literals, and 1 is an integer literal. They would be printed out as 1.0, 1, 1.1 and 1, respectively. See also the section called “X:Y:convert” and the section called “string:T:convert”.

Integer Literals

IntegerLiteral = DecimalIntegerLiteral
               | HexadecimalIntegerLiteral | BinaryIntegerLiteral .

DecimalIntegerLiteral     = [ "-" ] DecimalDigit { DecimalDigit } .
HexadecimalIntegerLiteral = "0x" HexadecimalDigit { HexadecimalDigit } .
BinaryIntegerLiteral      = "0b" BinaryDigit { BinaryDigit } .

DecimalDigit     = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .
HexadecimalDigit = "A" | "B" | "C" | "D" | "E" | "F" |
                 | "a" | "b" | "c" | "d" | "e" | "f" | DecimalDigit .
BinaryDigit      = "0" | "1" .

A decimal integer literal is a sequence of one or more decimal digits, optionally preceded by a minus sign. The value must be in the range of the int type. (See the section called “int” for details.)

Integers may also be written as hexadecimal or binary literals. Hexadecimal literals are prefixed with 0x and binary literals are prefixed with 0b They are intepreted as unsigned integers, unlike decimal integer literals. Treating them as unsigned makes it more convenient to write literals involving the high bit of the underlying integer. For example, if they were interpreted as signed integers, instead of being able to write 0xFFFFFFFFFFFFFFFF to express the integer where all bits are set, the only option available would be to write -1.

Example 4.2. Integer literal

0
-123
42
0b01010
0xABCDEF0

int128 Literals

Int128Literal = [ "-" ] DecimalDigit { DecimalDigit } "q" .

DecimalDigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .

An int128 literal is a sequence of one or more decimal digits followed by q. The sequence of digits may optionally be preceded by a minus sign. The value must be in the range of the int128 type. (See the section called “int128” for details.)

Decimal Fixed-Point Literals

FixedPointLiteral = IntegerPart "d"
                  | [ IntegerPart ] "." DecimalPart [ "d" ] .

IntegerPart = DecimalIntegerLiteral .
DecimalPart = DecimalIntegerLiteral .

A decimal fixed-point literal specifies a decimal number. It is given as a decimal integer literal, followed by a period character and a decimal part, followed by an optional letter d. If d is given, the decimal part can be omitted. If the decimal part is given, the integer part can be omitted.

The value must be in the range of the decimal type. (See the section called “decimal” for details.)

Example 4.3. Fixed-point literals

31.555
31.555d
31d
.555
.555d

Binary Floating-Point Literals

FloatingPointLiteral = IntegerPart Exponent
                     | [ IntegerPart ] "." DecimalPart Exponent
                     | IntegerPart [ Exponent ] "f"
                     | [ IntegerPart ] "." DecimalPart [ Exponent ] "f" .

Exponent = ("e" | "E") [ "+" | "-" ] DecimalIntegerLiteral .

IntegerPart = DecimalIntegerLiteral .
DecimalPart = DecimalIntegerLiteral .

A binary floating-point literal specifies an IEEE 754 floating-point number. It is given as an integer literal, followed by an optional decimal part, followed by an optional exponent part, followed by the letter f.

The decimal part, if present, is indicated with a period (.) followed by another integer literal. The exponent part, if specified, is indicated with the letter e or E, followed by an optional plus (+) or minus (-) sign, followed by an integer literal.

See the section called “float” for details about the range and precision of floating-point numbers.

Example 4.4. Floating point literals

31.555f
31e12f
31e12
.0e12
31.555e-12f
31f

Although f is not strictly necessary when the exponent is given, we recommend that it never be omitted. Otherwise editing a floating-point literal might easily turn it into a decimal literal, thus causing a type error.

Boolean Literals

There are two boolean literals, true and false.

true
false

String Literals

A single-line string literal is a double quote character (", U+0022), followed by zero or more character specifiers, followed by another double quote character.

A multi-line string literal is three double quote characters ("""), followed by zero or more character specifiers including unescaped newline characters, followed by another three double quote characters. (A multi-line string literal can, but need not span multiple lines. See also the note below.)

Each character specifier determines one character that will be included in the string. The possible character specifiers are as follows:

  • Any character except a double quote, a backslash (\, U+005C), or a newline (U+000A). The character specifies itself for inclusion in the string.
  • \", indicating a double quote character (U+0022).
  • \a, indicating an alert (U+0007).
  • \b, indicating a backspace (U+0008).
  • \f, indicating a form feed character (U+000C).
  • \n, indicating a newline (line feed) character (U+000A).
  • \r, indicating a carriage return character (U+000D).
  • \t, indicating a tab character (U+0009).
  • \v, indicating a vertical tab character (U+000B).
  • \\, indicating a single backslash (U+005C) (but see the note below!).
  • \', indicating a single quote character (U+0027).
  • \u followed by exactly four hexadecimal digits, indicating the Unicode character with the code point given by those hex digits. Hexadecimal digits that are letters may be given in upper or lower case.

Note

Some special characters may occur directly in a string. For example, if a is followed by a tabulator in "a b", then the string is equivalent to "a\tb".

In a multi-line string literal backslash characters are not escaped, so """\\\\""" contains four backslashes, unlike "\\\\", which contains two. """\\\""" is perfectly legal (three backslashes), whereas "\\\" is not (and would trigger an error).

One consequence of this is that p("""\""""). is equivalent to p("\\\""), rather than p("\""). So a multi-line string literal can contain up to two double quotes in a sequence. For example, p(""""""""). is equivalent to p("\"\""). (However, adding one more double quote would cause a compilation error, which is perhaps not surprising.)

Another consequence is that if you want to insert, say, a tabulator into a multi-line string literal, then you cannot do it with a \t. Instead, you must make sure that a real tabulator character is present between the triple double quotes.

Note also that a multi-line string literal is converted to a normal string, so, for example,

"""ab
c"""

is exactly equivalent to "ab\nc".

Example 4.5. String literals

"hello, world"
""
"He said, \"It's only logical.\"\n"
"\uDEADbeef"
"""This is just
one string"""

Predicate Literals

A predicate literal is a back quote (`, U+0061) followed by an identifier. For example:

Example 4.6. Predicate literals

`p
`q
`parent

4.3. Operators

The following character sequences are used as operators in LogiQL:

. :: : , ; <- -> = < >
!= <= >= ( ) / - + * ^
@ [ ] !

4.4. Keywords

The following character sequences are keywords in LogiQL:

true false

Note that other identifiers, such as the names of data types (e.g., datetime), clause names in the module system (e.g., aliases) and aggregation rule types (e.g., seq) are treated specially by LogiQL but are not reserved words. To make your program more readable, you should avoid using these identifiers for other purposes.

4.5. White space and comments

White space and comments are used to lay out code and to separate tokens that would otherwise combine due to the longest match principle. White space and comments are immediately discarded after being parsed.

White space is any sequence of the following characters: space (U+0020), tab (U+0009), form feed (U+000C), carriage return (U+000D), or line feed (U+000A, i.e., newline).

A comment can be written in either of two ways. One way is to start with a slash and an asterisk (/*). Such a comment continues until the first instance of an asterisk followed by a slash (*/). The second way is to start with two slashes (//): the comment then extends to the end of the line. Here are two examples of comments.

Example 4.7. Syntax of comments

// This is a comment

/* This is
   a multi-
   line comment
*/