The provenance of data — how, through which relations and operations a piece of data arrived in a query’s result — can be applied for both the debugging of queries as well as their optimization. The dissertation work of LogicBlox team members Grigoris Karvounarakis and TJ Green advanced the state-of-art in query provenance, by developing the semring-based provenance. In this recent article for SIGMOD Records, Grigoris and TJ summarizes their work, and describes how it’s been applied at LogicBlox.
Abstract We present an overview of the literature on querying semiring-annotated data, a notion we introduced five years ago in a paper with Val Tannen. First, we show that positive relational algebra calculations for various forms of annotated relations, as well as provenance models for such queries, are particular cases of the same general algorithm involving commutative semirings. For this reason, we present a formal framework for answering queries on data with annotations from commutative semirings, and propose a comprehensive provenance representation based on semirings of polynomials. We extend these considerations to XQuery views over annotated, unordered XML data, and show that the semiring framework suffices for a large positive fragment of XQuery applied to such data. Finally, we conclude with a brief overview of the large body of work that builds upon these results, including both extensions to the theoretical foundations and uses in practical applications.