Congratulations to LogicBlox team member Zografoula Vagena on her recent publication in SIGMOD 2013, the premier international conference in data management. In collaboration with researchers from Rice University and IBM Almaden Research Center, this paper describes how to use a declarative SQL extension to specify, simulate, and query markov chains. We are excited to see more applications of the declarative, in-database approach to computing!
Abstract This paper describes the SimSQL system, which allows for SQLbased specification, simulation, and querying of database-valued Markov chains, i.e., chains whose value at any time step comprises the contents of an entire database. SimSQL extends the earlier Monte Carlo database system (MCDB), which permitted Monte Carlo simulation of static database-valued random variables. Like MCDB, SimSQL uses user-specified “VG functions” to generate the simulated data values that are the building blocks of a simulated database. The enhanced functionality of SimSQL is enabled by the ability to parametrize VG functions using stochastic tables, so that one stochastic database can be used to parametrize the generation of another stochastic database, which can parametrize another, and so on. Other key extensions include the ability to explicitly define recursive versions of a stochastic table and the ability to execute the simulation in a MapReduce environment. We focus on applying SimSQL to Bayesian machine learning.