I love query languages for many reasons, but mostly because of their semantics. Wait, come back! In contrast to most systems programming languages (whose semantics can be quite esoteric), the semantics of a query (given some inputs) are precisely its outcome -- rows in tables. Hence when we write a query, we directly engage with its semantics: we simply say what we mean. This makes it easy and natural to reason about whether our queries are correct: that is, whether they mean what we intended them to mean.
Query languages have traditionally been applied to a relatively narrow domains: historically, data at rest in data stores; more recently, data in motion through continuous, "streaming" query frameworks. Why stop here? Could query languages do for a notoriously complex domain such as distributed systems programming what they have done so successfully for data management? How would they need to evolve to become expressive enough to capture the programs that we need to write, while retaining a simple enough semantics to allow mere mortals to reason about their correctness?
I will attempt to answer these questions (and raise many others) by describing a query language for distributed programming called Dedalus. Like traditional query languages, Dedalus abstracts away many of the details we typically associate with programming, making data and time first-class citizens and relegating computation to a subordinate role, characterizing how data is allowed to change as it moves through space and time. As we will see, this shift allows programmers to directly reason about distributed correctness properties such as consistency and fault-tolerance, and lays the foundations for powerful program analysis and repair tools (such as Blazes and LDFI), as well as successive generations of data-centric programming languages (including Bloom, Edelweiss and Eve).
UNIVERSITY OF CALIFORNIA SANTA CRUZ
Peter Alvaro is an Assistant Professor of Computer Science at the University of California Santa Cruz. His research focuses on using data-centric languages and analysis techniques to build and reason about data-intensive distributed systems, in order to make them scalable, predictable and robust to the failures and nondeterminism endemic to large-scale distribution. Peter is the creator of the Dedalus language and co-creator of the Bloom language.
While pursuing his PhD at while UC Berkeley, Peter co-developed and taught Programming the Cloud, an undergraduate course that explored distributed systems concepts through the lens of software development. Prior to attending Berkeley, Peter worked as a Senior Software Engineer in the data analytics team at Ask.com. Peter's principal research interests are databases, distributed systems and programming languages.