Samsara: Declarative Machine Learning on
Distributed Dataflow Systems
We present Samsara, a domain-specific language for declarative machine learning in cluster environments. Samsara allows its users to specify programs using a set of common matrix abstractions and linear algebraic operations, similar to R or MATLAB. Samsara then compiles, optimizes and executes these programs on distributed dataflow systems. The aim of Samsara is to allow mathematicians and data scientists to leverage the scalability of distributed dataflow systems via common declarative abstractions, while drastically reducing the need for detailed knowledge of the programming model and execution scheme of the underlying systems. Samsara is part of the Apache Mahout library and supports backends like Apache Spark and Apache Flink.
Sebastian Schelter is a senior researcher at the Database Systems and Information Management Group of TU Berlin. His research focuses on the technical side of data mining and machine learning, and incorporates a wide variety of aspects, such as systems research, scalable algorithms and the application of data mining to domains such as the web and social networks.