Optimizing Scala Collections with Macros
Author: Georgios Kollias (University of Athens, Grece)
Supervisor: Prof. Yannis Smaragdakis (University of Athens, Grece), Vlad Ureche (EPFL)
Completed: May 2013
High-order functions such as map, filter and foreach provide a very convenient way to work with collections, but this convenience often comes at a cost of performance. Typical use cases for collection processing involve anonymous functions, which usually compile down to JVM anonymous classes introducing noticeable runtime overhead.
With the advent of macros in Scala 2.10, the programmer can control how exactly the compiler is compiling invocations of given methods. By declaring collections methods as macros, it becomes possible to transform their usages into plain while loops, bringing the performance back to normal.
Within this project your work will be to: 1) devise a mini-benchmark to measure the state of the art performance of collections, 2) rewrite popular collection methods as macros, 3) assess the performance improvements.
Master’s Thesis: PDF
Github repository: https://github.com/geo-kollias/declosurify
Shadow Embedding for Slick
We address the problem of integrating general purpose programming languages with relational databases. An approach to solving this problem is using raw strings to represent SQL statements. This approach leads to run-time errors and security vulnerabilities like SQL injection. The second approach is integrating the query in a host language. The most well-known example of the second approach is LINQ. This approach provides static checking of types and syntax during compilation.
Scala Benchmarking Suite
Author: Ngoc Duy Pham
Supervisor: Aleksandar Prokopec
Completed: December 2011
Scala is growing dramatically and thereby needs a benchmarking tool to guarantee its performance and reliability. The Scala Benchmarking Suite (SBS) is a tool developed to satisfy this request. It allows users to write micro-benchmarks detecting the performance regression with statistical rigor in a way just as simple as the way they write unit tests. In addition, users can have SBS profile typical metrics during benchmark runs, such as method invocations, number of boxings, memory consumption, etc.
And finally, SBS comes with the implementation of a bottleneck finding algorithm, which combines bytecode instrumentation and statistically rigorous performance regression detection. The algorithm has the ability to dynamically and programmatically point out the piece of code that causes a performance drop without the needs for manual effort from users.
Report: PDF
Distributed Collections DSL
Author: Stefan Ackermann
Supervisor: Vojin Jovanovic
Completed: work in progress
Contemporary large scale data processing frameworks either provide higher level abstractions that increase expressiveness while sacrificing performance or provide low level APIs that require significant development effort to program and maintain. The goal of this project is to provide a domain specific language (DSL) with a high level API, similar to Scala Collections [1], for distributed data parallel processing. Although it would have a high level API, at the same time, it would use domain knowledge to generate highly optimized code for existing large scale processing frameworks like (Hadoop [2] or Nephele/PACTs [3]). It would be implemented as a building block for other DSLs inside the Delite framework [4].
The DSL implementation would use/extend the interface from existing Delite Collections DSL, use existing optimization techniques and then generate code that targets a specific processing framework. This project would innovate in following aspects:
- Show that the DSL based approach will allow peak performance while keeping high level of abstraction in large scale data processing
- Provide code portability between different processing frameworks with minimal development effort
[1] Scala Collections http://docs.scala-lang.org/overviews/collections/introduction.html
[2] Hadoop http://hadoop.apache.org
[3] Nephele/PACTs http://www.stratosphere.eu
[4] Delite http://stanford-ppl.github.com/Delite/index.html
Report: PDF
Presentation: PDF
Multi-stage Programming with Scala Macros
Authors: Ngoc Duy Pham, Vladimir Nikolaev, Vera Salvisberg
Multi-stage programming (staging) [1] is a promissing aproach for developing high-performance programs with a high-level programming model. LMS [2] (type-dirven staging) complements staging with type-safety and modularaity through the use of abstract data types. This approach is great for library/DSL developers but library/DSL users often face complex type errors that are hard to understand.
This project aims to use metaprogramming with Scala macros to provide simple interface for library/DSL users while giving the full abstract data type approach to library/DSL developers. Each LMS program will be pre-processed with a macro that replaces method calls with their DSL representatives. This will allow simple type checking (without abstract types) and thus improve the interface for numerous DSLs. Additionaly the macro will analyze static methods imported into the DSL and try to convert them into the DSL representation. For example, calling `Math.pow(x, 3)` will be automatically imported into the DSL if its interface allows for all its building blocks.
Prerequisites: Good knowledge of Scala, Basic knowledge about the type systems, Basics of compiler construction
References:
[1] Multi-stage programming: http://www.cs.rice.edu/~taha/MSP/
[2] LMS – Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
Offered by Vojin Jovanovic
Paper: PDF