Completed Semester Projects ‒ LAMP ‐ EPFL

Using Reified Types for Specialization

Author: Nicolas Alexander Stucki
Supervisor: Vlad Ureche
Completed: June 2013

Generic code increases programmer productivity as it increases code reuse. For example, the LinkedList abstraction can be used in many contexts, from storing a list of numbers to storing representations of files on the disk. Unfortunately this comes at the expense of performance, as the generic code needs a common representation for all types. The common representation is usually a pointer to heap data. But for value types, such as integers, bytes and even value classes (see SIP-15) this leads to significant overheads, as they need to be allocated as objects on the heap and then pointed to, breaking data locality and adding an extra indirection.

To overcome this performance loss, many programming languages and virtual machines perform code specialization, which creates a copy of the generic class for each value type and adapts the data representation.

Scala does not require type parameters to be known at compile time, thus allowing truly-generic code to be generated. In this case, type information is not necessary and will not available in the generated bytecode. But this prohibits programmers from using specialized code from generic code, which might be desirable.

Scala can overcome the loss of type information in generic classes by attaching types in so-called ClassTags (formerly known as Manifests). This information can be used to dispatch to the correct specialized class implementation, and can be made either as a compiler plugin or as a macro. Either implementation is fine, as long as the syntactic overhead is reasonable. For this project one should research what is the best implementation and prepare a set of benchmarks that clearly show the performance hit of dispatching based on the ClassTag.

Report: PDF (also: the paper presenting this project at Scala Workshop 2013)
Github repository: https://github.com/nicolasstucki/specialized

Scala Benchmark: Invariant Verifier

Author: Pamela Delgado
Supervisor: Aleksandar Prokopec
Completed: January 2012

Invariant verifier is a tool used to test parallel programs by verifying rules about them. This tool instruments the code to track method invocations and does so based on the representation of rules which are first order logic sentences. This project has been developed as part of the Scala Benchmark suite.

Report: PDF

Parallelized Collaborative Filtering with Menthor

Authors: Olivier Blanvillain and Louis Bliss
Supervisor: Heather Miller and Philipp Haller
Completed: January 2012

We took a Hadoop implementation of collaborative filtering (used for example by Amazon to suggest other “similar” items to shoppers), developed a new strategy for parallelization, and implemented it both in Java using ExecutorServices and on top of Menthor. They report on impressive speed-ups and offer a nice discussion on Java vs Scala parallel performance for their application.

Report: PDF

Optimizing Menthor with Parallel Collections

Author: Florian Gysin
Supervisor: Heather Miller and Philipp Haller
Completed: January 2012

We drastically improved the parallel performance of the Menthor framework using parallel collections. We report some nice speed-ups, and report on (a) the choice of data structures and (b) communication patterns on parallel performance.

Report: PDF

Parallel Natural Language Processing Algorithms in Scala

Author: Stanislav Peshterliev
Supervisor: Heather Miller and Philipp Haller
Completed: January 2012

This project reports on the parallel implementation of algorithms for Natural Language Processing, which are used to produce a system for classifying movie reviews. Several algorithms were implemented– parallelized Maximum Entropy, parallelized Naive Bayes, and feature selection via Information Gain. Using these algorithms, the results show good both speed-ups and good quality results across data sets of real movie reviews.

Report: PDF

Parallel Machine Learning: Collaborative Filtering via Alternating Least Squares

Author: Bruno Studer
Supervisor: Heather Miller
Completed: June 2012

This project reports on the parallel implementation of challenging collaborative filtering algorithm based on the Alternating Least Squares Method of Multipliers using both Scala’s parallel collections and the Menthor framework. Results show speed ups across many cores, on differing architectures, all on a real dataset of movie reviews.

Report: PDF

A Non-Blocking Concurrent Queue Algorithm

Author: Bruno Didot
Supervisor: Aleksandar Prokopec
Completed: June 2012

The algorithm implements the concurrent ordered queue as an unrolled singly-linked list. We show how the algorithms works, the details of the implementation, we demonstrate its correctness and finally show the performance compared to the Java concurrent linked queue.

Report: PDF

Parallel Machine Learning: An Expectation Maximization Algorithm for Gaussian Mixture Models

Author: Pierre Grydbeck
Supervisor: Heather Miller
Completed: June 2012

This project reports on the parallel implementation of a challenging Expectation-Maximization-based Gaussian classifier, and shows speed ups across different parallel frameworks. Three implementations are compared; a sequential Scala implementation, a Scala parallel collections implementation, and an implementation on top of Menthor. Results shows speed-ups across each, and a nice comparison of the different programming models. Futhermore, improved results were obtained by extending the Menthor framework with an additional “crunchToOne” combinator.

Report: PDF

FlowPools: A Lock-Free Concurrent Dataflow Abstraction

Author: Tobias Schlatter
Supervisor: Heather Miller and Aleksandar Prokopec
Completed: June 2012

This project covers a integral part of a novel new data structure for concurrent dataflow computation that was the subject of a submission to LCPC2012. The submission, which was joint work between Aleksandar Prokopec, Heather Miller, Tobias Schlatter, Philipp Haller and Martin Odersky can be described as follows:

Implementing correct and deterministic parallel programs is challenging. Even though concurrency constructs exist in popular programming languages to facilitate the task of deterministic parallel programming, they are often too low level, or do not compose well due to underlying blocking mechanisms. In this paper, we present the design and implementation of a fundamental data structure for composable deterministic parallel dataflow computation through the use of functional programming abstractions. Additionally, we provide a correctness proof, showing that the implementation is linearizable, lock-free, and deterministic. Finally, we show experimental results which compare our FlowPool against corresponding operations on other concurrent data structures, and show that in addition to offering new capabilities, FlowPools reduce insertion time by 49-54% on a 4-core i7 machine with respect to comparable concurrent queue data structures in the Java standard library.

The project report corresponding to the integral part of the submission, Multi-Lane FlowPools, can be described as follows:

FlowPools, are a powerful way to express dataflow logic in highly parallelized applications. The original paper proposes two ways of implementing a FlowPool: Single-Lane FlowPools (SLFP) and Multi-Lane FlowPools (MLFP). While SLFPs showed decent performance overall, insertion operations do not scale. MLFPs solve this limitation as benchmarks discussed in the above paper have shown. This report goes into the details of the implementation of MLFPs and lays out the benchmarking results from the above paper to show that MLFPs reduce insertion time by 49−54% on a 4-core i7 machine with respect to comparable concurrent queue data structures in the Java standard library.

Report: PDF

Miniboxing: An Encoding for Specialization

Author: Cristian Talau
Supervisor: Vlad Ureche
Completed: June 2012

In the presence of parametric polymorphism, erasure-based languages such as Java and Scala handle primitives (boolean values, integers and floating point numbers) in a suboptimal way: in order to provide a uniform representation on the low level, all primitive values are stored in heap objects, in a process known as boxing. This leads to access overheads, wasteful usage of the heap space and broken cache locality.

Specialization enables Scala to optimally handle primitive values in the context of generic classes, but this is done at the expense of duplicating classes up to 10 times for each type parameter, for each of the 9 Scala primitive types and heap objects. This prevents specialization of key classes in the Scala library, such as Function2, List, Map and so on.

This project aims at testing the hypothesis that encoding several primitive types into a larger stack-based primitive type can maintain the performance of specialized code, while dramatically decreasing the generated bytecode size.

Report: PDF
Github repository: https://github.com/miniboxing/miniboxing-plugin

Scaladoc Diagrams for Class Hierarchies

Author: Damien Obrist
Supervisor: Vlad Ureche
Completed: June 2012

On numerous occasions it has been pointed out that the class hierarchy of the Scala standard library is hard to follow. This project aims at addressing the problem by automatically generating interactive class hierarchy diagrams in the scaladoc tool.

Report: PDF
Status: diagrams are integrated in scaladoc and are used by many projects documenting their API