Using Reified Types for Specialization
Author: Nicolas Alexander Stucki
Supervisor: Vlad Ureche
Completed: June 2013
Generic code increases programmer productivity as it increases code reuse. For example, the LinkedList abstraction can be used in many contexts, from storing a list of numbers to storing representations of files on the disk. Unfortunately this comes at the expense of performance, as the generic code needs a common representation for all types. The common representation is usually a pointer to heap data. But for value types, such as integers, bytes and even value classes (see SIP-15) this leads to significant overheads, as they need to be allocated as objects on the heap and then pointed to, breaking data locality and adding an extra indirection.
To overcome this performance loss, many programming languages and virtual machines perform code specialization, which creates a copy of the generic class for each value type and adapts the data representation.
Scala does not require type parameters to be known at compile time, thus allowing truly-generic code to be generated. In this case, type information is not necessary and will not available in the generated bytecode. But this prohibits programmers from using specialized code from generic code, which might be desirable.
Scala can overcome the loss of type information in generic classes by attaching types in so-called ClassTags (formerly known as Manifests). This information can be used to dispatch to the correct specialized class implementation, and can be made either as a compiler plugin or as a macro. Either implementation is fine, as long as the syntactic overhead is reasonable. For this project one should research what is the best implementation and prepare a set of benchmarks that clearly show the performance hit of dispatching based on the ClassTag.
Report: PDF (also: the paper presenting this project at Scala Workshop 2013)
Github repository: https://github.com/nicolasstucki/specialized
Scala Benchmark: Invariant Verifier
Author: Pamela Delgado
Supervisor: Aleksandar Prokopec
Completed: January 2012
Invariant verifier is a tool used to test parallel programs by verifying rules about them. This tool instruments the code to track method invocations and does so based on the representation of rules which are first order logic sentences. This project has been developed as part of the Scala Benchmark suite.
Report: PDF
Parallelized Collaborative Filtering with Menthor
Authors: Olivier Blanvillain and Louis Bliss
Supervisor: Heather Miller and Philipp Haller
Completed: January 2012
We took a Hadoop implementation of collaborative filtering (used for example by Amazon to suggest other “similar” items to shoppers), developed a new strategy for parallelization, and implemented it both in Java using ExecutorServices and on top of Menthor. They report on impressive speed-ups and offer a nice discussion on Java vs Scala parallel performance for their application.
Report: PDF
Optimizing Menthor with Parallel Collections
Author: Florian Gysin
Supervisor: Heather Miller and Philipp Haller
Completed: January 2012
We drastically improved the parallel performance of the Menthor framework using parallel collections. We report some nice speed-ups, and report on (a) the choice of data structures and (b) communication patterns on parallel performance.
Report: PDF
Parallel Natural Language Processing Algorithms in Scala
Author: Stanislav Peshterliev
Supervisor: Heather Miller and Philipp Haller
Completed: January 2012
This project reports on the parallel implementation of algorithms for Natural Language Processing, which are used to produce a system for classifying movie reviews. Several algorithms were implemented– parallelized Maximum Entropy, parallelized Naive Bayes, and feature selection via Information Gain. Using these algorithms, the results show good both speed-ups and good quality results across data sets of real movie reviews.
Report: PDF
Parallel Machine Learning: Collaborative Filtering via Alternating Least Squares
Author: Bruno Studer
Supervisor: Heather Miller
Completed: June 2012
This project reports on the parallel implementation of challenging collaborative filtering algorithm based on the Alternating Least Squares Method of Multipliers using both Scala’s parallel collections and the Menthor framework. Results show speed ups across many cores, on differing architectures, all on a real dataset of movie reviews.
Report: PDF
A Non-Blocking Concurrent Queue Algorithm
Author: Bruno Didot
Supervisor: Aleksandar Prokopec
Completed: June 2012
The algorithm implements the concurrent ordered queue as an unrolled singly-linked list. We show how the algorithms works, the details of the implementation, we demonstrate its correctness and finally show the performance compared to the Java concurrent linked queue.
Report: PDF
Parallel Machine Learning: An Expectation Maximization Algorithm for Gaussian Mixture Models
Author: Pierre Grydbeck
Supervisor: Heather Miller
Completed: June 2012
This project reports on the parallel implementation of a challenging Expectation-Maximization-based Gaussian classifier, and shows speed ups across different parallel frameworks. Three implementations are compared; a sequential Scala implementation, a Scala parallel collections implementation, and an implementation on top of Menthor. Results shows speed-ups across each, and a nice comparison of the different programming models. Futhermore, improved results were obtained by extending the Menthor framework with an additional “crunchToOne” combinator.
Report: PDF
FlowPools: A Lock-Free Concurrent Dataflow Abstraction
Author: Tobias Schlatter
Supervisor: Heather Miller and Aleksandar Prokopec
Completed: June 2012
This project covers a integral part of a novel new data structure for concurrent dataflow computation that was the subject of a submission to LCPC2012. The submission, which was joint work between Aleksandar Prokopec, Heather Miller, Tobias Schlatter, Philipp Haller and Martin Odersky can be described as follows:
Implementing correct and deterministic parallel programs is challenging. Even though concurrency constructs exist in popular programming languages to facilitate the task of deterministic parallel programming, they are often too low level, or do not compose well due to underlying blocking mechanisms. In this paper, we present the design and implementation of a fundamental data structure for composable deterministic parallel dataflow computation through the use of functional programming abstractions. Additionally, we provide a correctness proof, showing that the implementation is linearizable, lock-free, and deterministic. Finally, we show experimental results which compare our FlowPool against corresponding operations on other concurrent data structures, and show that in addition to offering new capabilities, FlowPools reduce insertion time by 49-54% on a 4-core i7 machine with respect to comparable concurrent queue data structures in the Java standard library.
The project report corresponding to the integral part of the submission, Multi-Lane FlowPools, can be described as follows:
FlowPools, are a powerful way to express dataflow logic in highly parallelized applications. The original paper proposes two ways of implementing a FlowPool: Single-Lane FlowPools (SLFP) and Multi-Lane FlowPools (MLFP). While SLFPs showed decent performance overall, insertion operations do not scale. MLFPs solve this limitation as benchmarks discussed in the above paper have shown. This report goes into the details of the implementation of MLFPs and lays out the benchmarking results from the above paper to show that MLFPs reduce insertion time by 49−54% on a 4-core i7 machine with respect to comparable concurrent queue data structures in the Java standard library.
Report: PDF
Miniboxing: An Encoding for Specialization
Author: Cristian Talau
Supervisor: Vlad Ureche
Completed: June 2012
In the presence of parametric polymorphism, erasure-based languages such as Java and Scala handle primitives (boolean values, integers and floating point numbers) in a suboptimal way: in order to provide a uniform representation on the low level, all primitive values are stored in heap objects, in a process known as boxing. This leads to access overheads, wasteful usage of the heap space and broken cache locality.
Specialization enables Scala to optimally handle primitive values in the context of generic classes, but this is done at the expense of duplicating classes up to 10 times for each type parameter, for each of the 9 Scala primitive types and heap objects. This prevents specialization of key classes in the Scala library, such as Function2, List, Map and so on.
This project aims at testing the hypothesis that encoding several primitive types into a larger stack-based primitive type can maintain the performance of specialized code, while dramatically decreasing the generated bytecode size.
Report: PDF
Github repository: https://github.com/miniboxing/miniboxing-plugin
Scaladoc Diagrams for Class Hierarchies
Author: Damien Obrist
Supervisor: Vlad Ureche
Completed: June 2012
On numerous occasions it has been pointed out that the class hierarchy of the Scala standard library is hard to follow. This project aims at addressing the problem by automatically generating interactive class hierarchy diagrams in the scaladoc tool.
Report: PDF
Status: diagrams are integrated in scaladoc and are used by many projects documenting their API