Student Projects

If you are interested in working with us, here are some additional projects which we would be happy on working on!

Arrow of time in algorithmic tasks
The recent success of transformers in natural language processing has led to research on their reasoning capabilities. This line of research usually focuses on how learning occurs in transformers that are trained from scratch on a specific algorithmic task. Surprisingly, even in a simple task such as addition, training transformers from scratch does not succeed, and non-trivial modifications are required. These modifications are task-specific and take the form of either modifying the data, such as its ordering or adding meta-data for transformers, or modifying components, such as positional encoding. In the addition task, in particular, writing digits in reverse order helps transformers. In this project, we aim to develop a general training procedure that can handle different algorithmic tasks by considering generalized orderings of the data. The primary objective is to benchmark a certain training procedure on various algorithmic tasks and compare it with solutions in the literature.

Contact person: Oguz Kaan Yüksel
Sparse principal component analysis for interpretability
With the rapid deployment of neural networks, interpretability is becoming increasingly important. Dictionary learning is a simple approach for extracting features from transformers’ hidden representations. In this project, we will study an even simpler approach based on sparse principal component analysis. Our main goal is to study the efficacy of such approaches in explaining the representations of transformers and neural networks trained with self-supervised objectives.

Contact person: Oguz Kaan Yüksel
Longer or More?
Language models are trained on a large corpus of text, with largeness measured by the number of tokens (T), which can be seen as the product of two quantities: the number of documents (D) and the length of each document (L), rather than the older concept of sample size (S). However, as tokens from the same document are correlated with one another, the i.i.d. assumption central to generalization theory is invalid, and the relationship between (T) and (S) is unclear. In this project, we will study how (D) and (L) influence learning in order to define the concept of an effective sample size (S). The main goal will be the identification of scaling laws w.r.t. (D) and (L) instead of (T) with baby language models in real-world data and some synthetic settings, such as Markovian languages.

Contact person: Oguz Kaan Yüksel