This page lists the projects currently open at the Laboratory for the History of Science and Technology (LHST). If you are interested in working on one of the projects listed below, please write directly to the contact person, with Prof. Baudry cc’ed in your email.
The project descriptions are only brief outlines and we are in general flexible about the particulars.
This project, conducted in collaboration with the Institut des Humanités en Médecine (IHM), forms part of the SNSF-funded research initiative MEDIF, led by Dr. Aude Fauvel and Prof. Rémy Amouroux. MEDIF investigates the collective contributions of the first female doctors in France and French-speaking Switzerland to the development of medical theory and practice between 1870 and 1940.
The proposed project focuses on compiling and analyzing a dataset of women doctors listed in the Rosenwald Guides, published in France between 1887 and 1940. By applying quantitative methods, the project aims to examine their numbers, medical specializations, modes of practice (e.g., clinical settings vs. private practice), geographic distribution, marital status, and whether they worked in partnership with others. The goal is to shed light on the dynamics of women’s professionalization in a historically male-dominated field.
The primary corpus comprises 47 guidebooks in French, all digitized and available via Gallica, the digital platform of the Bibliothèque nationale de France. The OCR quality of the texts varies, and part of the project will involve evaluating and refining the textual data. The analysis will aim to identify trends over time in the visibility and distribution of women doctors, as reflected in the guides.
This project will be carried out in collaboration with Mikhaël Moreau (IHM), Dr. Amélie Puche (IHM), and Prof. Jérôme Baudry (EPFL).
Project type: Semester project or Master’s thesis
Prerequisites: Experience with text mining; strong data analysis skills; knowledge of NLP and computational linguistics; interest in history and the social sciences; reading proficiency in French.
Contact : Jérôme Baudry (jerome.baudry@epfl.ch)
The project investigates how digital tools can be used to study the development of botany, forestry, and agricultural knowledge in 19th-century France through the case of the acclimatization of Eucalyptus, an Australian plant, discovered in 1788 by Charles Louis L’Héritier.
The corpus includes 3,660 books published between 1791 and 1914 that mention eucalyptus, available in bulk downloads with good OCR quality. These are books on medicine, agriculture, botany, history, geography, and literature available on Gallica, the online platform of the French national library (BnF).
The project aims to determine, with the use of text mining, the evolution of publications dealing with the eucalyptus through the 19th century and identify the type of properties this tree is associated with (e.g. medicinal, forestry, agricultural, cleansing). To do this, the student will have to
- harvest the corpus through Gallica’s API;
- assemble it into a comprehensive DataFrame with relevant metadata;
- create a database and carry out textual analysis, which can range:
a) co-occurrence networks;
b) topic-modeling;
c) sentiment analysis;
d) diachronic evolution of issues and topics associated with eucalyptus;
e) possibly, named entity recognition to construct knowledge graphs.
This student project is part of a broader environmental history research project: the study of environmental, technical, scientific, and social transformations brought in by the planting of Eucalyptus, a “new” tree in the second half of the nineteenth and twentieth centuries in the Mediterranean region.
Project type: semester project or master’s thesis.
Prerequisites: Prior experience in text mining; solid data analysis skills; knowledge in LLM a plus; interest in environmental history a plus; language skills in French.
Possibility to work in group? Yes.
Contact: Elisabeth Davin-Mortier (elisabeth.davin-mortier@epfl.ch)
The project investigates how digital tools can be used to study the dynamics of innovation in science and technology, from the eighteenth century to today. Innovation—the production of the new—is often said to be radical and path-breaking; yet, what can history teach us about the actual rhythms of innovation? Looking at past centuries of technological development, do we see continuity or discontinuity? Can we identify waves of innovation (and imitation)? How new is the new and how did people strategically describe and draw technology to present it as new? How did inventors in the past address the dangers and potential negative consequences of their activity? Analyzing and/or building a corpus of patents of invention, you will choose and apply state-of-the-art NLP/machine learning methods and/or use your statistical and data science knowledge.
Below are some more detailed examples of themes for semester projects and master’s theses, but you can also propose your own questions. You will work with an interdisciplinary team of historians and computer scientists.
– Tracing international patent flows: Since the 19th century, as economic relationships became more and more globalized, individuals and corporations have increasingly patented their inventions in many countries simultaneously. Available statistics do not allow to answer questions such as: Which patents had counterparts in other countries? In which countries were patents covering the same technology to be found? Did the inventors usually patent in their country of residence first, or did they choose other countries? A computational analysis of digitized patent documents might help in answering such questions. While the textual descriptions of the inventions needed to be translated in the language of each country, and adapted to its legal system, the drawings contained in the document tended to be reused. Using computer vision techniques to match patents from different countries that feature the same drawings would shed light on the historical dynamics of international patenting and technology flows.
– Classifying patents and technology: Categorizing innovation is difficult. Categories are static, innovation is dynamic. Innovation can and does happen in-between established categories. Yet, it is very important to categorize patents, because the dynamic of innovation and the logic of taking out patents differ according to technology and to industry. However, such labels are usually missing from the datasets. The availability of the textual description of patents presents a great opportunity to address the challenge of classification.
– Extending the geographical scope of possible investigations: Most studies relying on the full text of historical patents rely on those issued by the United States of America, because of their easy availability in digitized form. To investigate similar questions for other countries, the available scanned material would need to be prepared and processed to be turned into clean digital full text, and results from off-the-shelf optical character recognition (OCR) software are of varying and sometimes questionable quality. This would be an interesting challenge for people interested in computer vision and OCR, and in bringing about a less one-sided view of innovation.
Project type: semester project or master’s thesis.
Prerequisites: prior experience in either text mining, NLP or computer vision; solid skills in data analysis and Python; prior experience with large datasets and/or working remotely on a server is a plus.
Contact : Jérôme Baudry (jerome.baudry@epfl.ch)
Open Science is an international movement aiming at making all scientific research productions—publications, data, software, methods—freely accessible to all people in society: researchers, amateurs, policy makers, industries, as well as artists, journalists, and activists. Open Science across the world relies heavily on the design and development of dedicated infrastructures, mostly platforms: digital libraries, data repositories (“as open as possible, as closed as necessary”), directories, online journals, web archives, computational services, MOOCs, content management systems (CMS), collaborative version control, etc.
These platforms are loosely bound as a network. For example, a publication on an online journal may refer, via a persistent identifier (such as a DOI), to a dataset hosted on a given repository. Another example is when an open science search engine may harvest the metadata of libraries and directories to index available publications.
The shape of the open science ecosystem online and the nature of the links that tie platforms together are not well known yet. The aim of this project is to crawl the web to identify the links between platforms, to characterize their nature, and to generate an interactive map of the open science network.
Level
Master (research project, optional research project, or master’s project).
Possibility to work in group?
Yes.
Contact: Simon Dumas Primbault (simon.dumasprimbault@epfl.ch)
For some years now, the term ‘ecosystem’ has been used by a number of research stakeholders to refer, in very different ways, to the environment in which their practice takes place. The numerous uses of the semantic field of ecology to understand the digital transition of research environments are, however, very diverse and very polarised.
The first part of this projet will be to assemble a vast heterogeneous corpus comprising writings of very different genres (scientific articles, policy briefs, reports, tribunes) as well as oral documents (courses, conferences, speeches) and metadata alone. This corpus will be built up by harvesting targeted sources: databases of scientific articles, crawling of research networks and infrastructures, archives of administrative institutions, course and conference repositories. Note: for the first instantiation of the project, a smaller and less heterogeneous corpus can be assembled.
From there, the project can take two different but complementary directions:
- Use NLP to perform an automated systematic literature review in order to trace the emergence of the term, and other relevant semantic fields, their circulation and their crystallization.
- Use network analysis in order to create a multi-layered network of documents, authors, concepts, institutions and disciplines and produce a diachronic map of the emergence and circulation of ecological discourse on research.
Project type: Master (research project, optional research project, or master’s project).
Possibility to work in group? Yes.
Contact: Simon Dumas Primbault (simon.dumasprimbault@epfl.ch)