Text Mining and Linguistic Computing ‒ DHLAB ‐ EPFL

How can we make large corpus of text searchable?

How can we find names of people and places in texts?

How can we detect and classify events?

How can we retrieve citations in academic papers?

How can we detect authorship?

How can we detect language changes?

What are the linguistic effects of machine translation?

Publications

Warning

Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.

Navigating through 200 years of historical newspapers

Y. Rochat; M. Ehrmann; V. Buntinx; C. Bornet; F. Kaplan

This paper aims to describe and explain the processes behind the creation of a digital library composed of two Swiss newspapers, namely Gazette de Lausanne (1798-1998) and Journal de Genève (1826-1998), covering an almost two-century period. We developed a general purpose application giving access to this cultural heritage asset; a large variety of users (e.g. historians, journalists, linguists and the general public) can search through the content of around 4 million articles via an innovative interface. Moreover, users are offered different strategies to navigate through the collection: lexical and temporal lookup, n-gram viewer and named entities.

2016

International Conference on Digital Preservation (IPRES), Bern, Switzerland, October 3-6, 2016.

Detailed record

Full text

Il pleut des chats et des chiens: Google et l’impérialisme linguistique

F. Kaplan; D. Kianfar

Au début du mois de décembre dernier, quiconque demandait à Google Traduction l’équivalent italien de l’expression « Cette fille est jolie » obtenait une proposition étrange : Questa ragazza è abbastanza, littéralement « Cette fille est assez ». La beauté s’était lost in translation — perdue en cours de traduction. Comment un des traducteurs automatiques les plus performants du monde, fort d’un capital linguistique unique constitué de milliards de phrases, peut-il commettre une erreur aussi grossière ? La réponse est simple : il passe par l’anglais. « Jolie » peut se traduire par pretty, qui signifie à la fois « joli » et « assez ». Le second sens correspond à l’italien abbastanza.

2015

Detailed record

La question de la langue à l’époque de Google

F. Kaplan

En 2012, Google a réalisé un chiffre d’affaires de 50 milliards de dollars un résultat financier impressionnant pour une entreprise créée il y a seulement une quinzaine d’années. 50 milliards de dollars représentent 140 millions de dollars par jour, 5 millions de dollars par heure. Si vous lisez ce chapitre en une dizaine de minutes, Google aura, entre temps, réalisé presque un million de dollars de revenu. Que vend Google pour réaliser des performances financières si impressionnantes ? Google vend des mots, des millions de mots.

Digital Studies Organologie des savoirs et technologies de la connaissance; Limoge: Fyp, 2014. p. 143-156.

ISBN : 978-2-364051089

Detailed record

Linguistic Capitalism and Algorithmic Mediation

F. Kaplan

Google’s highly successful business model is based on selling words that appear in search queries. Organizing several million auctions per minute, the company has created the first global linguistic market and demonstrated that linguistic capitalism is a lucrative business domain, one in which billions of dollars can be realized per year. Google’s services need to be interpreted from this perspective. This article argues that linguistic capitalism implies not an economy of attention but an economy of expression. As several million users worldwide daily express themselves through one of Google’s interfaces, the texts they produce are systematically mediated by algorithms. In this new context, natural languages could progressively evolve to seamlessly integrate the linguistic biases of algorithms and the economical constraints of the global linguistic economy.

Representations

2014

Vol. 127 , num. 1, p. 57-63.

Detailed record

Full text