Project Details :
Sentence Annotation Tool
Laboratory : LSIR | Semester | Proposal |
KEYWORDS
Natural Language Processing; Software Engineering; Python
CONTEXT
It seems as though every day there are new and exciting problems that people have taught computers to solve. But there are still many tasks that computers cannot perform, particularly in the realm of understanding human language. Statistical and machine learning methods have proven to be an effective way to approach these problems, but these techniques often work better when the algorithms are provided with pointers to what is relevant about a dataset. When discussing about natural language, these pointers often come in the form of annotations—metadata that provides additional information about the text.
Theoretical and computational linguistics are focused on unraveling the deeper nature of language and capturing the computational properties of linguistic structures. Text annotation can take various forms depending on the problem needed to be solved. Most common problems where text annotation is essential are: entity extraction, text structure analysis, syntactic parsing, argument extraction etc. Highly dependent on the nature of the problem is the level of detail in the annotated corpus, meaning the level of abstraction of the annotation itself—word-level, sentence-level, paragraph-level etc.
GOAL
The goal of this project is the implementation of a text annotation tool and more specifically a sentence-level annotator. This tool must be integrated with the existing SciLens web application (http://scilens.epfl.ch/demo) serving as an additional functionality. In this feature, the user will be able to add sentence level annotations on a given corpus/article. The system should give the user the ability to select a complete sentence upon the displayed article, while providing a set of predefined tags for each new annotation. In the backend, the application should store properly the annotations in an efficient and reusable way. Lastly, the application should provide the user their already annotated articles, presenting it in a usable way and providing the ability for modifications.
This is a purely engineering project in the context of natural language processing and machine learning. The candidate will contribute to a machine learning web application (SciLens), getting familiarized with text parsing and storing technologies. The final programming deliverable should be a productionable and integrated implementation of the aforementioned sentence-level annotation tool.
WORK PLAN
1. Study existing implementations on word-level annotation in a web interface (investigate potential open source projects that solve this problem: hypothesis.org, prodigy etc.)
2. Implement word-level annotation integrated in the existing SciLens web application.
3. Sketch multiple use-cases (e.g., for entity annotation, dubious claim detection, etc.).
4. Run a crowdsourcing experiment with some of these use-cases.
Contact: |
|