Automated Text Analysis of Legal/Technical Documents

Countless organisations are currently confronted with the question of how they could make better use of their document resources with the help of automated text analysis. The rapid progress made in this field, particularly by the tech giants and through the use of artificial intelligence, raises high expectations in this respect. However, the conditions are often very different. While, for example, large Internet platforms have very large data sets and extensive user feedback that can be used to train text analysis algorithms, in many concrete industrial and administrative applications the data sets are limited, and user feedback, if at all, can only be obtained from experts.
The goal of this project is to explore to what extent global technical progress in document analysis can be profitably applied in specialised organisational fields of application, and ultimately to make the available information more usable for end-users and to relieve them of routine tasks. The student would have to test whether standard text analysis methods (e.g. entity recognition, information extraction, classification, topic detection) can be used to directly improve the current use of the documents.

Deliverables: codebase with documentation

PREREQUISITES

  • Familiar with Python
  • Creativity, spirit, initiative and pro-active
  • Knowledge of Linux and related tools

PREFERRED, BUT NOT REQUIRED

  • Experience in Machine Learning
  • Experience in Natural Language Processing

Send me your CV: [email protected].