HIPE: Identifying Historical People, Places and other Entities

How good systems are at recognizing and disambiguating named entities in multilingual historical documents? Which approach is best?

HIPE is a series of evaluation campaigns on named entity processing on historical documents in multiple languages. The overall objective is to assess and advance the development of robust, adaptable and transferable NE processing systems in order to support information extraction and text understanding of cultural heritage data.

Contact persons: Maud Ehrmann (EPFL), Matteo Romanello (UNIL), Simon Clematide (UZH).

CLEF HIPE 2022

In 2022, following the success of the first CLEF-HIPE-2020 evaluation lab (see below), we organize an extended evaluation campaign on NE processing on historical documents of various types and languages, HIPE-2022 hosted by CLEF.

Compared to the first edition, HIPE-2022 introduces several novelties, with:

  • the addition of a new type of document alongside historical newspapers, namely classical commentaries;
  • the consideration of a broader language spectrum, with 5 languages for historical newspapers (3 for the previous edition), and 3 for classical commentaries;
  • the confrontation with the issue of the heterogeneity of annotation tag sets and guidelines.

Data, code, participation guidelines:

Related publications:

Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

M. Ehrmann; M. Romanello; S. Najem-Meyer; A. Doucet; S. Clematide 

2022. 13th Conference and Labs of the Evaluation Forum (CLEF 2022), Bologna, Italy, 5-8 Sept 2022. DOI : 10.5281/zenodo.6979577.

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

M. Ehrmann; M. Romanello; A. Doucet; S. Clematide 

2022-04-05. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022. p. 347-354. DOI : 10.1007/978-3-030-99739-7_44.

HIPE-2022 Shared Task Named Entity Datasets

M. Ehrmann; M. Romanello; A. Doucet; S. Clematide 

2022.

Named Entity Recognition and Classification in Historical Documents: A Survey

M. Ehrmann; A. Hamdi; E. Linhares Pontes; M. Romanello; A. Doucet 

ACM Computing Survey. 2021-09-21. Vol. 56, num. 2, p. 27.

CLEF HIPE 2020

In 2020, we organized the first HIPE-2020 evaluation campaign on named entity processing on historical newspapers in French, German and English. It was organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab.

As the first of its kind on such material,  the shared task brought together 13 enthusiastic teams who submitted a total of 75 runs for 5 different task bundles on named entity recognition and entity linking. The main conclusion of this edition was that neural-based approaches can achieve good performances on historical NERC when provided with enough training data, but that progress is still needed to further improve performances, adequately handle OCR noise and small-data settings, and better address entity linking.

Data, code, participation guidelines:

HIPE-2020 Workshop at CLEF-2020

Results of participating teams appear in the CEUR working notes and were presented at the CLEF conference in Bologna in Sept 2020 (online).

See the recorded presentations:

Related publications:

Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers

M. Ehrmann; M. Romanello; A. Flückiger; S. Clematide 

2020-10-21. 11th Conference and Labs of the Evaluation Forum (CLEF 2020), [Online event], 22-25 September, 2020. DOI : 10.5281/zenodo.4117566.

Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers

M. Ehrmann; M. Romanello; A. Flückiger; S. Clematide 

2020-09-15. 11th International Conference of the CLEF Association – CLEF 2020, Thessaloniki, Greece, September 22–25, 2020. p. 288–310. DOI : 10.1007/978-3-030-58219-7_21.

CLEF-HIPE-2020 – Shared Task Participation Guidelines

M. Ehrmann; M. Romanello; S. Clematide; A. Flückiger 

2020

publication thumbnail

CLEF-HIPE-2020 Shared Task Named Entity Datasets

M. Ehrmann; M. Romanello; S. Clematide; A. Flückiger 

2020.

Impresso Named Entity Annotation Guidelines (CLEF-HIPE-2020)

M. Ehrmann; C. Watter; M. Romanello; C. Simon; A. Flückiger 

2020

//zenodo.org/deposit/3706857.

//zenodo.org/deposit/3706857.

Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers

M. Ehrmann; M. Romanello; S. Bircher; S. Clematide 

2020-04-08. ECIR 2020 : 42nd European Conference on Information Retrieval, Lisbon, Portugal, April 14-17, 2020. p. 524-532. DOI : 10.1007/978-3-030-45442-5_68.

Diachronic Evaluation of NER Systems on Old Newspapers

M. Ehrmann; G. Colavizza; Y. Rochat; F. Kaplan 

2016. 13th Conference on Natural Language Processing (KONVENS 2016)Conference on Natural Language Processing, Bochum, GermanyBochum, Germany, September 19-21, 2016September 19–21, 2016. p. 97-107.