How good systems are at recognizing and disambiguating named entities in multilingual historical documents? Which approach is best?
HIPE is a series of evaluation campaigns on named entity processing on historical documents in multiple languages. The overall objective is to assess and advance the development of robust, adaptable and transferable NE processing systems in order to support information extraction and text understanding of cultural heritage data.
Contact persons: Maud Ehrmann (EPFL), Matteo Romanello (UNIL), Simon Clematide (UZH).
CLEF HIPE 2022
In 2022, following the success of the first CLEF-HIPE-2020 evaluation lab (see below), we organize an extended evaluation campaign on NE processing on historical documents of various types and languages, HIPE-2022 hosted by CLEF.
Compared to the first edition, HIPE-2022 introduces several novelties, with:
- the addition of a new type of document alongside historical newspapers, namely classical commentaries;
- the consideration of a broader language spectrum, with 5 languages for historical newspapers (3 for the previous edition), and 3 for classical commentaries;
- the confrontation with the issue of the heterogeneity of annotation tag sets and guidelines.
Data, code, participation guidelines:
- HIPE-2022 data is based on six primary datasets that come from several European cultural heritage projects and previous research projects of the HIPE organisers. See HIPE-2022-data repository.
- HIPE-2022 participation guidelines.
- HIPE-scorer
Related publications:
Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
2022. 13th Conference and Labs of the Evaluation Forum (CLEF 2022), Bologna, Italy, 5-8 Sept 2022. DOI : 10.5281/zenodo.6979577.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
2022-04-05. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022. p. 347-354. DOI : 10.1007/978-3-030-99739-7_44.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
HIPE-2022 Shared Task Named Entity Datasets
2022.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Named Entity Recognition and Classification in Historical Documents: A Survey
ACM Computing Survey. 2021-09-21. Vol. 56, num. 2, p. 27.CLEF HIPE 2020
In 2020, we organized the first HIPE-2020 evaluation campaign on named entity processing on historical newspapers in French, German and English. It was organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab.
As the first of its kind on such material, the shared task brought together 13 enthusiastic teams who submitted a total of 75 runs for 5 different task bundles on named entity recognition and entity linking. The main conclusion of this edition was that neural-based approaches can achieve good performances on historical NERC when provided with enough training data, but that progress is still needed to further improve performances, adequately handle OCR noise and small-data settings, and better address entity linking.
Data, code, participation guidelines:
- HIPE-2020 data github repository and Zenodo record.
- HIPE-2020 Participation Guidelines.
- CLEF-HIPE-2020-eval toolkit.
HIPE-2020 Workshop at CLEF-2020
Results of participating teams appear in the CEUR working notes and were presented at the CLEF conference in Bologna in Sept 2020 (online).
See the recorded presentations:
Related publications:
Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers
2020-10-21. 11th Conference and Labs of the Evaluation Forum (CLEF 2020), [Online event], 22-25 September, 2020. DOI : 10.5281/zenodo.4117566.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers
2020-09-15. 11th International Conference of the CLEF Association – CLEF 2020, Thessaloniki, Greece, September 22–25, 2020. p. 288–310. DOI : 10.1007/978-3-030-58219-7_21.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
CLEF-HIPE-2020 – Shared Task Participation Guidelines
2020
Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Impresso Named Entity Annotation Guidelines (CLEF-HIPE-2020)
2020
//zenodo.org/deposit/3706857.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers
2020-04-08. ECIR 2020 : 42nd European Conference on Information Retrieval, Lisbon, Portugal, April 14-17, 2020. p. 524-532. DOI : 10.1007/978-3-030-45442-5_68.Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.