ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)


Track 1 - Train-A data set and Train-B data set

Train-A data set: Dataset of pages with manually revised baselines and the corresponding transcripts associated to them. This batch is small, 50 pages.

Train-B data set: Dataset of pages without any layout or text line information. The corresponding transcripts are provided at page level with line breaks. It has 10k pages, though for convenience it is divided into two 5k page batches.

In order to test the PAGE format required for the two tracks, participants can submit a tar file containing the PAGE files with the transcripts included in each line for the last 5 pages of the Train-A set (000046.xml-000050.xml).

Also, a baseline system is available in the link below. This baseline system is trained using only the first 40 pages of Train-A set. The next 5 pages are used used as validation and the las 5 as test. .

  • Download training data Train-A (you have to login and follow the competition first)
  • Download training data Train-B (you have to login and follow the competition first)
  • Baseline System (you have to login and follow the competition first)
  • Submit a new method for evaluation (you have to login and follow the competition first)
  • View results for all available methods

News

30/6/2017
Dear participants,
the Test data is now available for both traditional track and advanced track.

28/4/2017
Dear participants,
Remember to include your mail in the followers of this competition if you want to be continously informed with news.

28/4/2017
Dear participants,
There is a remark regarding the data provided for this competition:
In this edition, the quality of the images (and the resolution) for some batches (is not as good as previous editions. For the preparation of this competition, we received the images that you have available and the Ground-Truth (GT) was prepared for this images taking profit of existing GT material (transcripts).
This issue may happen both with the training data and the test data. For the test data, we inform you that the images are collected from different collections and therefore the image resulution may be not the same for all test images.
Regarding the resolution of the images, low resolution images are very frequent in archives (thousands of images, according to archives involved in READ). This is because many collections were scanned some time ago and currently some of these collections are not being scanned again (document not currently available, low budgets, different priorities, ...). So, this is a real problem that many collections residing in archive needs to be addressed.
Sorry for not providing you this information in advance.

3/4/2017 The training data is now available

7/2/2017 ICDAR2017 Competition on Handwritten Text Recognition on announcement

Important Dates

3 April 2017: competition opens

3 April 2017: training data available

15 June 2017: registration deadline

30 June 2017: test data available

14 July 2017: deadline for submitting results on the test data





Organizers







Verónica Romero

[Universitat Politècnica de València] 

Member of the Pattern Recognition and Human Language Technology Research Center of the Universitat Politècnica de València.

Enrique Vidal

[Universitat Politècnica de València] 

PhD in Physics from the Universitat de València (Spain), 1985. Full professor of Computer Science in the Universitat Politècnica de València. Member of the IEEE and a fellow of the IAPR.

Joan Andreu Sanchez

[Universitat Politècnica de València] 

Joan Andreu Sanchez is professor at Universitat Politècnica de Valencia and researcher in the Pattern Recognition and Human Language Technologies (PRHLT) research center.