ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)


Track 3 - Test-B: Advanced Track

This track corresponds to a more realistic but more challenging scenario.

The test pages contain information only about the geometry of regions where to detect text line and recognize. This set consists of two parts:

  • Test-B1 The same batch of images as in Test-A but with PAGE files that do not contain baseline information.

  • Test-B2 A completely new batch of images.

The participants have to submit a tar file containing PAGE files for all Test-B including information about the detected lines and the corresponding recognition. All Test-B should be processed using exactly the same pipeline. Test-B1 is used for control, thus, it is not allowed to use the baselines from Test-A or even the number of lines in each region to aid the detection. The winner will be decided just on results on batch Test-B2.

Evaluation will be performed with BLEU at region level concatenating the lines provided by the participants. The reading order of the lines affects the performance, thus the participants must take care to include the coordinates of the lines in the PAGE files. The reading order will be defined using these coordinates (left to right and top to bottom). If several lines have the same coordinates, the reading order will be defined by the order of the XML TextLine elements in the PAGE files.

News

30/6/2017
Dear participants,
the Test data is now available for both traditional track and advanced track.

28/4/2017
Dear participants,
Remember to include your mail in the followers of this competition if you want to be continously informed with news.

28/4/2017
Dear participants,
There is a remark regarding the data provided for this competition:
In this edition, the quality of the images (and the resolution) for some batches (is not as good as previous editions. For the preparation of this competition, we received the images that you have available and the Ground-Truth (GT) was prepared for this images taking profit of existing GT material (transcripts).
This issue may happen both with the training data and the test data. For the test data, we inform you that the images are collected from different collections and therefore the image resulution may be not the same for all test images.
Regarding the resolution of the images, low resolution images are very frequent in archives (thousands of images, according to archives involved in READ). This is because many collections were scanned some time ago and currently some of these collections are not being scanned again (document not currently available, low budgets, different priorities, ...). So, this is a real problem that many collections residing in archive needs to be addressed.
Sorry for not providing you this information in advance.

3/4/2017 The training data is now available

7/2/2017 ICDAR2017 Competition on Handwritten Text Recognition on announcement

Important Dates

3 April 2017: competition opens

3 April 2017: training data available

15 June 2017: registration deadline

30 June 2017: test data available

14 July 2017: deadline for submitting results on the test data





Organizers







Verónica Romero

[Universitat Politècnica de València] 

Member of the Pattern Recognition and Human Language Technology Research Center of the Universitat Politècnica de València.

Enrique Vidal

[Universitat Politècnica de València] 

PhD in Physics from the Universitat de València (Spain), 1985. Full professor of Computer Science in the Universitat Politècnica de València. Member of the IEEE and a fellow of the IAPR.

Joan Andreu Sanchez

[Universitat Politècnica de València] 

Joan Andreu Sanchez is professor at Universitat Politècnica de Valencia and researcher in the Pattern Recognition and Human Language Technologies (PRHLT) research center.