Member of the Pattern Recognition and Human Language Technology Research Center of the Universitat Politècnica de València.
ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)
The test pages contain information only about the geometry of regions where to detect text line and recognize. This set consists of two parts:
- Test-B1 The same batch of images as in Test-A but with PAGE files that do not contain baseline information.
- Test-B2 A completely new batch of images.
The participants have to submit a tar file containing PAGE files for all Test-B including information about the detected lines and the corresponding recognition. All Test-B should be processed using exactly the same pipeline. Test-B1 is used for control, thus, it is not allowed to use the baselines from Test-A or even the number of lines in each region to aid the detection. The winner will be decided just on results on batch Test-B2.
Evaluation will be performed with BLEU at region level concatenating the lines provided by the participants. The reading order of the lines affects the performance, thus the participants must take care to include the coordinates of the lines in the PAGE files. The reading order will be defined using these coordinates (left to right and top to bottom). If several lines have the same coordinates, the reading order will be defined by the order of the XML TextLine elements in the PAGE files.
the Test data is now available for both traditional track and advanced track.
Remember to include your mail in the followers of this competition if you want to be continously informed with news.
There is a remark regarding the data provided for this competition:
In this edition, the quality of the images (and the resolution) for some batches (is not as good as previous editions. For the preparation of this competition, we received the images that you have available and the Ground-Truth (GT) was prepared for this images taking profit of existing GT material (transcripts).
This issue may happen both with the training data and the test data. For the test data, we inform you that the images are collected from different collections and therefore the image resulution may be not the same for all test images.
Regarding the resolution of the images, low resolution images are very frequent in archives (thousands of images, according to archives involved in READ). This is because many collections were scanned some time ago and currently some of these collections are not being scanned again (document not currently available, low budgets, different priorities, ...). So, this is a real problem that many collections residing in archive needs to be addressed.
Sorry for not providing you this information in advance.
3/4/2017 The training data is now available
7/2/2017 ICDAR2017 Competition on Handwritten Text Recognition on announcement
3 April 2017: competition opens
3 April 2017: training data available
15 June 2017: registration deadline
30 June 2017: test data available
14 July 2017: deadline for submitting results on the test data