Member of the Pattern Recognition and Human Language Technology Research Center of the Universitat Politècnica de València.
ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)
Train-B data set: Dataset of pages without any layout or text line information. The corresponding transcripts are provided at page level with line breaks. It has 10k pages, though for convenience it is divided into two 5k page batches.
In order to test the PAGE format required for the two tracks, participants can submit a tar file containing the PAGE files with the transcripts included in each line for the last 5 pages of the Train-A set (000046.xml-000050.xml).
Also, a baseline system is available in the link below. This baseline system is trained using only the first 40 pages of Train-A set. The next 5 pages are used used as validation and the las 5 as test. .
- Download training data Train-A (you have to login and follow the competition first)
- Download training data Train-B (you have to login and follow the competition first)
- Baseline System (you have to login and follow the competition first)
- Submit a new method for evaluation (you have to login and follow the competition first)
- View results for all available methods
the Test data is now available for both traditional track and advanced track.
Remember to include your mail in the followers of this competition if you want to be continously informed with news.
There is a remark regarding the data provided for this competition:
In this edition, the quality of the images (and the resolution) for some batches (is not as good as previous editions. For the preparation of this competition, we received the images that you have available and the Ground-Truth (GT) was prepared for this images taking profit of existing GT material (transcripts).
This issue may happen both with the training data and the test data. For the test data, we inform you that the images are collected from different collections and therefore the image resulution may be not the same for all test images.
Regarding the resolution of the images, low resolution images are very frequent in archives (thousands of images, according to archives involved in READ). This is because many collections were scanned some time ago and currently some of these collections are not being scanned again (document not currently available, low budgets, different priorities, ...). So, this is a real problem that many collections residing in archive needs to be addressed.
Sorry for not providing you this information in advance.
3/4/2017 The training data is now available
7/2/2017 ICDAR2017 Competition on Handwritten Text Recognition on announcement
3 April 2017: competition opens
3 April 2017: training data available
15 June 2017: registration deadline
30 June 2017: test data available
14 July 2017: deadline for submitting results on the test data