Markus Diem is a senior scientist at the Computer Vision Lab, TU Wien, Austria. His research interests are Cultural Heritage Applications and Document Analysis.
ICDAR 2017 Competition on Baseline Detection (cBAD)
The database of Track A [Simple Documents] consists of 755 images extracted from 9 different archival collections. The dataset comprises images with additional PAGE XMLs . The PAGE XML contains text regions, e.g. paragraphs. Thus a layout analysis or text detection needs not to be performed on this dataset. Only handwritten text is present and the dataset contains no tables. The groundtruth of the test-set will be released after evaluating all submitted methods and the final results being made public.
Track B [Complex Documents] contains mixed documents. Though most documents are handwritten, printed documents, book covers, empty pages, and tables are contained in this track. While Track A has locally skewed text-lines, text-lines in Track B are rotated up to 180°. PAGE XML documentation with reference implementations
 Competition description
A new evaluation scheme is introduced that measures errors using baselines. A detailed description of the evaluation scheme is available here. The evaluation tool which will be used for the competition is available as standalone jar .
Submission ProtocolThe submitted result file has to be a compressed tar file (.tar.gz), containing separate result files (imagename.txt or imagename.xml) for each image in the test set. The folder structure must not be changed! If the test-set contains 3 different images (in 2 subfolders):
Subtrack 1.1 - Track A [Simple Documents]
14th June 2017
Submission deadline extended
24th May 2017
Training sets updated
There is a second competition on baseline detection
01st Mar. 2017
Datasets are online
17st Jan. 2017
Competition site is online
18th June 2017
07th July 2017
10th-15th Nov. 2017