ICDAR2017 Competition on Reading Chinese Text in the Wild
The aim of Task I is to localize text lines in test images. Participants can train their detectors on the provided training set. Extra data is allowed, but must be reported during submission. Fine-tuning models that are pretrained on ImageNet is also allowed.
For each image in the test set, a UTF-8 text file should be provided in the following naming convention:
task1_<image_name>.txt
Each line of the text file is a detected text line, made up by the coordinates of a quadrilateral and a confidence score. Vertices are in the clockwise order.
<x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<score> <x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<score> ...
Participants need to submit a single zip file that contains all the result files.
The evaluation protocol follows PASCAL VOC, which adopts mean Average Precision (mAP) as the primary metric. The original mAP is defined on axis-aligned boxes, while text in our dataset may be multi-oriented. Therefore, we will calculate intersection-over-union (IoU) over polygons rather than axis-aligned rectangles.
In our evaluation protocol, a detected polygon is marked as true positive if 1) IoU with a ground truth polygon is over 0.5 and 2) the ground truth polygon has not been matched to another detection yet. The rest of our protocol follows the PASCAL VOC protocol. We will calculate both Average Precision and Average Recall, and take AP as the primary metric. Note that this metric is the same as mAP, since we only have one category.
The aim of Taks II is to localize and recognize text lines in test images. Participants can train their detectors on the provided training set. Extra data is allowed, but must be reported during submission. Fine-tuning models that are pretrained on ImageNet is also allowed.
For each image in the test set, a UTF-8 text file should be provided in the following naming convention:
task2_<image_name>.txt
Each line of the text file contains the position and recognized text of a text line. File format:
<x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<recognized_text> <x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<recognized_text> ...
Participants need to submit a single zip file that contains all the result files.
First, every detected quadrilateral is matched to a groundtruth
quadrilateral that has the maximum IoU, or null
if none
has IoU over 0.5. If multiple detections are matched with the same
ground truth, only the one with the maximum IoU will be kept and the
rest are matched to null
.
Then, the edit distances between all matching pairs are calculated.
If a detection is matched to null
, then an empty string
will be taken as the groundtruth text. The edit distances are
summarized and divided by the number of the test images. The
resulting average edit distance is taken as the primary metric.