Task I - Text Localization

The aim of Task I is to localize text lines in test images. Participants can train their detectors on the provided training set. Extra data is allowed, but must be reported during submission. Fine-tuning models that are pretrained on ImageNet is also allowed.

Submission

For each image in the test set, a UTF-8 text file should be provided in the following naming convention:

task1_<image_name>.txt

Each line of the text file is a detected text line, made up by the coordinates of a quadrilateral and a confidence score. Vertices are in the clockwise order.

<x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<score> <x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<score> ...

Participants need to submit a single zip file that contains all the result files.

Evaluation Protocol

The evaluation protocol follows PASCAL VOC, which adopts mean Average Precision (mAP) as the primary metric. The original mAP is defined on axis-aligned boxes, while text in our dataset may be multi-oriented. Therefore, we will calculate intersection-over-union (IoU) over polygons rather than axis-aligned rectangles.

In our evaluation protocol, a detected polygon is marked as true positive if 1) IoU with a ground truth polygon is over 0.5 and 2) the ground truth polygon has not been matched to another detection yet. The rest of our protocol follows the PASCAL VOC protocol. We will calculate both Average Precision and Average Recall, and take AP as the primary metric. Note that this metric is the same as mAP, since we only have one category.

Task II - End-to-End Recognition

The aim of Taks II is to localize and recognize text lines in test images. Participants can train their detectors on the provided training set. Extra data is allowed, but must be reported during submission. Fine-tuning models that are pretrained on ImageNet is also allowed.

Submission

For each image in the test set, a UTF-8 text file should be provided in the following naming convention:

task2_<image_name>.txt

Each line of the text file contains the position and recognized text of a text line. File format:

<x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<recognized_text> <x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<recognized_text> ...

Participants need to submit a single zip file that contains all the result files.

Evaluation Protocol

First, every detected quadrilateral is matched to a groundtruth quadrilateral that has the maximum IoU, or null if none has IoU over 0.5. If multiple detections are matched with the same ground truth, only the one with the maximum IoU will be kept and the rest are matched to null.

Then, the edit distances between all matching pairs are calculated. If a detection is matched to null, then an empty string will be taken as the groundtruth text. The edit distances are summarized and divided by the number of the test images. The resulting average edit distance is taken as the primary metric.

Important Dates

January 20 - April 30, 2017: Registration open
April 15, 2017: Test dataset available
April 15, 2017: Submission open
April 30, 2017: Submission deadline

RCTW-17

Task I - Text Localization

Submission

Evaluation Protocol

Task II - End-to-End Recognition

Submission

Evaluation Protocol

Important Dates