ICDAR2017 Competition on Reading Chinese Text in the Wild
Our competition is based on a dataset of more than 12,000 images. Most of the images are collected in the wild by phone cameras. Some are screenshots. The images exhibit various kinds of scenes, including street views, posters, menus, indoor scenes, and screenshots of phone apps.
Every image in the dataset is annotated with text line positions and text transcripts. Positions are represented by quadrilaterals. Vertices are iterated in the clockwise direction, starting from the top-left vertex.
Each dataset image is associated with a groundtruth text file with the same file name, i.e.
In the groundtruth file, each line in the groundtruth file represents the annotation for one text instance (a word or sentence). File format is:
<x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<difficult>,"<transcript>"; <x1>,<y1>,<x2>,<y2>,<x3>,<y3>,<x4>,<y4>,<difficult>,"<transcript>"; ...