Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 2.46 KB

README.md

File metadata and controls

29 lines (21 loc) · 2.46 KB

COCO-CN

COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.

Chinese sentences COCO-CN train COCO-CN val COCO-CN test
human written
human translation
machine translation (baidu)

coco-cn annotation examples

Progress

  • version 201805: 20,341 images (training / validation / test: 18,341 / 1,000 / 1,000), associated with 22,218 manually written Chinese sentences and 5,000 manually translated sentences. Data is freely available at HuggingFace.
  • Precomputed image features: ResNext-101
  • COCO-CN-Results-Viewer: A lightweight tool to inspect the results of different image captioning systems on the COCO-CN test set, developed by Emiel van Miltenburg at the Tilburg University.
  • NUS-WIDE100: An extra test set.

Citation

If you find COCO-CN useful, please consider citing the following paper: