
The CoSyn tool leverages synthetic data to help open-source AI models excel at understanding complex, text-rich images such as medical diagrams.
Key Details
- 1Penn Engineering and the Allen Institute for AI developed CoSyn to generate scientific charts and diagrams as training data for open-source vision-language models.
- 2CoSyn-400K includes over 400,000 synthetic images and 2.7 million sets of instructions, spanning scientific charts, chemical structures, and more.
- 3CoSyn-trained models outperformed proprietary systems, including GPT-4V and Gemini 1.5 Flash, on seven benchmarks.
- 4A small synthetic dataset (7,000 images) allowed their model to beat others trained on millions of real images for the NutritionQA benchmark.
- 5The approach eliminates copyright risks and supports wide, open-source access.
Why It Matters

Source
EurekAlert
Related News

MD Anderson Unveils New AI Genomics Insights and Therapeutic Advances
MD Anderson reports breakthroughs in cancer therapeutics and provides critical insights into AI models for genomic analysis.

UCLA Researchers Present AI, Blood Biomarker Advances at SABCS 2025
UCLA Health researchers unveil major advances in breast cancer AI pathology, liquid biopsy, and biomarker strategies at the 2025 SABCS.

SH17 Dataset Boosts AI Detection of PPE for Worker Safety
University of Windsor researchers released SH17, a 8,099-image open dataset for AI-driven detection of personal protective equipment (PPE) in manufacturing settings.