
The CoSyn tool leverages synthetic data to help open-source AI models excel at understanding complex, text-rich images such as medical diagrams.
Key Details
- 1Penn Engineering and the Allen Institute for AI developed CoSyn to generate scientific charts and diagrams as training data for open-source vision-language models.
- 2CoSyn-400K includes over 400,000 synthetic images and 2.7 million sets of instructions, spanning scientific charts, chemical structures, and more.
- 3CoSyn-trained models outperformed proprietary systems, including GPT-4V and Gemini 1.5 Flash, on seven benchmarks.
- 4A small synthetic dataset (7,000 images) allowed their model to beat others trained on millions of real images for the NutritionQA benchmark.
- 5The approach eliminates copyright risks and supports wide, open-source access.
Why It Matters

Source
EurekAlert
Related News

Review Finds AI Model Architecture Key for Brain Tumor MRI Segmentation
A systematic review highlights that AI model architecture, more than data or imaging specifics, drives improved tumor segmentation in brain MRI scans.

Mount Sinai to Showcase AI Innovations in Lung and Sleep Imaging at ATS 2026
Mount Sinai experts will present new research on AI and imaging for lung nodules, sleep apnea, and cardiovascular risk at the ATS 2026 Conference.

AI Doctor Avatars Boost Cancer Patient Understanding Before Radiology Consults
Meeting an AI avatar doctor before consultations helps cancer patients understand treatment better and reduces stress.