
The CoSyn tool leverages synthetic data to help open-source AI models excel at understanding complex, text-rich images such as medical diagrams.
Key Details
- 1Penn Engineering and the Allen Institute for AI developed CoSyn to generate scientific charts and diagrams as training data for open-source vision-language models.
- 2CoSyn-400K includes over 400,000 synthetic images and 2.7 million sets of instructions, spanning scientific charts, chemical structures, and more.
- 3CoSyn-trained models outperformed proprietary systems, including GPT-4V and Gemini 1.5 Flash, on seven benchmarks.
- 4A small synthetic dataset (7,000 images) allowed their model to beat others trained on millions of real images for the NutritionQA benchmark.
- 5The approach eliminates copyright risks and supports wide, open-source access.
Why It Matters

Source
EurekAlert
Related News

AI Method Automates X-ray Absorption Spectroscopy for Material Analysis
Researchers have developed an AI-based approach to automate and enhance the analysis of X-ray absorption spectroscopy (XAS) data for materials science.

Zero-Cost EHR-Based AI Tool Scales Dementia Detection in Primary Care
A zero-cost, EHR-integrated AI tool significantly boosts dementia detection in primary care clinics without adding clinician workload.

BraDiPho: New 3D AI Atlas Integrates Brain Dissections with MRI
Researchers have developed BraDiPho, a tool that merges ex-vivo photogrammetric brain dissection data with in-vivo MRI tractography using AI.