
The CoSyn tool leverages synthetic data to help open-source AI models excel at understanding complex, text-rich images such as medical diagrams.
Key Details
- 1Penn Engineering and the Allen Institute for AI developed CoSyn to generate scientific charts and diagrams as training data for open-source vision-language models.
- 2CoSyn-400K includes over 400,000 synthetic images and 2.7 million sets of instructions, spanning scientific charts, chemical structures, and more.
- 3CoSyn-trained models outperformed proprietary systems, including GPT-4V and Gemini 1.5 Flash, on seven benchmarks.
- 4A small synthetic dataset (7,000 images) allowed their model to beat others trained on millions of real images for the NutritionQA benchmark.
- 5The approach eliminates copyright risks and supports wide, open-source access.
Why It Matters
Synthetic data generation like CoSyn can democratize advanced image understanding for medical and radiology AI, improving model accuracy while addressing data scarcity and copyright barriers. This supports innovation in clinical decision support and scientific research for radiology professionals.

Source
EurekAlert
Related News

•EurekAlert
AI Model Accurately Predicts Blood Loss Risk in Liposuction
A machine learning model predicts blood loss during high-volume liposuction with 94% accuracy.

•EurekAlert
AI-Driven CT Tool Predicts Cancer Spread in Oropharyngeal Tumors
Researchers have created an AI tool that uses CT imaging to predict the spread risk of oropharyngeal cancer, offering improved treatment stratification.

•EurekAlert
AI Model PRTS Predicts Spatial Transcriptomics From H&E Histology Images
Researchers developed PRTS, a deep learning model that infers single-cell spatial transcriptomics from standard H&E-stained tissue images.