
The CoSyn tool leverages synthetic data to help open-source AI models excel at understanding complex, text-rich images such as medical diagrams.
Key Details
- 1Penn Engineering and the Allen Institute for AI developed CoSyn to generate scientific charts and diagrams as training data for open-source vision-language models.
- 2CoSyn-400K includes over 400,000 synthetic images and 2.7 million sets of instructions, spanning scientific charts, chemical structures, and more.
- 3CoSyn-trained models outperformed proprietary systems, including GPT-4V and Gemini 1.5 Flash, on seven benchmarks.
- 4A small synthetic dataset (7,000 images) allowed their model to beat others trained on millions of real images for the NutritionQA benchmark.
- 5The approach eliminates copyright risks and supports wide, open-source access.
Why It Matters

Source
EurekAlert
Related News

Major Study Reveals Barriers to Implementing AI Chest Diagnostics in NHS Hospitals
A UCL-led study identifies significant challenges in deploying AI tools for chest diagnostics across NHS hospitals in England.

AI Model Enhances Prediction of Infection Risks from Oral Mucositis in Stem Cell Transplant Patients
Researchers developed an explainable AI tool that accurately predicts infection risks related to oral mucositis in hematopoietic stem cell transplant patients.

AI-Enabled Hydrogel Patch Provides Long-Term High-Fidelity EEG and Attention Monitoring
Researchers unveil a reusable hydrogel patch with machine learning capabilities for high-fidelity EEG recording and attention assessment.