Back to all papers

Automated ultrasound with AI for osteophyte grading in hand osteoarthritis: comparison with expert rheumatologist assessment.

May 21, 2026pubmed logopapers

Authors

Frederiksen BA,Hammer HB,Terslev L,Danielsen MA,Savarimuthu TR,Weber ABH,Just SA

Affiliations (8)

  • Department of Medicine, Section of Rheumatology, Svendborg Hospital - Odense University Hospital, Baagøes Allé 15, Svendborg, DK-5700, Denmark.
  • Center for Treatment of Rheumatic and Musculoskeletal Diseases (REMEDY), Diakonhjemmet Hospital, Oslo, Norway.
  • Faculty of Medicine, University of Oslo, Oslo, Norway.
  • Center for Rheumatology and Spine Disease, Rigshospitalet, Glostrup, Denmark.
  • Maersk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark.
  • ROPCA ApS, Odense, Denmark.
  • Department of Medicine, Section of Rheumatology, Svendborg Hospital - Odense University Hospital, Baagøes Allé 15, Svendborg, DK-5700, Denmark. [email protected].
  • ROPCA ApS, Odense, Denmark. [email protected].

Abstract

The objective of this study was to characterise the agreement of the CE-certified automated robotic ultrasound system ARTHUR v.2.0, combined with the AI model DIANA v.2.0, for grading osteophytes in hand osteoarthritis (OA), using expert rheumatologist assessment as the reference standard. Thirty patients with hand OA underwent ultrasound of MCP, PIP, and DIP joints with ARTHUR v.2.0 and subsequently by an expert rheumatologist. Osteophytes were graded (0-3) using the OMERACT system. Agreement was assessed using weighted Cohen's kappa (κ), Percent Exact Agreement (PEA), Percent Close Agreement (PCA), sensitivity, and specificity. Comparisons were made against both the primary rheumatologist and an independent blinded external assessor (EA). ARTHUR v.2.0 successfully scanned 703/840 joints (83.7%), with lower success in DIP joints. Agreement between ARTHUR+DIANA and the rheumatologist showed κ = 0.49, PEA 53.7%, and PCA 90.7%. Compared to the EA, the automated system showed κ = 0.46 and PCA 90.5%. Agreement between the rheumatologist and EA showed κ = 0.67 and PCA 97.5%. Binary agreement for disease presence (≥ Grade 2) for the automated system was 81.6% compared to the rheumatologist, although sensitivity was limited (36.5%). ARTHUR v.2.0 combined with DIANA v.2.0 achieved binary agreement comparable to expert-expert comparisons, although with limited sensitivity, and moderate agreement on the full 0-3 OMERACT scale. A decomposition analysis indicated that acquisition-related variability contributed substantially to the overall discrepancy. Refinement of AI threshold calibration and distal joint acquisition, and evaluation in larger and more diverse cohorts, is warranted to further improve sensitivity for osteophyte detection.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.