Sort by:
Page 461 of 7497488 results

He F, Chen S, Liu X, Yang X, Qin X

pubmed logopapersJul 14 2025
Accurately assessing the risk stratification of cN0 papillary thyroid carcinoma (PTC) preoperatively aids in making treatment decisions. We integrated preoperative ultrasound and cytological images of patients to develop and validate a multimodal deep learning (DL) model for non-invasive assessment of N0 PTC risk stratification before surgery. In this retrospective multicenter group study, we developed a comprehensive DL model based on ultrasound and cytological images. The model was trained and validated on 890 PTC patients undergoing thyroidectomy and lymph node dissection across five medical centers. The testing group included 107 patients from one medical center. We analyzed the model's performance, including the area under the receiver operating characteristic curve, accuracy, sensitivity, and specificity. The combined DL model demonstrated strong performance, with an area under the curve (AUC) of 0.922 (0.866-0.979) in the internal validation group and an AUC of 0.845 (0.794-0.895) in the testing group. The diagnostic performance of the combined DL model surpassed that of clinical models. Image region heatmaps assisted in interpreting the diagnosis of risk stratification. The multimodal DL model based on ultrasound and cytological images can accurately determine the risk stratification of N0 PTC and guide treatment decisions.

Reschke P, Koch V, Gruenewald LD, Bachir AA, Gotta J, Booz C, Alrahmoun MA, Strecker R, Nickel D, D'Angelo T, Dahm DM, Konrad P, Solim LA, Holzer M, Al-Saleh S, Scholtz JE, Sommer CM, Hammerstingl RM, Eichler K, Vogl TJ, Leistner DM, Haberkorn SM, Mahmoudi S

pubmed logopapersJul 14 2025
This study aims to evaluate the effectiveness of a deep learning (DL)-enhanced four-fold parallel acquisition technique (P4) in improving prostate MR image quality while optimizing scan efficiency compared to the traditional two-fold parallel acquisition technique (P2). Patients undergoing prostate MRI with DL-enhanced acquisitions were analyzed from January 2024 to July 2024. The participants prospectively received T2-weighted sequences in all imaging planes using both P2 and P4. Three independent readers assessed image quality, signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR). Significant differences in contrast and gray-level properties between P2 and P4 were identified through radiomics analysis (p <.05). A total of 51 participants (mean age 69.4 years ± 10.5 years) underwent P2 and P4 imaging. P4 demonstrated higher CNR and SNR values compared to P2 (p <.001). P4 was consistently rated superior to P2, demonstrating enhanced image quality and greater diagnostic precision across all evaluated categories (p <.001). Furthermore, radiomics analysis confirmed that P4 significantly altered structural and textural differentiation in comparison to P2. The P4 protocol reduced T2w scan times by 50.8%, from 11:48 min to 5:48 min (p <.001). In conclusion, P4 imaging enhances diagnostic quality and reduces scan times, improving workflow efficiency, and potentially contributing to a more patient-centered and sustainable radiology practice.

Cheng J, Zhao F, Wu Z, Yuan X, Wang L, Gilmore JH, Lin W, Zhang X, Li G

pubmed logopapersJul 14 2025
Inspired by the remarkable success of attention mechanisms in various applications, there is a growing need to adapt the Transformer architecture from conventional Euclidean domains to non-Euclidean spaces commonly encountered in medical imaging. Structures such as brain cortical surfaces, represented by triangular meshes, exhibit spherical topology and present unique challenges. To address this, we propose the Spherical Transformer (STF), a versatile backbone that leverages self-attention for analyzing cortical surface data. Our approach involves mapping cortical surfaces onto a sphere, dividing them into overlapping patches, and tokenizing both patches and vertices. By performing self-attention at patch and vertex levels, the model simultaneously captures global dependencies and preserves fine-grained contextual information within each patch. Overlapping regions between neighboring patches naturally enable efficient cross-patch information sharing. To handle longitudinal cortical surface data, we introduce the spatiotemporal self-attention mechanism, which jointly captures spatial context and temporal developmental patterns within a single layer. This innovation enhances the representational power of the model, making it well-suited for dynamic surface data. We evaluate the Spherical Transformer on key tasks, including cognition prediction at the surface level and two vertex-level tasks: cortical surface parcellation and cortical property map prediction. Across these applications, our model consistently outperforms state-of-the-art methods, demonstrating its ability to effectively model global dependencies and preserve detailed spatial information. The results highlight its potential as a general-purpose framework for cortical surface analysis.

Saadat Behzadi, Danial Sharifrazi, Bita Mesbahzadeh, Javad Hassannataj Joloudarid, Roohallah Alizadehsani

arxiv logopreprintJul 14 2025
Objectives: Timely and accurate detection of colorectal polyps plays a crucial role in diagnosing and preventing colorectal cancer, a major cause of mortality worldwide. This study introduces a new, lightweight, and efficient framework for polyp detection that combines the Local Outlier Factor (LOF) algorithm for filtering noisy data with the YOLO-v11n deep learning model. Study design: An experimental study leveraging deep learning and outlier removal techniques across multiple public datasets. Methods: The proposed approach was tested on five diverse and publicly available datasets: CVC-ColonDB, CVC-ClinicDB, Kvasir-SEG, ETIS, and EndoScene. Since these datasets originally lacked bounding box annotations, we converted their segmentation masks into suitable detection labels. To enhance the robustness and generalizability of our model, we apply 5-fold cross-validation and remove anomalous samples using the LOF method configured with 30 neighbors and a contamination ratio of 5%. Cleaned data are then fed into YOLO-v11n, a fast and resource-efficient object detection architecture optimized for real-time applications. We train the model using a combination of modern augmentation strategies to improve detection accuracy under diverse conditions. Results: Our approach significantly improves polyp localization performance, achieving a precision of 95.83%, recall of 91.85%, F1-score of 93.48%, [email protected] of 96.48%, and [email protected]:0.95 of 77.75%. Compared to previous YOLO-based methods, our model demonstrates enhanced accuracy and efficiency. Conclusions: These results suggest that the proposed method is well-suited for real-time colonoscopy support in clinical settings. Overall, the study underscores how crucial data preprocessing and model efficiency are when designing effective AI systems for medical imaging.

Jiaxu Zheng, Meiman He, Xuhui Tang, Xiong Wang, Tuoyu Cao, Tianyi Zeng, Lichi Zhang, Chenyu You

arxiv logopreprintJul 14 2025
Magnetic Resonance (MR) imaging plays an essential role in contemporary clinical diagnostics. It is increasingly integrated into advanced therapeutic workflows, such as hybrid Positron Emission Tomography/Magnetic Resonance (PET/MR) imaging and MR-only radiation therapy. These integrated approaches are critically dependent on accurate estimation of radiation attenuation, which is typically facilitated by synthesizing Computed Tomography (CT) images from MR scans to generate attenuation maps. However, existing MR-to-CT synthesis methods for whole-body imaging often suffer from poor spatial alignment between the generated CT and input MR images, and insufficient image quality for reliable use in downstream clinical tasks. In this paper, we present a novel 3D Wavelet Latent Diffusion Model (3D-WLDM) that addresses these limitations by performing modality translation in a learned latent space. By incorporating a Wavelet Residual Module into the encoder-decoder architecture, we enhance the capture and reconstruction of fine-scale features across image and latent spaces. To preserve anatomical integrity during the diffusion process, we disentangle structural and modality-specific characteristics and anchor the structural component to prevent warping. We also introduce a Dual Skip Connection Attention mechanism within the diffusion model, enabling the generation of high-resolution CT images with improved representation of bony structures and soft-tissue contrast.

Guohao Huo, Ruiting Dai, Hao Tang

arxiv logopreprintJul 14 2025
Brain tumor segmentation plays a critical role in clinical diagnosis and treatment planning, yet the variability in imaging quality across different MRI scanners presents significant challenges to model generalization. To address this, we propose the Edge Iterative MRI Lesion Localization System (EdgeIMLocSys), which integrates Continuous Learning from Human Feedback to adaptively fine-tune segmentation models based on clinician feedback, thereby enhancing robustness to scanner-specific imaging characteristics. Central to this system is the Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS), which employs a Modality-Aware Adaptive Encoder (M2AE) to extract multi-scale semantic features efficiently, and a Graph-based Multi-Modal Collaborative Interaction Module (G2MCIM) to model complementary cross-modal relationships via graph structures. Additionally, we introduce a novel Voxel Refinement UpSampling Module (VRUM) that synergistically combines linear interpolation and multi-scale transposed convolutions to suppress artifacts while preserving high-frequency details, improving segmentation boundary accuracy. Our proposed GMLN-BTS model achieves a Dice score of 85.1% on the BraTS2017 dataset with only 4.58 million parameters, representing a 98% reduction compared to mainstream 3D Transformer models, and significantly outperforms existing lightweight approaches. This work demonstrates a synergistic breakthrough in achieving high-accuracy, resource-efficient brain tumor segmentation suitable for deployment in resource-constrained clinical environments.

Tang M, Wang H, Wang S, Wali E, Gutbrod J, Singh A, Landeras L, Janich MA, Mor-Avi V, Patel AR, Patel H

pubmed logopapersJul 14 2025
Phase-sensitive inversion recovery late gadolinium enhancement (LGE) improves tissue contrast, however it is challenging to combine with a free-breathing acquisition. Deep-learning (DL) algorithms have growing applications in cardiac magnetic resonance imaging (CMR) to improve image quality. We compared a novel combination of a free-breathing single-shot phase-sensitive LGE with respiratory triggering (FB-PS) sequence with DL noise reduction reconstruction algorithm to a conventional segmented phase-sensitive LGE acquired during breath holding (BH-PS). 61 adult subjects (29 male, age 51 ± 15) underwent clinical CMR (1.5 T) with the FB-PS sequence and the conventional BH-PS sequence. DL noise reduction was incorporated into the image reconstruction pipeline. Qualitative metrics included image quality, artifact severity, diagnostic confidence. Quantitative metrics included septal-blood border sharpness, LGE sharpness, blood-myocardium apparent contrast-to-noise ratio (CNR), LGE-myocardium CNR, LGE apparent signal-to-noise ratio (SNR), and LGE burden. The sequences were compared via paired t-tests. 27 subjects had positive LGE. Average time to acquire a slice for FB-PS was 4-12 s versus ~32-38 s for BH-PS (including breath instructions and break time in between breath hold). FB-PS with medium DL noise reduction had better image quality (FB-PS 3.0 ± 0.7 vs. BH-PS 1.5 ± 0.6, p < 0.0001), less artifact (4.8 ± 0.5 vs. 3.4 ± 1.1, p < 0.0001), and higher diagnostic confidence (4.0 ± 0.6 vs. 2.6 ± 0.8, p < 0.0001). Septum sharpness in FB-PS with DL reconstruction versus BH-PS was not significantly different. There was no significant difference in LGE sharpness or LGE burden. FB-PS had superior blood-myocardium CNR (17.2 ± 6.9 vs. 16.4 ± 6.0, p = 0.040), LGE-myocardium CNR (12.1 ± 7.2 vs. 10.4 ± 6.6, p = 0.054), and LGE SNR (59.8 ± 26.8 vs. 31.2 ± 24.1, p < 0.001); these metrics further improved with DL noise reduction. A FB-PS sequence shortens scan time by over 5-fold and reduces motion artifact. Combined with a DL noise reduction algorithm, FB-PS provides better or similar image quality compared to BH-PS. This is a promising solution for patients who cannot hold their breath.

Lindholz M, Burdenski A, Ruppel R, Schulze-Weddige S, Baumgärtner GL, Schobert I, Haack AM, Eminovic S, Milnik A, Hamm CA, Frisch A, Penzkofer T

pubmed logopapersJul 14 2025
Radiology reports can change during workflows, especially when residents draft preliminary versions that attending physicians finalize. We explored how large language models (LLMs) and embedding techniques can categorize these changes into textual, semantic, or clinically actionable types. We evaluated 400 adult CT reports drafted by residents against finalized versions by attending physicians. Changes were rated on a five-point scale from no changes to critical ones. We examined open-source LLMs alongside traditional metrics like normalized word differences, Levenshtein and Jaccard similarity, and text embedding similarity. Model performance was assessed using quadratic weighted Cohen's kappa (κ), (balanced) accuracy, F<sub>1</sub>, precision, and recall. Inter-rater reliability among evaluators was excellent (κ = 0.990). Of the reports analyzed, 1.3 % contained critical changes. The tested methods showed significant performance differences (P < 0.001). The Qwen3-235B-A22B model using a zero-shot prompt, most closely aligned with human assessments of changes in clinical reports, achieving a κ of 0.822 (SD 0.031). The best conventional metric, word difference, had a κ of 0.732 (SD 0.048), the difference between the two showed statistical significance in unadjusted post-hoc tests (P = 0.038) but lost significance after adjusting for multiple testing (P = 0.064). Embedding models underperformed compared to LLMs and classical methods, showing statistical significance in most cases. Large language models like Qwen3-235B-A22B demonstrated moderate to strong alignment with expert evaluations of the clinical significance of changes in radiology reports. LLMs outperformed embedding methods and traditional string and word approaches, achieving statistical significance in most instances. This demonstrates their potential as tools to support peer review.

Mingda Zhang

arxiv logopreprintJul 14 2025
Precise segmentation of brain tumors from magnetic resonance imaging (MRI) is essential for neuro-oncology diagnosis and treatment planning. Despite advances in deep learning methods, automatic segmentation remains challenging due to tumor morphological heterogeneity and complex three-dimensional spatial relationships. Current techniques primarily rely on visual features extracted from MRI sequences while underutilizing semantic knowledge embedded in medical reports. This research presents a multi-level fusion architecture that integrates pixel-level, feature-level, and semantic-level information, facilitating comprehensive processing from low-level data to high-level concepts. The semantic-level fusion pathway combines the semantic understanding capabilities of Contrastive Language-Image Pre-training (CLIP) models with the spatial feature extraction advantages of 3D U-Net through three mechanisms: 3D-2D semantic bridging, cross-modal semantic guidance, and semantic-based attention mechanisms. Experimental validation on the BraTS 2020 dataset demonstrates that the proposed model achieves an overall Dice coefficient of 0.8567, representing a 4.8% improvement compared to traditional 3D U-Net, with a 7.3% Dice coefficient increase in the clinically important enhancing tumor (ET) region.

Guan, H., Hou, P. C., Hong, P., Wang, L., Zhang, W., Du, X., Zhou, Z., Zhou, L.

medrxiv logopreprintJul 14 2025
Recent advances in vision-language models (VLMs) have enabled automatic radiology report generation, yet current evaluation methods remain limited to general-purpose NLP metrics or coarse classification-based clinical scores. In this study, we propose a clinically informed evaluation framework for VLM-generated radiology reports that goes beyond traditional performance measures. We define a taxonomy of 12 radiology-specific error types, each annotated with clinical risk levels (low, medium, high) in collaboration with physicians. Using this framework, we conduct a comprehensive error analysis of three representative VLMs, i.e., DeepSeek VL2, CXR-LLaVA, and CheXagent, on 685 gold-standard, expert-annotated MIMIC-CXR cases. We further introduce a risk-aware evaluation metric, the Clinical Risk-weighted Error Score for Text-generation (CREST), to quantify safety impact. Our findings reveal critical model vulnerabilities, common error patterns, and condition-specific risk profiles, offering actionable insights for model development and deployment. This work establishes a safety-centric foundation for evaluating and improving medical report generation models. The source code of our evaluation framework, including CREST computation and error taxonomy analysis, is available at https://github.com/guanharry/VLM-CREST.
Page 461 of 7497488 results
Show
per page

Ready to Sharpen Your Edge?

Subscribe to join 7,500+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.