Development and evaluation of a convolutional neural network model for sex prediction using cephalometric radiographs and cranial photographs.
Authors
Affiliations (7)
Affiliations (7)
- Doctoral Program of Medical Science, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia.
- Nursing Department, Poltekkes Kemenkes Pontianak, Pontianak, Indonesia.
- Forensic Odontology Department, Faculty of Dental Medicine, Universitas Airlangga, Surabaya, Indonesia.
- Biomedical Engineering Study Program, Physics Department, Sains and Technology Faculty, Universitas Airlangga, Surabaya, Indonesia.
- Information Systems Department, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia.
- Forensics and Medicolegal Department, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia. [email protected].
- Magister of Forensic Sciences, Postgraduate School, Universitas Airlangga, Surabaya, Indonesia. [email protected].
Abstract
Accurately determining sex using features like facial bone profiles and teeth is crucial for identifying unknown victims. Lateral cephalometric radiographs effectively depict the lateral cranial structure, aiding the development of computational identification models. This study develops and evaluates a sex prediction model using cephalometric radiographs with several convolutional neural network (CNN) architectures. The primary goal is to evaluate the model's performance on standardized radiographic data and real-world cranial photographs to simulate forensic applications. Six CNN architectures-VGG16, VGG19, MobileNetV2, ResNet50V2, InceptionV3, and InceptionResNetV2-were employed to train and validate 340 cephalometric images of Indonesian individuals aged 18 to 40 years. The data were divided into training (70%), validation (15%), and testing (15%) subsets. Data augmentation was implemented to mitigate class imbalance. Additionally, a set of 40 cranial images from anatomical specimens was employed to evaluate the model's generalizability. Model performance metrics included accuracy, precision, recall, and F1-score. CNN models were trained and evaluated on 340 cephalometric images (255 females and 85 males). VGG19 and ResNet50V2 achieved high F1-scores of 95% (females) and 83% (males), respectively, using cephalometric data, highlighting their strong class-specific performance. Although the overall accuracy exceeded 90%, the F1-score better reflected model performance in this imbalanced dataset. In contrast, performance notably decreased with cranial photographs, particularly when classifying female samples. That is, while InceptionResNetV2 achieved the highest F1-score for cranial photographs (62%), misclassification of females remained significant. Confusion matrices and per-class metrics further revealed persistent issues related to data imbalance and generalization across imaging modalities. Basic CNN models perform well on standardized cephalometric images but less effectively on photographic cranial images, indicating a domain shift between image types that limits generalizability. Improving real-world forensic performance will require further optimization and more diverse training data. Not applicable.