AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry

Overview

Abdominal surface mesh sample 1 — Fig 1: Sample 2D abdominal surface meshes from the AbdCTBench dataset. These CT-derived surface geometries demonstrate the range of external anatomical features used to predict internal body composition biomarkers without radiation exposure. Complete biomarker details for these images are provided in the Appendix.

Abdominal surface mesh sample 2 — Fig 1: Sample 2D abdominal surface meshes from the AbdCTBench dataset. These CT-derived surface geometries demonstrate the range of external anatomical features used to predict internal body composition biomarkers without radiation exposure. Complete biomarker details for these images are provided in the Appendix.

Body composition analysis through CT and MRI imaging provides critical insights for cardiometabolic health assessment but remains limited by accessibility barriers including radiation exposure, high costs, and infrastructure requirements.

We present AbdCTBench, a large-scale dataset containing 23,506 CT-derived abdominal surface meshes from 18,719 patients, paired with 87 comorbidity labels, 31 specific diagnosis codes, and 16 CT-derived biomarkers. Our key insight is that external surface geometry is predictive of internal tissue composition, enabling accessible health screening through consumer devices.

We establish comprehensive benchmarks across seven computer vision architectures (ResNet-18/34/50, DenseNet-121, EfficientNet-B0, ViT-Small, and Swin Transformer-Base), demonstrating that models can learn robust surface-to-biomarker representations directly from 2D mesh projections. Our best-performing models achieve clinically relevant accuracy: age prediction with MAE 6.22 years (R²=0.757), mortality prediction with AUROC 0.839, and diabetes (with chronic complications) detection with AUROC 0.801.

Key Statistics

23,506 CT-derived surface meshes
18,719 unique patients
87 HCC comorbidity labels
31 ICD-10 diagnosis codes
16 CT-derived biomarkers
10 benchmark tasks
7 architectures evaluated

Dataset

Collection and Curation

AbdCTBench is a comprehensive dataset derived from 23,506 abdominal CTs of 18,719 patients (≈1.26 scans per patient), representing one of the largest CT-derived biomarker datasets for abdominal composition analysis. The data was collected from CT scans conducted from August 11, 2003, to September 9, 2021, under IRB approval.

Processing proceeded in two parallel phases:

Surface mesh rendering: DICOM image series were converted to STL files, then to 2D PNG images of size 384×384 via PyVista
Biomarker calculation: DICOM series were processed by OSCAR, which creates segmentation masks to calculate metrics at vertebral levels (L1-L5, T10-T12) and organ-specific regions (liver, spleen, kidneys, aorta)

Dataset Statistics

Demographics

Mean age: 55.3 years (SD: 16.51)
56.8% female, 43.2% male
Balanced sex distribution

Clinical Conditions

Essential hypertension: 53.7%
Type 2 Diabetes: 44.6%
Impaired glucose tolerance: 38.0%
Tobacco use: 26.8%
Myocardial Infarction: 23.1%

HCC Comorbidities

Average 1.8 HCC conditions per patient
HCC 108 (Vascular Disease): 22.6%
HCC 19 (Diabetes w/o complications): 13.0%
HCC 12 (Cancers): 10.9%

Biomarkers

Calcium Scoring Agatston: 1200.9 ± 3126.5
Kidney median HU: 90.0 ± 58.8
Spleen volume: 223.9 ± 127.2 cm³
Comprehensive adipose tissue analysis

Data Splits

The dataset was split at the patient ID level to prevent data leakage:

Training: 70%
Validation: 20%
Test: 10%

All hyperparameter tuning and model selection used only the train and validation sets. The test set was held out exclusively for final evaluation.

HIPAA Compliance

The dataset was processed for HIPAA Safe Harbor compliance, removing all protected health information (PHI) for safe public release. All patient identifiers were replaced with randomized study IDs, and only anonymized abdominal surface meshes are distributed.

Benchmark Tasks

From AbdCTBench, we curate 10 biomarker prediction tasks from 2D surface mesh images. We design a single-target learning framework to benchmark selected architectures on biomarker prediction.

Prediction Tasks

Mortality Prediction

Binary classification for patient death during follow-up

11.4%

HCC 108 (Vascular Disease)

Binary classification for vascular disease

22.6%

HCC 12 (Cancers)

Binary classification for breast, prostate, and other cancers

10.9%

HCC 96 (Cardiac Arrhythmias)

Binary classification for cardiac arrhythmias

9.0%

HCC 18 (Diabetes w/ Complications)

Binary classification for diabetes with chronic complications

8.3%

HCC 111 (COPD)

Binary classification for chronic obstructive pulmonary disease

7.1%

Calcium Score

Binary classification for Agatston score > 1000

21.2%

Myocardial Infarction

Binary classification for previous MI

23.1%

Type 2 Diabetes

Binary classification for diabetes at scan time

44.6%

Age Prediction

Regression task for patient age at scan time

Mean: 55.3 years

Architectures Evaluated

CNN Architectures

ResNet-18
ResNet-34
ResNet-50 (RadImageNet)
DenseNet-121
EfficientNet-B0

Vision Transformers

ViT-Small (DINOv2)
Swin Transformer-Base

Training Protocol

We establish a standardized training protocol for fair and reproducible comparison:

Optimizer: AdamW with weight decay 1×10⁻⁴
Learning Rate: Evaluated across 1×10⁻⁵, 1×10⁻⁴, 1×10⁻³
Batch Size: 16
Training: 100 epochs with early stopping (patience: 10)
Dropout: 0.2
Class Imbalance: Inverse frequency weighting, balanced batch sampling, threshold optimization

Results

Key Findings

Age Prediction

MAE: 6.223 years

R²: 0.757

Best: EfficientNet-B0

Mortality Prediction

AUROC: 0.839

Best: ResNet-18

Diabetes w/ Complications

AUROC: 0.801

Best: Swin Transformer-Base

Architectural Insights

Smaller architectures consistently matched or surpassed larger models: ResNet-18/34 and EfficientNet-B0 often outperformed ResNet-50 despite having fewer parameters
Medical-domain pretraining (RadImageNet) did not outperform standard ImageNet pretraining on this task, suggesting that surface geometry representations differ from typical medical imaging
Self-supervised pretraining (DINOv2) showed competitive performance but did not achieve the best results on any biomarker
Vision Transformers (ViT-Small, Swin) demonstrated robustness across tasks, with Swin achieving best performance on several biomarkers

Performance Summary

Task	Best Model	Metric	Performance
Age	EfficientNet-B0	MAE	6.223 years
Mortality	ResNet-18	AUROC	0.839
Calcium Score	ResNet-34	AUROC	0.848
Myocardial Infarction	Swin Transformer-Base	AUROC	0.742
Type 2 Diabetes	ResNet-34	AUROC	0.742
HCC 108 (Vascular Disease)	Swin Transformer-Base	AUROC	0.768
HCC 111 (COPD)	ResNet-18	AUROC	0.769
HCC 18 (Diabetes w/ Chronic Complications)	Swin Transformer-Base	AUROC	0.801
HCC 96 (Cardiac Arrhythmias)	Swin Transformer-Base	AUROC	0.770
HCC 12 (Cancers)	ResNet-34	AUROC	0.591

Effective Learning of Representations from Abdominal Surface Geometry

From the performance metrics reported above, we do not see drastic differences between the architectures considered. This demonstrates the viability of effective representation learning from abdominal surface geometry for clinically relevant biomarker prediction. The effectiveness is observed across all architectures and biomarkers, except HCC-12 (Breast, Prostate, and other Cancers), whereby external surface geometry may not be predictive of the comorbidity from a clinical perspective either. To demonstrate this further, we apply Gradient-Weighted Class Activation Mapping (Grad-CAM) to the input images to visualize the representations.

In specific, we load ResNet-18 (one of the best-performing models on HCC-18, Diabetes with Chronic Complications), and apply Grad-CAM on the last convolution layer to visualize the features learned from the surface geometry images. We collect a small random sample of size 100 from the test set, and select compelling examples to demonstrate the effectiveness of the learned representations as hypothesized. The heatmaps identify high attention regions from the surface geometry for the model to make predictions. The F1-optimal threshold of 0.9 was applied for the binary classification.

Grad-CAM visualization 1 for HCC-18 prediction — Grad-CAM visualizations showing learned representations from abdominal surface geometry. The heatmaps highlight regions of interest that the ResNet-18 model focuses on for HCC-18 (Diabetes with Chronic Complications) prediction.

Grad-CAM visualization 2 for HCC-18 prediction — Grad-CAM visualizations showing learned representations from abdominal surface geometry. The heatmaps highlight regions of interest that the ResNet-18 model focuses on for HCC-18 (Diabetes with Chronic Complications) prediction.

While Grad-CAM is a popular interpretability method, it has been shown to be unreliable (Kindermans et al., 2019). We provide these visualizations as hypothesis-generating and these are not used to support any core claims. All of our main conclusions rely on quantitative performance metrics; the paper does not draw any causal or mechanistic inferences from Grad-CAM.

Download & Access

We are working diligently to make the dataset publicly available as soon as possible. The release to include:

Full dataset (23,506 surface mesh images in PNG format)
3D surface meshes obtained from CT DICOM image series in STL format
HIPAA-compliant de-identified labels: 87 comorbidities, 31 diagnoses, 16 biomarkers
Data splits (train/validation/test)
Evaluation protocols and scripts
Pretrained model checkpoints for all architectures
Complete DICOM-to-STL-to-PNG processing pipeline
Biomarker extraction scripts that use OSCAR

GitHub Repository: Link to repository will be provided shortly

For access requests or questions, please contact us through GitHub issues or email mahmedch@stanford.edu.

Citation

@inproceedings{abdctbench2026,
  title={AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry},
  author={Chaudhry, Muhammad Ahmed and Bedi, Suhana and Lagari, Pola Lydia and Layden, Brian T and Galanter, William and Pyrros, Ayis and Koyejo, Sanmi},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}

Accepted to ICLR 2026. For correspondence, contact Muhammad Ahmed Chaudhry at mahmedch@stanford.edu.

OpenReview: https://openreview.net/forum?id=dKRAo0a9Gm