← OpenSpineConsortium · AI Imaging Workshop

Tutorial 6 — Writing an AI Imaging Paper

AI Imaging Workshop · the 11:30 AM "Writing an AI paper (datasets, evaluations, clinical paper, scoping review)" block.

You've installed the tools, looked inside the data, run a model, annotated, and trained. This tutorial is about turning that into a publishable contribution — the four paper types you'll write, how each is structured, and the evaluation and reproducibility standards reviewers actually enforce. Examples are drawn from the workshop's own CTSpinoPelvic1K benchmark and its real peer review.


1. Four kinds of AI imaging paper

Type Core question it answers This workshop's example
Dataset / benchmark "Here is data + a way to measure progress." CTSpinoPelvic1K
Method "Here is a model/algorithm that does X better." a new segmentation net
Clinical "Does this AI change a patient-relevant outcome?" LSTV detection in practice
Scoping review "What does the literature cover, and where are the gaps?" "ML for LSTV: a scoping review"

Pick the type before you write — each has a different reviewer, structure, and bar.


2. The dataset/benchmark paper (what you're building)

A dataset paper is judged on rigor of construction, clarity of evaluation, and usability. Structure:

  1. Motivation / gap — what clinical or ML problem is unmet. (Ours: no CT benchmark for transitional anatomy at the lumbosacral junction.)
  2. Construction — exactly how the data was assembled, including the hard parts and the failure modes. Reproducible = a reader could rebuild it.
  3. The labels — the schema (our 10 classes), who annotated, inter-rater agreement, and how uncertainty is handled.
  4. Splits — train/val/test, stratified by the rare subgroup so it appears in every split, and patient-grouped so a patient never spans splits.
  5. A baseline + evaluation — at minimum one model's numbers so the benchmark's difficulty is concrete.
  6. Accounting — one authoritative table of counts (patients, scans, masks, volumes, per-class), because counts reported at different granularities are the #1 source of reviewer confusion.
  7. Release — data + code + (ideally) trained checkpoints, under a clear licence.

Documentation matters as much as the data. A datasheet (motivation, composition, collection, preprocessing, uses, distribution) is increasingly expected.


3. Evaluation you can defend

Most rejections are about evaluation, not ideas. The essentials:


4. The clinical paper

Different audience, different bar:


5. The scoping review

Maps a field's literature and gaps (broader than a systematic review):


6. What reviewers will push on (learned the hard way)

From this benchmark's actual review, the recurring asks — write to pre-empt them:


7. Reproducibility checklist (attach to any submission)


Recap

This concludes the AI Imaging Workshop tutorial series. You can now set up an environment, read medical images, run and train segmentation models on the grid, annotate with AI, and frame the result as a paper.