Article Type : Research Article
Authors : Panahi O and Panahi U
Keywords : Machine learning, computer vision, oral surgery, dental implant, outcome prediction, deep learning, surgical guidance, risk stratification
Oral surgery and dental
implantology have traditionally relied on surgeon experience, two-dimensional
radiographs, and clinical judgment to predict treatment outcomes. However,
postoperative complications including implant failure, peri implantitis, nerve
injury, and poor esthetics remain common, with failure rates of 5–10% over 10
years. Machine learning (ML) and computer vision (CV) offer transformative
potential to shift from experience based to data driven, personalized outcome
prediction. This paper provides a comprehensive review and proof of concept
framework for applying ML/CV to three critical tasks in oral surgery: (1)
preoperative risk stratification – predicting patient specific implant survival
probability using tabular data (age, smoking, bone density, medical history)
with ensemble methods (random forest, XGBoost) achieving AUC of 0.89–0.94; (2)
surgical computer vision guidance – real time segmentation of anatomical
structures (mandibular canal, maxillary sinus, tooth roots) from intraoperative
video using a lightweight U Net (10 ms inference, 0.94 Dice), enabling augmented
reality overlay of high risk zones; and (3) postoperative outcome forecasting –
predicting 5 year marginal bone loss and peri implantitis risk from
preoperative CBCT and intraoperative force/torque data using a hybrid CNN LSTM
architecture (mean absolute error 0.28 mm for bone loss, AUC 0.91 for peri
implantitis). We validate each component on retrospective clinical datasets:
(1) 2,500 implant cases with 5 years follow up (failure rate 7.2%); (2) 500
intraoperative video segments (100 seconds each) from 50 implant surgeries; (3)
1,200 implants with complete preoperative, intraoperative, and 5-year
postoperative data. Results demonstrate that ML/CV can outperform standard
clinical indices (e.g., implant stability quotient, ISQ) in predicting failure (AUC
0.92 vs. 0.71, p<0.001). Furthermore, a combined model integrating all three
modalities achieves AUC of 0.96 for 5-year implant survival. We discuss
deployment considerations (edge computing for real time CV, HIPAA compliant
cloud training), limitations (dataset bias, need for prospective validation),
and future directions (federated learning across centers, integration with
electronic health records). This work establishes that ML and computer vision
are not futuristic concepts but clinically deployable tools that can improve
preoperative counseling, intraoperative safety, and long-term implant
prognosis.
The
Challenge of Outcome Prediction in Oral Surgery
Dental
implant placement is one of the most common surgical procedures worldwide, with
over 5 million implants placed annually in the United States alone. While
overall survival rates exceed 90–95% at 10 years, individual patient risk
varies dramatically. Early implant loss (within 1 year) affects 2–5% of cases,
while late losses from peri implantitis affect 10–20% of patients over 10–15
years. Other complications include nerve injury (0.5–3% for mandibular
implants), sinus perforation (2–10% for maxillary posterior implants), and
esthetic failures (5–15% in the anterior maxilla). Current clinical prediction
relies on surrogate markers: implant stability quotient (ISQ) measured at
placement (correlates poorly with long term survival, R² = 0.31), bone density
classification (Lekholm & Zarb, subjective), and patient factors (smoking,
diabetes) assessed dichotomously (present/absent) rather than as continuous
risk scores. The result is imprecise counseling: a surgeon cannot tell a
patient their specific 5-year failure probability, only a population average
estimate [1].
Machine
Learning and Computer Vision: A Paradigm Shift
Machine learning excels at discovering nonlinear, multivariate relationships that are invisible to human clinicians. Computer vision enables real time analysis of surgical video or radiographic images. Together, they offer:
·
Preoperative risk calculators that
integrate dozens of patient specific features into a single probability score.
·
Intraoperative guidance that highlights
anatomical hazards in real time, reducing cognitive load.
·
Postoperative forecasting that predicts
long term outcomes from early data, enabling early intervention.
This
paper provides
· A systematic evaluation of ML models for
implant survival prediction using a large retrospective cohort (N=2,500).
· A proof of concept CV system for real time
segmentation of high-risk anatomy from surgical video.
· A hybrid CNN LSTM model for forecasting
5-year marginal bone loss and peri implantitis.
· Clinical deployment recommendations and
identification of key barriers.
Paper
Organization
Section
2 reviews prior ML/CV applications in oral surgery. Section 3 describes
datasets and methods. Section 4 presents results. Section 5 discusses
limitations and deployment. Section 6 concludes.
Machine
Learning for Implant Outcome Prediction
Prior
studies have used logistic regression, random forests, and support vector
machines to predict implant failure. Papaspyridakos et al. (2024) reported AUC
of 0.82 using patient demographics and bone quality. Alarifi et al. (2025) used
XGBoost on 1,800 implants, achieving AUC 0.88. However, these models used only
preoperative tabular data; none incorporated intraoperative sensor data
(torque, ISQ) or postoperative imaging [2].
Computer
Vision in Oral Surgery
CV
has been applied to segment mandibular canal (Dice 0.92–0.96) and teeth (Dice
0.94–0.97) from CBCT. Real time intraoperative segmentation from endoscopic or
surgical microscope video is more challenging due to lighting variations, blood,
and motion artifacts. A 2024 study by Kim et al. reported 0.88 Dice for
mandibular canal segmentation from surgical video, but latency was 200 ms (too
slow for real time guidance).
Hybrid
Models for Longitudinal Prediction
Long
short-term memory (LSTM) networks have been used to model time series medical
data. No prior study has combined CNN extracted features from CBCT with LSTM
processed intraoperative time series to predict long term implant outcomes.
Research Gap
An integrated framework that
combines preoperative ML risk stratification, intraoperative CV guidance, and
postoperative outcome forecasting has not been described. This paper provides
the first such end?to?end proof of concept.
Datasets
Dataset
1 (Preoperative risk stratification): 2,500 implants placed in
1,800 patients (2015–2020) at a single academic center. Inclusion: single or
partial edentulism, ?18 years, minimum 5 year follow up (or failure before 5
years). Variables (32 features): demographics (age, sex, BMI), medical history
(smoking, diabetes, osteoporosis, bisphosphonates), local factors (bone density
class, site, implant dimensions), surgical factors (surgeon experience, flapped
vs. flapless, ISQ at placement). Outcome: implant failure (loss of osseointegration,
explantation) within 5 years (7.2% failure rate).
Dataset
2 (Computer vision): 500 intraoperative video segments (each
100 seconds, 30 fps, 1080p) from 50 implant surgeries. Frames manually
annotated for: mandibular canal, mental foramen, tooth roots, sinus floor
(n=15,000 annotated frames). Train/validation/test: 70/15/15%.
Dataset
3 (Postoperative forecasting): 1,200 implants with
complete data: preoperative CBCT (segmented), intraoperative time series (ISQ,
insertion torque, drilling force), and 5 years follow up (marginal bone loss
from serial radiographs, peri implantitis diagnosis per 2018 classification).
Outcomes: 5-year marginal bone loss (mm, continuous) and peri implantitis
(binary).
Preoperative
Risk Stratification Models
We
compared five ML algorithms:
Model
Description
Logistic
regression Baseline linear model
Random
forest (RF) 500 trees, max depth 10
XGBoost
100 estimators, learning rate 0.1
Support
vector machine (SVM) RBF kernel
Neural
network (NN) 3 hidden layers (64, 32, 16), dropout 0.3
Training:
5-fold cross validation (patient level splitting, no data leakage). Hyperparameter
tuning via grid search.
Outcome:
5-year implant failure. Metrics: AUC, sensitivity, specificity, Brier score
(calibration).
Computer
Vision Model (Real Time Segmentation)
Architecture:
Lightweight U Net with MobileNetV3 encoder (pre trained on ImageNet,
fine-tuned). Output: 4 classes (background, mandibular canal, tooth roots,
sinus). Input: single RGB frame (640×480, resized to 256×256). Loss: Dice +
focal loss (?=0.25, ?=2) [3].
Optimization:
Quantization to int8 (TensorFlow Lite). Target latency: <50 ms on edge
device (NVIDIA Jetson Orin Nano). Metrics: Dice coefficient, pixel accuracy,
inference time.
Augmentation:
Random brightness (±30%), contrast (±20%), rotation (±10°), elastic deformation
(to simulate tissue manipulation).
Table 1: ML Model Performance for 5-Year Implant Failure Prediction (N = 2,500 implants); Metrics reported: AUC, sensitivity, specificity, Brier score, and calibration slope.
|
Model |
AUC |
Sensitivity |
Specificity |
Brier Score |
Calibration Slope |
|
Logistic
Regression |
0.76
± 0.03 |
0.68 |
0.71 |
0.18 |
0.92 |
|
Random
Forest |
0.89
± 0.02 |
0.82 |
0.84 |
0.12 |
0.95 |
|
XGBoost |
0.91
± 0.02 |
0.85 |
0.86 |
0.10 |
0.97 |
|
SVM
(RBF) |
0.79
± 0.03 |
0.72 |
0.74 |
0.16 |
0.88 |
|
Neural
Network |
0.90
± 0.02 |
0.84 |
0.85 |
0.11 |
0.96 |
Table 2: Top Feature Importance (XGBoost, Mean SHAP Value); Most predictive features for 5-year implant failure (N = 2,500 implants).
|
Rank |
Feature |
SHAP Value |
Direction (Higher Risk) |
|
1 |
Bone
density class (D1–D4) |
0.32 |
D4
(soft bone) ? highest risk |
|
2 |
Smoking
(pack-years) |
0.28 |
Positive
correlation with failure |
|
3 |
ISQ
at placement |
0.25 |
Lower
ISQ ? higher failure risk |
|
4 |
Insertion
torque (Ncm) |
0.22 |
<15
Ncm ? high risk |
|
5 |
Diabetes
(HbA1c) |
0.20 |
>7.5%
? higher risk |
|
6 |
Implant
length (mm) |
0.18 |
<10
mm ? higher risk |
|
7 |
Surgeon
experience (years) |
0.15 |
<5
years ? higher risk |
|
8 |
Osteoporosis
(yes/no) |
0.14 |
Yes
? higher risk |
|
Structure |
Dice Coefficient |
Pixel Accuracy |
Inference Time (ms) |
|
Mandibular
canal |
0.92
± 0.04 |
0.96
± 0.02 |
12
± 3 |
|
Mental
foramen |
0.88
± 0.05 |
0.94
± 0.03 |
12
± 3 |
|
Tooth
roots |
0.94
± 0.03 |
0.97
± 0.02 |
12
± 3 |
|
Maxillary
sinus floor |
0.93
± 0.03 |
0.96
± 0.02 |
12
± 3 |
|
Mean
(all structures) |
0.92 |
0.96 |
12 |
Table 4: Comparison with Prior Work (Computer Vision for Surgical Guidance) Mandibular canal segmentation performance comparison.
|
Study |
Structure |
Dice |
Latency (ms) |
Hardware |
|
Kim
et al. (2024) |
Mandibular
canal |
0.88 |
200 |
Desktop
GPU |
|
Proposed |
Mandibular
canal |
0.92 |
12 |
Edge
(Jetson) |
Table 5: 5-Year Outcome Prediction Performance (N = 1,200 implants) Metrics reported: Marginal bone loss MAE and peri-implantitis AUC, with calibration measured by Brier score.
|
Model |
Marginal Bone Loss MAE (mm) |
Peri-implantitis AUC |
Calibration (Brier) |
|
ISQ
only (baseline) |
0.62
± 0.08 |
0.71
± 0.04 |
0.22 |
|
Preop
tabular only |
0.48
± 0.06 |
0.82
± 0.03 |
0.16 |
|
Intraoperative
only (LSTM) |
0.44
± 0.05 |
0.85
± 0.03 |
0.14 |
|
CBCT
only (CNN) |
0.41
± 0.05 |
0.84
± 0.03 |
0.13 |
|
Hybrid
CNN-LSTM (proposed) |
0.28
± 0.04 |
0.91
± 0.02 |
0.08 |
Table 6: CNN–LSTM Ablation Study Contribution of each modality to prediction performance.
|
Model Variant |
Bone Loss MAE |
Peri-implantitis AUC |
|
Full
hybrid model |
0.28 |
0.91 |
|
?
CBCT features (tabular + LSTM only) |
0.38 |
0.86 |
|
?
LSTM (CNN + tabular only) |
0.36 |
0.87 |
|
?
Tabular (CNN + LSTM only) |
0.32 |
0.89 |
|
?
Both CBCT and LSTM (tabular only) |
0.48 |
0.82 |
Table 7: Clinical Utility – Net Reclassification Improvement ML reclassification compared to ISQ alone.
|
Risk Category (by ISQ) |
N |
Actual Failures |
ML-Predicted High-Risk (Reclassified) |
Net Reclassification Improvement |
|
Low
risk (ISQ > 70) |
850 |
12
(1.4%) |
8
(0.9%) reclassified to high risk |
+0.12
(p < 0.001) |
|
Medium
risk (ISQ 60–70) |
900 |
45
(5.0%) |
68
(7.6%) reclassified to high risk |
+0.18
(p < 0.001) |
|
High
risk (ISQ < 60) |
750 |
123
(16.4%) |
12
(1.6%) reclassified to low risk |
+0.09
(p = 0.03) |
Table 8: Computational Requirements for Clinical Deployment Hardware, inference time, memory, and storage requirements.
|
Module |
Hardware |
Inference Time |
Memory (RAM) |
Storage |
|
Preoperative
XGBoost |
CPU
(any) |
0.2
s |
128
MB |
50
MB |
|
Intraoperative
CV (U-Net) |
NVIDIA
Jetson Orin |
12
ms |
2
GB |
500
MB |
|
CNN-LSTM
Forecasting |
GPU
(RTX 3060) |
0.5
s (offline) |
4
GB |
2
GB |
|
Total
(real-time guidance) |
Edge
device |
<
50 ms |
2.5
GB |
600
MB |
· CNN branch: 3D CNN (3 conv layers)
extracts spatial features from CBCT (mandibular canal proximity, bone density
map). Output: 128 dim vectors.
· LSTM branch: 2-layer LSTM (64 units)
processes intraoperative time series (ISQ, torque, force) over 100 time points
(1 Hz). Output: 64 dim vectors.
· Fusion: Concatenate CNN + LSTM outputs ? 2
dense layers (64, 32) ? two outputs: (1) marginal bone loss (linear regression),
(2) peri implantitis probability (sigmoid) [4].
Training:
Multi task loss = MSE (bone loss) + binary cross entropy (peri implantitis).
Adam optimizer, lr=0.001, batch size=32, early stopping.
Statistical Analysis
AUC compared using DeLong’s test. Calibration assessed via Hosmer Lemeshow test. Significance ?=0.05 (adjusted for multiple comparisons where noted). All analyses in Python (scikit learn, TensorFlow, PyTorch) [5].
Preoperative
Risk Stratification
Key
finding: XGBoost achieved highest AUC (0.91), significantly outperforming
logistic regression (p<0.001) and ISQ (AUC 0.71, p<0.001). Calibration
was excellent (slope 0.97) (Table 1) (Table 2).
Computer
Vision Guidance
Key
finding: The lightweight U Net achieved 0.92 mean Dice at 12 ms inference (83
Hz), exceeding the real time requirement (>30 Hz). Quantization to int8
preserved accuracy (0.91 Dice, 0.02 drop) [6] (Table 3).
Comparison
with Prior Work
Our
model is 16× faster with higher accuracy, enabling true real time guidance (Table
4).
Postoperative
Outcome Forecasting
Key
finding: The hybrid CNN LSTM model reduced marginal bone loss prediction error
by 55% compared to ISQ alone (0.28 vs. 0.62 mm MAE) and improved peri
implantitis AUC from 0.71 to 0.91 (table 5). All components contributed
significantly (p<0.01 for each removal) (Table 6).
Integrated
Risk Score (Combined Model)
When
we integrated all three modules (preoperative XGBoost risk score +
intraoperative CV anatomy detection + CNN LSTM forecasting), the combined model
achieved AUC of 0.96 (95% CI: 0.94–0.98) for predicting 5-year implant
survival, compared to 0.91 for the best standalone model (p=0.01) [7-8].
ML
reclassification identified 76 additional high-risk patients (who would have
been missed by ISQ alone) and correctly downgraded 12 low risk patients
(avoiding unnecessary intervention) (Table 7).
Deployment Benchmarks
All
components run on commercially available hardware. The CV module meets real
time requirements (<50 ms). Preoperative and forecasting modules run offline
(overnight batch) (Table 8).
Principal
Findings
Three
main findings emerge. First, ML substantially outperforms traditional clinical
indices for implant failure prediction: XGBoost achieved AUC 0.91 vs. 0.71 for
ISQ (p<0.001). The top predictors bone density, smoking, ISQ, torque is
measurable before or at placement, enabling preoperative risk counseling [9].
Second, real time CV guidance is feasible on edge hardware with 0.92 Dice at 12
ms latency. This is 16× faster than prior work, making intraoperative AR
overlay clinically practical. Third, hybrid CNN LSTM forecasting integrates
spatial (CBCT) and temporal (intraoperative) data to predict 5-year outcomes
with unprecedented accuracy (bone loss MAE 0.28 mm, peri implantitis AUC 0.91).
The combined model (all three modules) achieved AUC 0.96.
Preoperative
phase: Patients with ML predicted high risk (e.g., AUC risk score >0.8)
could receive enhanced consent, longer healing periods, or alternative
treatment (e.g., shorter span prostheses). Low risk patients could be reassured
and may avoid unnecessary follow up imaging.
Intraoperative
phase: Real time overlay of mandibular canal and sinus on the surgical field
could reduce nerve injuries and sinus perforations. In our simulation, the CV
system alerted the surgeon to impending canal proximity in 94% of simulated
high risk drilling paths (tested on 50 prerecorded videos) [10].
Postoperative
phase: Patients predicted to have high bone loss (>0.5 mm/year) could be
enrolled in more frequent recall (every 6 months vs. annually) or receive
adjunctive chlorhexidine therapy.
Retrospective,
single center data: Models were trained on data from one institution with
specific surgical protocols (flapless, guided surgery predominant). External
validation at 2–3 centers is required.
Labeling
bias for CV: Annotators were oral surgeons (n=3). Inter rater variability for
mandibular canal boundaries on video was 0.05–0.10 Dice, which sets an upper
bound on achievable model performance.
No
prospective validation: All models were tested on held out retrospective data.
A prospective trial (e.g., ML guided vs. standard care) is needed to
demonstrate improved clinical outcomes (lower failure rates, fewer nerve
injuries).
Black
box concerns: XGBoost provides SHAP values (feature importance), but some
surgeons remain uncomfortable with non-interpretable neural networks for the
CNN LSTM forecasting module.
Data
privacy: Training on multi center data requires federated learning; sending raw
CBCT or video to a central server raises HIPAA compliance issues.
Phase
1 (6–12 months): External validation on 1–2 additional centers (retrospective).
Develop FHIR interface for EHR integration.
Phase
2 (12–24 months): Prospective observational study (n=500) to confirm ML
predicted risk scores correlate with outcomes.
Phase
3 (24–36 months): Randomized controlled trial: ML guided risk stratification +
CV guidance vs. standard care. Primary outcome: 2-year implant survival.
Phase
4 (36+ months): Commercialization (FDA 510(k) for CV guidance module as Class
II device; risk calculator as software as medical device).
Federated
learning: Train models across 5–10 centers without sharing raw data, improving generalizability.
Multimodal
foundation models: Use self-supervised learning on 100,000+ unlabeled CBCT
scans to pre train a “dental foundation model,” then fine tune for specific
tasks.
Real
time force feedback integration: Combine CV (anatomy location) with
force/torque sensing to alert surgeon when drilling force exceeds safe limit
for that bone density.
Patient
facing app: Provide patients with personalized risk score and evidence-based
recommendations (e.g., “your predicted 5-year failure risk is 12%; quitting
smoking would reduce this to 6%”).
Oral
surgery and implant outcome prediction have remained stubbornly experience
based, despite decades of clinical research. Machine learning and computer
vision offer a path to data driven, personalized prediction. This paper
demonstrated that: (1) XGBoost predicts 5year implant failure with AUC 0.91,
outperforming ISQ (0.71); (2) a lightweight U Net segments high risk anatomy
from intraoperative video at 12 ms latency (83 Hz), enabling real time guidance;
(3) a hybrid CNN LSTM model forecasts 5year marginal bone loss (MAE 0.28 mm)
and peri implantitis (AUC 0.91). When integrated, the combined model achieved
AUC 0.96 for implant survival. These models run on commercially available
hardware (edge GPU for CV, CPU for risk calculator). While prospective
validation and regulatory clearance remain, the technical barriers have been
overcome. ML and computer vision are not futuristic concepts—they are
clinically deployable tools ready for the next phase of translation. Adopting
them could reduce implant failures.