Towards Better-Calibrated ML Models for Reliable Network Intrusion Detection

Hussein Fawaz, Farah Ezzeddine, Silvia Giordano, Omran Ayoub
SUPSI, Switzerland

IEEE WiMob 2025
Other version: [IEEE] [[Code – forthcoming]]

TL;DR

Well-calibrated ML models make more reliable intrusion detection decisions. We combine calibration metrics with SHAP-based feature selection to improve NIDS reliability.

Why calibration matters in NIDS

Intrusion detection systems rely on probabilistic outputs to prioritize alerts. Poor calibration leads to overconfident false positives or missed attacks—both critical in operational environments.

Calibration-aware SHAP feature selection

We propose a feature selection framework that integrates:

  • SHAP values for explainability
  • Calibration metrics (ECE, Brier score) for reliability

Features are selected not only for accuracy, but also for their contribution to well-calibrated predictions.

Experimental results

Across multiple intrusion detection datasets, our approach:

  • Significantly improves calibration
  • Maintains or improves detection performance
  • Produces more trustworthy confidence estimates

Why this matters

Reliable confidence estimates allow security operators to make better-informed decisions, reducing alert fatigue and operational risk.

Future work

We are extending this work with uncertainty quantification and conformal prediction to provide formal reliability guarantees.