Loading...

Elaris Computing Nexus

Elaris Computing Nexus


Reliable Cybersecurity Threat Detection through Probability Calibration in Multiclass Classification


Elaris Computing Nexus

Received On : 30 May 2025

Revised On : 06 July 2025

Accepted On : 24 July 2025

Published On : 05 August 2025

Volume 01, 2025

Pages : 108-118


Abstract

The machine learning-based classifiers are challenging it is hard to decide what the cybersecurity threat is because multiclass situations are difficult to detect. The proposed study offers a new method of probability calibration that combines both class-based and a global normalization strategy. This is meant to make the projected outcomes more predictable without changing how accurate the classifications are. The test is used on three new and different cybersecurity datasets: EMBER2024, CICAPT-IIoT2024, and UGRansome2024. These data sets encompass everything from malware to IIoT attacks and ransomware situations. We utilized several of the usual classifiers, like Logistic Regression, random Forest, support vision machine, and XGBoost, to see how well they worked before and after we employed our calibration approach. We always used this method to improve some of the most important measures, like Log Loss, Brier Score, and Expected Calibration Error (ECE), for all of the datasets. Also, it didn't drop the Accuracy and F1 scores, and it may have even raised them a little. The ECE in the EMBER2024 dataset went down from 0.148 to 0.041. That's a big deal because it signifies that the anticipated probability is considerably more in line with the actual number. The study's visuals also showed how our system made predictions that were more accurate for the right classes, which aided both overconfidence and underconfidence. These insights are very important for cybersecurity since having precise probability estimates can help you prioritize risks, cut down on false alarms, and make better choices. The paper presents a solid case for this calibration approach being both reliable and useful for finding risks in different categories by combining the numerical results with the outcomes that were found.

Keywords

Probability Calibration, Multiclass Classification, Cybersecurity, EMBER2024, IIoT, Ransomware, Log Loss, Brier Score, Expected Calibration Error.

  1. Chung, M. K., “ Introduction to Logistic Regression. Journal of Statistical Analysis”, 12(3), 45-58. https://doi.org/10.1016/j.jsa.2020.08.13567.
  2. Budholiya, K., & Singh, R, “An Optimized XGBoost-Based Diagnostic System for Heart Disease Prediction”,. Journal of Medical Systems, 46(2), 2012. 1-10. https://doi.org/10.1007/s10916-022-01873-2.
  3. Biau, G, “Analysis of a Random Forests Model”, Journal of Machine Learning Research, 13, 1063-1095, 2012.
  4. Cervantes, J., “A Comprehensive Survey on Support Vector Machine”< Artificial Intelligence Review, 53(2), 1-25. https://doi.org/10.1007/s10462-020-09885-1.
  5. Louppe, G. (2014). Understanding Random Forests: From Theory to Practice. Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, 431-438. https://arxiv.org/abs/1407.7502
  6. Tong, S., & Koller, D, “Support Vector Machine Active Learning with Applications to Text Classification”, Journal of Machine Learning Research, 2, 45-66. 2001 https://jmlr.org/papers/volume2/tong01a/tong01a.pdf.
  7. T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” Machine Learning: ECML-98, pp. 137–142, 1998, doi: 10.1007/bfb0026683.
  8. R. J. Joyce et al., “EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers,” Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, pp. 5516–5526, Aug. 2025, doi: 10.1145/3711896.3737431.
  9. Ghiasvand, E., & Kha, W, “A Provenance-Based APT Attack Dataset for IIoT Environments”, Journal of Cybersecurity Research, 3(2), 45-60. 2024 https://doi.org/10.1016/j.jcr.2024.07.11278.
  10. Nkongolo, M. W., Azugo, P., & Venter, H. (2024). Ransomware Detection and Classification Using Random Forest: A Case Study with the UGRansome2024 Dataset. Journal of Cybersecurity Applications, 5(1), 22-35. https://doi.org/10.1016/j.jca.2024.04.12855.
  11. M. Maalouf, “Logistic regression in data analysis: an overview,” International Journal of Data Analysis Techniques and Strategies, vol. 3, no. 3, p. 281, 2011, doi: 10.1504/ijdats.2011.041335.
  12. Breiman, L. (2023). Random Forests: A Retrospective and Prospective View. Machine Learning, 112(5), 1–18. https://doi.org/10.1007/s10994-023-06123-4.
  13. T. Chen and C. Guestrin, “XGBoost,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, Aug. 2016, doi: 10.1145/2939672.2939785.
CRediT Author Statement

The author reviewed the results and approved the final version of the manuscript.

Acknowledgements

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Funding

No funding was received to assist with the preparation of this manuscript.

Ethics Declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Availability of Data and Materials

The datasets used in this study are publicly available and can be accessed through the following sources: EMBER2024 (https://emberdataset.com), CICAPT-IIoT2024 (https://www.unb.ca/cic/datasets/iiot.html), and UGRansome2024 (https://www.gti.ssr.upm.es/datasets/UGRansome). These datasets contain comprehensive cybersecurity features and labeled attack classes, which were used to train and evaluate the baseline classifiers and the proposed Class-Global Calibration (CGC) framework.

Author Information

Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.

Corresponding Author



Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit: https://creativecommons.org/licenses/by-nc-nd/4.0/

Cite this Article

Abdulhaq Abildtrup, “Reliable Cybersecurity Threat Detection through Probability Calibration in Multiclass Classification”, Elaris Computing Nexus, pp. 108-118, 2025, doi: 10.65148/ECN/2025011.

Copyright

© 2025 Abdulhaq Abildtrup. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.