Loading...

Elaris Computing Nexus

Elaris Computing Nexus


Explainable Natural Language Processing Models using Partial Dependence Plots with Random Forests


Elaris Computing Nexus

Received On : 20 March 2025

Revised On : 30 April 2025

Accepted On : 25 May 2025

Published On : 06 June 2025

Volume 01, 2025

Pages : 061-072


Abstract

The interpretability of natural language processing (NLP) models is needed to comprehend the decision-making, especially in the ensemble-based models, like the Random Forests. This paper will discuss how Partial Dependence Plots (PDP) can be used to measure and plot the influence of individual words on model predictions on a variety of NLP models. The datasets that were taken into consideration were multi-class topic classification (20 Newsgroups, AG News), binary sentiment analysis (IMDB, Amazon Reviews), and SMS spam detection. Random Forest classifiers were trained on TF-IDF features and PDPs were used to analyze key words that are representative of each class or sentiment. Findings indicate that words that are class specific and those that bear sentiments have high values of partial dependence, which have strong effects on the classes they are predicted to belong to, whereas generic words have moderate cross-class effects. The method gives both numerical and graphical understanding of the contribution of features, and one can easily interpret the model behavior without compromising the predictive performance. In datasets, PDPs showed consistent patterns, which indicated the generality of the approach. The results highlight that PDPs are useful to discover meaningful word-level relations, identify subtle interactions, and increase the model transparency. Having generalized the use of PDPs across several NLP domains, this work provides a viable framework of interpretable machine learning, making practitioners apply models with confidence and knowing the underlying factors that lead to the predictions. In general, the suggested methodology fills the disconnection between model performance and interpretability, which can make NLP systems more transparent and reliable.

Keywords

Natural Language Processing, IMDB, AG News, Amazon Reviews, Partial Dependence Plots.

  1. T. Danesh, R. Ouaret, P. Floquet, and S. Negny, “Neural Network Sensitivity and Interpretability Predictions in Power Plant Application,” SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4119745.
  2. Md. M. Islam, H. R. Rifat, Md. S. B. Shahid, A. Akhter, M. A. Uddin, and K. M. M. Uddin, “Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME,” Engineering Reports, vol. 7, no. 1, Dec. 2024, doi: 10.1002/eng2.13080.
  3. C. Molnar et al., “Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process,” Explainable Artificial Intelligence, pp. 456–479, 2023, doi: 10.1007/978-3-031-44064-9_24.
  4. A. Abdollahi and B. Pradhan, “Explainable artificial intelligence (XAI) for interpreting the contributing factors feed into the wildfire susceptibility prediction model,” Science of The Total Environment, vol. 879, p. 163004, Jun. 2023, doi: 10.1016/j.scitotenv.2023.163004.
  5. M. Ryo, “Explainable artificial intelligence and interpretable machine learning for agricultural data analysis,” Artificial Intelligence in Agriculture, vol. 6, pp. 257–265, 2022, doi: 10.1016/j.aiia.2022.11.003.
  6. A. Sarica et al., “Explainability of random survival forests in predicting conversion risk from mild cognitive impairment to Alzheimer’s disease,” Brain Informatics, vol. 10, no. 1, Nov. 2023, doi: 10.1186/s40708-023-00211-w.
  7. H. Zhao et al., “Explainability for Large Language Models: A Survey,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 2, pp. 1–38, Feb. 2024, doi: 10.1145/3639372.
  8. J. Kim, H. Lee, and H. Lee, “Mining the determinants of review helpfulness: a novel approach using intelligent feature engineering and explainable AI,” Data Technologies and Applications, vol. 57, no. 1, pp. 108–130, Jul. 2022, doi: 10.1108/dta-12-2021-0359.
  9. A. Lotfata, M. Moosazadeh, M. Helbich, and B. Hoseini, “Socioeconomic and environmental determinants of asthma prevalence: a cross-sectional study at the U.S. County level using geographically weighted random forests,” International Journal of Health Geographics, vol. 22, no. 1, Aug. 2023, doi: 10.1186/s12942-023-00343-6.
  10. G. Dharmarathne, M. Bogahawaththa, M. McAfee, U. Rathnayake, and D. P. P. Meddage, “On the diagnosis of chronic kidney disease using a machine learning-based interface with explainable artificial intelligence,” Intelligent Systems with Applications, vol. 22, p. 200397, Jun. 2024, doi: 10.1016/j.iswa.2024.200397.
  11. J. Petch, S. Di, and W. Nelson, “Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology,” Canadian Journal of Cardiology, vol. 38, no. 2, pp. 204–213, Feb. 2022, doi: 10.1016/j.cjca.2021.09.004.
  12. K. A. Abid, S. A. Syed, and M. Khan, “Explainable machine learning-based model for predicting interlayer bond strength in 3D printed concrete,” Multiscale and Multidisciplinary Modeling, Experiments and Design, vol. 8, no. 9, Aug. 2025, doi: 10.1007/s41939-025-00997-8.
  13. M. K. Nallakaruppan, E. Gangadevi, M. L. Shri, B. Balusamy, S. Bhattacharya, and S. Selvarajan, “Reliable water quality prediction and parametric analysis using explainable AI models,” Scientific Reports, vol. 14, no. 1, Mar. 2024, doi: 10.1038/s41598-024-56775-y.
  14. A. Moncada-Torres, M. C. van Maaren, M. P. Hendriks, S. Siesling, and G. Geleijnse, “Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival,” Scientific Reports, vol. 11, no. 1, Mar. 2021, doi: 10.1038/s41598-021-86327-7.
  15. O. S. Djandja, A. A. Salami, Z.-C. Wang, J. Duo, L.-X. Yin, and P.-G. Duan, “Random forest-based modeling for insights on phosphorus content in hydrochar produced from hydrothermal carbonization of sewage sludge,” Energy, vol. 245, p. 123295, Apr. 2022, doi: 10.1016/j.energy.2022.123295.
  16. V. V. Mihunov, K. Wang, Z. Wang, N. S. N. Lam, and M. Sun, “Social media and volunteer rescue requests prediction with random forest and algorithm bias detection: a case of Hurricane Harvey,” Environmental Research Communications, vol. 5, no. 6, p. 065013, Jun. 2023, doi: 10.1088/2515-7620/acde35.
CRediT Author Statement

The author reviewed the results and approved the final version of the manuscript.

Acknowledgements

Authors thank Reviewers for taking the time and effort necessary to review the manuscript.

Funding

No funding was received to assist with the preparation of this manuscript.

Ethics Declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Availability of Data and Materials

All datasets used in this study, including 20 Newsgroups, AG News, IMDB, Amazon Reviews, and SMS Spam Detection, are publicly available and have been described in detail within the article.

Author Information

Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.

Corresponding Author



Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit: https://creativecommons.org/licenses/by-nc-nd/4.0/

Cite this Article

Anandakumar Haldorai, “Explainable Natural Language Processing Models using Partial Dependence Plots with Random Forests”, Elaris Computing Nexus, pp. 061-072, 2025, doi: 10.65148/ECN/2025007.

Copyright

© 2025 Anandakumar Haldorai. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.