Decision trees are widely recognized for their interpretability and computational efficiency. However, the choice of impurity function—typically entropy or Gini impurity—can significantly influence model performance, especially in high-dimensional or imbalanced data settings. We propose Adaptive Weighted Impurity (AWI), a novel impurity criterion that dynamically integrates entropy and Gini impurity through an adaptive, data-driven weighting mechanism. AWI retains the transparency of classical decision trees while enhancing classification accuracy, scalability, and robustness across diverse datasets. Extensive experiments on benchmark datasets (Iris, Titanic, MNIST) demonstrate that AWI consistently improves classification performance, reduces training time, and simplifies resulting models. Notably, AWI introduces no additional hyperparameters and can be seamlessly incorporated into existing decision tree frameworks. This makes it particularly suitable for resource-constrained environments and applications requiring low-latency inference, such as financial analytics, healthcare diagnostics, and embedded AI systems. Impact Statement— The impurity criterion used in decision trees has a direct impact on model accuracy, training efficiency, and generalization. This work introduces Adaptive Weighted Impurity (AWI)—a hybrid impurity function that combines entropy and Gini via adaptive weighting. AWI consistently improves decision tree performance across a range of dataset sizes and class imbalances, without compromising interpretability. The proposed approach is especially valuable for real-world deployments in domains like medical diagnosis and financial risk modeling, where decision transparency, speed, and accuracy are critical.
Introduction
Decision trees are popular, interpretable machine learning models for classification and regression. They split data based on impurity measures like entropy and Gini impurity to create more homogeneous groups.
Entropy is based on information theory and measures uncertainty; it is sensitive to class distribution but computationally heavier due to logarithmic calculations.
Gini impurity measures the probability of misclassification; it is computationally efficient and more stable on imbalanced datasets.
Traditionally, the choice between entropy and Gini is fixed and not tailored to specific data characteristics, which limits adaptability.
Proposed Method: Adaptive Weighted Impurity (AWI)
AWI is a new hybrid impurity measure that dynamically combines entropy and Gini impurity using an adaptive, data-driven weighting scheme. This aims to leverage the strengths of both metrics:
Entropy-normalized weighting: Increases entropy's influence when uncertainty is high, emphasizing splits that reduce class ambiguity.
Class imbalance-aware weighting: Increases Gini's weight when one class dominates, benefiting from its stability and efficiency on imbalanced data.
where w(S)∈[0,1]w(S) \in [0,1]w(S)∈[0,1] adapts based on node characteristics.
Evaluation and Benefits
AWI improves learning outcomes by adapting to different data distributions dynamically.
Empirical tests show AWI performs well across datasets of varying sizes and class balances.
It integrates smoothly with ensemble methods like Random Forests and Gradient Boosted Trees.
AWI enhances scalability and performance without extra tuning or added complexity.
Additional Notes
Decision trees require regularization (pruning) to avoid overfitting.
Entropy’s computational cost can be significant on large datasets, while Gini is favored in real-time or large-scale contexts.
AWI offers a practical, interpretable, and adaptable impurity measure improving on traditional static choices.
Conclusion
In this study, we proposed Adaptive Weighted Impurity (AWI)—a novel, hybrid impurity criterion that dynamically blends entropy and Gini impurity through a data-driven weighting strategy. By adapting to the statistical properties of each decision node, AWI offers a principled mechanism to balance the interpretability and performance of decision tree models.
Through both theoretical formulation and empirical evaluation on datasets of increasing complexity—Iris (balanced and simple), Titanic (real-world and imbalanced), and MNIST (large-scale and high-dimensional)—AWI consistently demonstrated superior or competitive performance across key metrics: classification accuracy, training time, tree depth, and memory efficiency.
Unlike traditional impurity functions that remain static throughout the tree, AWI intelligently adjusts its weighting scheme:
• Favoring entropy in balanced or uncertain distributions, enhancing sensitivity to subtle class distinctions;
• Leveraging Gini impurity in skewed datasets, promoting robustness and generalization.
This adaptive behavior results in shallower, more interpretable trees with a reduced tendency to overfit, making AWI especially suitable for domains where transparency and efficiency are critical.
Moreover, AWI is:
• Modular—requiring no changes to the underlying decision tree architecture;
• Lightweight—introducing negligible computational and memory overhead;
• Parameter-free—eliminating the need for additional tuning or configuration.
These attributes make AWI an attractive choice for deployment in real-world applications such as finance, healthcare, IoT, and edge computing, where models must be interpretable, efficient, and scalable.
In future work, AWI could be extended to ensemble methods such as Random Forests or Gradient Boosted Trees, and further explored in the context of feature importance analysis, cost-sensitive learning, or streaming data environments.
References
[1] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA, USA: Wadsworth International Group, 1984.
[2] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, Mar. 1986.
[3] J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufmann, 1993.
[4] C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,” Journal of Statistical Physics, vol. 52, no. 1–2, pp. 479–487, 1988.
[5] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, Aug. 1996.
[6] D. Zimmermann, “Asymmetric impurity functions, class weighting, and optimal splits for binary classification trees,” arXiv preprint, arXiv:1904.12465, Apr. 2019.
[7] Y. Wang, C. Song, and S.-T. Xia, “Unifying decision trees split criteria using Tsallis entropy,” arXiv preprint, arXiv:1511.08136, Nov. 2015.
[8] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992.
[9] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[10] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936. (Iris dataset)
[11] Kaggle, “Titanic: Machine Learning from Disaster,” [Online]. Available: https://www.kaggle.com/competitions/titanic [Accessed: May 2025].
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. (MNIST dataset)
[13] A. Belayneh, R. Choe, and E. Chow, “Entropy-weighted hybrid impurity criterion for classification trees,” Applied Soft Computing, vol. 125, 2022, Art. no. 109248.