Over the past few years, the convergence of machine learning (ML) and artificial intelligent computing has transformed numerous industries, with agriculture being one of the major beneficiaries. This paper explores the usage of ML algorithms in crop yield prediction and the optimization of agricultural operations, with a focus on the Indian context, where agriculture is an important sector. Crop yield prediction is greatly dependent on environmental factors like soil content, humidity, rainfall, and other cultivation parameters. But conventional approaches, like historical averages, may not be able to capture the dynamic behaviour of these factors, and hence may not provide correct predictions. In order to overcome this limitation, the present research uses a variety of supervise learning models, viz., Random Forest, Naïve Bayes Classifier, and Decision Trees, to forecast the yield of crops and determine the appropriate crops for a given terrain under given weather conditions and soil characteristics. Results indicate that Naïve Bayes Classifier in crop classification gives the best predictions of crop yield with an accuracy of 99.74%, while. In addition, through the incorporation of IoT sensor inputs, the system provides farmers with data-driven information about crop locations, optimal planting, watering, and fertilizing strategies. The research proves that decision support systems based on ML can equip farmers with the means to maximize crop yields, minimize waste, and reduce environmental footprint, ultimately leading to more sustainable and resilient agricultural practices.
Introduction
1. Background & Motivation
Agriculture is India's primary economic activity, employing a large population and contributing significantly to GDP.
Over 60% of land is used for agriculture, but traditional decision-making methods (e.g., personal judgment) ignore critical factors like soil health, leading to:
Poor crop selection
Fertilizer misuse
Soil degradation
To address this, Machine Learning (ML) offers data-driven solutions for predicting optimal crops and yield using environmental and soil parameters.
2. Research Objective
The goal is to demonstrate how ML can enable precision agriculture:
Analyze how soil conditions affect crops
Recommend suitable crops based on data patterns
Maximize yields using predictive analytics and environmental factors
3. Literature Review Highlights
Various ML and deep learning models have been applied for crop and yield prediction:
Random Forest (RF) and Naïve Bayes (NB) yielded over 90% accuracy in some studies.
SVM, CNN, LSTM, and GBR have shown promising results in yield and fruit counting.
Limitations include lack of soil data in some studies and computational complexity in hybrid models.
4. Methodology
A. Dataset
Used Kaggle Crop Recommendation Dataset with 2,200 samples and 8 environmental variables:
Extra Trees (ET): Adds randomness for better generalization
5. Results & Insights
Top predictors of suitable crops:
????? Rainfall
???? Humidity
???? Soil nutrients: K, P, N
Temperature and pH also contributed, though less significantly.
ML models were effective in recommending crops tailored to soil and weather conditions.
Conclusion
In this study, the efficiency of many machine learning algorithms in identifying the best crop is compared given environmental and soil conditions. The models used for comparison are LR, NB, SVM, KNN, DT, RF, BG, GB, &ET.
As the first objective, we examined which environmental variables contribute mostto accurately predicting the crop. According to the ranking of feature importance, rainfall along with humidity were found to be the leading predictors. Nutrient levels in the soil, i.e., Potassium (K), Phosphorus (P), and Nitrogen (N), were also found to be strongly contributing, reflecting that crop yield and compatibility are largely determined by weather as well as soil health. Temperature and pH were relatively weaker in their contribution but still important for a complete picture of compatibility between soil and crops.
References
[1] Nischitha, K.,Vishwakarma, D., Mahendra, N., Ashwini & Manjuraju, M.R., 2020. Crop Prediction using Machine Learning Approaches. International Journal of Engineering Research & Technology (IJERT), 9(08). Available at: http://www.ijert.org [Accessed 6April 2025].
[2] Venugopal, A., Aparna, S., Mani, J., Mathew, R.& Williams, V., 2021. Crop Yield Prediction using Machine Learning Algorithms.IJERT 2021, 9. Available online: https://ieeexplore.ieee.org/abstract/document/8985951[Accessed on 19 April 2025].
[3] Agarwal S., and Tarar, S., 2021. A hybrid approach for crop yield prediction using machine learning and deep learning algorithms. In Journal of Physics: Conference Series (Vol. 1714, No. 1, p. 012012). IOP Publishing.
[4] van Klompenburg, T., Kassahun, A. and Catal, C., 2020. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture, 177, p.105709. https://doi.org/10.1016/j.compag.2020.105709
[5] Hani, N., Roy, P. and Isler, V., 2020. A comparative study of fruit detection and counting methods for yield mapping in apple orchards. Journal of Field Robotics, 37(2), pp.263–282. https://doi.org/10.1002/rob.21902
[6] Koirala, A., Walsh, K.B., Wang, Z. and McCarthy, C., 2019. Deep learning – Method overview and review of use for fruit detection and yield estimation. Computers and Electronics in Agriculture, 162, pp.219–234.
[7] Chlingaryan, A., Sukkarieh, S. and Whelan, B., 2018. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture, 151, pp.61–69. https://doi.org/10.1016/j.compag.2018.05.012
[8] Mishra, P., Khan, R. and Baranidharan, D.B., 2020. Crop yield prediction using gradient boosting regression. International Journal of Innovative Technology and Exploring Engineering, 9, pp.2293–2297.
[9] Lamos-Díaz, H., Puentes-Garzón, D.E. and Zarate-Caicedo, D.A., 2019. Comparison between machine learning models for yield forecast in cocoa crops in Santander, Colombia. Revista Facultad de Ingeniería, 29, p. e10853.
[10] Pradeep, G., Rayen, T.D.V., Pushpalatha, A. and Rani, P.K., 2023. Effective crop yield prediction using gradient boosting to improve agricultural outcomes. In: Proceedings of the 2023 International Conference on Networking and Communications (ICNWC), Chennai, India, 5–6 April 2023, pp.1–6.
[11] Yasaswy, M.K., Manimegalai, T. and Somasundaram, J., 2022. Crop yield prediction in agriculture using gradient boosting algorithm compared with random forest. In: Proceedings of the 2022 International Conference on Cyber Resilience (ICCR), Dubai, UAE, 6–7 October 2022, pp.1–4.
[12] Jothi, V.L., Neelambigai, A., Sabari, N.S. and Santhosh, K., 2020. Crop yield prediction using KNN model. International Journal of Engineering Research & Technology (IJERT), 8.
[13] Pavani, S. and Beulet, P.A.S., 2022. Prediction of Jowar crop yield using K-Nearest Neighbor and Support Vector Machine algorithms. In: Proceedings of the International Conference on Futuristic Communication and Network Technologies, Niagara Falls, ON, Canada, 9–11 August 2022.
[14] Sundari, M., Rekha, G., Krishna, V.S.R., Naveen, S. and Bharathi, G., 2023. Crop recommendation system using K-Nearest Neighbors algorithm. In: Proceedings of the 6th International Conference on Recent Trends in Computing, Chennai Campus, India, 14–15 December 2023, pp.581–589.
[15] Karn, R.K. and Suresh, A., 2023. Prediction of crops based on a machine learning algorithm. In: Proceedings of the 2023 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 23–25 January 2023, pp.1–8.
[16] Cheong, L.R.N., Kwong, K.F.N.K. and Du Preez, C.C., 2009. Effects of sugar cane (Saccharum hybrid sp.) cropping on soil acidity and exchangeable base status in Mauritius. South African Journal of Plant and Soil, 26, pp.9–17.
[17] Atharva Ingle, 2021. Crop Recommendation Dataset. Kaggle. Available at: https://www.kaggle.com/datasets/atharvaingle/crop-recommendation-dataset [Accessed 12 March 2025].
[18] M. Schott, \"Random Forest Algorithm for Machine Learning,\" Capital One Tech, Apr. 25, 2019. [Online]. Available: https://medium.com/capital-one-tech/randomforest-algorithm-for-machine-learning-c4b2c8cc9feb. [Accessed: Jul. 5, 2024].
[19] Alamma, B.H., Sharma, D., Chithra, H.N., Bhat, S., Suhana, B.V.A., Raj, A. and Ashok, G., 2024. Enhancing Lung Cancer Early Detection: A Hybrid Ensemble Model. Journal of Electrical Systems, 20(Special Issue 10), pp.01–06.
[20] Chakrabarty, N., Chowdhury, S. & Rana, S., 2020. A statistical approach to graduate admissions. Chance Prediction, pp.145–154.
[21] Sharma, D., 2023. Machine Learning Classifiers for Breast Cancer Diagnosis. International Journal of Engineering Research & Technology (IJERT), NCRTCA - 2023. Available at: https://www.ijert.org/research/NCRTCA-PID-098.pdf [Accessed 14 April 2024].
[22] GeeksforGeeks. (n.d.). Naive Bayes Classifiers. [online] Available at: https://www.geeksforgeeks.org/naive-bayes-classifiers/ [Accessed 15 February 2025].
[23] S. Ronaghan, \"The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark,\" Towards Data Science, May 12, 2018. [Online]. Available: https://towardsdatascience.com/the-mathematicsof-decision-trees-random-forest-and-feature-importance-inscikit-learn-and-spark-f2861df67e3# . [Accessed: 5 July 2024].
[24] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning,46(1-3),389-422.
[25] IBM, n.d. Bagging. [online] Available at: https://www.ibm.com/think/topics/bagging [Accessed 2 May 2025].
[26] Baladram, S., 2024. Gradient Boosting Regressor, Explained: A Visual Guide with Code Examples. [online] Medium. Availableat:https://medium.com/data-science/gradient-boosting-regressor-explained-a-visual-guide-with-code-examples-c098d1ae425c [Accessed 3 May 2025].
[27] Singh, A., 2025. Gradient Boosting Explained: Turning Weak Models into Winners. [online] Medium.Available at:https://medium.com/@abhaysingh71711/gradient-boosting-explained-turning-weak-models-into-winners-c5d145dca9ab [Accessed 6 May 2025].
[28] Baladram, S., 2024. Extra Trees, explained: A visual guide with code examples – Setting Random Forest ablaze with more randomness. [online] Medium. Available at: https://medium.com/@samybaladram/extra-trees-explained-a-visual-guide-with-code-examples-4c2967cedc75 [Accessed 4 May 2025].
[29] Chu, Z., Yu, J. and Hamdulla, A., 2020. Throughput prediction based on ExtraTree for stream processing tasks. [online] Available at: https://www.researchgate.net/figure/The-structure-of-ExtraTree_fig1_346995264 [Accessed 1 May 2025].
[30] Uppugunduri, V.N., Pandiyan, A.M., Raja, S.P. and Stamenkovic, Z., 2024. Machine learning-based crop yield prediction inSouth India: Performance analysis of various models. Computers, 13(6), p.137. https://doi.org/10.3390/computers13060137