There has been a rapid urbanization, due to which populationgrowthincitieshasincreasedandwillincreasefurther in the developing cities of India. Gurgaon, one of India’s fastest- growing cities, has emerged as a major real estate hub today, thanks to its proximity to the capital, growing infrastructure, corporate presence and improved connectivity, due to which real estate decision making has become very complex for buyers, investors and people working there.
ThisstudyservesasacomprehensiveAI-drivenframework as it provides property price predictions, market analysis, and personalized recommendations using a dataset of near 4,000 residential listings which is web-scraped from an online website 92acres.com.
Thisresearchpresentsanend-to-enddatasciencepipeline thatincorporatesrigorouspreprocessingincludingIQR-based outlier treatment andadvancedfeatureengineeringtoextract therequiredfeaturessuchasluxuryamenities,servantroom availability, floornumber, andsoon.Toensurehighavailabil- ity,amulti-modelcomparisonwasalsoconducted,evaluating 11algorithms, including linearregression,randomforest,and XGBoost, across various encoding strategies such as one-hot and targetencoding. ThefinalR-squaredscoreisapproximately0.90. Inaddition,arecommendationmodulewasdevelopedusing cosinesimilarity onfeaturevectorsandlocationadvantagesto provideuserswithrelevantpropertyalternatives.Thissystemhas been deployed using a Streamlit-based dashboard. Furthermore, ananalyticsmodulefeaturinginteractivegeo-mapsprovides real-timeinsights. Wehavecreatedafunctionalrealestate modulefromrawwebdatathatprovidesstakeholderswithbetterinsights.
Introduction
This study presents an AI-powered real estate platform for the Gurgaon housing market that combines house price prediction, market analysis, and property recommendations into a single Streamlit dashboard. The system addresses the challenges of property valuation by using machine learning instead of traditional broker-based methods.
The platform is built around three key modules:
Price Prediction Module: Uses machine learning models, particularly XGBoost and Random Forest, to estimate property prices based on features such as location, area, number of rooms, amenities, and luxury score.
Analyzer Module: Provides interactive visualizations, including heatmaps, geospatial maps, and charts, to help users understand market trends and price variations across Gurgaon.
Recommendation Module: Employs cosine similarity to recommend similar properties based on user preferences and property characteristics.
The dataset consists of over 3,900 property listings scraped from 99acres.com, including flats and independent houses. Data preprocessing involved handling missing values, removing outliers using the Interquartile Range (IQR) method, and extracting useful features. Multiple regression models were evaluated, with XGBoost achieving the best performance after hyperparameter tuning and log transformation of prices.
The system was integrated into a Streamlit web application using serialized machine learning models for real-time predictions. Experimental results showed strong performance, with an R² score of 0.90, MAE of 0.44 (log scale), and a recommendation similarity score of 0.85. Compared to other models such as Random Forest, Extra Trees, Linear Regression, and SVR, XGBoost delivered the highest prediction accuracy.
Conclusion
This research reports the development of an AI-based plat- form used to analyze, predict, and recommend properties inthe Gurgaon real estate market, derived from a large 99acres dataset. Moving beyond usual guesswork, data-based valua- tions are presented. The XGBoost model, when paired with log-normal transformation, significantly outperforms linear regressionandSVM,achievingahighaccuracyofR2=0.90[1][2][4][9].
Theplatformprovidescompetitivepriceforecastsaswellas relevant property recommendations. Complex machine learn- ing models have been integrated into a user-friendly dash- board,makingitequallyaccessibletotheaveragehomebuyer and industry professionals. This constitutes a practical toolthat helps navigate India’s most complex property markets by connecting analysis with real-world financial decisions. The solution is also scalable and may be deployed in other smart cities [11].
References
[1] S. Nagula, “Real Estate Price Prediction Using Machine LearningModels,” IJRASET, ISSN: 2321-9653, vol. 13, July 2025. https://doi.org/10.22214/ijraset.2025.72962
[2] T. Zhao, “Predicting House Prices Using Machine Learning Models,”Transactions on Computer Science and Intelligent Systems Research,ISSN: 2960-1800, vol. 9, AIDML 2025.
[3] R.CellmerandK.Kobylin´ska,“HousingPricePrediction–Ma-chine Geostatistical Methods,” Real Estate Management and Valuation,vol. 33, no. 1, 2025. https://doi.org/10.2478/remav-2025-0001
[4] O. Pastukh and V. Khomyshyn, “Using Ensemble Methods of MachineLearningPredictRealEstatePrices,”arXiv:2504.04303,2025.https: //doi.org/10.48550/arXiv.2504.04303
[5] Singaravelu, Muthuselvan, et al., “Real Estate Price Prediction SystemUsing Machine Learning Algorithm,” AIP Conference Proceedings,vol.3175,no.1,AIPPublishingLLC,2025.https://doi.org/10.1063/5.0254265
[6] K. Singh, M. Mishra, and Er. S. Singh, “Content-based RecommenderSystem Using Cosine Similarity,” IJRASET, ISSN: 2321-9653, vol. 12,
[7] IssueV,May2024.https://doi.org/10.22214/ijraset.2024.61835
[8] Dr. K. Malpni, “Detecting Outliers for Single Dimensional Data UsingInterquartileRange,”JournalofEngineeringResearchandApplication,ISSN:2248-9622,vol.9,Issue9,pp.31–35,September2019.https://doi.org/10.9790/9622-0909013135
[9] H. Sharma, H. Harsora, and B. Ogunleye, “An Optimal House PricePrediction Algorithm: XGBoost,” Analytics, vol. 3, pp. 30–45, 2024.https://doi.org/10.3390/analytics3010003
[10] H. Li, “House Price Prediction and Analysis Based on Random ForestandXGBoostModels,”HighlightsinBusiness,EconomicsandManage-ment,vol.21,pp.934–938,2023.https://doi.org/10.54097/hbem.v21i.14837
[11] M. Geerts, M. Reusens, B. Baesens, S. vanden Broucke, and J. DeWeerdt, “On the Performance of LLMs for Real Estate Appraisal,” inECMLPKDD2025,LectureNotesinComputerScience,vol.16021,
[12] Springer,Cham,2026.https://doi.org/10.48550/arXiv.2506.11812
[13] P. Gu¨mmer, J. Rosenberger, M. Kraus, P. Zschech, and N. Hambauer,“UnveilingLocation-SpecificPriceDrivers:ATwo-StageClusterAnal-ysisforInterpretableHousePricePredictions,”arXiv:2508.03156,2025.https://doi.org/10.48550/arXiv.2508.03156