How Data Science is Powering the Next Wave of Agro-Tourism in Rural India
Researchers from Symbiosis International University and Odia Generative AI used machine learning to identify key social, environmental, and economic factors driving sustainable agro-tourism in rural India. Their study found that family involvement, community participation, and eco-friendly practices are the strongest predictors of success for agro-tourism centers.
A new study by researchers from Symbiosis International (Deemed University), Pune, and Odia Generative AI, Berlin, has opened up a data-driven way to plan and develop sustainable agro-tourism centers in India. Combining fieldwork with machine learning, a branch of artificial intelligence, the research identifies the key factors that make agro-tourism successful, profitable, and eco-friendly. Conducted mainly in the Pune district of Maharashtra, the study shows how technology and community participation can work together to boost rural development.
Agro-Tourism: A Growing Force in Rural India
Agro-tourism is more than just a relaxing stay on a farm. It is a strategy that allows farmers to earn extra income while preserving their culture, land, and traditions. Visitors get to experience authentic rural life through activities like farming, local cuisine, and nature walks. According to the researchers, this model can help tackle challenges such as rural unemployment and migration by creating sustainable livelihoods close to home.
However, most previous studies focused on the concept of agro-tourism without using data to understand what truly drives its growth. This study changes that. It uses real-world data and advanced analytics to find out what makes an agro-tourism center successful and how these insights can guide future investments and policies.
How the Research Was Done
The researchers began with a thorough review of global and Indian studies on agro-tourism, collecting dozens of potential factors that might influence success. They grouped these into four main categories: demographic, environmental, economic, and socio-cultural. Demographic indicators included family involvement, training, accessibility, and accommodation. Environmental factors covered biodiversity, water conservation, and energy use. Economic indicators focused on expenses, investments, and employment, while socio-cultural ones emphasized community participation, heritage, and safety.
The fieldwork took place across 81 agro-tourism centers in Pune district, which is known for its mix of urban proximity and strong agricultural base. Farmers were interviewed through structured questionnaires designed to collect accurate and measurable data. The survey tested everything from local resources to social media usage. Questions were verified for reliability using the Cronbach’s Alpha method, ensuring that only high-quality data was analyzed.
Once the data was collected, it was cleaned, standardized, and analyzed using machine learning. The key technique applied was LASSO (Least Absolute Shrinkage and Selection Operator), a powerful algorithm that automatically filters out less important variables, highlighting only the strongest predictors. The team then tested four classification models, Logistic Regression, Decision Trees, Random Forest, and XGBoost, to determine which one best predicts agro-tourism success.
What the Data Revealed
From 86 original factors, the model identified 35 key indicators that have the biggest impact on agro-tourism growth. Among them, 14 were demographic, such as family involvement, professional training, participation in self-help groups, distance from the nearest city, and use of social media for promotion. Six environmental indicators, like water availability, conservation methods, and local crop yield, also proved crucial. On the economic side, daily employee expenses stood out, while in the socio-cultural category, safety measures were identified as key.
Initially, some of the machine learning models seemed to perform perfectly, achieving 100 percent accuracy, but this was a result of “overfitting,” where the model learns the training data too well and fails to generalize. To fix this, the researchers applied a method called SMOTE (Synthetic Minority Over-Sampling Technique), which balanced the dataset by generating synthetic examples for underrepresented cases. Once the data was balanced, the results became far more realistic and reliable.
The standout performer was Logistic Regression, which achieved 98 percent accuracy with a ROC-AUC score of 0.99 in the 70–30 data split, and 99 percent accuracy in the 80–20 split. Cross-validation tests confirmed the model’s stability, making it the most reliable tool for predicting which agro-tourism centers are likely to succeed.
Why It Matters
The findings show that the heart of agro-tourism lies in people and sustainability. Family participation, proper training, and community collaboration are just as vital as infrastructure or financial investment. Environmental care, such as responsible water management, clean energy use, and biodiversity protection, also plays a defining role in long-term success. Together, these factors create a balance between profit and preservation.
The study’s machine learning approach gives policymakers and farmers a way to make smarter decisions. Governments could use the model to identify promising regions for new agro-tourism projects or to plan training programs that strengthen local capacity. Farmers can use it to understand which factors they should focus on, like improving facilities, adopting green technologies, or marketing their centers through digital platforms.
A Data-Driven Future for Rural Tourism
The research marks a new step toward data-based rural development. It bridges tourism management, environmental sustainability, and artificial intelligence, fields that rarely come together in traditional planning. The study proves that machine learning can identify patterns that humans might overlook, offering a scientific foundation for sustainable agro-tourism strategies.
The researchers acknowledge that their work is limited to the Pune district and suggest expanding it to other agro-climatic zones in India. Doing so could lead to the creation of regional models that adapt to local cultures, landscapes, and economies.
Ultimately, this study is a story of how technology and tradition can work hand in hand. By using data to strengthen human-centered practices, India can turn its farms into vibrant centers of culture, learning, and sustainability. Agro-tourism, the researchers conclude, has the power not only to support farmers but to reconnect people with the land, and data may be the key to unlocking that future.
- FIRST PUBLISHED IN:
- Devdiscourse

