Smart-city data may become easier to use with LLM-powered dashboards

Smart-city data may become easier to use with LLM-powered dashboards
Representative image. Credit: ChatGPT

Large language models (LLMs) could help cities open complex environmental data to residents, policymakers and other non-technical users, but new research warns that the same systems may also create risks around trust, accuracy, transparency and long-term dependence on commercial AI providers. The researchers developed and tested a natural language dashboard where users can ask everyday questions about urban air quality and receive database-backed results and visualisations.

Published in Sustainability, the study Towards Democratising Urban Sustainability Data: An LLM-Enabled Natural Language Interface for Smart-City Air-Quality Decision Support presents a proof-of-concept system that uses LLMs to translate natural language questions into executable database queries. The research focuses on smart-city air-quality data as a test case and finds that proprietary GPT-based models outperformed the evaluated open-source alternatives in producing accurate text-to-SQL results, while raising wider concerns about governance, inclusivity, reproducibility and data quality.

Smart-city data remains hard to access for many users

Cities are collecting growing volumes of environmental data through sensors, monitoring networks and real-time digital infrastructure. Air-quality data is among the most policy-relevant forms of this information because it can support pollution monitoring, urban planning, public health decisions and community awareness.

However, the study identifies a persistent access problem. Urban data may be publicly valuable, yet it often remains difficult for non-specialists to use. Traditional databases require knowledge of structured query language, data schemas, filtering rules and visualisation tools, which means residents, journalists, planners and some public officials may depend on technical experts to extract answers from data that could otherwise inform daily decisions and public debate.

The study claims that natural language interfaces could reduce that barrier. Instead of writing database queries, users could ask questions such as where air pollution is highest, how pollutant levels changed over a period, or which locations show unusual readings. The system would then generate the database query, retrieve the relevant information and display the result in a usable format.

Air quality is used as a representative sustainability domain in the study because the data is abundant, policy-relevant and useful to many stakeholders. The same design pattern could later be applied to other urban sustainability datasets, including transport, noise, energy or infrastructure performance, the study claims.

The proposed dashboard is not presented as a finished public service; it's a proof of concept meant to test whether large language models can act as a natural language access layer for complex smart-city data. Its broader purpose is to show how AI could make urban data more inclusive while identifying the safeguards needed before such systems are used in public decision-making.

LLM dashboard turns natural language into data queries

The prototype allows users to submit questions through a web interface. The system sends the question, along with the database structure and output rules, to a large language model and then the model returns a structured response containing a database query, a short description of the query and a classification of the expected output type. The generated query is then checked, executed against a smart-city database and used to return results for visualisation. The dashboard can present spatial data on an interactive map and time-based outputs as charts. Users can also access data linked to the answer.

The system was built around a modular architecture involving a user interface, a web application, an LLM service and a PostgreSQL database. The database used spatial extensions to support location-based queries, while the front end used mapping and charting tools to display outputs. The researchers also incorporated earlier analytical dashboard functions, including sensor forecasting and network health monitoring. This implies that the natural language layer is designed to sit alongside existing smart-city analytics, not replace them.

A key technical task in the study was text-to-SQL generation- a process of turning a natural language question into a valid SQL query that can retrieve the correct data from a relational database. The challenge is not just syntax. A query may run successfully but still answer the wrong question if it uses the wrong pollutant, time window, sensor group, aggregation method or location filter.

To test model performance, the researchers created a controlled benchmark of 100 natural language questions across operational, reporting, exploratory and trend-analysis categories. The questions covered major air pollutants including PM2.5, PM10, carbon monoxide, ozone and nitrogen dioxide. Each question type included alternative phrasings to test whether the models could handle variation in ordinary language. The evaluation measured whether the generated SQL both executed successfully and matched the intended meaning of the question. This is crucial because a query can look technically valid while still producing misleading results.

Proprietary GPT models led the benchmark, but trade-offs remain

The study compared proprietary GPT-based models with several open-source models, including LLaMA-based systems, Qwen and SQL-oriented models. The highest-performing proprietary GPT model was selected for integration into the dashboard after the benchmark. The results showed a large performance gap between the evaluated model groups. Proprietary GPT-based models achieved the strongest observed accuracy and consistency in this specific text-to-SQL task. Open-source models performed less reliably and showed greater sensitivity to changes in question type and phrasing.

Why does this finding matter for city data systems? A natural language dashboard used by residents or public agencies must handle varied, informal and sometimes ambiguous questions. If a model fails when wording changes slightly, the system may not be reliable enough for real-world use.

Stronger performance also comes with governance concerns. Proprietary models are typically accessed through commercial APIs. Their training data, architecture and optimisation processes are not fully visible to users or public agencies, creating a tension between performance and transparency.

The researchers warn that a city system built around a commercial model could face vendor dependency. If pricing, access rules or model behaviour changes, public-sector users may find it costly or difficult to switch providers. Open-source models offer greater control and transparency, but the study found that the evaluated open models did not match the leading proprietary models for this domain-specific task. The findings thus point to a policy and procurement dilemma. Cities may be drawn to proprietary systems because they work better now, but relying on them for public data infrastructure could create long-term risks. The study suggests that abstraction frameworks could make it easier to switch models, but real-world replacement would still require testing and adjustment.

The research also highlights the importance of prompt design and schema context. Models performed the task after receiving structured instructions, database definitions and output constraints. This suggests that system design, not just model choice, shaped performance. The dashboard's reliability depends on careful prompting, validation and output handling.

Implications and limitations for AI-enabled sustainability data

LLM-enabled natural language interfaces could make urban environmental data more accessible, but only if they are built with transparency and validation at the centre. The technology can lower technical barriers, yet it also shifts some analytical responsibility to an AI system that users may not understand or be able to audit.

One risk is misplaced trust. A user may receive a polished visualisation or numerical result without knowing whether the generated query included all relevant sensors, applied the correct time range or used the right aggregation. In public-facing systems, that risk could affect journalism, planning, environmental monitoring and local policy choices.

The authors identify several safeguards. Systems could show the generated SQL for technical users while also providing a plain-language explanation for non-specialists. The explanation should make clear which data sources were used, what filters were applied and how the result was calculated. The system could also flag ambiguous questions and ask users to clarify before producing an answer.

Additional validation layers may also be needed. Rule-based checks could detect missing filters, unusual restrictions or inconsistent aggregation logic. Confidence signals or warnings could help users know when results need further review. These safeguards would be especially important when outputs are used for official decision support or public communication.

Data quality is another limitation. Even if the AI generates the correct query, results can still be wrong or misleading if sensors are noisy, missing data, delayed, poorly calibrated or unevenly distributed across the city. The researchers note that natural language access must be paired with sensor validation and data quality monitoring if it is to support operational sustainability workflows.

The study also has methodological limits. The benchmark used synthetic questions created for controlled testing rather than real user queries. That allowed consistent comparison across models, but real users may ask messier, more ambiguous or more locally specific questions. The authors say future work should include live deployments, user studies and naturalistic query collection to test usability, trust and accessibility across different stakeholder groups.

The research did not conduct formal user testing. Therefore, it establishes technical feasibility rather than proving that the system improves decision-making in practice. Future evaluations will need to examine whether residents, officials, journalists and community groups can use such systems effectively and whether they understand the limits of AI-generated answers.

  • FIRST PUBLISHED IN:
  • Devdiscourse

TRENDING

OPINION / BLOG / INTERVIEW

Smart-city data may become easier to use with LLM-powered dashboards

Digital skills can decay before workers realize they are obsolete

Workers see AI as helpful, but fear losing credit for their own expertise

GPTs, chatbots and machine learning drive new wave of AI clinical trial records

DevShots

Latest News

Connect us on

LinkedIn Quora Youtube RSS
Give Feedback