AI literacy key to unlocking cost savings in smart home energy management
A new study finds that AI literacy, rather than technical knowledge of home energy systems, determines whether households can effectively use large language models to reduce electricity bills.
The study, titled Human-AI Collaboration in Large Language Model-Integrated Building Energy Management Systems: The Role of User Domain Knowledge and AI Literacy, analyzed how 85 participants interacted with a GPT-driven building energy management system in a structured behavioral experiment.
Equal interaction, unequal outcomes
The research team designed a role-playing experiment that simulated a real-world scenario. Participants were asked to assume the role of a homeowner in Austin, Texas, with access to detailed appliance-level energy consumption data collected at 15-minute intervals during peak summer months. The home operated under a time-of-use rate structure, meaning electricity prices varied by time of day.
Participants were tasked with identifying five behavioral changes that would reduce energy bills without compromising comfort or daily routines. They interacted with OpenAI’s GPT-4o model, which functioned as a prototype LLM-integrated building energy management system. The model had access to uploaded datasets and was expected to analyze appliance use patterns, identify high-consumption devices, and propose cost-saving strategies.
Before the experiment, participants completed a screening survey that assessed two variables: domain knowledge in building energy and AI literacy. Domain knowledge captured self-reported understanding of home energy use, billing structures, appliances, and systems such as HVAC. AI literacy measured familiarity with generative AI tools, frequency of use, prompt engineering experience, and perceived expertise.
Participants were then divided into four groups: low domain knowledge and low AI literacy; low domain knowledge and high AI literacy; high domain knowledge and low AI literacy; and high domain knowledge and high AI literacy. The goal was to determine which combination produced the most effective human-AI collaboration.
Contrary to expectations, interaction patterns across all four groups were strikingly similar. Metrics such as total conversation turns, average prompt length, and the ratio between user prompts and GPT responses did not significantly differ. Most participants relied on short, direct prompts and allowed GPT to generate long, structured answers.
This uniformity extended to reasoning patterns. Whether participants were energy novices or experts, and whether they were AI beginners or frequent users, they displayed similar levels of information seeking, constraint articulation, solution evaluation, and commitment expression during the dialogue.
The finding challenges a common assumption in human-AI research: that subject matter experts will naturally engage in deeper, more critical exchanges with AI systems. In this experiment, domain knowledge alone did not translate into longer conversations, more complex prompts, or higher levels of analytical scrutiny.
The results suggest that LLMs may act as cognitive equalizers. When given access to structured data and a capable AI assistant, users across expertise levels tend to follow similar interaction paths. However, equal interaction did not mean equal performance.
AI literacy emerges as the critical factor
While 19 of the 20 measured metrics showed no statistically significant differences between groups, one stood out: appliance identification rate. This metric assessed whether participants correctly identified the key appliances with the highest energy-saving potential, based on expert-defined reference solutions.
The only statistically significant difference across groups was in this outcome measure, and it was driven by AI literacy, not domain knowledge.
Participants with higher AI literacy were more accurate in identifying the appliances that mattered most for cost reduction. These users were better able to interpret GPT’s output, discern which suggestions aligned with the data, and translate AI recommendations into correct strategic decisions.
Importantly, this advantage did not stem from writing longer prompts or engaging in more turns of dialogue. High AI literacy participants did not interact more frequently or more verbosely. Instead, their edge appeared in how they processed the AI’s responses.
The study suggests that AI literacy operates primarily at the interpretation stage. Users who understand how generative models work are better equipped to critically evaluate output, recognize when suggestions are incomplete or misaligned, and adjust their reasoning accordingly.
On the other hand, participants with strong domain knowledge but lower AI literacy did not consistently outperform others. In some cases, they appeared to treat GPT as a confirmation tool rather than an active collaborator, accepting its suggestions without deeper interrogation.
The findings complicate the narrative that energy expertise alone is sufficient for effective AI collaboration. In the context of LLM-integrated building systems, knowing how to think with AI may matter more than knowing the subject matter itself.
Passive prompting and the risk of overreliance
Another key finding concerns user behavior. Across all groups, the median prompt length was brief, and GPT responses were often significantly longer. The imbalance in dialogue suggests that participants treated the model as a problem-solving engine rather than a co-creative partner.
This passive prompting style reflects a broader behavioral pattern. Many users posed a single directive question and accepted the first plausible set of recommendations. Few engaged in iterative refinement, challenged assumptions, or explored alternative strategies unless prompted by specific concerns.
The structure of GPT’s responses may have contributed to this dynamic. Comprehensive, multi-point answers can create cognitive overload, reducing users’ inclination to probe further. In such cases, verbose output may inadvertently suppress user agency.
The researchers argue that effective LLM-based energy systems should incorporate design features that promote active engagement. Adaptive output modes, modular recommendations, and guided follow-up prompts could help users move from passive consumption of AI advice to critical co-construction of solutions.
Measuring conversation quality with SCALE
To analyze human-AI interaction in detail, the researchers developed a structured evaluation framework called GPT-based Structured Context-Aware Language Evaluation, or SCALE.
Rather than relying on keyword matching, the framework assessed conversations across four weighted dimensions: explicitness, depth, consideration, and evidence. These criteria captured whether participants used appropriate domain terminology, demonstrated contextual reasoning, integrated concepts into decision-making, and supported claims with concrete details.
SCALE was applied to both user prompts and GPT responses. The analysis revealed a consistent pattern: GPT’s responses scored high on energy consumption analysis, cost awareness, and technical knowledge, regardless of the participant group. In contrast, user contributions in technical reasoning and appliance-level analysis were often minimal.
This gap indicates that GPT compensated for areas where participants lacked technical input. The model’s baseline analytical performance was strong enough to identify obvious high-impact appliances, such as HVAC systems and pool pumps, in most cases.
However, GPT’s surface-level reasoning sometimes failed to uncover less obvious but still valuable strategies. In scenarios where participants actively questioned and redirected the model, outcomes improved. In cases where they did not, alignment with expert solutions dropped.
The research underscores that LLM performance alone does not guarantee optimal collaboration. The quality of outcomes depends on how users steer, evaluate, and refine AI outputs.
Educational gains and policy implications
The study examined participants’ perceptions. Post-experiment surveys showed that most participants found the GPT-assisted system easy to understand and educational. Self-reported understanding of home energy use and billing improved after the session, even though the interaction lasted only a single sitting.
A majority indicated they would consider using an AI-based system to manage their home energy patterns. Many also expressed a desire to learn techniques that could enhance GPT responses, suggesting awareness of a gap between basic usage and effective prompting.
These findings highlight the dual role of LLM-integrated BEMS as both decision support tools and learning platforms. By exposing users to appliance-level data and cost structures, such systems may enhance energy literacy alongside bill savings.
Utilities that collect granular consumption data could deploy LLM-integrated systems to help customers interpret usage patterns and adapt behavior under time-of-use pricing schemes. Policymakers could use AI-mediated interactions to assess how households respond to energy interventions and where knowledge gaps persist.
The study also acknowledges limitations. Participants were not actual residents of the test home, and the experiment was limited to a single session. Domain knowledge and AI literacy were self-reported, and the model used was a general-purpose LLM without domain-specific fine-tuning.
Future research may examine real homeowners over extended periods, integrate retrieval-augmented generation or agent-based frameworks, and explore demographic effects on interaction patterns.
- FIRST PUBLISHED IN:
- Devdiscourse

