Generative AI is transforming web search, but transparency may be the next casualty

Perhaps the most striking discovery is how LLM-powered search engines balance between retrieved web knowledge and internal model knowledge. Traditional engines simply rank documents, but generative models merge live search data with pre-trained knowledge embedded during model training.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 05-11-2025 10:03 IST | Created: 05-11-2025 10:03 IST
Generative AI is transforming web search, but transparency may be the next casualty
Representative Image. Credit: ChatGPT

A new study provides the first systematic characterization of how web search is transforming in the age of generative AI. The research, titled “Characterizing Web Search in the Age of Generative AI”, published on arXiv compares traditional ranking-based search engines like Google with generative search systems developed by OpenAI and Google to understand how these new models retrieve, synthesize, and present information differently.

The findings raise fundamental questions about accuracy, transparency, and diversity in the next generation of information retrieval, warning that the shift from “ranked lists” to “generated answers” is not a simple evolution, it’s a paradigm shift that will redefine how humans access and trust online information.

Generative search changes what users see and how they see it

Traditional web search relies on ranked retrieval, displaying an ordered list of web pages for users to evaluate. Generative search, by contrast, produces a cohesive, conversational answer synthesized from multiple sources using large language models (LLMs).

The authors examined four generative engines from Google and OpenAI, comparing them to Google’s standard search interface across four different query domains. They assessed three dimensions, source coverage, knowledge provenance, and conceptual diversity, to uncover how generative engines reconstruct information.

Unlike conventional search results, where users can identify sources directly from snippets and links, generative engines blend retrieved content with model-internal knowledge. The output is not a list of references but a single, unified narrative. While this improves readability and reduces cognitive effort, it also obscures transparency, making it harder for users to verify where specific facts originate.

The researchers found that generative systems tend to broaden topical coverage, pulling from a wider set of sources than users typically explore manually. However, this comes at the cost of source visibility and content provenance, key foundations of search reliability.

A new kind of knowledge mix: Model memory meets the open web

Perhaps the most striking discovery is how LLM-powered search engines balance between retrieved web knowledge and internal model knowledge. Traditional engines simply rank documents, but generative models merge live search data with pre-trained knowledge embedded during model training.

This blend can lead to novel but unverifiable synthesis. For instance, generative search may surface insights or interpretations not explicitly present in any single source, raising both promise and peril: it can generate comprehensive context, but also hallucinate or distort information when grounding fails.

The study stresses that evaluation metrics used for ranked search, such as precision, recall, or click-through rates, are no longer sufficient. Generative search needs new benchmarks that account for concept diversity, semantic coherence, and factual attribution. Without this shift, researchers argue, there’s a risk of overestimating the performance of AI-powered search systems.

The authors also note that concept diversity, the range of distinct ideas or themes present in responses, is often higher in generative results than in ranked lists. This means generative engines can capture more nuanced or interdisciplinary perspectives, but their underlying selection process remains opaque.

Rethinking search evaluation and trust in the AI era

The research outlines a roadmap for rethinking web search evaluation in the generative age. Traditional search quality metrics were designed for user behavior shaped by clickable lists, not AI-synthesized summaries. When users no longer choose between sources but instead read a single narrative, trust must be earned through transparency, not position.

The study recommends the development of hybrid evaluation frameworks that measure:

  • Coverage: How comprehensively the generated response reflects the diversity of web content.
  • Provenance: How clearly the system attributes its sources.
  • Knowledge balance: How much of the output relies on internal model memory versus live retrieval.

These frameworks would help regulators, developers, and information scientists assess generative engines more rigorously. The authors stress that future search systems must make source reliance explicit, revealing whether an answer stems from retrieved data, pre-trained memory, or both.

Ethical implications also loom large. Since generative answers can reshape user understanding, the study warns that bias and misinformation risks increase when the origins of knowledge are hidden. 

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback