AI in Finance: LLMs disrupt financial forecasting with unmatched accuracy and speed

LLM-based frameworks are enabling highly structured financial analyses by integrating diverse data sources into sophisticated decision pipelines. One of the most notable cases is MarketSenseAI, which employs GPT-4 to synthesize real-time financial news, fundamental data, and macroeconomic indicators into actionable investment signals. The model achieved a staggering 72% cumulative return over 15 months for S&P 100 stocks, significantly outperforming market baselines.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 07-07-2025 09:34 IST | Created: 07-07-2025 09:34 IST
AI in Finance: LLMs disrupt financial forecasting with unmatched accuracy and speed
Representative Image. Credit: ChatGPT

A new study published on arXiv offers a sweeping overview of how Large Language Models (LLMs) are transforming the financial landscape. It paints a compelling picture of an AI-driven financial future, where LLMs don’t just support decision-making but become integral to market analysis, portfolio strategy, and risk management.

Titled “Integrating Large Language Models in Financial Investments and Market Analysis: A Survey”, the paper categorizes current innovations into four dominant frameworks: LLM-based frameworks and pipelines, hybrid integration methods, fine-tuning and adaptation strategies, and agent-based architectures.

With a structured analysis of dozens of recent models and strategies, the paper underscores a decisive shift from traditional methods to AI-augmented decision-making in portfolio management, risk assessment, and financial forecasting.

How are LLM-based drameworks and pipelines reshaping traditional financial modeling?

LLM-based frameworks are enabling highly structured financial analyses by integrating diverse data sources into sophisticated decision pipelines. One of the most notable cases is MarketSenseAI, which employs GPT-4 to synthesize real-time financial news, fundamental data, and macroeconomic indicators into actionable investment signals. The model achieved a staggering 72% cumulative return over 15 months for S&P 100 stocks, significantly outperforming market baselines.

Another framework, Ploutos, merges textual and numerical data using interpretability mechanisms like rearview-mirror prompting and token weighting. It effectively rationalizes stock movements, a capability that bridges the critical gap between black-box AI outputs and analyst accountability.

ChainBuddy further democratizes AI integration by enabling non-experts to build customizable LLM workflows using natural language. GPT-InvestAR utilizes LLMs to distill insights from SEC filings, combining them with machine learning models to predict stock performance. This hybrid model surpassed the S&P 500 in forecasting returns.

Other frameworks such as LLMoE and FinLlama deploy mixture-of-expert architectures to optimize expert selection based on dynamic market inputs, integrating textual and quantitative data for enhanced predictive power. These frameworks illustrate how modular LLM systems are being tailored for task-specific financial objectives, driving greater reliability and precision in investment analysis.

What Hybrid and Fine-Tuned Approaches Improve Financial Prediction Accuracy?

The paper details an expanding body of work combining traditional financial indicators with LLM-powered sentiment analysis, feature engineering, and signal extraction. One such example is the ChatGPT-based Investment Portfolio Selection method, where stocks were selected based on LLM-generated suggestions and then integrated into quantitatively optimized portfolios. These portfolios achieved superior Sharpe ratios and demonstrated resilience during market turbulence.

Other hybrid systems, such as MuSA, leverage multimodal sentiment analysis with reinforcement learning and LLMs to fine-tune portfolio weights based on real-time market changes. SEP (Summarize-Explain-Predict) uses self-reflective learning to offer explainable predictions by refining stock movement forecasts through human-like reasoning without requiring labeled data.

Fine-tuning efforts, such as those in FinLlama and LLaMA-2-based models, adapt pre-trained LLMs to financial domains using curated datasets. These models outperform general-purpose LLMs in sentiment classification and return prediction, especially when trained using domain-specific corpora like MD&A filings, analyst reports, and climate sentiment data.

The Stock-Chain framework exemplifies the power of retrieval-augmented generation (RAG) in combination with LLMs. Using a custom dataset (AlphaFin), it outperformed classical and domain-specific financial models with an annualized return of over 30%.

In the area of earnings-based forecasting, GPT-3.5 and LLaMA models fine-tuned using earnings call transcripts and analyst ratings have demonstrated strong performance in short-term return predictions. Emotional tone extraction from financial headlines using distilled RoBERTa models has also emerged as a novel predictor of price trends.

How Are Multi-Agent LLM Architectures Driving Autonomous Financial Decision-Making?

Perhaps the most transformative aspect of the survey is its deep dive into agent-based architectures. These systems simulate real-world trading environments using multiple specialized AI agents working in collaboration. The Optimized AI-Agent Collaboration framework employs agents with distinct responsibilities, such as sentiment analysis and fundamental data extraction, coordinated through either horizontal or vertical structures. This design improves performance in risk assessment and investment decision-making.

Another system, Alpha-GPT 2.0, incorporates human-in-the-loop design to allow analysts to refine model outputs interactively, thus aligning AI decisions more closely with expert judgment. The FS-ReasoningAgent takes a novel approach by separating factual and subjective reasoning through multiple LLM agents, adapting the weight of each input stream based on market conditions. This approach demonstrated superior performance in both bull and bear markets.

FINCON takes a hierarchical agent-based design to a new level, utilizing Conceptual Verbal Reinforcement and dual-layer risk controls for optimized portfolio management. Tested on stocks like Tesla, Apple, and Amazon, it consistently outperformed traditional ETFs and deep reinforcement learning (DRL) baselines.

Simulation-based systems like StockAgent and TwinMarket replicate investor behaviors to test how AI agents respond to changing economic and sentiment conditions. TwinMarket, for example, models behavioral biases such as herding and overconfidence using BDI-based agents powered by GPT-4o. These simulations accurately reproduce macro-level financial phenomena such as boom-bust cycles and speculative bubbles.

Finally, the Agent Trading Arena evaluates LLMs’ numerical reasoning through visual data and reflection mechanisms. By combining chart-based insights with iterative performance analysis, this framework showcases how LLM agents can learn and evolve trading strategies in a dynamic market environment.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback