Are LLMs good in-context learners for financial sentiment analysis?

Financial sentiment analysis (FSA) is critical for investors, analysts, and policymakers, offering insights into market trends through news reports, financial statements, and social media. Traditionally, FSA relied on machine learning models trained on labeled datasets, requiring domain-specific expertise. With the rise of large language models (LLMs), a new approach - in-context learning - is gaining traction as an alternative to fine-tuning. This method allows AI to analyze financial sentiment without extensive retraining, making market assessments more adaptable and efficient.
A recent study, "Are Large Language Models Good In-Context Learners for Financial Sentiment Analysis?" by Xinyu Wei and Luojia Liu, published as a conference paper at the ICLR 2025 Workshop on Advances in Financial AI, investigates whether modern LLMs can accurately perform financial sentiment analysis through in-context learning rather than traditional fine-tuning. The study evaluates multiple state-of-the-art LLMs, including GPT-4, Claude 3.5, DeepSeek V3, and Llama-3.1, analyzing their effectiveness in classifying financial sentiment without requiring model retraining.
Challenges in financial sentiment analysis and the role of LLMs
Financial sentiment analysis presents unique challenges compared to generic sentiment analysis. Financial markets operate within a complex, jargon-heavy, and often subjective environment, making it difficult for generic language models to accurately assess sentiment. Words like “bull” and “bear”, neutral in everyday language, carry strong positive and negative connotations in financial contexts. Additionally, financial documents often feature ambiguous phrasing, where sentiment is not explicitly stated but inferred from subtle cues.
Traditional sentiment analysis approaches in finance have relied on pretrained financial models like FinBERT, which require fine-tuning on labeled datasets. However, acquiring sufficient high-quality labeled financial sentiment data is challenging due to domain-specific knowledge requirements. Moreover, fine-tuning LLMs with hundreds of billions of parameters is computationally expensive and impractical for most financial institutions.
This study explores whether LLMs can bypass these challenges by learning in context - using examples provided in prompts to generalize sentiment analysis capabilities. If successful, in-context learning would allow financial analysts to leverage LLMs without requiring expensive fine-tuning, making AI-driven sentiment analysis more accessible and adaptable.
In-Context Learning: A New Paradigm for Financial Sentiment Analysis
In-context learning allows LLMs to adapt to new tasks without modifying their internal weights. Instead of fine-tuning, the model is fed a set of labeled examples (demonstrations) within a prompt, enabling it to generalize sentiment classification for new inputs. The study evaluates four primary in-context sample selection strategies:
- Random Selection – Demonstration examples are randomly chosen.
- Distance-Based Selection – The most linguistically diverse examples are selected to maximize the range of financial expressions covered.
- Difficulty-Based Selection – The LLM itself selects the most challenging samples, ensuring it learns from edge cases.
- Clustering-Based Selection – A balanced approach that selects examples representative of different clusters of financial sentiment, ensuring diversity while maintaining relevance.
The researchers tested these strategies on two real-world financial datasets - FiQA (Financial Question Answering) and Twitter financial news data - using ten LLMs from different AI providers, including Google’s Gemini, OpenAI’s GPT-4, Anthropic’s Claude, and DeepSeek V3.
Can LLMs learn financial sentiment without fine-tuning?
The study’s empirical results provide critical insights into the effectiveness of in-context learning for FSA:
- In-context learning consistently improved sentiment classification accuracy across multiple LLMs compared to zero-shot performance (where models classify sentiment without examples).
- Clustering-based sample selection yielded the best results, outperforming random and distance-based selection strategies, as it ensured balanced representation of sentiment expressions.
- LLMs performed well in classifying positive and negative sentiments but struggled with neutral sentiment, which often involves ambiguous or mixed financial language.
- Larger models (e.g., GPT-4, Claude 3.5) outperformed smaller models, demonstrating better generalization in financial contexts.
- Fine-tuned financial models still hold an advantage, but in-context learning allows general-purpose LLMs to achieve competitive accuracy without retraining, making them viable for financial analysis.
These findings suggest that while LLMs can effectively perform financial sentiment analysis through in-context learning, careful prompt design and sample selection significantly impact performance.
Implications for AI in finance: Moving toward adaptive AI systems
The study’s results highlight important implications for the future of AI-driven financial analysis. First, in-context learning could democratize access to AI-driven financial sentiment analysis. Instead of requiring domain-specific fine-tuned models, financial professionals could adapt general-purpose LLMs to market analysis tasks using well-designed prompts. This could lower costs and increase the adoption of AI in financial institutions.
Second, LLMs still face limitations, particularly in handling neutral sentiment. Neutrality in financial sentiment often arises when analysts hedge statements or when sentiment is mixed, making it harder for models to classify. This suggests that future research should explore hybrid models that combine in-context learning with domain-specific fine-tuning.
Third, as AI regulations and compliance frameworks evolve, the use of adaptive AI models in financial markets will require explainability and robustness. LLMs that learn in context without explicit training may raise concerns regarding model transparency and consistency in financial decision-making.
Finally, the study highlights that larger LLMs demonstrate stronger performance, but their efficiency must be balanced against computational costs and latency concerns. Financial firms may opt for customized AI pipelines that blend in-context learning with smaller fine-tuned models to achieve optimal performance.
Future research should focus on optimizing prompt engineering strategies, integrating domain-specific knowledge, and ensuring transparency in AI-driven financial decision-making.
- FIRST PUBLISHED IN:
- Devdiscourse