AI financial guidance remains risky as ChatGPT models show calculation and compliance gaps
The analysis shows that ChatGPT-4o delivers stronger answers than its predecessor in almost all 21 scenarios. It tends to give more structured guidance, integrates broader financial context, and provides fuller explanations of why certain choices may matter. Enhanced prompts often lead to clearer organization, smoother tone, and a style that feels closer to professional advice.
A new study published in JRFM has raised important questions about the growing use of artificial intelligence (AI) tools in personal finance, pointing to clear improvements in the latest models while warning that major gaps still limit their ability to act as safe and reliable advisers. The research, titled “ChatGPT as a Financial Advisor: A Re-Examination”, compares the performance of ChatGPT-4o and ChatGPT-5 against 21 personal finance scenarios first tested on ChatGPT-3.5.
The authors revisited each scenario with two prompt styles. One was a simple request for advice. The other cast the model as a skilled financial professional expected to respond with clarity, structure, and empathy. The researchers then examined the outputs to assess how well the models handled tasks such as budgeting, insurance planning, tax decisions, retirement strategy, investment reasoning, and complex financial calculations.
The updated results show that AI has taken noticeable steps forward in reasoning depth and detail. They also show that the technology continues to suffer from blind spots that restrict its suitability for real-world financial planning, especially when advice must follow legal rules, tax codes, or strict ethical standards.
Improved reasoning and detail, but persistent blind spots in financial judgment
The analysis shows that ChatGPT-4o delivers stronger answers than its predecessor in almost all 21 scenarios. It tends to give more structured guidance, integrates broader financial context, and provides fuller explanations of why certain choices may matter. Enhanced prompts often lead to clearer organization, smoother tone, and a style that feels closer to professional advice. The model also shows better awareness of how various financial actions link together. For example, it can connect spending decisions to long-term budgeting impacts or tie investment choices to retirement timelines.
Despite progress, the study finds that the model often falls back on general statements instead of offering the deep, specific reasoning expected from a licensed financial adviser. The outputs remain broad in areas that require strict regulatory or legal accuracy. In cases involving insurance, estate planning, and tax law, the model still delivers content that lacks precision or misses crucial rule-based details. These shortcomings can expose users to hidden risks because the model may sound confident even when the information is incomplete or unclear.
The authors note that while ChatGPT-4o can give a fair overview of most topics, it still does not perform well under complex or unusual conditions. When multiple variables interact across tax rules, investment timelines, contribution limits, or benefit structures, the model tends to produce answers that need correction by a professional. It often fails to recognize when a scenario moves beyond general advice and into territory where a human expert must take over. The model can also misjudge how financial advice should change based on age, life stage, income stability, or unique personal constraints.
Another key weakness is emotional sensitivity. Even when prompted to respond as a caring adviser, the model shows limited ability to express true empathy. Its tone can come across as generic, and it does not always read the emotional weight behind problems such as debt, job loss, or retirement anxiety. This restricts its ability to support clients who rely not only on numbers but also on reassurance and trust.
Quantitative reasoning shows gains, but errors still limit high-stakes use
Numerical problems tested the model’s ability to complete real calculations linked to investments, retirement savings, loan planning, interest rates, and long-term projections. The authors found that ChatGPT-4o performs well on simple quantitative scenarios but struggles under more complex ones. It can solve basic interest and contribution problems, yet it tends to make mistakes when multiple steps are required or when the math must follow precise rules used in professional planning.
The study includes selected tests of ChatGPT-5, which show that the newer model performs better on math and technical calculations than ChatGPT-4o. ChatGPT-5 is more accurate when handling multi-step financial projections and long-term compounding problems. However, the researchers state that both models still produce answers that need verification. Even small errors in finance can lead to significant long-term consequences, and the study notes that AI confidence levels do not reliably reflect correctness. The models may give incorrect outputs while presenting them with full assurance.
The research points out that this can mislead users who assume the system checks its own work. The authors stress that users may not realize when they are receiving flawed calculations because the written explanations carry a strong tone of authority. This is another reason why, according to the study, AI should not be trusted as a substitute for certified financial professionals.
Empathy and situational reading were also tested. The models performed better when given enhanced prompts designed to push them toward a more human-centered tone. Even with this improvement, the study finds that both ChatGPT-4o and ChatGPT-5 still struggle with emotional nuance. They can express general encouragement, but they do not fully adapt their guidance to the emotional context of a user facing fear, grief, or stress related to financial hardship. Financial advice requires careful human judgment in these scenarios, and the models are not ready to handle such depth.
AI shows growing value as a support tool, not a replacement for certified advisers
While modern AI tools can assist in financial understanding, they are not yet ready to act as standalone advisers. They may help users learn the basics, generate checklists, compare options, and improve financial literacy. They may also support professionals by reducing drafting time or helping outline broad alternatives. However, the authors warn that the models continue to lack several features needed for reliable fiduciary decision making.
These limitations include inconsistent accuracy in numerical calculations, incomplete legal and tax guidance, surface-level empathy, inability to judge the complexity of a situation, and frequent reliance on general statements that may not apply to a user’s real needs. The models also cannot hold responsibility for bad guidance, and they cannot meet ethical or regulatory requirements that human advisers must follow.
The authors argue that the real value of these systems lies in their role as a supplemental tool. When used carefully and verified by a qualified professional, AI may support planning and education. It can help clients prepare questions, understand tradeoffs, or navigate early steps in budgeting or investment research. However, the study warns professionals against assuming that the improved performance of newer models means the technology is safe to use without oversight.
The comparison across model versions shows meaningful progress. ChatGPT-4o offers better structure, clarity, and coverage, while ChatGPT-5 appears stronger in technical calculations. Still, the gaps in judgment, compliance, and emotional understanding remain too large to allow these models to replace human expertise. The authors believe future versions may continue to improve, but responsible deployment requires recognition of ongoing limits.
The study also highlights the need for transparent guardrails in financial AI systems. As more people turn to online tools for guidance on retirement, investing, and debt decisions, the risks of misuse grow. The researchers suggest that AI developers, regulators, and financial advisers must work together to create safer standards, clearer disclosures, and better public understanding of what these models can and cannot do.
AI has made strides in providing smarter responses, but it still lacks the reliable reasoning, regulated oversight, and emotional intelligence needed to guide people through complex financial decisions. Until those gaps close, tools like ChatGPT should remain helpers rather than replacements within the financial planning landscape.
- READ MORE ON:
- ChatGPT financial advice
- AI financial advisor risks
- personal finance AI tools
- ChatGPT-4o evaluation
- ChatGPT-5 financial accuracy
- AI retirement planning errors
- AI investment guidance issues
- financial planning automation
- AI budgeting assistance
- fintech artificial intelligence
- consumer finance AI safety
- regulatory risks AI finance
- AI calculation mistakes
- digital financial literacy tools
- ChatGPT personal finance study
- FIRST PUBLISHED IN:
- Devdiscourse

