Breaking the AI bottleneck: Why retraining from scratch is becoming obsolete

In many real-world applications, machine learning models must adapt to new data continuously. For instance, in computer vision tasks, a model trained on an image dataset might need to incorporate new classes or domains over time. The standard approach - retraining from scratch on the combined old and new datasets - ensures high accuracy but comes with a high computational cost.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 06-03-2025 17:07 IST | Created: 06-03-2025 17:07 IST
Breaking the AI bottleneck: Why retraining from scratch is becoming obsolete
Representative Image. Credit: ChatGPT

As machine learning models grow more complex and data scales exponentially, the computational cost of training new models has become a significant bottleneck. Traditionally, when new data is introduced, models are retrained from scratch, requiring vast computational resources and time. However, what if existing models could be updated efficiently without starting over? This is where continuous training offers a breakthrough.

A recent study titled “Same Accuracy, Twice as Fast: Continuous Training Surpasses Retraining from Scratch” by Eli Verwimp, Guy Hacohen, and Tinne Tuytelaars, published in arXiv (2025), challenges the conventional approach of retraining AI models. The research demonstrates that leveraging previously trained models can reduce computational costs by up to 2.7x without sacrificing accuracy. By optimizing aspects such as initialization, regularization, data selection, and learning rate adjustments, continuous training can accelerate convergence while maintaining high performance.

The challenge of computationally expensive retraining

In many real-world applications, machine learning models must adapt to new data continuously. For instance, in computer vision tasks, a model trained on an image dataset might need to incorporate new classes or domains over time. The standard approach - retraining from scratch on the combined old and new datasets - ensures high accuracy but comes with a high computational cost.

This problem is particularly relevant in industry settings, where companies must frequently update AI models to accommodate new trends, behaviors, or categories. A model that takes weeks to train on a massive dataset must be retrained every time new data is introduced, consuming unnecessary GPU and storage resources.

Previous research in continual learning has focused on minimizing memory constraints and avoiding catastrophic forgetting - a phenomenon where models lose previously learned knowledge when trained on new data. However, many of these approaches sacrifice accuracy for efficiency. This study, in contrast, takes a practical approach, assuming access to both old and new data and focusing solely on reducing computational costs while maintaining model performance.

How continuous training achieves faster convergence

The study introduces a novel evaluation framework that quantifies how continuous training can outperform retraining from scratch. The key insight is that instead of discarding an existing model and retraining from the ground up, continuing to train from a pre-existing model can significantly reduce training time while achieving the same final accuracy.

The researchers tested various optimization techniques to improve the effectiveness of continuous training:

Initialization Strategies:

  • Instead of randomly initializing a new model, they used a “shrink and perturb” technique, which slightly modifies the weights of the existing model while preserving its learned features. This prevents loss of plasticity and allows for faster adaptation to new data.

Regularization Methods:

  • They applied L2-init regularization, which maintains stability by encouraging the model’s parameters to remain close to their initial values while still adapting to new information.

Optimized Data Selection:

  • Instead of randomly mixing old and new data during training, they prioritized the most informative samples while reducing reliance on examples that were either too easy or too difficult to learn.

Adaptive Learning Rate Scheduling:

  • By adjusting learning rates dynamically, they allowed the model to rapidly converge without prolonged fine-tuning phases.

These optimizations not only reduced training time but also ensured that models trained continuously could achieve the same or even higher accuracy compared to scratch training.

Results: Continuous training is up to 2.7x faster

The research evaluated continuous training on various image classification datasets, including CIFAR-10, CIFAR-100, ImageNet-100, ImageNet-200, and Adaptiope. The results showed a consistent reduction in computational time while maintaining accuracy.

For instance, in the CIFAR-100 (70+30) setup, where a model was initially trained on 70 classes and then updated with 30 additional classes:

  • Retraining from scratch took the longest.
  • A naïve continuous approach (simply continuing training without modifications) improved efficiency but still lagged.
  • The optimized continuous training approach achieved the same accuracy 2.7x faster, demonstrating the power of tailored optimization techniques.

The study also explored domain adaptation - where new data comes from the same categories but with a different distribution (e.g., product images vs. real-world photos). Even in these cases, continuous training outperformed scratch training, though the speedup was slightly lower due to the need for additional adaptation.

Implications for the future of AI model training

The findings of this study have far-reaching implications for AI research and deployment. Reducing the computational cost of training machine learning models is crucial as:

  • Model sizes continue to grow, making full retraining increasingly impractical.
  • Sustainability concerns push for more energy-efficient AI training methods.
  • Industry applications demand faster model updates without losing accuracy.

By refining continuous training methods, AI developers can accelerate model updates, reduce energy consumption, and optimize resource allocation. The next steps for research include:

  • Further minimizing memory requirements, allowing models to adapt efficiently even with restricted access to old data.
  • Applying these techniques to large-scale transformer models, particularly in NLP and multimodal AI.
  • Improving transfer learning strategies, enabling models to generalize better across diverse tasks without excessive retraining.

Ultimately, continuous training represents a paradigm shift in AI model development, moving away from inefficient retraining cycles toward smarter, more sustainable learning processes. As AI continues to evolve, leveraging existing knowledge effectively rather than discarding and rebuilding models will be key to scaling machine learning for the future.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback