Tech

Over-training large language models may make them harder to fine-tune

Share
Share
Over-training language models may make them harder to tune
Language models with extensive pre-training can exhibit catastrophic overtraining, where the performance of post-trained models degrades as the pre-training stage is extended. Credit: arXiv (2025). DOI: 10.48550/arxiv.2503.19206

A small team of AI researchers from Carnegie Mellon University, Stanford University, Harvard University and Princeton University, all in the U.S., has found that if large language models are over-trained, it might make them harder to fine-tune. In their paper posted on the arXiv preprint server, the group compared the impact of different amounts of training on a single LLM.

Over the past couple of years, as AI researchers seek to enhance their products to make them more “intelligent,” many have been driven by the mantra that the more training a model is given, the better the model will be in the end. In this new study, the research team has found some evidence suggesting that there may be a point of diminishing returns with language model training.

The researchers came to this conclusion as they were testing the return when training two different versions of the LLM OLMo-1B. Under one scenario, they trained it using 2.3 trillion tokens, while in the other they used 3 trillion tokens. They then compared the scenarios by testing them with several benchmarks, such as ARC and AlpacaEval. In so doing, they found that the model trained with more tokens actually did worse when tested—up to 3% worse.

Surprised by their findings, they ran more tests and found similar results, suggesting that there is some point at which more training starts to make models less “intelligent.” The research team calls it “catastrophic overtraining,” and suggests it is due to what they describe as “progressive sensitivity.”

They further suggest that as the number of tokens rises, the more fragile a model becomes, which means that fine-tuning, which can be viewed as adding noise, starts to reverse the gains in improvement that were seen prior to the stress point.

Over-training language models may make them harder to tune
Schematic to illustrate how the scaling of the optimal learning rate can affect model evaluations as a function of the pre-training tokens T. Credit: arXiv (2025). DOI: 10.48550/arxiv.2503.19206

To test their theory, they added Gaussian noise to some of the models, and found that doing so led to the same type of performance degradation they had witnessed earlier. They have named the point of no return, the “inflection point.” After that point, they suggest, any further training will reduce the stability of the model, making it more difficult to tune in ways that are useful for a desired set of applications.

The researchers conclude by suggesting that moving forward, developers of LLM models may have to make estimations regarding how much training is enough—or, find other types of methods that will allow for additional training with a more distant inflection point.

More information:
Jacob Mitchell Springer et al, Overtrained Language Models Are Harder to Fine-Tune, arXiv (2025). DOI: 10.48550/arxiv.2503.19206

Journal information:
arXiv


© 2025 Science X Network

Citation:
Over-training large language models may make them harder to fine-tune (2025, April 14)
retrieved 14 April 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
NYT Connections hints and answers for Sunday, May 4 (game #693)
Tech

NYT Connections hints and answers for Sunday, May 4 (game #693)

Looking for a different day? A new NYT Connections puzzle appears at...

NYT Strands hints and answers for Sunday, May 4 (game #427)
Tech

NYT Strands hints and answers for Sunday, May 4 (game #427)

Looking for a different day? A new NYT Strands puzzle appears at...

Quordle hints and answers for Sunday, May 4 (game #1196)
Tech

Quordle hints and answers for Sunday, May 4 (game #1196)

Looking for a different day? A new Quordle puzzle appears at midnight...

We just got another big hint that the Samsung Galaxy S25 FE is on the way
Tech

We just got another big hint that the Samsung Galaxy S25 FE is on the way

References to Galaxy S25 FE firmware have appeared The phone could launch...