Next Big Leap in LLM/AI...
Worth reading and keeping an eye on.. Introducing Nested Learning: A new ML paradigm for continual learning We introduce Nested Learning, a new approach to machine learning that views models as a set of smaller, nested optimization problems, each with its own internal workflow, in order to mitigate or even completely avoid the issue of โcatastrophic forgettingโ, where learning new tasks sacrifices proficiency on old tasks. The last decade has seen incredible progress in machine learning (ML), primarily driven by powerful neural network architectures and the algorithms used to train them. However, despite the success of large language models (LLMs), a few fundamental challenges persist, especially around continual learning, the ability for a model to actively acquire new knowledge and skills over time without forgetting old ones. When it comes to continual learning and self-improvement, the human brain is the gold standard. It adapts through neuroplasticity โ the remarkable capacity to change its structure in response to new experiences, memories, and learning. Without this ability, a person is limited to immediate context (like anterograde amnesia). We see a similar limitation in current LLMs: their knowledge is confined to either the immediate context of their input window or the static information that they learn during pre-training. The simple approach, continually updating a model's parameters with new data, often leads to โcatastrophic forgettingโ (CF), where learning new tasks sacrifices proficiency on old tasks. Researchers traditionally combat CF through architectural tweaks or better optimization rules. However, for too long, we have treated the model's architecture (the network structure) and the optimization algorithm (the training rule) as two separate things, which prevents us from achieving a truly unified, efficient learning system.