Planning for emergent capabilities in LLMs

Av Stefan Mesken, Chief Scientist, DeepLSenast uppdaterad: 18 juli 2025

Tech Blog

I detta inlägg

In my previous post on deploying the NVIDIA DGX SuperPOD with DGX GB200 systems, I explained how our approach needs to adjust to take advantage of its step-change in performance. Beyond the hardware itself, successfully deploying infrastructure like this involves optimizing software, for example by integrating NVIDIA’s CUDA Deep Neural Network library (cuDNN) and Transformer Engine layers with our own code bases, to enhance training and inference performance in different floating-point formats. It also involves evolving model architecture and scaling data for the capabilities we now have available.

In this post, I’m going to be looking ahead to capabilities that aren’t yet apparent, but will become so in the future. This will become increasingly central to AI research as models and data grow, apparently exponentially, or in other words, the challenge of getting to grips with emergent capabilities in large language models (LLMs).

The definition of emergent capabilities in LLMs

Emergent capabilities in LLMs describes the phenomenon whereby new capabilities for these models emerge suddenly and unpredictably. As model size, computational power and training data scale up, you reach sudden transition points where new capabilities of LLMs unexpectedly materialize. This often takes the form of the model’s performance on a particular task going from very poor to suddenly very good, in a way that was entirely unpredictable from what you’d observed before. Researchers are logging more and more cases of such emergent capabilities. It’s a natural outcome of the reality that LLMs are more than the sum of their parts. What’s difficult to predict is how much more than the sum of their parts they will become.

What emergent capabilities mean for AI research

For AI researchers, the good news is that emergent capabilities mean previously impossible problems suddenly become feasible, and sometimes even relatively straightforward. The bad news is that, because you can’t predict when (or if) this will happen, you have to envisage using capabilities you don’t currently have. You need the ability to imagine problems that could be solved – but can’t be yet.

The simple solution would seem to be sticking with what you know: training models based on what LLMs are currently capable of, and waiting for new capabilities to emerge before you start using them. However, sticking with what you know probably restricts the emergence of new capabilities, because you’re not trying to tackle the problems that would allow them to surface. It also means that you risk solving problems in a way that will become instantly obsolete once capabilities do change.

The alternative is to exercise your imagination and make bold but intelligent bets on problems worth solving that can conceivably be tackled, but which current capabilities fall short of. It’s the type of creative conundrum that motivates a lot of people in our line of work – balancing what you do know, with what you know you don’t.

Future-feasible problems for Language AI LLMs

Clarify, the interactive language expert that we recently launched for DeepL Translator, is a great example of what emergent capabilities can enable. It depends on the capabilities of a model to detect ambiguities in someone’s intended meaning, and when it needs to ask intelligent questions to determine that meaning. It also depends on the ability to adjust a translation in real-time based on responses to those questions.

We didn’t have a model with such capabilities until we developed our next-gen LLMs on our previous DGX SuperPOD with DGX H200 systems, known as Mercury, in 2023. As we were training those models leveraging new levels of compute power, capabilities suddenly emerged, and solving this problem became feasible.

DeepL’s new DGX SuperPOD withDGX GB200 systems represents another step-change in compute power that is likely to generate further new capabilities. As we scale up our models and data, we’ll keep identifying pressing problems that are worthwhile solving, and betting strategically on when capabilities could emerge to help solve them.

I can think of three areas where there’s a strong case for making these bets:

Enabling human-AI collaboration and personalizing translations

The first involves building on what we’ve achieved with Clarify, in order to further improve human-AI collaboration and build more sophisticated interactive experiences. It fuses our models’ language understanding with our users’ human grasp of their own intention and meaning. It would have a transformative impact on the experience of AI translation, and it’s a very exciting area to keep exploring.

Personalizing translations to each user

The second focuses on the potential for personalizing translations to each user, in a way that reflects their individual characteristics, tone and preferences, but which still keeps the responsibility for quality and accuracy with DeepL.

This would potentially mean every user having a customized DeepL model that reflects the way that they naturally express themselves, the intentions they typically have when communicating, the types of phrases they use that make them sound like themselves. The promise is to replicate this across languages when translating, so that a German speaker using Japanese doesn’t just use the correct terms, but expresses their personality and intention in the way that a Japanese speaker would; in a way that works through the medium of that culture.

Scaling collaboration across languages

The final area involves scaling collaboration between the models that we use for translating between different languages.

When translating content into multiple languages, the typical method involves starting with one source text, usually in English, and translating it independently into each language. Each translation reveals someone’s translation preferences through the way they use our editing tools and features like Clarify. It might capture whether they prefer us to translate an idiom directly or use a local equivalent. However, these insights aren’t currently shared from one translation to another, so we effectively start from scratch with each language we translate into. AI can potentially fill this gap, inferring preferences across languages, and generating initial translations that are more aligned to how someone likes to express themselves. If you provide guidance for an Italian translation, it could apply similar insights to Spanish, for example.

When the best way to solve a problem doesn’t yet exist

This last example illustrates the value of anticipating emergent capabilities over working with what you have. As things stand, it’s not feasible to anticipate what preferences expressed for one language mean for other languages. It would involve an endless amount of work to go through all of the preferences that people might express when translating Japanese, and then work out how those preferences would play out in Korean or Chinese, for example. However, with emergent capabilities this could quickly change and inferring preferences could suddenly become relatively easy. It actually makes sense to wait for the shortcut that comes from emergent capabilities rather than taking a more complicated and time-consuming approach. Planning around possibilities is the best way of tackling the problem.

An invitation to imagine

By its nature, language is a long-tail problem, with lots of cultural nuances, subjective preferences and personal expressions. Human beings express themselves in a hugely rich variety of different ways, and a phrase or idiom can be very significant — even when it’s used by only a handful of people, or on only a handful of occasions. DeepL’s task is to navigate this variety and complexity without ever oversimplifying it. That requires an ambitious approach, stretching the boundaries of what’s possible now. Emergent capabilities in LLMs invite us to imagine new ways of doing this, and embracing the unique challenges that language introduces.

About the Author

A lifelong science and technology enthusiast, Stefan Mesken studied mathematics and computer science at the University of Münster, where he also worked as a researcher before taking on a full-time role as a data scientist to start his career in tech. He started working in AI in 2018, and joined DeepL’s research team in 2020. As Chief Scientist, he shapes the strategy and prioritization of DeepL’s research agenda and coordinates large-scale research initiatives while aligning research goals with engineering and product teams. One of Stefan’s proudest accomplishments is helping lead DeepL’s next-gen model program, a cornerstone in the company’s push toward advanced AI-driven communication. He remains deeply committed to developing world-class tools rooted in cutting-edge research and practical impact.

https://www.linkedin.com/in/stefan-mesken/

Av Stefan Mesken, Chief Scientist, DeepLSenast uppdaterad: 18 juli 2025

Tech Blog

I detta inlägg

Dela