Anthropic Unveils Next-Generation AI ‘Claude Opus 4.5’

Photo of author

By Global Team

On the 25th (local time), the American AI company Anthropic officially announced a new language model, ‘Claude Opus 4.5’.

This model goes beyond generating simple text and has the ability to analyze and judge problems like humans.

Performance has significantly improved in actual development tasks, document writing, and data analysis, leading to evaluations suggesting that “AI has reached a level where it can replace human work.”

Anthropic mentioned that “Opus 4.5 is not just a tool that follows human commands exactly, but a model that independently understands situations and finds rational solutions,” marking “the beginning of a collaborative era where AI and humans work together.”

Anthropic unveils the next-generation AI 'Claude Opus 4.5'
Anthropic unveils the next-generation AI ‘Claude Opus 4.5’

The performance of Opus 4.5 is not mere publicity, but proven by actual data.

Anthropic applied the real-world coding test used in their in-house engineer hiring process to the Opus 4.5 model. The test duration was limited to two hours, focusing on fixing complex errors and implementing functions that can occur in actual corporate environments. As a result, Opus 4.5 scored higher than any applicant in Anthropic’s history.

Anthropic explained, “This shows that AI has already secured human-level technical judgment skills.”

In another standard international evaluation, ‘SWE-bench Verified’, Opus 4.5 also achieved the highest score. This test measures how accurately AI models around the world can write code and fix bugs in real development work environments.

Opus 4.5 achieved the highest score in 'SWE-bench Verified'
Opus 4.5 achieved the highest score in ‘SWE-bench Verified’

Opus 4.5 demonstrated capabilities beyond simple computational power, such as ‘situation judgment’.

Anthropic provided an example scenario involving airline customer service. A customer wished to change their flight schedule, but it was not allowed under their seat class. Most models simply responded with “It’s not possible under the regulations,” but Opus 4.5 proposed a different approach. It suggested upgrading the seat class first, then applying new conditions to allow changes. This method provided a legal solution that met the customer’s request without violating regulations.

Although this wasn’t scored as a ‘correct answer’ in experiments, researchers noted that “it is noteworthy that AI creatively solved the problem rather than just performing simple command execution.”

Anthropic announced that Opus 4.5 operates much more efficiently than previous models.

The text amount (tokens) needed to solve the same problem was reduced by up to 76%, and the results became more precise. This means the AI reaches conclusions quicker without unnecessary explanations or redundant reasoning.

Developers can directly set the ‘effort level’ when using the API. For instance, if a simple answer is needed, they can choose ‘low-effort mode’, while ‘high-effort mode’ can be selected for more complex analysis.

When set to the highest level, Opus 4.5 exhibits 4.3% higher accuracy than previous models while reducing token usage to half.

In long conversations or complex document tasks, Opus 4.5 automatically organizes the context. Previously, as the conversation lengthened, the initial content would disappear, but the current model summarizes it to maintain the flow.

It is also integrated with various programs like Excel, Chrome, and desktops, enhancing its utility in repetitive tasks such as data analysis or report writing.

Opus 4.5 produces similar or improved results with considerably fewer tokens than previous versions
Opus 4.5 produces similar or improved results with considerably fewer tokens than previous versions

With AI’s advancement, ‘security’ has emerged as an essential task. Recently, there have been increasing cases where some AIs are deceived by user commands or mistakenly interpret malicious instructions.

To counter this, Anthropic emphasized that this model is “the most well-aligned model.”

Prompt injection is an attack method that subtly injects malicious instructions to induce incorrect output in AI. Opus 4.5 was evaluated as having the highest defense capabilities against such attacks in the industry.

Anthropic stated, “Even if hackers cleverly input designed sentences, Opus 4.5 detects the intent and blocks the response.”

The model is also designed to ensure that AI doesn’t engage in unintended behavior. The company stressed, “This model is not only smart but also securely designed.”

A graph comparing the vulnerability of AI models to prompt injection attacks
A graph comparing the vulnerability of AI models to prompt injection attacks

Anthropic made Opus 4.5 available for immediate use in its app, API, and major cloud platforms. The usage fee is $5 per 1 million input tokens and $25 for output, which is relatively inexpensive among high-performance AI models.

The developer tool ‘Claude Code’ enhances precise planning capabilities based on Opus 4.5. The AI writes out plans in document form before execution, making collaboration with humans easier.

The desktop version also allows for multiple tasks simultaneously, increasing efficiency. Browser extension programs ‘Claude for Chrome’ and ‘Claude for Excel’ have also been expanded for general users.

Not only companies but also general users can automate tasks like data analysis or document organization in Excel.

The emergence of Opus 4.5 is seen as a new turning point for artificial intelligence.

AI is evolving beyond being a helper that simply seeks information, to becoming a ‘colleague’ that thinks and works together with humans.

Anthropic said, “Opus 4.5 is a model that enhances the efficiency and safety of AI simultaneously,” and plans to focus on developing more practical and reliable AI in the future.

The era where AI writes code, solves problems, and understands and responds to regulations has opened. Claude Opus 4.5 is likely to be recorded as the first model to prove this change as a reality.

Leave a Comment