OpenAI Launches GPT-5.3-Codex Amidst AI Model Showdown

OpenAI’s New Model

OpenAI has released its latest and most powerful programming model, GPT-5.3-Codex, just 15 minutes after Claude Opus 4.6.

The new model exhibits a sense of aesthetic taste, as demonstrated in two showcased demos: a racing game and a diving game, both of which have a stylish design.

Reportedly, GPT-5.3-Codex iteratively developed these games with minimal human intervention, consuming millions of tokens in the process.

In web development, aside from a more attractive UI, it also shows a stronger understanding of user intent. Even with unclear prompts, it can auto-complete logic to generate a fully functional website.

The design quality is indeed a step up from previous versions.

Its computer use capabilities have also been enhanced, now assisting finance professionals in creating PPT presentations directly.

This extends to other professional tasks, particularly in knowledge-intensive roles, where it can effectively write documents and create spreadsheets.

Key highlights of the new model include:

Smarter: Achieving 57% on SWE-Bench Pro, 76% on TerminalBench 2.0, and 64% on OSWorld.
More controllable: Real-time guidance during tasks, allowing for adjustments and updates.
Faster: Completing tasks with less than half the tokens required by 5.2-Codex, with a token speed increase of over 25%.
More agent-like: Not only better at coding, but also proficient in computer operations.

A comparison chart illustrates that almost every dimension shows significant improvement over the previous generation.

Users are excited, noting that just a day after OpenAI was targeted by Anthropic, they have responded with a powerful release.

In one day, two heavyweight programming models have emerged.

The comments section quickly divided into Anthropic and OpenAI supporters.

GPT 5.3 Codex

The most anticipated aspect is, of course, its programming capabilities. OpenAI claims that GPT-5.3-Codex has achieved SOTA on SWE-Bench Pro, a benchmark designed for real-world software engineering, covering four programming languages with higher difficulty and richer tasks that closely resemble real production scenarios.

Additionally, there is a noticeable improvement in performance on Terminal-Bench 2.0.

Crucially, it achieves these results while using fewer tokens than any previous model. Besides programming skills, another focus of the new Codex is computer use.

OSWorld is a benchmark for agents in computer usage, requiring models to complete various productivity tasks in a visual desktop environment. Results show that GPT-5.3-Codex significantly surpasses earlier GPT models in this area.

In summary, GPT-5.3-Codex represents not just a breakthrough in specific model capabilities but a comprehensive development based on agents, enhancing coding, front-end development, and computer operations.

Interestingly, GPT-5.3-Codex participated in its own training process. OpenAI states this is their first model to engage in self-acceleration. The Codex team utilized its early versions to debug training processes, manage deployments, and evaluate test results.

The official report includes specific examples. During the training phase, the research team used Codex to monitor and debug training tasks, tracking model behavior changes, conducting in-depth analyses of interactions, and proposing improvements.

In data analysis, a data scientist collaborated with GPT-5.3-Codex to build a new data pipeline, visualizing results in ways that far exceed traditional dashboard tools. Subsequently, researchers analyzed these results with Codex, which distilled key insights from thousands of data points in under three minutes.

The engineering team leveraged Codex to optimize and adapt the testing and operational framework for GPT-5.3-Codex. When anomalies affecting user experience began to appear, team members used Codex to pinpoint defects related to context rendering, tracing the issue back to low cache hit rates.

Two More Things

The showdown with Anthropic is indeed exciting, but OpenAI has two other significant announcements worth noting.

1. Frontier: A platform to help businesses create “AI colleagues”

This is a major ToB initiative from OpenAI, aimed at integrating agents into corporate workflows. Implementation methods include shared context, hands-on onboarding, feedback-driven practical learning, and clear permissions and boundaries. Notable companies like HP, Intuit, Oracle, State Farm, Thermo Fisher, and Uber have already adopted Frontier.

2. AI4S: OpenAI collaborates with Ginkgo to reduce protein synthesis costs by 40% using GPT-5

Ginkgo is a lab-based company specializing in synthetic biology. They have integrated GPT-5 into a self-operating lab, allowing the model to propose experimental plans, scale experiments, learn from results, and decide on the next steps, completing a closed loop.

2026 could be a pivotal year for the acceleration of AI4S.

However, while OpenAI is busy countering Anthropic, the online community is left dazzled by a series of new developments, with some voices expressing discontent.

Give me back 4o!!

As of now, there has been no response regarding the complete removal of 4o.

Perhaps they are just too occupied with the competition against Anthropic.