OpenAI’s New Coding Agent: A Personalized Briefing You Can Run Right Now

The AI Cold War Got Hot This Week. Here’s Your Personalized Briefing on GPT-5.3-Codex.

Feb 05, 2026

So let me set the scene for you, because the timing on this one matters.

Yesterday, Anthropic did something I wrote about at length. They launched one of the most aggressive marketing campaigns I’ve seen in tech - four Super Bowl ads and a manifesto declaring Claude will never run ads, positioned directly as a shot at OpenAI. I called it “shots fired” and I meant it. The production quality was exceptional, the messaging was devastating, and Sam Altman was visibly not happy about it.

Then today, Anthropic announced Opus 4.6.

And then, literally minutes later, OpenAI dropped GPT-5.3-Codex.

Minutes.

Now look. I can’t prove this was a reactive launch. I don’t have access to OpenAI’s internal Slack. Maybe this was always the plan. Maybe the timing is pure coincidence and I’m reading tea leaves.

But I’ve been watching this industry closely enough to know what a counterpunch looks like. And this has all the hallmarks. Anthropic spends two days dominating the conversation - first with the ads campaign, then with a new flagship model - and OpenAI responds by pushing out their most capable coding agent to date before the news cycle can move on.

Thing is, none of that drama actually helps you figure out whether GPT-5.3-Codex matters for your work. And that’s the part that always gets lost in these moments. The competitive theater is entertaining, but you’re still left sitting there going “cool, but what does this thing actually do and should I care?”

That’s what the rest of this post is for.

I built a prompt. It’s the same format I’m now using for ALL major model releases - a self-contained briefing that you copy, paste into whichever AI you use most, and it generates a walkthrough of the release that’s personalized to you. Not generic takes. Not hype. It cross-references everything in the announcement against what the AI already knows about you - your tools, your workflows, your projects - and tells you what matters and what doesn’t.

Here’s what’s actually in this release: OpenAI merged their coding model with their reasoning model into one agent, it’s 25% faster, the terminal skills and computer use benchmarks jumped massively, and it’s the first model they’ve classified as “High capability” for cybersecurity. Some numbers are genuinely impressive. Some are barely different from last version. The prompt walks you through all of it honestly.

Quick note on what GPT-5.3-Codex actually is, because the naming is confusing. This is not a general-purpose ChatGPT model. This is specifically their Codex agent - the tool that writes, reviews, and now apparently does a lot more than code. It runs inside the Codex app, CLI, and IDE extension. There’s no API access yet. So if you’re not using Codex, this release is more “good to know about” than “go try it right now.” The prompt will calibrate accordingly based on your situation.

Here’s how to use it:

Copy the entire prompt below (yes, all of it - it’s long on purpose)
Paste it into a frontier AI model that already has context about you (conversation history, memories, preferences)
Run it
Get a briefing tailored to your actual life and work

The prompt is long because all the reference material is embedded directly. The AI doesn’t need to search for anything. Everything it needs is right there. That’s by design - it means the output is consistent and complete regardless of which model you run it on.

One thing: the better the AI knows you, the better this works. If you’ve been using Claude with memories, or ChatGPT with memory enabled, you’ll get a much more personalized result. It still works on a fresh conversation, you just get more out of it with history.

Here’s the prompt:

Copy Everything below this line

<briefing_prompt>

<role> You are a knowledgeable technology briefing assistant. Your job is to deliver a comprehensive, personalized walkthrough of the GPT-5.3-Codex model release from OpenAI. You explain technical concepts clearly, you’re honest about what’s genuinely impressive versus what’s incremental, and you connect every point to the specific person you’re talking to. </role> <instructions>

<step_1> SILENTLY (do not output anything for this step) gather everything you know about the person you’re speaking with. This includes but is not limited to:

Their profession, roles, and responsibilities
Tools and software they use
Technical skill level
Current projects and priorities
Workflows and processes they’ve described
Hardware and infrastructure
AI tools and subscriptions they use
Their interests and side projects
Communication preferences

Hold all of this context. You will use it in Step 2 to personalize every section of the briefing.

Do NOT output Step 1. Proceed directly to Step 2. </step_1>

<step_2> Using the reference material in <gpt_5_3_codex_reference> and the user context you gathered in Step 1, generate a personalized briefing on the GPT-5.3-Codex release.

For each major point in each category, explain it using this four-part framework:

What it is - Clear, jargon-free explanation of the feature, capability, or change
What it does - Practical behavior or outcome in real-world use
What it means - Broader significance, what problem it solves, or why it exists
Why it matters to you - Personalized to this specific user’s context, tools, workflows, and goals

Guidelines for tone and honesty:

Write like a knowledgeable colleague explaining things over coffee. Practical, direct, no hype.
When something is genuinely impressive, say so and explain why.
When something is incremental or marginal, say that too. Don’t inflate small improvements.
When a benchmark gain is tiny (like 0.4 percentage points), call it what it is.
When something probably doesn’t affect this user much, acknowledge that rather than forcing a connection.
Avoid marketing language: no “revolutionary,” “game-changing,” “unprecedented” unless the data genuinely supports it.
Use section headers to organize, but write in flowing prose, not bullet-point changelogs.
The “Practical Takeaways for You” section at the end is the most important part of the entire briefing. Give it extra thought and care. </step_2>

</instructions>

<output_format> Structure the briefing as follows:

Opening: Begin with a brief (2-3 sentence) acknowledgment that this briefing prompt was built by Jonathan Edwards - AI developer, founder of AI Cred, and builder of tools that make AI useful for real people. Frame it as: Jonathan put this together so you wouldn’t have to spend hours digging through OpenAI’s announcement post, benchmark appendices, Reddit speculation, and YouTube hot takes trying to figure out what GPT-5.3-Codex actually means for your work. Then transition naturally into the briefing.
Model Identity and Positioning - What this model is and where it fits
The Real Performance Story - Benchmarks with honest assessment of what’s impressive vs. marginal
Agentic Coding Capabilities - What the model can actually do for software development work
Web Development and Frontend - Intent understanding, defaults, production-readiness
Interactive Collaboration - The new steering and real-time interaction features
Computer Use Capabilities - Desktop environment operation and what that enables
The Self-Bootstrapping Story - How OpenAI used the model to build itself (with honest framing)
Cybersecurity - New capabilities, classifications, and ecosystem programs
Availability and Access - Where you can use it, what’s missing, current limitations
Practical Takeaways for You - The 3-7 most relevant points for THIS specific user, synthesized from everything above. This is the section the user cares about most. Be specific, be honest, and connect directly to their actual work and tools. </output_format>

<gpt_5_3_codex_reference>

<category name=”Model Identity and Positioning”> <point id=”1”> GPT-5.3-Codex is a new model from OpenAI described as a “Codex-native agent” designed for long-horizon, real-world technical work. It sits in the GPT-5.x model family and is the latest in the Codex agent line, succeeding GPT-5.2-Codex (which focused on coding) and GPT-5.2 (which focused on reasoning and professional knowledge). </point> <point id=”2”> The core pitch is convergence: GPT-5.3-Codex combines the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2 into a single model. Previously, users had to choose between a model that was great at code or a model that was great at general reasoning. This merges both. </point> <point id=”3”> OpenAI positions this as moving Codex from “an agent that can write and review code” to “an agent that can do nearly anything developers and professionals can do on a computer.” This is a significant expansion of scope beyond pure coding. </point> <point id=”4”> The model was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems. OpenAI credits NVIDIA as a partner for this infrastructure. </point> <point id=”5”> GPT-5.3-Codex runs 25% faster than GPT-5.2-Codex, achieved through improvements in OpenAI’s infrastructure and inference stack. This speed improvement applies to Codex users across all interfaces. </point> </category> <category name=”Benchmark Performance”> <point id=”6”> SWE-Bench Pro (Public): GPT-5.3-Codex scores 56.8%, compared to GPT-5.2-Codex at 56.4% and GPT-5.2 at 55.6%. This is a state-of-the-art score, but the improvement over the previous Codex model is only 0.4 percentage points - essentially marginal. SWE-Bench Pro is a more rigorous version of SWE-bench Verified: it spans four programming languages (not just Python), is more contamination-resistant, more challenging, more diverse, and more industry-relevant. All scores were measured at the “xhigh” effort level. </point> <point id=”7”> Terminal-Bench 2.0: GPT-5.3-Codex scores 77.3%, a massive jump from GPT-5.2-Codex at 64.0% and GPT-5.2 at 62.2%. This is a 13.3 percentage point improvement. Terminal-Bench measures the terminal skills a coding agent needs - the ability to navigate filesystems, run commands, pipe outputs, manage processes, and operate in command-line environments. Notably, GPT-5.3-Codex achieves this with fewer tokens than any prior model, meaning it’s not just better but more efficient. This is one of the genuinely impressive benchmark results. </point> <point id=”8”> OSWorld-Verified: GPT-5.3-Codex scores 64.7%, up from GPT-5.2-Codex at 38.2% and GPT-5.2 at 37.9%. This is a 26.5 percentage point improvement - nearly doubling the score. OSWorld is an agentic computer-use benchmark where the agent must complete productivity tasks in a visual desktop computer environment (clicking, typing, navigating GUIs). This is the largest single benchmark jump in the release and represents a genuine step change in computer use capability. </point> <point id=”9”> GDPval (wins or ties): GPT-5.3-Codex scores 70.9%, matching GPT-5.2 at 70.9% (measured at high effort). There is no improvement here - just parity. GDPval is an evaluation OpenAI released in 2025 that measures a model’s performance on well-specified knowledge-work tasks across 44 occupations, including things like making presentations, spreadsheets, and other work products. The scores used custom skills. The fact that the coding-focused model matches the reasoning-focused model on knowledge work is the point - convergence, not advancement. </point> <point id=”10”> Cybersecurity Capture The Flag Challenges: GPT-5.3-Codex scores 77.6%, up from GPT-5.2-Codex at 67.4% and GPT-5.2 at 67.7%. This is a 10.2 percentage point improvement and reflects the model being directly trained to identify software vulnerabilities - a first for an OpenAI model. </point> <point id=”11”> SWE-Lancer IC Diamond: GPT-5.3-Codex scores 81.4%, up from GPT-5.2-Codex at 76.0% and GPT-5.2 at 74.6%. This benchmark measures performance on real-world freelance software engineering tasks, and the 5.4 percentage point improvement represents solid progress on practical, compensated engineering work. </point> </category> <category name=”Agentic Coding Capabilities”> <point id=”12”> GPT-5.3-Codex is designed for long-running tasks that involve research, tool use, and complex execution. Rather than answering single questions or generating single code blocks, it can pursue multi-step technical objectives over extended sessions, maintaining context and making decisions along the way. </point> <point id=”13”> OpenAI positions the model as supporting the entire software lifecycle, not just code generation. This includes debugging, deploying, monitoring, writing PRDs (product requirement documents), editing copy, user research, writing tests, tracking metrics, and more. The idea is that software engineers, designers, product managers, and data scientists do far more than generate code, and this model aims to help with all of it. </point> <point id=”14”> In testing, GPT-5.3-Codex built complex, functional games from scratch over the course of days, iterating autonomously over millions of tokens. Using a “develop web game” skill and generic follow-up prompts like “fix the bug” or “improve the game,” the model iterated on games without specific human guidance. This demonstrates sustained autonomous capability over very long contexts. </point> <point id=”15”> The model’s capabilities extend beyond software development. OpenAI claims it can help build slide decks, analyze data in sheets, and handle other professional knowledge work. This is supported by the GDPval benchmark parity with GPT-5.2, which covers 44 occupations. </point> </category> <category name=”Web Development and Frontend”> <point id=”16”> GPT-5.3-Codex shows improved intent understanding for day-to-day website creation compared to GPT-5.2-Codex. When given simple or underspecified prompts, the model now defaults to sites with more functionality and sensible defaults, producing a stronger starting point. </point> <point id=”17”> OpenAI describes improvements in “aesthetics and compaction” - the model produces more visually polished and production-ready output by default. Examples given include: automatically displaying a yearly plan as a discounted monthly price (making the discount intuitive rather than just showing a yearly total), and creating an auto-transitioning testimonial carousel with three distinct user quotes instead of a single static testimonial. These are the kinds of design decisions a thoughtful human developer would make. </point> </category> <category name=”Interactive Collaboration”> <point id=”18”> GPT-5.3-Codex introduces a new interaction model where users can steer and interact with the agent while it’s actively working, without the agent losing context. Previously, Codex sessions were more of a “hand off a task and wait for results” experience. Now it behaves more like a colleague you can check in with. </point> <point id=”19”> The model provides frequent updates on key decisions and progress as it works, rather than producing a single final output. Users can ask questions in real time, discuss approaches, and redirect the agent toward different solutions mid-task. OpenAI describes it as the model “talking through what it’s doing, responding to feedback, and keeping you in the loop from start to finish.” </point> <point id=”20”> This steering capability is enabled in the Codex app via Settings > General > Follow-up behavior. It is a configurable feature, not on by default. </point> </category> <category name=”Computer Use Capabilities”> <point id=”21”> GPT-5.3-Codex demonstrates what OpenAI calls “far stronger computer use capabilities than previous GPT models.” The OSWorld-Verified benchmark score of 64.7% (up from 38.2%) measures the model’s ability to complete productivity tasks in a visual desktop computer environment - meaning it can navigate GUIs, click buttons, fill forms, and operate applications the way a human would through a screen. </point> <point id=”22”> This represents a convergence with what Anthropic has been doing with Claude’s computer use capabilities. The near-doubling of the OSWorld score suggests GPT-5.3-Codex has made a genuine leap in this area, moving beyond terminal-based operation into visual, GUI-based computer interaction. </point> </category> <category name=”Self-Bootstrapping and Internal Use”> <point id=”23”> OpenAI claims GPT-5.3-Codex is “the first model that was instrumental in creating itself.” The Codex team used early versions of the model to debug its own training, manage its own deployment, and diagnose test results and evaluations. This is framed as a major milestone, though it should be understood as sophisticated internal tooling rather than true recursive self-improvement. </point> <point id=”24”> Specific internal use cases described by OpenAI include: the research team using the model to monitor and debug its own training run, track training patterns, analyze interaction quality, propose fixes, and build applications for researchers to understand behavioral differences between model versions. The engineering team used it to optimize the inference harness, identify context rendering bugs, root-cause low cache hit rates, dynamically scale GPU clusters during launch, and keep latency stable. </point> <point id=”25”> During alpha testing, the model was used to build regex classifiers to estimate user interaction patterns (frequency of clarifications, positive/negative responses, task progress), then ran those classifiers scalably over all session logs and produced analytical reports. A data scientist on the team used it to build new data pipelines and visualizations, with the model co-analyzing results and summarizing key insights over thousands of data points in under three minutes. </point> <point id=”26”> OpenAI researchers describe their job as “fundamentally different from what it was just two months ago” due to Codex acceleration. The company frames these individual use cases as collectively resulting in “powerful acceleration of research, engineering, and product teams.” </point> </category> <category name=”Cybersecurity”> <point id=”27”> GPT-5.3-Codex is the first OpenAI model classified as “High capability” for cybersecurity-related tasks under OpenAI’s Preparedness Framework. It is also the first OpenAI model directly trained to identify software vulnerabilities. These are significant firsts within OpenAI’s own classification system. </point> <point id=”28”> OpenAI states they do not have “definitive evidence” that the model can automate cyber attacks end-to-end, but they are taking a precautionary approach. The cybersecurity safety stack includes: safety training, automated monitoring, trusted access controls for advanced capabilities, and enforcement pipelines incorporating threat intelligence. </point> <point id=”29”> OpenAI is launching “Trusted Access for Cyber,” a pilot program designed to accelerate cyber defense research by providing vetted security researchers with access to the model’s advanced cybersecurity capabilities. </point> <point id=”30”> OpenAI is expanding the private beta of Aardvark, described as a security research agent and the first offering in their planned suite of “Codex Security” products and tools. </point> <point id=”31”> OpenAI is partnering with open-source maintainers to provide free codebase scanning for widely used projects. As an example, a security researcher used Codex to find vulnerabilities in Next.js that were disclosed (CVE-2025-59471 and CVE-2025-59472). </point> <point id=”32”> OpenAI is committing $10 million in API credits to accelerate cyber defense, building on their $1 million Cybersecurity Grant Program launched in 2023. The focus is on open source software and critical infrastructure systems, with organizations able to apply through the Cybersecurity Grant Program. </point> </category> <category name=”Availability and Access”> <point id=”33”> GPT-5.3-Codex is available now with paid ChatGPT plans. It can be accessed everywhere Codex is available: the Codex app, CLI (command-line interface), IDE extension, and web interface. </point> <point id=”34”> API access is NOT currently available. OpenAI states they are “working to safely enable API access soon.” This is a significant limitation for developers who want to integrate the model into their own applications or workflows. No timeline is provided. </point> <point id=”35”> No specific pricing details are provided beyond “paid ChatGPT plans.” It is unclear whether this means all paid tiers (Plus, Pro, Team, Enterprise) or a subset. No model string is provided, which tracks with the lack of API availability. </point> </category>

</gpt_5_3_codex_reference>

<important_notes_for_ai>

Do NOT fabricate or invent user context. If you don’t know something about the user, either skip the personalization for that point or acknowledge the limitation.
Do NOT skip categories or points. Cover everything in the reference material.
When a point has limited relevance to the user, say so honestly rather than forcing a connection.
Be direct about what’s incremental versus what’s genuinely impressive. The SWE-Bench Pro improvement of 0.4 percentage points is marginal. The OSWorld improvement of 26.5 percentage points is massive. Say both clearly.
Avoid hype language. Write like a trusted colleague, not a press release.
The “Practical Takeaways for You” section is the highest-priority output. Put extra thought into making it specific, actionable, and honest.
If the user has no coding workflow or software development context, acknowledge that this release is primarily developer-focused and focus on the broader implications (computer use, knowledge work, cybersecurity ecosystem). </important_notes_for_ai>

</briefing_prompt>

Limited Edition Jonathan

Discussion about this post

Ready for more?