Hello fellow keepers of numbers,

It’s about time OpenAI joined the launch party. Anthropic was probably getting a little lonely. Anthropic and OpenAI are having their launch parties at Dave & Buster’s (iykyk), while Google and Microsoft are hosting theirs at Chuck E. Cheese. Used to be cool, but now it’s just sad.

Finally, we got OpenAI’s new model, GPT-5.5. And we got their new image model and workspace agents. Fun week for the people who still have that ChatGPT subscription, which might not be very many at this point.

We also finally got Frontier program access to Copilot Cowork (or will shortly as it’s rolling out). Plus, a demo of financial statement reporting comparing Claude Code via the desktop app with the Codex app (Opus 4.7 vs GPT-5.5, respectively).

THE LATEST

OpenAI launches GPT-5.5

Source: X post by @d4m1n

OpenAI launched GPT-5.5 alongside a higher-accuracy variant called GPT-5.5 Pro. The company describes GPT-5.5 as its most intuitive model yet, built specifically for long-running, multi-step work on a computer.

OpenAI positions GPT-5.5 as a step up in agentic coding, computer use, knowledge work, and early scientific research. The company says the model delivers higher-level intelligence without sacrificing speed compared to GPT-5.4, and uses fewer tokens to get the same work done.

OpenAI is treating the model's biological, chemical, and cybersecurity capabilities as High under its Preparedness Framework and says the release includes its strongest set of safeguards to date.

GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. In Codex, the model comes with a 400K context window, with a Fast mode that generates tokens 1.5x faster at 2.5x the cost. API access will follow soon, with GPT-5.5 priced at $5 per million input tokens and $30 per million output tokens, and GPT-5.5 Pro priced at $30 per million input and $180 per million output.

Why it’s important for us:

It's been a while since OpenAI dropped something that felt like a real leap. GPT-5.5 could be it. That said, I'm not buying the hype that this is suddenly the best model in the world. Based on the benchmarks, it's slightly ahead of Opus 4.7 in some areas and slightly behind in others.

It’s great news that ChatGPT feels competitive again. Lately it hadn't.

Benchmarks don't really matter for the rest of us, though. What matters is what it feels like to work with it. For me, the two things that count most are communication style and day-to-day file work. Excel, Word, financial statements, working with Google Drive or SharePoint, formatting, etc.

Claude has been really, really good at that stuff. ChatGPT has been pretty average. I'm curious if 5.5 has actually closed that gap or just moved some benchmark numbers around.

Worth noting: GPT-5.5 is now more expensive than Opus 4.7 in the API, and there's no GPT-5.5 mini yet (no Sonnet equivalent). The hope with smarter models is they get to the right answer faster, so the effective cost is actually lower. TBD.

In a regular chat, Opus 4.7 and GPT-5.5 probably feel pretty close. The real fight right now is Cowork / Claude Code vs the Codex app. If the models are genuinely competitive, Codex becomes a real contender. Cowork is still the easiest for people who don't live in a terminal, but the interfaces are simplifying fast enough that we're getting close to a world where living and working in these apps every day makes sense. Check out my demo below for some visuals.

OpenAI releases workspace agents in ChatGPT

OpenAI released workspace agents in ChatGPT, an evolution of GPTs that lets teams build shared agents for complex, long-running workflows. The agents run in the cloud, have access to files, tools, and memory, and operate inside the permissions and controls set by each organization.

Users build an agent by describing a workflow or dropping in a file, and ChatGPT walks them through defining steps, connecting tools, and testing. Agents can be scheduled, deployed in Slack, or invoked on demand, and they retain memory across runs. Templates are available for finance, sales, marketing, and other functions.

OpenAI says its own teams use workspace agents internally. The company's accounting team built one that prepares key parts of the month-end close, including journal entries, balance sheet reconciliations, and variance analysis, and generates workpapers with the underlying inputs and control totals.

Admins can control which tools and actions are available to user groups, require approval before sensitive actions, and manage who can build or share agents. OpenAI says built-in safeguards help agents resist prompt injection from misleading external content, and the Compliance API gives admins visibility into every agent's configuration and runs.

Workspace agents are available in research preview for ChatGPT Business, Enterprise, Edu, and Teachers plans, and will be free until May 6 when credit-based pricing begins. OpenAI says more is coming, including trigger-based starts, expanded tool integrations, and support in the Codex app.

Why it’s important for us:

Workspace agents seem very similar to Claude Managed Agents. Anthropic’s launch a few weeks ago got buried under Mythos, Opus 4.7, and Claude Routines. The OpenAI release seems much more polished and well thought out.

Claude Managed Agents live in the Anthropic console and feel disconnected from the rest of the Claude experience. You have to go to a dashboard that sits outside of the tools you and the team use day-to-day, which seems like a barrier for most firms.

OpenAI’s workspace agents live inside of ChatGPT (and soon the Codex app too). They use ChatGPT connectors and apps to reach external tools. It just feels much more approachable for a firm to build a workspace agent right now than to use Claude Managed Agents.

If you’re familiar with Notion’s Custom Agents, this should look very similar. Right now, we’re trending towards this type of model for agents that do end-to-end tasks for your firm.

The big question mark is the credit-based pricing. It feels like a necessary evil for OpenAI, Notion, and others. Running agents will likely cost significant money and compute, so they need some form of variable pricing for it to be sustainable. But it’s going to be a huge black box for firms starting out with it.

A firm will need to run an agent for at least a few months before having a real picture of the cost to be able to plan and budget appropriately. Managing each agent’s context, specific model (cheap and fast vs expensive and powerful), and scope will become a legitimate skill.

For firms looking at this, it’ll come down to what you want out of it. If you’re chasing short-term productivity, you’re probably better off having your team use Claude Cowork, Claude Code, or Codex. If you’re playing longer term and willing to invest, this could be a valuable approach for scaling a firm and becoming more profitable.

TRENDING NEWS

Microsoft expanded Copilot Cowork into its Frontier early-access program: Took much longer to make it to the Frontier program than they initially suggested, which is slightly concerning. But this is a very exciting announcement. TBD on how successful it is because there are some glaring differences to Claude Cowork, the biggest of which seems to be the lack of customization (MCPs, custom instructions, etc.).

Digits released an MCP server to connect AI tools directly to live ledger data: Another accounting vendor launching their MCP to use with tools like Claude Cowork. This was necessary after the same news from Intuit and Xero, but great move.

Anthropic added interactive visuals to Claude Cowork: This has been available for several weeks in Claude Chat, and I've been loving it. Now you can create interactive charts within your Cowork tasks, where it already has access to all your files and context. This is an underrated update.

OpenAI launched ChatGPT Images 2.0: This is almost certainly the new state-of-the-art image model, overtaking Google's Nano Banana Pro. The examples I'm seeing from people are pretty mind-blowing.

Google introduced Workspace Intelligence, which uses Gemini to pull context across all your Google apps: This is the equivalent of Microsoft's Work IQ. Essentially, the large companies that have your data spanning across a lot of apps (such as Drive, Gmail, Docs, and more) are automatically compiling that info to "train" the models used in your account. Sounds cool, but probably going to be some rough edges in practice.

Meta rolled out MCI (Model Capability Initiative), an internal tool that records keystrokes, mouse clicks, and screenshots of employees across hundreds of apps to train AI agents: Ohhhh boy... This one is really interesting. On one hand, there are some real benefits to this. Think of tools like Laurel that do something similar to help accountants capture their time and analyze efficiency, manual work, etc. But the framing here seems important. Don't say it's to train AI models so it can eventually do human work. That feels obvious. But Zuck is gonna do Zuck things, I guess.

SpaceXAI secured an option to acquire AI coding company Cursor for $60B later this year: Cursor has been working on their own* models for a few years now (asterisk: they really just fine-tune other open source models to create powerful and fast coding models). Elon and SpaceXAI, the new combination of SpaceX and xAI, are now almost certainly going to acquire them to bring that power under Elon's umbrella of assets. Cursor is probably the most popular AI IDE right now, so it'll be interesting to follow if anything changes with the product.

Microsoft plans OpenClaw-inspired agent features for Copilot: This isn't shocking, but still felt interesting. Every AI company seems to be chasing the OpenClaw hype now. Essentially, they're just saying they want to put powerful AI agents in your pocket so you can have AI work for you from anywhere.

PUT IT TO WORK

Recently, Anthropic released a new design for the Claude desktop app. Claude Code in the desktop app looks a little cleaner. OpenAI also released some new updates for their Codex app.

It’s come to my attention recently that people actually still pay for ChatGPT, so I’m going to cover them here.

In all seriousness, there are some great things about GPT-5.5 and the Codex app. So I wanted to compare Codex and Claude Code performance on financial statement reporting and analysis.

I also wanted to show how simple and unintimidating the two interfaces can be for normal tasks. I think a lot of people would have you believe you need to work in a terminal and know scary commands to use them. If a dummy like me can learn them, you can too.

WEEKLY RANDOM

Over the weekend, a Chinese humanoid robot ran a half-marathon in Beijing. Well, actually a lot of them attempted a half-marathon. But one in particular broke the human record for fastest half-marathon. It’s aptly named Lightning.

Apparently, we’re entering the stage where we need “human records” and “robot records.” The Guinness Book of Human World Records will be a fun read from the ancient archives someday.

Is it even impressive for a robot to break a half-marathon record? My car doesn’t get tired when it drives 100 miles… And while we’re on cars, can a driverless Waymo enter this competition and smash Lightning’s world record? It’s still a robot if it’s not human-driven, I suppose. Does it have to have legs?

That being said, the year-over-year improvement is impressive. Apparently, it finished in 2 hours 40 minutes in the same race last year vs 50 minutes this year.

China seems to be alarmingly ahead of us in the robot race. They’re much further ahead of the US in the robot race than we are in the AI model race. They’re going to ship all their robots over here to steal our jobs while we’re still trying to figure out how to get Microsoft Copilot to find the right file in SharePoint.

Until next week, keep protecting those numbers.

Preston

Keep Reading