Hello fellow keepers of numbers,

2026 is going to be a wild ride. We’re getting model releases and new apps at a furious pace. This week, Anthropic launched Sonnet 4.6. It’s an update to Sonnet 4.5, and performance is now competing with the older Opus 4.5 model at lower prices and with more usage. Google also launched Gemini 3.1 Pro, which is competing with Opus 4.6 (industry leader) in most categories.

OpenAI also launched the Codex app a few weeks ago, so we take a look at the Codex AI model, the app, and how it’s similar and different from Claude Code and Claude Cowork. Plus, I tested Google’s new music model for fun, and the results were as bad as you’d expect (might’ve been bad prompting).

THE LATEST

Anthropic launches Sonnet 4.6 and upgrades Claude in Excel

Source: Gemini Nano Banana Pro / The AI Accountant

Anthropic launched Claude Sonnet 4.6 as its most capable mid-tier model, with upgrades in reasoning and coding. It also offers significant improvements in computer use, where the model operates software by clicking and typing. Sonnet 4.6 comes with an option for a 1M context window in beta.

Sonnet 4.6 keeps the same pricing as Sonnet 4.5 at $3 per million input tokens and $15 per million output tokens. It’s now the default model for Free and Pro users on Claude.ai and is available through the Anthropic API and major cloud platforms like AWS Bedrock and Google Vertex AI.

Anthropic also expanded its Excel integration through the Claude in Excel add-in. It now supports MCP connectors, so Claude can pull live data into a workbook without leaving Excel, using Anthropic’s connector framework to respect existing permissions and access controls. This is available on Pro, Max, Team, and Enterprise plans.

Why it’s important for us:

Model launches are coming fast and furious now. This is a meaningful update for Claude users. Sonnet 4.6 is somewhat equivalent to the older Opus 4.5, but eats up much less of the usage limits and is much cheaper in the API.

If you’re a Max plan subscriber, you probably should continue to use Opus 4.6. But if you find your usage is getting consumed quickly (which is a current complaint with Opus 4.6) or you’re looking to complete a simple task, Sonnet 4.6 might be the best option.

The update for Claude in Excel is once again buried in the news announcement, but it feels massively important. I’ve yet to test this, but I do see the connectors. Notably, it’s just MCP connectors, so it looks like Google Drive, Gmail, SharePoint, and Outlook aren’t currently included. But I can see a world where that data becomes available directly in Excel through the Claude plug-in. In the meantime, you can pull information from other financial sources, including custom MCP connectors you create in Claude. As the connectors library in Claude grows, this will become increasingly more useful.

Also, a quick note on the 1M context window… This is really big news that’ll fly under the radar. Right now, model quality degrades significantly when it surpasses its context window limit of 200k tokens and has to compact the conversation. The upgrade to a 1M context window could be a major upgrade for the model. Unfortunately, it’s only available in the API right now, and you pay for usage beyond the 200k token context window (instead of being included under your current plan’s usage). So TBD on how good this will be until it becomes more readily available.

Google releases Gemini 3.1 Pro

Source: Google / Gemini 3.1 Pro: A smarter model for your most complex tasks

Google released Gemini 3.1 Pro, its latest flagship AI model built for complex reasoning and multi-step problem solving. The update incorporates the "Deep Think" capabilities Google introduced last week and marks the first time the company has used a .1 version increment instead of its usual .5 naming, signaling a faster release cadence.

The benchmark numbers are strong. Some benchmarks even report twice the score of Gemini 3, which was released just 3 months ago. Other benchmarks report numbers similar to those of Opus 4.6, the current leader in most categories.

The model supports a 1M token context window with up to 64,000 tokens of output. It’s rolling out now in the Gemini app, NotebookLM, Google AI Studio, Vertex AI, Gemini Enterprise, Gemini CLI, and Android Studio. API pricing stays the same as Gemini 3 Pro, so developers get the upgrade at no additional cost.

Why it’s important for us:

I repeat, model launches are coming fast and furious. This is an eye-opening release. The numbers in the benchmark are quite impressive. However, I’ve said many times before that I personally put no weight on benchmarks. I prefer to use the model and make my own opinions based on communication style, output quality, and speed.

I’ve yet to run any tests on Gemini 3.1 Pro since it just launched, but I expect it to be an extremely smart model. I suspect it’s going to have a significant impact on anyone using it to code. Unfortunately, Google doesn’t yet have a strong competitor to Claude Code, Claude Cowork, or Codex (see the Codex app news).

OpenAI’s new Codex Mac app competes with Claude Code and Cowork

A few weeks ago, OpenAI introduced the Codex desktop app for macOS, a new interface for running and supervising multiple AI agents across projects. The app organizes work into projects and threads so you can run agents in parallel and switch between tasks without losing context.

Codex now supports agent skills, which Anthropic open-sourced two months ago. Agent skills are reusable bundles of instructions, resources, and scripts that tell agents how to run specific workflows, like generating spreadsheets, editing PDFs, deploying web apps, or pulling design assets from tools like Figma.

The app also adds “Automations,” which are scheduled jobs that let Codex run recurring tasks in the background, such as issue triage, test failure summaries, or daily reports, and place the results in a review queue for a human to approve.

Codex uses system-level sandboxing, so agents are restricted to a specific folder or branch. Any commands that need elevated permissions, like broader file access or network calls, require explicit approval or pre-approved project rules. The app is available now on macOS for ChatGPT Plus, Pro, Business, Enterprise, and Edu subscribers. Windows support is planned.

Why it’s important for us:

I’ve been sitting on this news for a couple of weeks because I wanted to properly test the app before providing my analysis. I’m going to break this down into a few different pieces to make it easier to understand. Admittedly, OpenAI has done a very poor job of making this simple on us… Buckle up for this one.

(1) What is Codex?

Codex currently means two things. First, it can mean OpenAI’s state-of-the-art coding model, currently called GPT-5.3-Codex. Their Codex AI model has historically been targeted at software developers and used almost entirely for coding. Now, Codex can also mean their new macOS application, called Codex.

(2) What is GPT-5.3-Codex, and why don’t I see it in my ChatGPT account?

GPT-5.3-Codex is ChatGPT’s most powerful model. It’s their agentic model that is now intended to compete with Opus 4.6. It seems as though OpenAI has recalibrated. They’ve likely seen the success of Claude Code and Claude Cowork over the last several months and decided that GPT-5.3-Codex needs to be more widely available. Hence, the Codex app.

“Why don’t I see GPT-5.3-Codex as an option in my ChatGPT account?” Hey friend, great question. Actually, OpenAI made the (seemingly) conscious decision to hide the GPT-5.3-Codex model behind a separate chatbot entirely. If you’re in ChatGPT, you’ll see a “Codex” icon on the left sidebar. If you click that, it’ll redirect you to another ChatGPT chatbot (aka GPT-5.3-Codex).

You’ll notice I didn’t really answer the question ‘why.’ I can’t answer for OpenAI’s stupidity. Sorry.

(3) How is the Codex app different from Claude Code and Claude Cowork?

Personally, I find the Codex app to be somewhat of a mix of the two. It’s not as simple as Claude Cowork (Codex is still built for developers). But the interface is a bit easier on the eyes than Claude Code (especially in the terminal).

Under the hood, the Codex app is essentially the same as Claude Code, except it uses GPT-5.3-Codex instead of Opus 4.6. In my opinion, the most meaningful difference is how OpenAI has chosen to package agent skills. It’s really simple to search for available skills and add them to your workspace, which is just a nice quality-of-life feature. It’s also easy to find and use the skills in the chat.

(4) Is it good?

I’ve been testing this on and off over the last few weeks, and comparing it to tasks I’ve completed in Claude Code. The GPT-5.3-Codex model is really smart and is quite comparable to Opus 4.6. Which means it often completes my tasks with nearly the same quality and consistency as Claude Code.

I have two main complaints so far. (a) It asks a million questions before completing tasks. In one case, it asked me 9 clarifying questions before it started work. By question 7, I told it to “just f*****g do it.” Honestly not sure if this is a bug or feature. (b) I really don’t like the communication style of the model. It’s verbose and often extremely technical. I find I spend a lot of time just trying to understand what it’s telling me, whereas Claude Code is concise and simple almost every time. I realize this is somewhat personal preference.

The huge benefit of the Codex app right now is that the limits are far kinder. You get significantly more usage in the Codex app for the $20/month plan in ChatGPT than you do in the $20/month plan for Claude.

(5) Why should I care?

I say all this because I find so many people think ChatGPT isn’t even playing in the same ballpark as Claude right now. OpenAI is making strides here. They’ve seen the reception of Claude Code and Claude Cowork. They’re going to be in the game. They’re just behind right now. Especially for non-developers. That’s how good Claude Cowork is at the moment.

PUT IT TO WORK

Anthropic expanded Claude Cowork to Windows last week. I thought it would be useful to cover the main differences between regular AI chats and something like Claude Code or Claude Cowork.

While the conversation is around Anthropic’s products, the same underlying fundamentals apply across AI models and apps. I think this will be important to understand moving forward because this is the direction the AI landscape is shifting.

Claude Chat vs Claude Code & Claude Cowork - Watch Video

WEEKLY RANDOM

Google added music generation to the Gemini app this week. It currently gives you a 30-second track with vocals, lyrics, and “album” cover art.

Now, I’m not one to speak to how to best prompt AI models for songs or videos. But the output I got from this was pretty hilariously awful. I know there are some popular AI music apps, like Suno, that have pretty good AI music models. I think my expectations might’ve been a bit too high for the new Google Lyria model…

I did at least have a lot of fun, even though I burned probably 30 minutes asking for ridiculous music. Below are a few results from the model. I’m fully aware how awful they are. All I did was put in the prompts below. I requested no edits. Now that you’ve been warned that your ears may bleed, take a listen.

Prompt:

i want the linkin park style of song. something that has the undertones of in the end. make it a song about a penguin who, against all odds, figures out how to fly.

Prompt:

create a rap in the same style of song as eminem. just like the song lose yourself. this is the chorus of the rap. it's about a lost puppy who’s 8 miles from his home in detroit. he's got one shot, one opportunity to get back home. he needs to capture it, otherwise he'll let it slip.

I can’t blame you if you didn’t listen. But for those that did, it’s obvious the model is pulling the lyrics from the actual song I referenced, even though it can’t replicate it due to copyrights. In my Linkin Park song, it says, “I tried so hard and got so far, but in the end it doesn’t matter. I had to fall…” Almost a word-for-word rip of lyrics from “In the End.”

In my Eminem song, it uses the phrase “my feet don’t fail me now,” which I guess is close to “feet fail me not.”

So what’s the point? I don’t know. I guess it’s fun, and I also like the idea that Google might get sued because of something an idiot like me asked it to make about a penguin who learned how to fly.

The model kind of sucks though.

Until next week, keep protecting those numbers.

Preston

#019: Sonnet 4.6, Gemini 3.1 Pro, and Codex App

Anthropic launches Sonnet 4.6 and upgrades Claude in Excel

Google releases Gemini 3.1 Pro

OpenAI’s new Codex Mac app competes with Claude Code and Cowork

Keep Reading

The AI Accountant