Hello fellow keepers of numbers,

I keep waiting for a slow news week. I’m starting to think it won’t happen… This week had a bit of everything. New models, some accounting-related AI guidance, and some serious drama.

Yesterday, OpenAI launched GPT-5.4. We’re now up to 6 (!) different model releases by OpenAI since November 2025. For those interested: GPT-5.1, GPT-5.2, GPT-5.3-Codex, GPT-5.3-Codex-Spark, GPT-5.3 Instant, and GPT-5.4.

Google launched Gemini 3.1 Flash-Lite, which is a cheaper version of their flagship model. COSO released an audit-ready internal control framework for AI. And Anthropic has been designated a supply chain risk by the Department of War.

On a lighter note, I also walk through the Codex app, including a comparison to Claude Cowork. I’ve included a fun hack to bring over Claude Cowork’s free skills.

THE LATEST

OpenAI’s GPT-5.4 brings 1M-token context and desktop control

Source: Gemini Nano Banana 2 / The AI Accountant

OpenAI announced GPT-5.4 on March 5 as its new flagship “frontier” model for professional work across ChatGPT, the API, and the Codex app. GPT-5.4 introduces a 1M-token context window, which lets the model read and reason over very large inputs like full document collections, long email threads, or multi-tab spreadsheets in a single run instead of chunking them into smaller pieces.

The release also includes “computer-use” as a native capability, meaning agents powered by GPT-5.4 can control a computer. OpenAI is pairing this with new tooling like Tool Search, which dynamically loads only the tools an agent needs.

GPT-5.4 comes in multiple variants, including GPT-5.4 Pro, which is designed to handle the hardest problems. GPT-5.4 API pricing is $2.50 per million input tokens and $15 per million output tokens for the base model.

Why it’s important for us:

The speed of model releases is eye-opening. Both OpenAI and Anthropic have said their AI models are now creating future AI models. And the speed of releases with impressive capabilities is strong evidence.

This model is comparable, if not even better than, Claude Opus 4.6. I’ve said this often recently, but accountants aren’t pushing the limits of these models. Not even close. So it’s tough to tell a difference between state-of-the-art Claude and ChatGPT models. A lot of it comes down to the communication style of the model. And as we’ve seen in recent weeks, a big decision point for people is the user interface for the agentic AI, like Claude Cowork.

I should also note the jump up to the 1M-token context window. This is a big deal for us. It means we’re able to upload more files, more context, and more instructions to the AI model without it degrading and negatively affecting performance.

Personally, I still don’t really like the communication style of ChatGPT, even with early testing of GPT-5.4. I find my interactions with Claude more enjoyable. But that varies by user. As you’ll see below in the Put it to Work section, OpenAI has a (slightly less friendly-looking) competitor to Claude Cowork, so there are legitimate options.

Anthropic officially designated a supply chain risk by Department of War

Source: Anthropic / Where things stand with the Department of War

The Department of War issued a formal letter to Anthropic on March 4th designating the company a supply chain risk to U.S. national security under 10 USC 3252. The designation restricts Anthropic's Claude models from being used as a direct part of Department of War contracts, though it does not affect Claude usage outside of those contracts.

The designation stems from two narrow usage exceptions that Anthropic maintains for its models: a prohibition on use in fully autonomous weapons and in mass domestic surveillance of Americans. Anthropic says the two sides had been in productive conversations about alternative arrangements before the formal designation was issued.

Anthropic CEO, Dario Amodei, said the company plans to challenge the designation in court, citing the statute's requirement that the government use the "least restrictive means necessary." In the interim, Anthropic said it will continue to offer its models to the Department of War and the national security community at nominal cost with dedicated engineering support.

Why it’s important for us:

This is the first time in the history of the United States that an American company has been designated a supply chain risk.

Not only is this an irrational punishment of Anthropic, but it’s also going to punish every company that has a contract with the Department of War and uses Claude. Now, any company using Claude will have to switch to another AI model for work on contracts with the Department of War and switch back to Claude for other contracts. The daily workflow just got very convoluted.

It also has a direct impact on national security. The U.S. military has been using Claude for a long while now. They reportedly have used it in the Iran war, even as recently as March 1st.

Anthropic is being designated a supply chain risk because they asked for redlines that OpenAI has publicly supported. The same OpenAI that the Department of War partnered with just hours after the Anthropic negotiations fell apart. Now, OpenAI is amending its contract to secure the same redlines. Again, the redlines for which Anthropic is being designated a supply chain risk.

There will be no impact on accountants, unless you’re contracted with the Department of War. In which case, you cannot use Claude for anything related to that contract. You can still use Claude for all other contracts. It’s not immediately clear how big an impact this will have on Anthropic’s business. It’s also not clear that this will hold up in court.

Google’s new Gemini Flash-Lite is the cheap, fast workhorse

Source: Google / Gemini 3.1 Flash-Lite: Built for intelligence at scale

Google announced Gemini 3.1 Flash-Lite, a new model in the Gemini 3 family designed as its fastest and most cost-efficient option, now rolling out in preview via the Gemini API in Google AI Studio and for enterprises through Vertex AI. It is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens. Google says it outperforms Gemini 2.5 Flash with roughly 2.5x faster time-to-first-token and 45% faster output speed while maintaining similar or better quality.

Flash-Lite is a multimodal model that can take in text, images, audio, video, and PDFs and produce text responses. It supports a 1M-token input context window and up to 64,000 output tokens, meaning it can process hundreds of pages of content or large batches of documents in a single request.

Why it’s important for us:

Gemini 3 and 3.1 Pro have flown a bit under the radar, but they’re up there with the best available models right now. Gemini 3.1 Flash-Lite is an extremely cheap and fast version of those same models.

This is most relevant for anyone using APIs for automations. Past Gemini models weren’t the best at tool calling (i.e., using a tool for web searches, searching folders in Google Drive or SharePoint, etc.). This’ll be something worth testing with the new Gemini 3.1 Flash-Lite model.

Overall, this shouldn’t be expected to compete with the smartest models. It’s not made for that. But it’s probably the most affordable smart model available via API.

COSO drops audit-ready GenAI controls playbook

Source: Gemini Nano Banana 2 / The AI Accountant

COSO released a new guide called Achieving Effective Internal Control Over Generative AI that applies its existing internal control framework to AI. The guide is designed for management, financial reporting teams, compliance, IT governance, internal audit, and external auditors.

Rather than targeting specific AI tools, the guide organizes AI use cases into eight categories based on what the AI is doing: ingesting data, transforming data, posting and processing transactions, orchestrating workflows, making judgments and forecasts, monitoring, tracking regulatory changes, and managing human-AI interaction. Each category gets its own set of controls and risk considerations mapped to the five COSO components.

Why it’s important for us:

It’s nice to see any sort of guidance at all in the accounting space related to AI. Most of our regulatory bodies and reputable sources have been silent on the topic. Which is making it very tough for firms to understand when and how to use AI.

Most firms don’t have formal controls around AI right now. This guide gives a starting point that’s already mapped to a framework that auditors accept.

Ultimately, the guidance isn’t that different from the other controls around the financials. Log what goes in and comes out. Know who’s responsible for reviewing the work. Test if it’s doing what you expect. And keep track of when you change the model, the prompts, and/or the data you’re feeding it.

PUT IT TO WORK

The Codex app recently launched for Windows. I covered the app in my newsletter a few weeks ago.

In this video, I go through the basics of the Codex app as compared to Claude Cowork. While the Codex app isn’t wrapped in as simple an interface, you can still accomplish similar tasks.

Exploring the OpenAI Codex App (vs Claude Cowork) - Watch Video

WEEKLY RANDOM

Anthropic and OpenAI are in quite the cat-fight, and it’s getting more personal now than ever. The fallout from the last week has been interesting to follow.

If you’re unfamiliar with the news, it was covered pretty well in this post from The Neuron.

The short version: Anthropic refused to let the military use Claude for domestic mass surveillance of Americans or fully autonomous weapons, the Pentagon cut off negotiations, and OpenAI struck a deal hours later.

Turns out most people don’t want to be surveilled. ChatGPT mobile uninstalls jumped 295% the day after the deal, and 1-star reviews increased 775%. Claude went from #42 to the #1 free app on both the Apple App Store and Google Play. It surpassed ChatGPT in U.S. daily downloads for the first time.

It wasn’t just users. OpenAI’s own employees expressed support for Anthropic. Some employees also decided to leave OpenAI shortly after the news, including a VP of Research.

Just days after the deal, Sam Altman admitted the deal “looked opportunistic and sloppy” and went back to amend the contract. I’m not sure where the renegotiations currently stand. Supposedly, Anthropic is already back in negotiations with the Pentagon.

There’s definitely bad blood between Anthropic and OpenAI. It’s certainly not getting any better after this. I’ll just continue to stand here and judge each of them from my moral high ground.

Until next week, keep protecting those numbers.

Preston

#021: GPT-5.4, Anthropic's Punishment, Gemini 3.1 Flash-Lite, and COSO's AI Controls

OpenAI’s GPT-5.4 brings 1M-token context and desktop control

Anthropic officially designated a supply chain risk by Department of War

Google’s new Gemini Flash-Lite is the cheap, fast workhorse

COSO drops audit-ready GenAI controls playbook

Keep Reading

The AI Accountant