Hello fellow keepers of numbers,

Some major news this week in several areas. Anthropic released Opus 4.8, their new state-of-the-art model. At least until they release Mythos in a few weeks... Also, Byron officially launches to prepare 1065s and 1120s. And major news from OpenAI in a venture with Crete to build self-improving agents for tax prep.

Plus, stick around for a demo of the new Opus 4.8 model compared to Sonnet 4.6 for a slide deck for a year-end financial review.

THE LATEST

Anthropic releases Claude Opus 4.8

Source: Anthropic / Introducing Claude Opus 4.8

Anthropic released Claude Opus 4.8, an upgrade to its flagship model focused on more reliable code and better judgment when it acts on its own.

Anthropic says the model is roughly four times less likely than Opus 4.7 to let flaws in its own code slip through, and more likely to flag uncertainty than to state something it can't back up. It also takes a clear step up at controlling a browser and completing multi-step web tasks, outperforming both Opus 4.7 and GPT-5.5.

A new effort control in claude.ai and Cowork lets users trade speed for quality, spending more tokens on the harder problems.

Standard pricing stays the same as Opus 4.7 at $5 per million input tokens and $25 per million output tokens. Anthropic also cut fast-mode pricing to $10 and $50 per million, about three times cheaper than before. Opus 4.8 is available now across every plan from consumer to enterprise, and through the API as claude-opus-4-8.

Why it’s important for us:

This one was needed. Opus 4.7 never really clicked for a lot of us. It was too literal, you had to spell out every step in your prompt, and it wasn't great at just running with a task on its own. Opus 4.8 pulls things back toward how the older Opus models felt. More creative, more willing to figure things out, less hand-holding in the prompt.

What I care about most is that it's better at the everyday stuff. Slide decks, memos, financial analysis, writing. And this version is statistically better at all of it based on the benchmarks and some of the early feedback. I'll keep testing it myself, but the early signs are good.

My one gripe is that this is several Opus updates in a row with nothing for Sonnet. Sonnet is the cheaper model, and it's what a lot of us run when we're trying not to blow through our usage limits. I'd love a Sonnet release with these same improvements so we're not stuck a generation behind on the model we lean on to keep costs down. For now, Anthropic is clearly focused on the high end.

Also, in this update, they’ve added effort levels to Cowork. You can tell the model how hard to work on a task. Nice lever if you're watching usage closely. It's also one more thing to learn, and I suspect most people will pick a setting and leave it there. Probably fine.

The big, hidden headline from the announcement is that Claude Mythos goes public in the next few weeks. That's the model Anthropic handed to cybersecurity pros and software vendors to patch systems before any wider release. I'm very curious what that looks like once it's out.

Byron launches AI agents for business tax prep

Source: ChatGPT Images 2.0 / The AI Accountant

Byron publicly launched an AI agent platform for business tax preparation aimed at CPA firms, alongside a $6.5 million seed round led by Square Peg with participation from Sorenson Capital and Liquid2 Ventures.

The platform takes on the chain of work between receiving client data and producing a finished return. It supports 1065, 1120, and 1120-S workflows, rolls prior-year returns and workpapers forward into Excel, requests missing information from clients, and generates review-ready output.

Byron both partners with and natively integrates with Canopy and Truss, two platforms many firms already run. The company was built by AI and accounting specialists drawn from Amazon's artificial general intelligence team and Deloitte.

It arrives as the profession absorbs more than 120,000 open accounting and auditing roles a year, a shrinking pipeline of new CPAs, and a workforce nearing retirement.

Why it’s important for us:

The Byron announcement is underrated. Most of the AI tax prep vendors we've seen started on the individual side, building for 1040s first. Byron started on 1065s and 1120s. They've also already got partnerships with Canopy and Truss, which immediately puts them in front of a lot of firms.

What I really like is the focus on Excel for workpapers and syncing. So many new AI tools are spending the effort to build a platform, but that requires significant change management. Accountants are notoriously bad with change.

The big question is how Byron can handle K-1s and K-3s. Some vendors are built entirely around K-1 processing, so if Byron handles the complex ones well, they’re going to eat up some of that market.

Unfortunately, the AI tax prep space is very disjointed right now. Maybe Byron will move into 1040s down the road. But as it stands, you'd run one vendor for business returns and a separate one for individual returns. Don't love that. Two subscriptions and two sets of integrations for what should be a single workflow.

I find the convergence of the tax space quite interesting. A year ago, you had to buy one tool for intake, another for internal workflow, and a third for tax prep. We’re seeing the tools converge, and there’s a lot of overlap in features already. I expect that to continue. There may be space for everyone, but I suspect a few vendors will come out as the obvious winners. It’s hard to predict who right now.

OpenAI and Crete build Tax AI, a self-improving tax prep system

Source: OpenAI / Building self-improving tax agents with Codex

OpenAI detailed Tax AI, a tax preparation software system it built with Crete, a network of more than 30 accounting firms. The two worked in a forward-deployed model, with OpenAI engineers embedded alongside Crete's accountants for six months to build the system around how the firms actually prepare returns.

Built on OpenAI's Codex coding model, Tax AI drafts returns and then rewrites its own code to get more accurate the more it runs. When a preparer corrects a draft, the system records the full trail from source document to final filed value. Recurring corrections become test cases, and Codex takes them on as scoped engineering tasks, writing and validating the fixes against a pass condition.

During the pilot, Tax AI processed 7,000 returns, focused on 1040 and 1041 filings and the complex, K-1 heavy work that consumes the most billable hours. OpenAI says drafts reach up to 97% accuracy, and the share of returns hitting 75% correct field completion rose from 25% at launch to 86% within six weeks. Practitioners reported cutting preparation time by about a third and raising throughput by roughly 50%.

Why it’s important for us:

This is something the accounting space should really be paying attention to closely. Crete used OpenAI engineers embedded inside their firms to build a self-improving tax prep software.

In practice, it’s a bunch of agents that handle the prep work once the client data comes in. The example they walk through is a Schedule E built from rental data. The agent takes whatever you give it (handwritten notes, PDFs, Excel schedules), drafts the form, then hands it to an accountant to review and, ultimately, file the return.

Building the initial architecture to even put this in place is incredibly difficult, so I shouldn’t make it sound trivial. And OpenAI doesn’t even really mention that part in the article. But what’s most interesting is the self-improvement loop built on top of the architecture.

Every step gets logged: the agent’s draft of the Schedule, the accountant’s edits during review, and the final filed return that’s treated as the source of truth. Codex (OpenAI’s agentic AI, like Claude Code) then compares all three, figures out anywhere the agent went wrong, groups the errors, and proposes the fix itself for an engineer to review and approve.

The engineers aren't writing the improvements here. Codex is. And in the charts they share in the article, the accuracy climbs fast. Like exponentially within just 1-2 months.

I don’t think I buy the argument that SaaS is dying, but if you do, this is a pretty good example to point to. Crete now runs its own software doing the work they’d normally get from a vendor (or a few), and it improves on its own and is tailored to Crete’s specific workflows.

There’s obviously complexity in running your own software. They’ll likely need engineers to maintain and improve this for as long as they use it. And I suspect it was quite expensive to build with the help of OpenAI.

The model is really interesting though. Embed engineers inside a firm, build self-improving agents, and log every step of the review and final results.

If you buy from a vendor, you get what they built for the market at large. You also have to wait on their updates, which sometimes may fix one thing and break another. But build it like Crete and you shape exactly how it functions, teach it how you work, and the improvements are a direct result of your data and input.

The article is a heavy read with a lot of engineering and AI talk. But I’d highly recommend to anyone in the accounting space because I think it’s important to understand, even if it’s just at a high level.

TRENDING NEWS

Anthropic plans to release its powerful Claude Mythos models more widely once stronger safeguards are in place, after a preview found roughly 10,000 critical software bugs in 30 days: I mentioned this in the Opus 4.8 take. The actual news is that it found bugs faster than anyone can patch them, with only a fraction fixed so far. Great for security review, a little terrifying for every vendor whose software your firm depends on.

EY launched a $1 billion, five-year initiative with Microsoft to push enterprise AI past pilots and into production: This is the same kind of move as the OpenAI and Crete story above, embedding people inside a firm to make AI stick, whether through custom software or just training teams to use agentic AI. It's also a real win for Copilot, which has been lagging behind ChatGPT and Claude, so landing EY and likely more Big 4 firms matters for Microsoft.

OpenAI added Windows support for Codex in the ChatGPT mobile app, so Windows users can now start, review, and steer tasks from their phone while the work keeps running on their computer: Windows users haven't been forgotten after all. You can kick off tasks and see status updates from your phone now on Windows. Computer use in Codex with GPT-5.5 has been getting better quickly, but the main issue is how much usage it consumes.

Anthropic raised $65 billion at a $965 billion valuation, the largest venture round in history and enough to pass OpenAI: The money goes toward compute, which is what sets the rate limits and pricing we all deal with in Claude. An IPO is clearly next.

OpenAI filed confidentially for an IPO that could value it near $1 trillion, with a possible September debut: Going public means we finally get to see the real numbers behind ChatGPT. Reports say they lost $1.22 for every $1 of revenue last quarter, and an IPO will force a lot more of that into the open. Mostly just interesting to watch.

Intuit cut about 3,000 employees, roughly 17% of its staff, to fund its push into AI: More news on the jobs front. Intuit is cutting staff even with revenue up 17% and profit up 48%, and they're crediting AI. It's reaching the accounting world now too.

Synthetic raised $10 million from Khosla to build fully autonomous, accrual-basis bookkeeping with no human bookkeepers: It's founded by Ian Crosby, who previously founded Bench. Personally, this feels like a shot in the dark. How viable is a bookkeeping tool with no human in the loop, especially compared to keeping humans in the loop and running things through Claude Cowork, Claude Code, Codex, or the other AI tools popping up for bookkeeping and CAS?

Pope Leo XIV released his first encyclical, a 42,000-word call for binding AI regulation, with Anthropic co-founder Chris Olah onstage at the Vatican: It's interesting that the Vatican picked only one AI lab to stand onstage with them, and that was Anthropic, who has been notably more vocal about the impact of AI on the world and building it safely. On the serious side, an encyclical like this carries weight with a huge number of people, so it'll likely push AI to the front of mind for a lot of them. That heats up the regulation conversation and gets more people paying attention to the space, which is a good thing overall.

Avalara named Hugo Sarrazin its new CEO, replacing co-founder Scott McFarlane, to accelerate its AI tax compliance push: A CEO brought in specifically to push AI tells you where the roadmap is headed, and where the space is going.

PUT IT TO WORK

In the Opus 4.8 news, I said feedback has been that it’s better at everyday work. So I put it to the test against Sonnet 4.6. I asked Cowork to take the financials for 2025 for my fake company and create a slide deck for the owner and COO. Same prompt for both Opus and Sonnet.

Both were impressive, but one was a clear winner.

Opus 4.8 vs Sonnet 4.6: Slide Deck for YE Financials - Watch Video

WEEKLY RANDOM

Uber's COO went on a podcast this week and admitted the company burned through its entire 2026 budget for Claude Code and Cursor in four months. At Uber, 95% of engineers use AI tools every month, 70% of their committed code is AI-generated, and Q1 R&D spend still jumped 17% to $951 million. And the COO admitted he can't cleanly tie all that spend to a better product.

I’d imagine the timeframe here is so short that it’s hard to draw meaningful conclusions. But I understand the complaint because it’s significant money to continue to spend without feeling confident it’ll translate into anything valuable.

I tend to think this is bad strategy rather than an AI problem. So many companies have just been throwing money at AI, cutting headcount, buying tools, and grading employees on token usage. But when it comes down to it, was there ever a real plan?

It’d be like buying the nicest grill on the planet and cooking it at the perfect temperature to make a nice juicy steak and some grilled veggies. Then when you open the grill nothing is there because you actually trashed all your ingredients before you bought the grill and forgot to go to the store.

Opinions on AI are all over the place right now. People getting real value are doing the boring work first and gradually implementing it into their workflows. The ones who fired everyone and figured AI would just run the place are finding out how expensive “just use AI” actually is.

Until next week, keep protecting those numbers.

Preston

#033: Claude Opus 4.8, Byron, and Tax AI from OpenAI and Crete

Anthropic releases Claude Opus 4.8

Byron launches AI agents for business tax prep

OpenAI and Crete build Tax AI, a self-improving tax prep system

Keep Reading

The AI Accountant