Hello fellow keepers of numbers,

Major news this week, and a few things that could’ve flown under the radar. OpenAI previewed GPT-5.6, but it’s not available to the public yet. We have the U.S. government to thank for that.

The IRS finally issued guidance on AI. It’s not that helpful, but I suppose that’s probably not very shocking. They did pretty much all but kill the hourly pricing model though, which is probably for the best.

And Claude released Claude Tag. I know what you’re thinking, and yes, it’s an AI model that plays tag with kids. Not really, it’s actually just Claude as an AI employee inside of Slack (and maybe Teams in the future?). Seems small, but I think this announcement is a signal of where things are trending the next 6-12 months.

Plus, stick around for a demo of Cowork where I create a proposal from a meeting transcript in less than a minute. And some tips on Cowork projects, which have become one of my favorite things lately.

THE LATEST

OpenAI previews GPT-5.6 Sol, Terra, and Luna

Source: ChatGPT Images 2.0 / The AI Accountant

OpenAI previewed the GPT-5.6 series, a three-model family made up of Sol, the flagship; Terra, a balanced model for everyday work; and Luna, a fast, lower-cost option. The company says Terra matches the performance of GPT-5.5 at half the price, while Luna delivers strong capability at its lowest cost yet. Under a new naming system, the number marks the generation and the names mark capability tiers that can each improve on their own schedule.

GPT-5.6 adds a "max" setting that gives Sol more time to reason through hard problems, plus an "ultra" mode that splits work across helper agents to move faster on complex tasks. OpenAI reports gains in coding, scientific analysis, and cybersecurity, and says Sol is its strongest model yet for finding and fixing software vulnerabilities.

Ahead of launch, OpenAI previewed the models to the U.S. government and, at the government's request, is starting with a limited release to a small group of trusted partners before opening it up more broadly. The company says it does not want this kind of government sign-off to become a standard step for every release, and that it is working with the Administration on a clearer process for future models.

For now, access is limited to a select group of trusted partners and organizations. OpenAI says it plans to make the models broadly available to the public through ChatGPT, Codex, and the API in the coming weeks. Pricing runs per 1M tokens at $5 input and $30 output for Sol, $2.50 and $15 for Terra, and $1 and $6 for Luna.

Why it’s important for us:

This is the first major model release that left me a little bummed out, and it has nothing to do with the models. GPT-5.6 finally arrived, but it’s not available for most of us.

We seem to be heading toward a world where new models don't reach the rest of us first. The government is on high alert after Mythos, and frontier models are now getting vetted against some kind of security threshold before they go wide.

What are those thresholds, exactly? I don't know. I'm not sure OpenAI or Anthropic know either, which makes me think the government doesn't really know yet. I've said this before, but it's a good thing someone is finally paying attention. That was overdue. The open question is whether this is the right way to do it. OpenAI pretty clearly thinks it isn't. They said as much in their announcement.

So we can't actually test GPT-5.6 yet. You know I don't put much stock in benchmarks, but they're all we have right now. They look great. Sol is at or above Mythos level, which means we should expect it to land somewhere near where Fable 5 was, even if Fable only graced us for about 48 hours. I'm keeping expectations high, but I’ll hold the final verdict until I can actually use it.

OpenAI seems to have made the decision to copy Anthropic’s naming conventions. They used to be pretty bland with things like “GPT-5.5.” Now, we get Sol, Terra, and Luna.

Looking at benchmarks, it seems like they line up approximately to the following:

Luna = Sonnet

Terra = Opus

Sol = Fable

Hard to know for sure until we get to use them. I actually like the shift in naming conventions though because it makes it easier to understand which model to use and when.

Sol’s pricing caught my attention. It benchmarks near Fable but prices much closer to Opus. Fable is almost double the price of Sol, so if Sol is actually on par with Fable then that’s pretty big news.

It's all speculation until we get our hands on it, because benchmarks only tell you so much. But this is a promising one, and I'm looking forward to testing it the day it opens up.

IRS issues its first AI guidelines for tax practitioners

Source: ChatGPT Images 2.0 / The AI Accountant

The IRS Office of Professional Responsibility issued Alert 2026-19, "Introductory Guidelines for Responsible AI Use in Federal Tax Practice," on June 24th. The alert does not create new rules but clarifies how the existing Circular 230 ethics standards apply to generative AI.

On due diligence, the IRS said practitioners must thoroughly review every AI-generated document before it reaches a client or the agency, verifying all facts, citations, and calculations, and cannot rely on AI alone. On billing, cost savings from AI should be passed on to clients rather than billed as time that was not spent, since charging for work AI performed or double-billing for AI-assisted tasks could violate Circular 230's prohibition on unconscionable fees.

The alert also covers confidentiality, competence, and firm oversight: client data must stay in secure, enterprise-approved tools, practitioners are expected to understand how their AI systems work and where they fail, and firms must put internal AI policies, staff training, and vendor vetting in place under Section 10.36, all of it documented.

Why it’s important for us:

Our regulatory overlords have finally broken their silence. They’re very, very late to the party, in typical fashion.

This is the first time we’ve received official AI guidance for the tax space. They punted on a lot of topics, but at least there’s something to lean on now.

The biggest takeaway was probably the most surprising one. They’ve killed hourly billing. If you’re saving time using AI and billing by the hour, the IRS now says you have to pass those savings along to the client. The move toward fixed-fee and value-based pricing was already well underway, and this pretty much cements it. Now the question is just how fast firms can get off the hourly model, especially the ones already using AI.

The rest was fairly straightforward guidance. Firms need to understand AI and how their teams are using it. Firms need an AI policy, something many have been saying for a long time now. And firms need to handle client data appropriately, although they didn’t say much about what that actually means for us.

What they did make clear is that you need enterprise-approved tools. The Team/Business or Enterprise plans in ChatGPT or Claude. No free plans. That was already obvious, but now it’s also in writing. Those plans keep your data out of model training and give you the SOC 2 Type II coverage you need.

Anthropic launches Claude Tag for Slack

Anthropic launched Claude Tag on June 23rd, a way for teams to delegate work to Claude by tagging @Claude inside a Slack channel. Administrators grant Claude access to specific channels and connect it to the tools and data they choose. Once tagged, Claude breaks a request into stages and works through them asynchronously, then replies in a thread, and a single shared Claude in each channel can pick up where the last person left off or schedule its own tasks over hours or days.

In a separate post, Anthropic explained the access model behind it, which it calls "agent identity." Rather than acting on behalf of whoever tagged it, Claude gets its own service accounts that admins provision, moving permissions from per-user to per-channel. A channel member without direct access to a connected tool can still have Claude use it if the channel grants that access, and each private channel gets a separate identity whose memory and access do not carry into other channels.

Administrators can scope Claude's tools and memory per channel, set token-spend limits, block outbound traffic to unapproved systems, and review a log of every task, memory write, and network call. Direct messages work differently, running on a user's own claude.ai account and credentials.

The company says tagging @Claude is now one of its main ways of working internally, with 65% of its product team's code created by the internal version of the tool. Claude Tag is available today in beta for Claude Enterprise and Team customers, runs on Opus 4.8, and replaces the existing Claude in Slack app, which administrators can migrate from within 30 days.

Why it’s important for us:

This one is bigger than it looks on the surface. I'd argue it's the first real foray into AI workers in the enterprise that actually looks safe enough to deploy.

Some people would say Claude Cowork is already an AI worker, and OpenClaw (and OpenClaw clones) has been chasing the same idea. But Cowork is specific to each individual. Each person logs into their own desktop app, prompts it, and sets up their own scheduled tasks. OpenClaw still has real reliability and safety questions in an enterprise setting.

Claude Tag is Cowork, but multiplayer. Think of it similarly to creating a Cowork project for a client, then sharing it so the whole team prompts into the same project, sees one running thread of everyone’s requests, and manages the files and context together.

This is basically the same idea, just living in Slack, where each channel acts like its own Cowork project. A channel per client, an ops channel, an HR channel, and you scope Claude's permissions to each one or across several.

It also learns over time from the channels it sits in, the same way a good Cowork project builds context. You can manage the memories and context for each channel inside your Claude account.

They mentioned a new feature called “ambient behavior” that I’m interested to see in action. When you turn it on, supposedly Claude proactively flags what it thinks you need to know. My read is if a project went quiet because someone was out for two weeks, Claude could come back on its own to ask whether there's an update, and it can pull relevant context from other channels it has access to. If it works like I'm hoping, that's a real step up from scheduled tasks in Cowork.

I know several firms that use Slack for internal comms. But most don’t. Anthropic said in the post that they plan to expand to other places where teams work. I hope this makes it into Teams as well, but TBD.

The big downside right now is how this is billed. It’s not part of your normal usage. It’s billed through extra usage, which means each prompt is billed at API rates. The bill could get expensive pretty quickly. It’s unfortunate because I think this will be a barrier for a lot of firms to try it. There’ll need to be clear ROI before implementing this, but for teams committed to it, I think it’ll be extremely worth it.

TRENDING NEWS

Microsoft added finance-focused Copilot features to Excel, with reusable skills for work like DCFs, the close, and variance analysis, plus connectors to trusted data sources: They're adding a library of skills, very similar to the plugins that Anthropic publishes for users. They're still behind, but props to Microsoft for continuing to release nice features.

OpenAI updated GPT-5.5 Instant for more natural conversations and better handling of complex instructions: A minor update, but a nice quality-of-life bump for anyone firing quick questions into the chat. Paid users can hold onto 5.3 Instant for three months if the new one bugs them.

Digits connected its AI ledger to Ignition, Reach Reporting, and Karbon: Smart move to plug into the tools firms already run on, since that's the fastest way for Digits to land in firms that aren't going to rip out Karbon or Ignition. The bet is that everything else in your stack sits on top of their ledger.

Treasury's tax watchdog warned the IRS's own live chat and chatbot tools may give wrong or incomplete answers that push taxpayers toward filing incorrect returns: The timing is almost too good. The IRS just put out guidance saying you have to check AI outputs and keep a human in the loop, and now we find out their chatbot hallucinates. Either way, clients are leaning on these bots for tax answers before they ever call you, so expect to spend more time un-teaching whatever a chatbot told them.

OpenAI unveiled Jalapeno, its first in-house AI inference chip, built with Broadcom to run ChatGPT and Codex on its own hardware: Owning the chips could be a small differentiator for future OpenAI models. The bigger play is probably hardware itself, giving OpenAI a new revenue line off the chip over the long term.

Sakana AI launched Fugu, an orchestration model that routes each request to the best frontier model or assembles a team of them: Claude already does a version of this with workflows, but you're stuck inside its own models. Something model-agnostic that picks across providers is the interesting part, and I'd bet we see a lot more of it.

OpenAI made Codex in the ChatGPT mobile app generally available, with device pairing, notifications, and inline review comments: Codex on mobile has been my favorite development in recent memory. Still waiting on Claude to push this same functionality to improve on Dispatch.

PUT IT TO WORK

Most accountants notoriously suck at sales. I might also suffer from this affliction, even if I like to think I don’t. So just to be extra careful, I try to make sales as easy as possible. And I want the same for you, which is why I have today’s demo.

I walk through how to use Cowork to create a proposal in less than a minute from only a template and a meeting transcript. As a bonus, I also show how you can set up Cowork projects and use memories to improve your projects over time.

Claude Cowork Projects & Proposal Automation - Watch Video

WEEKLY RANDOM

Apple gave us a small but telling preview this week of where the AI boom starts to pinch the rest of us.

Apple just raised prices across its MacBook and iPad lineup. The reason is memory and storage costs spiking, which ties directly to AI data centers eating up the same components.

We've spent two years reading about AI capex in the abstract, billions of dollars in data centers and GPUs. Until now, it’s been news that mostly hasn’t impacted our day-to-day. But now it’s officially showing up as a higher price tag on the laptops and iPads we buy. It’s probably one of the more widely felt examples so far.

Just hours after Apple’s news, Microsoft said they’re raising the prices of Xbox game consoles. I expect we’ll see a lot more on this topic from other companies now that Apple and Microsoft have made these announcements.

The compute and memory pouring into AI is now competing with the rest of us for the same chips, and the rest of us are losing.

So many people are already anti-AI for many other reasons. This is probably going to be the biggest news yet for the anti-AI audience. Unfortunately, we’re heading down what seems like a very polarizing path with very few solutions currently in sight.

Until next week, keep protecting those numbers.

Preston

#037: GPT-5.6, IRS Guidance on AI, and Claude Tag

OpenAI previews GPT-5.6 Sol, Terra, and Luna

IRS issues its first AI guidelines for tax practitioners

Anthropic launches Claude Tag for Slack

Keep Reading

The AI Accountant