Sunday 10 May 2026

Ask your LLM to work itself out of a job

When I was a fresh-faced consultant working at IBM in the 2010s, one of my first gigs was wrangling Microsoft Project and Excel in the PMO of a large greenfield IT program. There was a wave of investment into Australian resources at the time, which created a boom in IT work as these companies stood up their systems from scratch.

The PMO was the last place I wanted to be, and much to the chagrin of the Account Partner, I accepted it kicking and screaming. From day 1, I was determined to cut my 18-month assignment in half, by any means necessary.

The path of least resistance, or rather the only path available to me at the time, was to create an elaborate Excel spreadsheet that did all of the time and finance tracking and reporting with just a few clicks. What was taking me the better part of a week at the start, was down to a couple of hours by month 9. I showed my manager how it worked so he could survive without me, and I lobbied the client to let me go. It worked (!) and soon I was doing more interesting work.

I feel this way about LLMs today.

The best tokens are expensive and there is no sign they are going to get cheaper any time soon. At the frontier, things are heading in the other direction. So I think it’s imperative that we use those tokens to their fullest extent by asking our LLMs to make themselves obsolete for certain tasks.

Not only is this more efficient, it’s also more consistent. You’re taking a non-deterministic language model and asking it to create something deterministic. There are different levels to this depending on whether something is a one-off, an occasional activity, or a daily workflow.

  1. One-off complex modelling: Create an HTML artifact. Say you want help deciding whether to keep your current car or buy a new one on a novated lease. Ask for a calculator with all the variables and other knobs you want to tweak and you can run as many scenarios as you like without burning another token. Yesterday, I linked to Thariq Shihipar’s article on the various ways you can use HTML with Claude Code. There are plenty of fresh ideas in there.
  2. Routine data task: Create a script. This is ideal when you want to do something complex more than a few times, with consistency. I like this one when you want to process data without sending it to a cloud service. For example, you want to categorise your monthly expenses from a CSV. Create a script you can run locally, and for bonus points ask for an HTML file you can use to open the output and visualise it for you. You could do this in Excel but it’s a lot of clicking. An LLM can create this all with a well-crafted prompt or two. Once you have the basic script in place you can even automate what it does with the script. I wrote about a script I created to locally check my vulnerability to a supply chain attack, without exposing any secrets to an LLM.
  3. Something you do, or want to do, every day: Build an app. This is where things go from a couple of prompts to a serious project. It’s also where you will find the greatest leverage and have the most fun. In my case, I’ve spent north of $500 in API LLM credits building a Swift iOS and Mac app for my reading and writing workflow, and I have a backlog of more than 60 feature ideas and improvements that will probably cost me hundreds or thousands more to complete. This is a meaningful sum and it’s an involved process but once it’s built I get leverage from this for as long as I use and maintain it.

I could have used an LLM to chat and reason about different scenarios for making a car buying decision. I could have (naively) used a coding LLM to mine my machine for secrets and tell me how exposed I am. I could have asked a coding LLM to publish my writing to GitHub every time. Instead I used the LLM to work itself out of these jobs.

I encourage you to do the same.

Saturday 9 May 2026

HTML is the ideal canvas for LLMs

Thariq Shihipar on X:

I’ve started preferring HTML as an output format instead of Markdown and increasingly see this being used by others on the Claude Code team, this is why.

I don’t know whether I, and many others, are just catching up to the latest model capabilities, or if it’s because the harnesses have levelled up via new skills and MCPs, but in the past fortnight something clicked for me, and now almost every chat feels like it could have an HTML element to it. Let me share a few examples:

  • After reading about rumoured CGT changes in Australia’s upcoming federal budget, I asked Claude to create a calculator so I could model the impact on my investments without back-and-forth in the chat.
  • I mocked up and iterated on several app screens for a new feature in just a few minutes, then asked Claude (the app) to write a document I could give to Claude Code to implement.
  • I asked Claude to generate an artifact covering a recent npm supply chain attack. This pulls together information from multiple sources into a single experience, and progressively reveals information so I can better comprehend it.

So far I’ve mostly created these in the Claude app, but after reading Thariq’s post I’m inspired to experiment with them in Claude Code too.

Wednesday 6 May 2026

AI is exhausting Mac mini and Mac Studio supply

Juli Clover, reporting for MacRumors:

Apple has removed more desktop Macs from its online store as the global memory shortage continues. Mac mini models with 32GB and 64GB of RAM are no longer available for purchase, nor is the M3 Ultra Mac Studio with 256GB RAM.

The M3 Ultra Mac Studio is now available only in a 96GB RAM configuration, with higher-tier options eliminated. Both M3 ‌Mac Studio‌ and M4 Max ‌Mac Studio‌ models have delivery estimates of 9 to 10 weeks.

Earlier this year, following the launch of Claude Dispatch (which requires an always-on computer), I said:

Ignoring tablets, the vast majority of computers purchased are laptops—around 70% according to IDC’s 3Q 25 quarterly data—exactly what you don’t want for an always-on AI assistant.

It’s hard to imagine every individual buying, and every employer issuing, an additional desktop computer just to house an AI assistant.

I still believe this is true, on the whole, but there is a genuine use case here and clearly Apple was caught flat-footed by a flood of AI-induced demand.

The more I use AI coding agents, the more appealing an always-on desktop is looking. Being blocked from commanding an agent to build a new feature, or fix a bug, from wherever I am, is starting to niggle at me.

I use an M4 MacBook Air for personal projects, one of which is an iOS app. When I’m not working, I close the lid and place it wherever—on a shelf, a countertop, the couch. From that point on, I can’t progress anything until I grab the laptop and open it again.

I know, I could force it not to sleep with a sudo command or an app like Amphetamine, but then I’d need to keep it plugged in when not in use, making it more like a desktop anyway.

I’m guessing most people buying these Mac desktops are using them for OpenClaw or Hermes or one of the other ‘claws’. Im claw-curious but the security and privacy tradeoffs have kept me from taking the plunge. I do however buy into a future where you can ask an agent to do real work for you from anywhere, and it has access to the tools, files, and internet services you would use yourself.

Via John Gruber

ChatGPT Had a Goblin Problem but Professionals Missed Out on the Fun

Ben Glickman, writing for the Wall Street Journal:

OpenAI recently gave its popular ChatGPT strict instructions: Stop talking about goblins.

Recent models of the artificial-intelligence chatbot have been bringing up the creatures in conversations with users seemingly out of the blue […]

Barron Roth, a 32-year-old product manager at a tech company, said the bot referred to a flaw in his code as a “classic little goblin.” He said he counted more than 20 times it mentioned goblins, without any prompting.

[…]

Part of the explanation lies in ChatGPT’s “personality” feature, which allows users to select from a handful of prompt instructions that give the chatbot more distinct characteristics. OpenAI said that for the bot’s “nerdy” personality, […] mentions of goblins in its GPT-5.4 model increased 3,881% from a previous version.

For the “professional” personality, the chat-bot dialed back the fun. Goblins were mentioned 7% less.

Personally, I’d like to see more use of goblins in professional settings, not less.

Monday 4 May 2026

Mercedes-Benz admits they pressed the wrong buttons

Matt Adams, writing for Drive:

Mercedes-Benz joins the growing list of manufacturers listening to customers and admitting that touch-sensitive controls and burying controls in menus were mistakes. The German brand remains committed to offering large screens in its models, but has listened to its customers and will offer physical buttons for key functions in future.

More of this, please.

Wednesday 29 April 2026

Github Copilot closes the chapter on cheap AI

GitHub company news blog earlier this week:

Today, we are announcing that all GitHub Copilot plans will transition to usage-based billing on June 1, 2026

This shouldn’t come as a surprise to anyone paying attention, but it’s still going to be very uncomfortable for businesses and enterprises that have based their budgets on heavily subsidised per-seat plans which will soon be a lot less useful.

Instead of counting premium requests, every Copilot plan will include a monthly allotment of GitHub AI Credits, with the option for paid plans to purchase additional usage. Usage will be calculated based on token consumption, including input, output, and cached tokens, using the listed API rates for each model.

I’m not a fan of these credit models. They’re akin to buying chips at a casino. Of course, this works for casinos, and it might work here, but AI compute isn’t (or at least shouldn’t be) gambling.

I’ve been using Claude Code via the API recently, and while it is gobsmackingly expensive, I actually like seeing my spend in dollars rather than hidden behind credits or an opaque subscription (although it irks me that everything is USD-denominated in Anthropic’s Console). I went as far as building a Mac menu bar app that tracks my spend in real time and does the currency conversion for me.

Because I can see what I’m spending, I’ve changed my behaviour to manage token burn: using /clear when I’m done with a task; editing .md docs to provide feedback on plans rather than sending multiple short messages; using lower cost subagents for smaller tasks; enabling 1hr prompt caching when I know I’m going to get interrupted during a session. These all add up to real cost savings.

Let me illustrate this with an example that might help those who have never seen the true cost of their AI use before. Imagine you are in the middle of a coding session with Opus 4.7 1m, default 5 minute cache TTL. You’ve been working for an hour and have 400k tokens in the context window when you decide to go make a coffee. When you return 10 minutes later, you write to Claude “back, what’s next” and Claude rewrites the entire context window to cache at a cost of USD $2.50 ($5 x 1.25 x 400,000). That is an expensive coffee!

Base plan pricing is not changing. Copilot Pro remains $10/month, Pro+ remains $39/month, Business remains $19/user/month, and Enterprise remains $39/user/month.

I can say from experience that people will be shocked by how little they can get done with $10 or even $39 when these changes take effect.

Tuesday 14 April 2026

Glider Classic

New re-release of Glider. I have fond memories of the original Glider, though I can’t quite remember where I played it. Possibly on a Macintosh Classic or LC in my public primary school library.

What a perfect use of AI:

What changed in 2026? Weirdly, I decided to put Claude (the LLM, AI) on it. I pasted code into Claude’s context window and asked, “What issues do you see?” Very quickly problems involving mixing coordinates systems (points versus pixels) lead to a series of fairly small changes that brought Glider Classic back from the dead.

Via John Gruber

Monday 13 April 2026

Golden tickets

Utterly delightful collection of weekly Milwaukee bus tickets from the 1950s.

For an Australian comparison, the Powerhouse Collection contains NSW bus tickets from 1966-1970, which are primitive in comparison. The weekly tickets covering the period are just as utilitarian.

I also stumbled on a 358 page(!) tome documenting paper bus tickets in country NSW.

Via John Gruber

Monday 30 March 2026

'I Saw Something New in San Francisco'

Ezra Klein, writing for the New York Times (gift link provided), surfaces some patterns I've noticed in my own use of AI, and in particular Claude, recently:

What makes A.I. truly persuasive isn't that it praises our ideas or insights, it's that it restates and extends them in a more compelling form than we initially offered, and does so while reflecting a polished image of ourselves back at us.

Continue reading →

LLMs can argue any direction

Andrej Karpathy on X:

Drafted a blog post

  • Used an LLM to meticulously improve the argument over 4 hours.
  • Wow, feeling great, it’s so convincing!
  • Fun idea let’s ask it to argue the opposite.
  • LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol

The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

I have plenty of blog posts sitting in draft for this exact reason. Even though everyone is (rightly) focused on agents in 2026, there is still an art to good prompting.

Workstation Exposure Tool

Last week’s LiteLLM supply chain attack got me wondering how exposed I would have been if I’d installed the compromised package.

I had Claude build me a macOS-specific shell tool to check. I’ve called it the Workstation Exposure Tool and published it for anyone to download and use.

I wanted something that I could run locally without installing any packages, making network calls, reading credentials to memory, or sending sensitive data externally (for example, asking an LLM to look for exposed credentials across my machine).

My scan come up clean, so I created some mock credentials to test it worked. The tool found them all, which was encouraging. A disclaimer though. I have definitely not pushed the boundaries of the script. I’m also not making any claims about how comprehensive, reliable or accurate the scan is. If you want to try this out please read the code yourself to see how it works, and expect to hit bugs. Ctrl+C if it gets stuck while running.

Sunday 22 March 2026

Claude Dispatch

I first saw this when it popped up in the Claude sidebar on iOS a few days ago.

Message Claude from your phone on the way to work, then follow up from your desktop when you sit down […] Claude works on your desktop computer using the files, connectors, and plugins you’ve already set up in Cowork. Claude messages you the outcome—a spreadsheet, a memo, a comparison table—rather than showing you every step of the process.

This sounds good in theory but comes with an obvious catch.

Your computer must be awake and the app must be open for Claude to work on tasks

Despite all the excitement around claws, the requirement for an always on desktop is a major handbrake on mainstream adoption.

Ignoring tablets, the vast majority of computers purchased are laptops—around 70% according to IDC’s 3Q 25 quarterly data—exactly what you don’t want for an always on AI assistant.

It’s hard to imagine every individual buying, and every employer issuing, an additional desktop computer just to house an AI assistant.

Thursday 12 March 2026

Should You Be A Carpenter?

Demitri Spanos, in the first of a promised series of conversations with Casey Muratori called ‘Wading Through AI’:

I have many friends who are in the VC business — investors, managers, recruiters, whatever. I would be surprised if those people could climb down from the level of commitment that they have put into transforming the workforce with AI.

The hundreds of billions flowing into ‘transforming the workforce with AI’ is either a bet on fewer jobs, a bet that AI lifts revenues, or both.

The smart knowledge workers I know aren’t waiting for the outcome. They anticipate disruption and are trying to get ahead of it. I’ve seen a few reactions: using AI to build apps and diversify income, encapsulating experience into AI agent skills and marketing themselves as fractional hires, publishing first-time research papers on AI, and open sourcing AI-adjacent dev tools to establish credibility. No carpenters—yet.

Maybe this time we’re wise enough to know that when big cheques are written, they will get cashed.

Wednesday 11 March 2026

Bringing Code Review to Claude Code

Claude Blog:

Today we're introducing Code Review, which dispatches a team of agents on every PR to catch the bugs that skims miss, built for depth, not speed. It's the system we run on nearly every PR at Anthropic. Now in research preview for Team and Enterprise.

From the docs:

Code Review is billed based on token usage. Reviews average $15-25, scaling with PR size, codebase complexity, and how many issues require verification.

Continue reading →