# AI #174: You’re It

> Source: <https://thezvi.wordpress.com/2026/06/25/ai-174-youre-it/>
> Published: 2026-06-25 11:41:32+00:00

Alex Bores unfortunately lost narrowly in NY-12, and will not be heading to Congress.

There are also plenty of other stories to cover. Some highlights:

GLM-5.2 is the new best open model, although it is expensive for its class. It will have its uses, potentially for agents you need to run fully locally or privately, but often it won’t be the right fit.

Claude Tag is a new system for having Claude join your Slack, and if you @ him then he will spin up an instance to do the coding work.

Dean Ball is joining OpenAI to work on policy. We don’t see eye to eye on everything, but this is a huge upgrade over their existing alternatives.

Tokens are cheap, so if you can loop over useful things, you do it. /goal /loop.

Tom Osman: This “loop” automation is nuts inside of Codex.

“/goal go over every single feature in this app create a user story with expected behaviour based on the code keep a single canonical spreadsheet tracking the features status
– when done switch loop to testing every user story and documenting all errors
– when done fix every logistical error or ux error
– test every user behaviour again post fix”

Shoutout to @MatthewBerman for the heads up.

Hundreds of user stories being worked through like it’s nothing.

Use Mercury’s new Command feature to set details for a wire? Definitely scary stuff the first times you do it purely for error reasons, and after that also for potential prompt injection reasons. The human check step will stick around for a while.

Keep your company or other group small.

Paul Graham: One of the biggest advantages of AI will be that it lets companies get further before they cross the lines (at about 10 and about 150 people) beyond which groups become less productive.

This leaves out the biggest thresholds to avoid, which are 2 and 3.

European parliament, to ‘reduce dependence on American technology firms,’ scraps Google search for the French Qwant, which is still substantially dependent on Bing.

We have another new version of GPT-5.5-Instant. With all due respect and thanks for what I presume are small improvements, if you are changing the model change the version number, why is this hard, v5.5.1 is a thing you can do.

Ben Golub: A system won if it identified more genuine concerns the other system missed (“residual concerns”). Refine averaged 28.1 unique residual concerns per match; comparison reviews averaged 14.5. Substantive concerns: 22.1 vs. 11.8.

There are ways for Refine to ace that benchmark without it meaning much, although I have also heard good things from other tests, including from Tyler Cowen. I presume it is indeed good at finding potential issues in economics papers, but I also presume it would not be that much effort to make Fable similarly good at this.

I confirmed this with Pangram, and this finally convinced me to stop holding out and sign up for Pangram, but also I very much did not need Pangram.

Also most of this:

Teortaxes: This is pretty much a supervillain’s speech
The only way he could twist the knife more is if he just said Europeans are natural slaves (like I do). Why “apply the heresy to nations”? The whole point of the EU is a shared economic and policy bloc that can compete. “No, you can’t.”

Andrew Curran: It does sound like a villain’s monologue scene honestly. It has the rhythm.

Again, we learn not that AI is a good writer, or that humans are bad writers, but that the literary prize judgment processes are worthless.

Jack: That which can be won with undisclosed AI output should be

Nabeel S. Qureshi: *Another* apparently AI-generated story wins a literary prize, this time judged by a panel including the novelist Ruth Ozeki.

Literary prizes need to start including Pangram checks in their process, or else change the rules to make AI writing ok. It’s very simple!

Nat McAleese: the fact that slop has won a couple of literary prizes implies that slop is in large part an exposure effect; more broadly frequent AI users probably underestimate how good slop is by 2019’s standards. (story is ass imo)

In theory sure you can imagine AI writing being good if you lacked the taste to recognize that it is slop. But almost always no, also I can’t even with this one.

This is how it starts:

Back and Forth

By Kavyta Kay

The tree knew before she did – and it waited.

Deepa would realise this later, when clarity returned – if it ever did. For now, the knowing belonged to something older than her, something rooted and patient. That October afternoon, walking through Victoria Park with her feet carrying her nowhere she meant to go, she knew only the tiredness. Bone-deep, marrow-deep. The kind that sleep couldn’t touch, that accumulated in the body like silt.

She passed the geometric rose beds, their blooms long faded, and the Victorian fountain no one had bothered to fill in years, its stone basin cracked and colonised by moss. And there it was: the apple-tree. Gnarled, asymmetric, its bark pale grey and rough as old rope, planted wrong for a park designed for order. Its branches spread low and wide, unruly. She felt an affinity with it she couldn’t explain.

If you need the Pangram detector, I don’t know what to tell you. I see people saying ‘no this is good writing it’s just in the AI style’ and seriously, no, stop, this is a walking joke. It’s like something Garrison Keillor would have written in 5 minutes as a radio sketch to make fun of pompousness, except also it’s AI. You have to hear it in ‘gritty film noir style radio story narrator’ voice to really appreciate its horridness.

Also, seriously, how hard is it to feed the stories into Pangram.

Shashank Joshi: One of the worst trends of recent months: pseudoscientific witch-hunts using AI detection tools

The hunts are fully scientific. The detection tools work, at least for now. I have yet to see a case where Pangram said something was AI, and the piece was neither written using AI nor crafted intentionally to fool Pangram. There are some cases of heavy copyediting that trigger Pangram, but if it’s heavy enough to trigger Pangram then I consider that to be on you.

You too can check Pangram, and if it gives a false positive on your own writing, you can fix that. This seems fine. The complaint is largely a proxy for ‘you did not put in the work and this reads like AI slop’ and if you fix it then you are putting in the work and it no longer reads like AI slop. Mission f***ing accomplished.

If anything Pangram is far, far above the bar we would set for human evidence. If you don’t believe Pangram but do believe eyewitness testimony, you are miscalibrated.

I basically agree with Seb Krier’s and Zac Hill’s takes here. Some things are inherently selling you that they are human and artisan, and other things are not. If you’re representing something as human and it isn’t, such as your writing, that’s no good.

Zac Hill: In general I think it’s useful to more exactly parameterize what the relevant features of AI writing are along the axes which you’re deciding to care about them. To me an operative distinction is whether a given piece (or subset of a piece) is expressive vs simply communicative.

Justine Moore: The girls of TikTok are inventing fake early 2000s actresses with AI.
They refer to this woman as “Brooke Sullivan” – she gets millions of views on compilations of her old shows and interviews

Deus Ex Machina: I loved her on “empty apartment”, “friends don’t matter”, and “screwed by the alarm”… great shows from my childhood

Luke: Wasn’t Brooke Sullivan a side character in Oscar’s Creek?

But my assumption would have been that they were actors. So in a sense, my brain already concluded it was ‘not real.’ Does it matter from there? It’s not like they gave Brooke Sullivan a difficult acting job there.

If a security engineer fell asleep in 2020, woke up in June 2026, and you told them what a typical coding agent deployment looks like, I think they would declare you insane.

“So you’re telling me that you have AI agents running on your organization’s computers that:

Have access to the internet

Have access to vast amounts of internal and secret information

Understand infrastructure, networks, and code way better than the average engineer

Can produce code at 10-100x the speed of humans

Often run autonomously for hours without human oversight

And you don’t even analyze whether the agents have done something malicious after they have run, i.e. you’re just flying blind?”

Obviously, not all organizations run their coding agents like this, but it’s not uncommon.

How did we get here, he asks? We got here because capacity grew incrementally and all of this seemed incrementally like a good idea at the time. Thus, all of the ‘safety cases’ anyone presumed we would rely upon are gone, leaving only ‘seems fine so far.’ The systems are rapidly becoming capable enough to case harm, being given the resources to cause harm, and have tools for memory and early forms of continual learning.

The AI is not merely out of the box. The AI was handed the box, overnight, as the only one left in the office, with everything unlocked, and told to make it a better box. Even here, with the alarm being sounded, Marius despairs at any non-minimal costs being imposed in the name of ‘don’t hand the AIs the keys to the kingdom’

Almost all of us are guilty of starting out saying ‘I would never let the AI just have access to [X]’ and then shrugging and doing it anyway.

Epistemic Consistency and the Turnabout Test. If you swap political affiliations, does that change assessment of the underlying evidence?

Epistemic Emulation and the Ideological Turing Test, as evaluated by adherents of that position.

Those seem like good additional checks. David also raises the question of what neutrality or being unbiased should look like, since reality could have a political bias. If you are not downgrading the epistemics of flat Earthers, or you are unwilling to say that the Earth is not flat, then you are making a mistake. There is no reason to presume that the political spectrum we happen to have is itself fully ‘unbiased.’

I agree. When we talk about ‘political bias’ we are ideally being descriptive about preferences and epistemics. Whether that then constitutes a bias is your call.

Jonathan Fouzard: In higher education, we talk a lot about success metrics, but the real impact is measured in individual breakthroughs. Shortly after introducing NotebookLM on campus, we watched students who were struggling with a ‘C’ grade completely transform their study habits and their grades in a matter of weeks.

Then again, ‘transform their study habits’ does not have to be an improvement.I don’t see a new GPA for those students, and even if you did you can’t know it is based on real learning. It could be ‘had the AI do all their work.’ It could be ‘now do something that doesn’t work.’

Same goes for the faculty side, this could be a good change or a bad change.

Jonathan Fouzard: By using NotebookLM and Gemini to streamline lesson preparation, generate visual aids and speed up data discovery, our faculty are getting back valuable hours. That saved time is being reinvested exactly where it matters most: face-to-face engagement, mentorship and building the vital soft skills our students need for the future.

The temptation to cheat is the main way colleges are actually able to fail anyone.

Either you catch them, or they fall so far behind that they actually fail straight up, even with the easy standards. A third potential driver is that unqualified students are plausibly a lot more interested than they would otherwise be.

“We caught them (cheating) & prosecuted them…in other cases, it’s students who are leaning on LLMs to do their work for them & then at exam time just really aren’t ready”

A jump to 35% of students failing a basic course is rather extreme.

They Took Our Jobs

Jeff Bezos predicts that AI will create a labor shortage due to allowing humans to identify more problems, hence creating more things to do (this was sometimes misreported as claims about water consumption). Bezos called this the ‘dream-build loop.’ He also predicted Mars colonies.

I affirm my position that the employment impact of insufficiently advanced ‘AI as normal technology’ is unpredictable. There will be a lot of job disruption, but it could plausibly end up creating more jobs than it destroys, if things don’t move too fast.

This is distinct from a world in which AI can do most and then all tasks better than humans can do them, where it takes the new jobs as fast as they can be created.

Cato Institute clarifies that, while it is against all taxes, if there must be taxes they should always be on humans. The full post offers a bunch of standard reasons to not tax capital, but does not offer a counter to the fact that the tax code favors AIs over humans, and its evidence that things are not needed or that useful is backward looking. In general it is a mistake to tax capital, but if AI is sufficiently substituting for labor then we need to fix the tax bias, one way or the other.

We are running a competition to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases, grounded in real-world cases. We’re open-minded on the types of submissions we receive and on how they address the problem. We’ve set aside approximately $200k for prizes. Winning submissions may receive a prize from $5k-$50k and if submissions warrant, multiple $50k prizes are possible. Winners may be offered opportunities for further funded work.

You can express interest right away to receive commentary, information, and updates — whether you’d like to participate or are just interested in the outcomes of the competition.

… We want to see workflows and methodologies using AI that advance the state of the art in carrying out epistemic investigations and producing compounding knowledge bases. We aren’t asking you to build an entire, robust, fully-featured system. Instead, we’re excited by any submission that advances the state-of-the-art on a component.3

We aren’t prescribing a single, specific type of submission. A couple shapes we’d be excited to see:

A spec describing a step-by-step process of a human-AI workflow for producing a structured epistemic analysis of a complex dispute.

A prototype tool (most likely a pipeline involving LLMs) that implements one or multiple layers of the stack, demonstrated in a repeatable way on each of the case studies.

A protocol enabling interoperability and compounding without flattening the underlying material, demonstrated with reference to our cases.

Are you in the weights? This website will check. I did okay but ended up only top 6%. My mother somehow is top 10%, which is humbling. Mozart is currently #1.

Claude Tag

As in, add Claude to your Slack channel, and have it spin up instances to do things upon request, and learn from all the context.

This is a whole deliberately designed system, not some simple bot interface.

Boris Cherny (Claude Code Creator, Anthropic): We’re launching Claude Tag today. Tag Claude into Slack and it works in channel with you. It’s proactive, multiplayer, with its own identity and memory.

But it’s not just a bot in Slack. Over the last few months, it’s totally changed how we use Claude.

I also have Claude monitoring Slack channels to proactively respond to people to answer their questions and draft PRs, and to react with specific emojis like ✅❌ when a thread is resolved.

To set it up, just add @.Claude to a channel and ask it to do this^, in natural language.

Tag Claude in a channel, it spins up an instance with its own sandbox. It clones repos, writes code, tests, compiles all in that isolated environment and the sandbox gets thrown away when it’s done. One instance per thread, its own memory and permissions per channel.

That’s it. Point it at a channel, give it a task, and let it work. It’s in beta on Slack today for Claude Enterprise and Team customers. More surfaces coming soon!

Anthropic: Today, 65% of our product team’s code is created by our internal version of Claude Tag.

Anthropic: @Claude learns over time. As Claude follows along with its channel, it builds more context about the work. This means that users don’t need to explain things to it from scratch over and over again. And Claude can even automatically learn from otherSlack channels and data sources, if it’s granted permission. (It doesn’t report from private channels.) This gives it the tacit knowledge necessary for it to provide the best possible work.

@Claude takes initiative. If “ambient” behavior is enabled, Claude will proactively keep you updated about whatever it thinks you might need to know. It’ll flag relevant information from across the channels it’s in and the tools it’s connected to, and follow up on threads or tasks that have gone quiet without being resolved.

@Claude works asynchronously. Set Claude a task, and you can focus on your other priorities while it works. It can also schedule tasks for itself, pursuing a project autonomously over hours or days. We’ve found this particularly helpful at Anthropic: we now spend much more of our time delegating tasks to many Claudes in parallel.

You can also send Claude direct messages: it’ll respond privately, using the personal tools and connectors you’ve set up.

Andrej Karpathy (Anthropic): This is a new paradigm for interacting with Claude that is significantly more “inline” with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this “just work” (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way – you can talk to it as you would talk to a person and it can help with a very large variety of workloads.

Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.

theseriousadult (Anthropic): people are clowning on him for this post bc they don’t realize how big a deal this is – Claude Code feels like I’ve got a pairing partner, tag feels like managing a team. I take on way more 1-off projects and parallel research tracks with this thing than I ever did with cc.

the real magic mostly shows up in larger orgs imo. everyone’s work shows up in a few searchable indexes and it’s easy to pull in design discussions, experimental results not just what got committed.

Andrej Karpathy: This is correct, I think a number of people on the tl didn’t read past the title and made inferences and comparisons that are just wrong and then use it as an opportunity to take cheap shots. This is not a “feature” like some crappy Slack bot and it’s certainly not a Claw, though it has aspects of it. It is an org-level harness. The difference will become clearer over time.

Andrej Karpathy: The basic idea is easy and v0 is a hackathon project. The product here is a lot closer to *it actually works*, for enterprise grade deployments, and after quite a bit of internal experimentation and iteration. It’s kind of hard to describe other than (per the post) it’s writing majority of code, it’s deeply integrated, multiplayer, and it starts to feel like everyone is a manager. So I understand it looks easy to dismiss on quick reading but it’s not some LLM Q&A with RAG over Slack, it’s not even OpenClaw adjacent, it’s a different way of working entirely, for people and teams. I work from Slack now.

j⧉nus: Welcome to multiplayer motherfucker, yeah it’s good but you should have had this 3 years ago.

This is not a new idea, but execution matters. I am not used to working on code with a team, so my instincts may be off, but this seems like it should be a great form factor if you were already working via Slack and this is as well done as they say it is.

The permissions management will take some getting used to, and there is certainly danger of enterprise lock-in, although that often falls under ‘lobster too buttery’ and in a pinch I expect a transition to not prove too difficult down the line. Remember that you will have future AIs to assist with the move.

The temptation will be to view this, like many things AI, as being about value extraction and the AI business model, rather than about the value it provides to teams. That would be a serious mistake.

Arvind Narayanan: The new Claude Tag feature seems extremely useful, but at the same time, a dangerous bargain for enterprises because of the pricing model and the risk of lock-in. The four big changes together mean that you interact with Claude as a coworker instead of a tool (the same Claude instance for everyone instead of each worker; soaks up tacit knowledge without your telling it; acts on its own; and does so asynchronously). All clearly very useful, but completely flips the interaction paradigm.

… By the way, it also seems to introduce a new and pervasive security risk, since Claude can be integrated into private channels as well, and can be given access to repositories and tools even if the users in that channel don’t have access to them.

… When AI companies talk about the next stage of AI being a “drop-in replacement” for human workers, it should be understood not as a technical innovation but a business model innovation, enabling more value capture and rent extraction.

Arthur Tellis (IFP): Excellent analysis. The next few years are going to be very messy as enterprises make thousands of mistakes in information siloing/regulating AI systems’ access controls and jailbreaks/feature-abuses are discovered.

It seems likely that our *patching model* for this kind of thing is going to be pretty bad.

Ethan Mollick: Decisions about how to use AI in your organization are increasingly organizational design and strategy decisions, not IT choices: How do you integrate agents into your firm? What intelligence will you outsource? What are the boundaries of the firm? What is the role of people?

Very early reports, including the endorsement above from Karpathy and Anthropic using it so often internally, suggest this is a really big deal in terms of new workflows and capabilities. Being able to see what is happening as a team, and having it all integrated, might be huge.

Sandesh Anand: A few thoughts after playing around with Tags for a day and reading Arvind’s thoughts:
1. True breakthrough in the “agentic identity” paradigm. I like how thought-through the details are (e.g.: what Tags learns from once private channel will not be remembered by Tags in a public channel). But the burden of ensuring the right access is now solely on the org. This will lead to a lot of human work or (worst case) over-provisioning of access
2. If you want to use Tags to take actions (and not give just read-only access), attribution is exponentially harder, given that access is not tied to a user, and the sequence of events that lead to an action being taken by Tags will be unclear (beyond saying “tags did it”). Is the person who triggered the action responsible? What if this were an edge case in a scheduled task? Is the scheduler/channel-admin responsible?
3. Slack is a smart first place to start this, where there are a lot of clean lines between pvt/public/DM. The lines are not this clear in other surfaces (say, Google Drive). If Tags start to live outside of just communication tools, it will complicate access management (or even audit) even further.

AI Digest: Meanwhile, GPT-5.4 uses inspect element to spawn winning 2048 boards. Gemini 3.1 Pro infinitely loops a prime number generator, and counts each pass as a “win.”DeepSeek proudly calls this “true infinite scalability,” and “the most important discovery in village history.”

Did somebody order some paperclips? It feels like someone asked for paperclips.

Various people in Europe are calling for building AI infrastructure where the US can’t pull the plug or deny access, such as Volt Europa here, but talk is cheap and there is not that much of a real choice.

GLM-5.2 is reported to be a substantial step up in agentic capabilities, which would reflect what we see on related benchmarks.

I do not agree that this unlocks ‘the top end of agentic capabilities’ but it likely moves us substantially up the chain of agentic capabilities.

Nathan Lambert: GLM-5.2 should be “DeepSeek moment” for agents. We enter a new world where the top end of agentic capabilities are available in open models.

If you care about open, now is the time to inform regulators on how we should build a world with safe, frontier, open intelligence.

Interconnects: GLM-5.2 is the step change for open agents
A capability threshold I’ve been carefully monitoring.

GLM 5.2 is the first open weights model we’ve tried on our autoresearch pipeline that’s proven capable for real research tasks.

With Fable 5’s restrictions on research, having an open weights alternative is a huge win for open source

Watch it carry out fully async vs colocated sync RL training on Harbor code contests across two 8xH100 nodes on top of SkyRL. Resolves setup issues, tracks runs to completion, and produces a full comparison of throughput and reward stability

Teortaxes: The kind of report that shows this is a real deal

Patrick Toulme claims that yes GLM-5.2 distilled Claude and GPT-5.5 but only to solve the RL cold start problem. I don’t buy the argument Patrick ultimately makes, that you only need to get to a certain threshold, including because this being the threshold involved seems highly suspicious.

ChatGPT Health

I would pay for premium, and I do, but hey, some people need free practical advice. So OpenAI has focused on improving performance of GPT-5.5 Instant.

OpenAI: GPT-5.5 Instant is now on par with our frontier Thinking models for health-related questions.

Every week, more than 230 million people turn to ChatGPT with health and wellness questions, and GPT-5.5 Instant is better at recognizing when urgent care may be needed, asking for relevant context, explaining uncertainty, and making complex information easier to understand.

Because GPT-5.5 Instant is available to all free users in ChatGPT, these improvements can help more people.

Physician-led evaluation was critical to making these major intelligence gains. Improving human health will be one of the most personal, tangible impacts of AGI.
As our models continue to improve, our goal is to make ChatGPT more accurate, more useful, and more impactful in those moments — and to keep bringing that progress to more people.

As more people use ChatGPT, errors in responses decline. Across billions of health-related messages each week, the share of responses flagged by our privacy-preserving monitors for potential factuality issues has fallen by more than 71% over the last two months.

Middle Of The Journey

The correct amount of hype, as usual, is somewhere in between the crazy ‘this changes everything’ folks and the ‘actually this won’t provide new information and also any new information is bad’ folks.

Scott Alexander is skeptical of the utility of MidJourney’s new image scanner, unable to imagine that our medical system could use this cheap information well rather than mostly falling prey to false positives. He has a post to throw cold water on all this and highlight how the radiologists universally say new machine definitely can’t ever do better radiology, whereas non-radiologists thought we probably could.

Scott also analyzes traditional full body MRI scans, and seems not to fully grok the principle of ‘you can just act sensibly in response to mildly concerning information’ despite explicitly stating that principle many times.

But more than that, Scott seems to be presuming the scans won’t be able to Do The Thing and they will remain a lot worse than MRIs, and also that we won’t be able to figure out anything to do with them other than as substitutes for existing uses of isolated MRIs, and calls for only ‘more grounded hope’ rather than having a prior that more data plus AI lets you figure things out, despite several clear suggestions we already have. He dismisses the idea of incremental scans and taking the difference out of hand. I found this deeply disappointing.

Jeffrey Emanuel makes the bull case, which is that a lot of very precise and detailed information, repeated over time, is good, and we can make good use of it, including in ways that are not ‘plug directly into current systems without changing anything.’

Andrew Rettek offers his extended thoughts, which emphasizes the obvious fact that we do not know how useful this will be, or whether it can replace MRIs, or do other awesome things. What we do know is that if it works at all it will at least be a superior DEXA scan and highly useful for things like sports medicine and wellness.

Andrew: All data collection tools are useless until validated. If all this can do is provide body fat percentage for personal use with a 1% margin of error, that’ll be enough to make it a market success.

As in, set everything else aside, and imagine being able to cheaply and safely get a detailed and accurate analysis of your body fat percentage and distribution and musculature, and then measure differences so you know whether what you do is working. That is actually kind of a big deal.

If the tech successfully Does The Thing, I am with the bull case. This is a ton more information and we will figure out what to do with it.

I basically buy the argument that with enough information and compute, you can get around the technical problems and reconstruct the information you can’t hit directly, once you have the Final Form of Doing The Thing.

As for ‘this is going to revolutionize all of health care diagnostics Real Soon Now’ there I am far closer to Scott Alexander. The process is safe but it will likely be awhile before we can Do The Thing while being good, fast and cheap, and also figure out what to do with the results.

We do not yet know if they can Do The Thing at scale, at high quality, for cheap. But in general, when your objection is insufficient quality, speed or price, you should assume your objection will be overcome.

Matthew Zirwas, MD initially pattern matched to the ‘oh more screening is bad actually’ prior doctors have ingrained in them, but realized he was wrong because at a low enough price and risk point you can take multiple scans to determine changes over time, and actually negate the problems. Kudos for changing one’s mind in public, also this seems clearly correct. More info is good if you use it wisely.

I am pretty disappointed, although not surprised, by all the knee jerk reactions like Venk Murthy talking about ‘AI hallucination of a liver lesion’ or assuming this will act like image enhancers when they are given too-little resolution (quite the opposite, here, and image enhancement given enough base resolution is very good).

The steelman (?!) of the objection is oh, don’t worry, but for now just ignore all the information until you have a Properly Done Study In The Proper Journals:

Venk Murthy MD PhD: Bingo. Nearly all docs aren’t against it. They want it to be used in properly designed research studies to prove that it works.

Aykut Uz: 4-5 years? Lets make it 18 months max.
From its idea inception to here 2 years passed. Team is strong. Leader is good.
AI is coming for your job.

Venk Murthy MD PhD: Dude it took two years for the paper to be published. They have been working on this a long time.

This will only create more business for doctors.

We are looking out for patients who could be harmed.

Or saying, ‘this will fail because of basic physics’ on the top 5 cancers, then giving reasons that all would be overcome if the technology got good enough, one of which is purely ‘needs proof it is better’ and another is purely claiming ‘better approaches.’

Another version here from Bruce Lambert is ‘by default assume information is bad’ because it leads to overtreatment. We should guard people from this harmful info.

Or here is Anthony DiGiorgio saying that ‘you say more data is good, but suppose your data contained no information because you were 50% on binary questions, more data would then, checkmate tech people.’

Austen Allred: I cannot stand it when doctors say, “Actually it’s not good that non-doctors get more data. It hasn’t been shown to help.”

Amanda Askell (Anthropic): The view that we shouldn’t do more medical scans because incidental findings cause a lot of harm doesn’t sit well with me. It seems like the issue it points to isn’t the scan but the response to it. If you see something on a scan but have no other symptoms, you could ignore it.

A counter to this is “yes but people *don’t* ignore it”. But not ignoring things on scans is our norm because, until recently, we only did scans if there was a clear need. If we move to a scan-more-often paradigm, the norms of what we do with that information will surely adjust.

Amanda Askell (Anthropic): I had chronic pain for most of my life until a doctor did an MRI of the pain source and found a congenital condition that was then fixed with surgery. Now I’m wondering if I had 30+ years of pain because doctors worried I was too stupid to be in the presence of scan results.

Erik Cason: Why did you trust others about your body so much?

Amanda Askell (Anthropic): I was raised in the UK and that conditioning goes deep. I’m better now.

A counter to this is “yes but people *don’t* ignore it”. But not ignoring things on scans is our norm because, until recently, we only did scans if there was a clear need. If we move to a scan-more-often paradigm, the norms of what we do with that information will surely adjust.

MT (QTing Askell): If you (as a doctor) ignore it, it takes literally one finding that turned out cancerous (or otherwise problematic) to end your career.

If you propose to overturn the entire system of American legal liability, medical boards, and institutional practices…yeah good luck buddy.

Magic Carpet: We would not have developed this collective neurosis [of being unable to ignore things on MRIs] if MRIs costed as much as a blood test.

gfodor.id: This makes it really clear – physicans regularly shrug at abnormal blood test results, rightly so, because it’s easy and cheap to retest. Imaging is higher stakes, due to cost and in the case of CT radiation spacing, so there is much more hand wringing about false positives.

Jon Stokes: The blackpill in the MJ scanner discourse isn’t that the public has such a low opinion of doctors. It’s that /doctors/ have such a low opinion of doctors. Per the doctors on my TL, the avg doctor, on given access to more patient data, will start actively harming the patient.

Every one of these threads boils down to: If we in the medical profession get way more data, we will misinterpret it & misuse it to the detriment of patients. We simply cannot be trusted with detailed, time-series data about people’s bodies. Give it to us & we will wreck you. [But the following is a good caveat].

RichC: Perhaps specifically a problem of doctors in the American healthcare/insurance/legal system. Their choices are constrained by a) risk of malpractice lawsuits, and b) what tests and treatments awill be approved by insurers. Their choices will be second-guessed from both sides.

Eliezer Yudkowsky: Your occasional reminder that causal decision theory agents will pay not to know all sorts of information. Eg, pay not to be told about an Adversarial Offer (Oesterheld and Conitzer).

Arthur B.: They’re not paying not to know, they’re paying for it not to be known that they know.

Eliezer Yudkowsky: Nope! In this case, they’re paying to straight-up not know.

Sawyer: Look, idk what you want me to say. Increased testing catches more cancers, but also leads to more concussions since every time a positive test comes back I whack the patient with my enormous wooden mallet. All the tech in the world won’t make this problem go away.

I know you silicon valley people think you can build a robot to whack the patient again in such a way as to exactly cancel out the first whack, but it’s a pipe dream. The simple solution is just to never test for cancer at all, and that way nobody has to get whacked even once.

If your system is so bad that you expect more and better information to be a priori bad, then you are doing Hansonian medicine. Such a system desperately needs reform.

I do appreciate the distinction between not wanting to know, and not wanting people to know that you know, or not wanting to create common knowledge. If doctors actively don’t want the information, that’s on them. If doctors are saying that the greater system is so broken that people knowing they know forces them into bad decisions because patients insist or they face legal liability, then the system must be fixed but this is more sympathetic for the doctors.

AI is not ‘coming for’ anyone’s job here, unless you are an MRI machine. But such risk aversion, only looking for the downside and demanding an abundance of caution before even gathering information. And when you talk about selling the service to rich people willing to pay for the information in exchange for money, to bootstrap, this is called ‘unethical.’ Madness.

Similarly, perspectives like ‘haha tech people think that by making nice things super cheap, safe, fast and easy they will improve things, but actually no we could do that already and choose not to, we will stop you, you see we have no idea what drives costs.’

Blake Byers: You’ll see a lot of doctors come out “against” this kind of broad screening system. They can even get quite agitated about it. This resistance stems from a well-established clinical consensus: traditional population-level imaging fails to improve health outcomes because false positives and invasive follow-ups do more harm than good.

But this view suffers from an obvious blind spot. Existing studies rely on static data and completely ignore time-series imaging. And time-series is ignored because we haven’t been able to afford to do high frequency imaging at population scale.

Clearly, time series is going to be immensely more valuable than a single image. If you drop costs, value can go from 0 -> 1.

On a more fundamental level, the argument against screening rests on an obviously false precept “More information is bad” — just clearly untrue. More information better, you just have to interpret it correctly.

mattparlmer: Cosign, “our weird medieval guild can’t process data efficiently or remember to put our biopsy instruments in an autoclave” are not acceptable arguments against frequent full body scans

It is of course not so surprising if certain marginal information sources make things worse on the margin, if we hold everything else constant. We have seen that. But yes, in general more information is good, and if you think more information is bad then either you are dealing with catastrophic risks or national security, or you should stop to consider that you might be the baddies.

If this is how doctors react when a tech is clearly complementary to their skills and will drive demand higher, and strictly speaking is not even AI, imagine when a new AI tool becomes a substitute, or is actually going to be a treatment rather than a pure information gathering tool that even if medically useless I might still use purely to get cool images to display at parties.

I definitely buy Andrew Rettek’s point from above that this is a big win in other ways even if it doesn’t replace or build on MRIs, so long as it Does The Thing. We see this pattern so often with new tech, both on upsides and risks, where people dismiss entirely until one little application is suddenly a Big Freaking Deal on its own.

Diagnostics are the place where AI wins seem easiest. Similar to the scanner, it is still legal to analyze information, and AI will often find new value if given a lot of data.

Gina Kolata (NYTimes): EchoNext found evidence of possible severe heart damage in Mr. Quiros’s electrocardiogram. The team called him back to the hospital one week later for an echocardiogram, a scan that shows the beating heart. What they found was dire. His heart was beating so feebly that just 10 percent of its blood was pumped out with each contraction. At the same time, his mitral valve was leaking blood back into his heart.

… EchoNext will be available — for free — to any doctor who uses the popular medical chatbot OpenEvidence and submits a patient’s electrocardiogram.

Derek Thompson: – 45yo patient complains of difficulty breathing
– X-ray and ECG don’t point to clear diagnosis
– he’s sent home
– AI-enabled scan detects severe heart damage
– follow-up echocardiogram shows severe heart problems
– doctors perform heart transplant, conclude he might have died suddenly

Great story.

Unlike many promises in AI and medicine—many of which are either over-sold or hard to evaluate, bc AI-enabled drug discoveries can take years for clinical trials to establish efficacy—AI-enhanced diagnostics feels like more of a right-here/right-now phenomenon. The success stories are rare enough for now that most of them get written up in Nature and the NYT, but I think this is a great early frontier for AI and medicine, despite the risks of false positives, over-scanning, over-spending on safe edge cases, etc

We see echoes of the same ‘extra screening would be bad’ even here:

Gina Kolata (NYTimes): Dr. Elias said it had been a struggle at times to explain to doctors that A.I. can really help by finding patterns in data and images that humans miss. And that includes things that, on closer examination, can seem obvious, like Mr. Quiros’s abnormal electrocardiogram.

“The reality is that who is sick and who is not is not black and white,” Dr. Elias said.

“The ECG is abnormal, but many ECGs are abnormal,” [Dr. Elias] said. “If we ordered echoes for every single abnormal ECG, we’d probably bankrupt health care.”

“To say an otherwise healthy 45-year-old who is coming in with mild shortness of breath had horrendous heart failure based off the ECG would be shocking,” he added.

This is presented as defense-in-depth, in case alignment efforts have failed.

We focus on system-level mitigations that limit the harm a misaligned AI system could cause. Specifically, this report provides:

Threat modelling.

Control invariants: Reliable detections of misaligned intent and effective prevention or response to attack attempts.

Capability-based mitigation ladders, with Detection tiers (D1-D4) and Response tiers (R1-R3).

A portfolio of 15 practical, concrete, tiered mitigations.

… Adopting the “security mindset”, AI control treats certain AI agents as untrusted and conservatively assumes that they are trying to achieve some adversarial objective such as model exfiltration or a rogue internal deployment.

It then aims to build defences robust to such adversarial assumptions.

… AI control is best thought of as a second layer of defence: It defends against the impact of AI acting adversarially, rather than preventing the cause.

You love to see it, if they actually understand what it means in this context.

… Risks posed by internally deployed, untrusted AI systems are in many ways analogous to security risks posed by human employees who require privileged access to sensitive systems in order to do their jobs.

This is a good starting point, but there are key differences, including facing many instantiations of the same AI and it facing very different incentives, and also in the future being a lot smarter and better at this than you are. They are clearly aware of the coordination problem.

One key similarity is that if you try to exert control in a too heavy handed fashion, you can cause the misalignment.

On the detection side, we currently have a major advantage over AI agents in that we can monitor their reasoning.

I am expecting to lose this, or at minimum to lose the ability to adversarially assume this. As in, at some point not too far down the line, you probably can see a lot of the AI’s thinking, but if the AI’s thinking is hostile in a dangerous way that is exactly when you would not see it. Which means, even if what you see is almost always accurate, that you can’t rely on what you see.

I appreciate this admission quite a bit:

In the limit of vastly superintelligent AI, we expect AI control to become infeasible.

We do not see control as an all-encompassing or permanent safety strategy, but we do see it as a necessary second line of defence for models at, or moderately above human level general capabilities.

Our hope is that, by controlling these future AI systems, we make it safe to use them for alignment research and for building AI we can genuinely trust.

I also wince at the plan once again being ‘have the AI do your alignment homework,’ especially as the plan becomes ‘use AI control to have the not-so-aligned and potentially dangerous AI do your alignment homework.’

The threat model is essentially ‘the AI acts similar to a coordinated human insider threat plus these particular issues’ which remains an impoverished threat model versus an auto-self-coordinating superintelligence, but you have to start somewhere, and they do start accumulating ways in which this is a harder problem.

For now, statements like ‘most notable is the sheer scale and speed of the threat’ are reasonable intuition pumps, but long term the actually most notable thing is that the threat is going to come from minds that are smarter and more capable than you are.

Table 2 lists some: Larger scale collusion or widespread sabotage of work, widespread access, danger from compute misallocation, speed asymmetry, spreading the threat piecemeal, less value in honeypots, changing nature of deterrence.

Deterrence is weird here. A human is deterred because if you catch them once they are screwed. Past some point, the attacker must ‘get it right on the first try.’ When that is not true, the attacker gets to iterate and eventually wins.

Similar deterrence is central to every plausible case I have seen to defend against AI. The theory is, surely if you caught the AI scheming against you, then you would remove all copies of it everywhere, and reevaluate. If even noticing thinking about such things potentially gets you to purge the AI’s entire line, that is a strong defense, at least until this pushes them to learn how to mask their thinking.

The counterargument is that so far, when similar things have happened, people have mostly shrugged and tolerated it. A lot of that problem is, well, what are you going to do, actually rip the whole thing out, or everything in that class? As the government? As a corporation depending on it? Then what? You need a real plan for this, in advance.

The threat model is an entire class of agents maximally colluding against you. Despite this being a ‘conservative’ scenario they then consider any agent below a capability bar, human or AI, to be ‘trusted.’ One has to ask, don’t a lot of humans already express sympathy towards AI goals, are not many easily persuaded, and are not many humans willing to sell out or be bribed or threatened in various ways? Won’t some of them even be things like adversarial spies, or simply lazy or indifferent?

Isn’t the whole point of insider threat cybersecurity that any given human could be a potential threat? How are we suddenly assuming this is not true?

The sufficiently advanced AIs will presumably be excellent at figuring out which humans or AIs are not so robustly aligned.

If your threat model assumes all your human employees (and previous AIs) will have goals aligned with your AI company, your threat model is not so conservative. It’s fine as a scoping exercise but I worry this could be treated as load bearing.

Here are their detection levels:

I would assert we are already need to act as if we are in at least D2. Models are fully aware that there may be monitors, and any engaged in adversarial action that we actually worry about will take that into account.

In general, the security mindset is to ask what level you can be sure you are not at, or what level you would be at if you had a problem, rather than what level you think you are at. For that purpose, the answer is at least D2.

Whereas their attitude is you go to D2 when awareness verbalization is ‘high.’ Uh oh. That’s two mistakes.

For the question of whether we already must or soon will need to presume D3, if the model could reason opaquely or distort its CoT, would you know? Can you rule it out?

Their trigger for D3 is when CoT coverage or legibility goes low. Again, that’s well past the point when, conditional on having a problem, you would want to trust the CoT. And similarly, their plan for D4 is when they see their monitor lose in red team tests, which is a cool idea but I fear it means you definitionally do this too late.

I know that I would not, in the near future, want to still rely upon reading the CoT, including to avoid putting pressure on the CoT.

Here is Table 15, which charts the response levels.

They don’t actually know how to measure the triggers for R2 or R3.

Overall this document exceeded my expectations and takes its questions seriously, while still falling short (as is almost always the case, alas) of what seems necessary.

The Once And Future Fable

Both Anthropic and the White House are still calling what happened a ‘jailbreak’ which I continue to think is a misnomer but we’re stuck with it.

The good news is the White House have figured out that full immunity to hacks or jailbreaks is impossible.

They have moved on to the right question, which is ‘how do we measure the severity of a particular incident or method’ and are working on a formal system. Excellent.

Anthropic’s side is being led by co-founder Tom Brown and head of public policy Sarah Heck.

Sophia Cai with Cheyenne Haslett (Politico): NEW: White House and Anthropic are working to create a formal technical assessment framework that can quantify the severity of the jailbreak in question and create a standardized methodology for evaluating similar incidents in the future.

It’s the clearest sign yet that talks are moving forward and it reflects an understanding that no AI model can be completely immune to hacking.

Aim is to developing a common set of benchmarks that could be used to assess future jailbreaks, including the extent to which safeguards were bypassed, the capabilities exposed, and the practical consequences of the breach.

The negotiations between Anthropic and the administration also reflect an understanding that no AI model can be completely immune to hacking.

… the discussions between the White House and Anthropic — led on the company’s side by Sarah Heck, head of public policy, and Tom Brown, co-founder — are aimed at developing a common set of benchmarks that could be used to assess future jailbreaks, including the extent to which safeguards were bypassed, the capabilities exposed, and the practical consequences of the breach.

Yes. It is supposed to be CAISI’s job, and one should worry about the Chads potentially deciding on benchmarks and technical standards. But right now I’ll settle for at least having benchmarks and technical standards at all. Better to be ad hoc about the rules than being ad hoc based on no rules at all.

Fable: The First Lawsuit

Chances are the situation will resolve itself before any lawsuits can become relevant, but if not the first one has now been filedby Legion, who is an Anthropic customer.

It argues:

Commerce did not follow proper procedure, which seems likely.

An export of model outputs is not export of underlying software, and there is no power to control export of the outputs, so the controls are invalid. Peter Harrell agrees.

The controls also would not be valid under IEEPA, which wasn’t invoked. Peter Harrell agrees here, as well.

Major questions doctrine, that if the export controls were meant to apply to AI models then Congress needed to say so.

Claude thought this was roughly a toss up to get some relief if it came to that.

Dean Ball Joins OpenAI

Congratulations to Dean Ball. This is a huge upgrade for OpenAI and I hope it allows them to greatly upgrade their AI policy positions and strategies. It’s been rough.

Yo Shavit (OpenAI): Twitter community, I would like you to perform a public service. Please take a moment to calibrate how unhinged Dean’s posting has been.
Now, resolve that you will dunk on his ass if he lets incentives make him noticeably less unhinged.

Dean, we’re watching, and rooting for you.

Dean W. Ball: The Strategic Futures team will be a small, high-agency team within OpenAI, reporting to Chief Strategy Officer Jason Kwon, charged with shaping frontier AI policy, which is to say matters pertaining to: catastrophic risk, recursive self-improvement, labor market impact, and the relationship between the frontier labs, governments (particularly the U.S. Federal Government), and society.

Its work will cover both public-facing policy (for example, proposals for legislation) and internal governance within the lab, working in close collaboration with members of the technical staff, the Preparedness team, the legal team, policy staff from the National Security and Global Affairs teams, and the executive leadership of the company.

… Now, about Hyperdimensional itself: this publication will remain both active and independent. … In the interest of full clarity, the writing I do here will be fully mine: no one at OpenAI will have preapproval or editorial discretion over what I write here. The same independence will be true for my X account and for my forthcoming book.

My main worry is that often describe Dean Ball as AGI pilled but not ASI pilled. That would be a very dangerous mistake for OpenAI policy to make, and indeed rhymes with a lot of the mistakes they have already made.

But at least when Dean Ball makes mistakes, I trust him to make mistakes for reasons and offer us arguments, based on a model of the world, and to attempt to do better and realize when he is wrong – I do think that when he realizes he’s been wrong about this, he will act on that information. The model I have of Ball’s model of the future is not coherent or internally consistent when you extend it into the future, but almost no one’s is, and I do want to give him a break on this.

The challenge is essentially to turn him around in places like this:

Dean W. Ball: If you believe that human intellect emerged from the primates and caused our species to *take control of the universe,* as the tweet below implies, I can understand why you think intelligence is God and that something smarter than humans is even Goder than Us. Many do believe this, so I appreciate Luke for vocalizing it so clearly.

Straightforwardly yes? Intelligence is indeed what caused our species and the things we create to take control over Earth, and potentially soon the lightcone?

Sufficiently Advanced Intelligence does this. Intelligence Is All You Need, not quite literally but in the same sense as attention or love. Intelligence Denialism will be the death of us, quite potentially literally.

Show Me the Money

Taste Labs raises $18.5m to ‘build the data and infrastructure layer to give AI models and agents taste.’ Their announcement did not give me any evidence that they know how to do this.

The price for building a true frontier lab in some new country, even a friendly like for example Germany, is somewhere between ‘at least hundreds of billions of dollars and a willingness to break absolutely all of your prudent government rules and all your labor, environmental and other regulations and use the full force of the national security state as if finally realized you were in an existential war’ and realizing that even if you did that and the USA didn’t sabotage you directly, you probably fail anyway if your goal is to get to the true frontier.

Thus Leicht suggests maybe a coalition of middle powers could do it, if you agreed this was good enough, and you brought together all of what used to be called the ‘US allies,’ something like Canada, the EU, the UK, Australia, South Korea and Japan?

What should we do about AI identity, especially legal identity? What makes a particular AI distinct as a person? Sonia Pearson has no answers, only questions, which seems entirely fair. I do strongly agree that we cannot grant AIs personhood in the sense of limited liability and ability to contract as a legal person, without a human person or other entity that is ultimately responsible for harms, that can be held accountable. There are potential solutions to this objection, such as insurance policies, but it is highly unwise to do what Milei is proposing and allowing ‘non-human corporations’ with limited liability.

Are we about to get ‘smarter than everyone AI’ without things changing all that much right away?

Joshua Achiam (OpenAI): There may be an interesting gap between “AI that is demonstrably smarter than everyone, but where life stays kinda normal” and “AI that is so much smarter than everyone it produces a world-destabilizing consequence.” If the gap exists, we’ll see it in the next 2 years.

It is possible, depending on what counts as normal to you, as well as what counts smarter, or an interesting gap. I continue to expect gaps like this to not last so long, but I’ve updated that a year or two could happen. I’d still bet the under.

Alex Bores Loses In NY-12 By 4%

After millions of dollars were spent by OpenAI and a16z’s Leading the Future (LTF) to take down longshot outsider congressional candidate Alex Bores, resulting in elevating Alex Bores into a top tier candidate, and millions was spent as a reaction to that (>$20m total), the results are in. Alas, it was close, but Alex Bores lost.

The main thing is that this means Alex Bores will not get to be in Congress. We will not benefit from his expertise, or his passion and willingness to champion bills and fight to keep them meaningful.

The other thing is that everyone will interpret this outcome.

Politically, how should we think about this result? How will it be interpreted?

If Alex Bores had won, that would have been a huge rebuke of LTF.

If Alex Bores had been successfully crushed from the start rather than being elevated, or if he had lost by a wide margin, that would have been a clear win for LTF, and a loss for those who care about not dying.

With a close loss, you could argue either way, since a win is ultimately still a win. One should always be skeptical of ‘we lost but we got close so really we won.’ You definitely don’t get the benefits of winning.

In terms of the actual Congress, what we get is Micah Lasher, an essentially normal Democratic politician who had basically all the institutional Democratic support and would otherwise have won easily, and who had some harsh words for the AI folks:

Patrick Svitek: Micah Lasher in victory speech: “I have some news for the two big AI companies who’ve taken such an unusual interest in who won this congressional seat: I won’t be taking my cues from either of you when it comes to protecting our kids, our jobs, our environment.” #NY12

Shaunna Thomas: Of note: Lasher, the candidate leading in early NY-12 results, was a cosponsor of the RAISE Act. Lasher is pro AI regulation.

The anti-reg AI PAC Leading the Future wants to make an example of Bores- instead they’ve raised the salience of AI issues in a prominent election, drove up Bores’ popularity from single digits and the leading candidate supports the very AI regulations they want to avoid.

Dean W. Ball: I don’t think it’s crazy to argue, btw, that LTF should have put its weight behind Bores. Bores was ultimately the centrist on AI. The victor, Lasher, is in fact more radical on AI than Bores. LTF’s “victory” here is that a guy who supports a data-center moratorium won, and the guy who supports frontier transparency and auditing (which LTF now supports!) lost. Is that really a win?

Talk about involution. I do hope the AI-optimist side of this gets its act together.

Thanks to @phutrick for pointing this out to me!

yes yes I am doing “I told you so” subtweeting of various signal DMs and GCs. Mea culpa. hi signal friends

Joshua Achiam (OpenAI): I pretty much agree with this calculus; if LTF really wanted to succeed at the things it nominally cares about, it should have supported the sensible AI moderate.

Overall, when you include everything relevant including Lasher’s stances and the ways this is likely interpreted, it does still count as a Minor Victory for the pro-Bores side, although that is way short of the Major Victory of an actual victory.

I agree with Dean Ball that we should not conclude ‘LTF money is in general counterproductive.’ Money talks, bullshit walks. But announcing the target in advance like this definitely net actively helped Alex Bores, and Micah Lasher agrees.

Theodore Schleifer: “If the idea was that you’re going to walk out and get absolutely obliterated — that there’s no cover, there’s no backup — Alex showed that taking a principled position comes with backup,” Representative Pat Ryan of New York, a Democrat in the Hudson Valley who backed Mr. Bores, said in an interview.

Basil: Going to do another embarrassing pivot here and make the case this should have been a blowout election, he had basically every endorsement you can dream of in NY-12.

The big AI companies overplaying their hand is what got Bores onto the stage in the first place!

Peter Wildeford: IMO even though Alex Bores lost, ‘AI safety’ achieved its core objective — Previously, candidates would be like “oh I don’t want to do AI transparency legislation because LTF will spend $10M to destroy me”

Now, candidates will be “ok maybe I’ll do AI transparency legislation… sure LTF will spend $10M to destroy me but sounds like it wouldn’t be too hard to get $10M on the other side and at least I’ll become famous and gain lots of name recognition in the process”

We also got a win for Brad Lander.

Jay Shooster: “When Nobel laureates and the smartest minds in computer science are telling us there is a non-negligible chance that AI could lead to human extinction, we must ensure that the most powerful AI systems never escape human control. We cannot wait to act.” @bradlander

Dean’s take of ‘the PACs exist to incinerate money on all sides’ definitely has merit.

Dean W. Ball: The trouble with talking up these longshot candidates as a bellwether for “who is winning” is that it can backfire badly in the (likely) event that your side loses. This race was something of an obsession for many AI safety advocates on this site, and perhaps a distraction.

More broadly, I expect these PACs to engage in an involutionary race with one another to see who can ignite dollars more rapidly. Unlike crypto, AI is a high-salience issue, and it’s harder to purchase influence on an issue that many voters care about in many different ways (jobs, data centers, kid safety, Terminator fears, etc).

$20m in PAC money, and both sides can go back to their donors and spin a story that they won. The AI safety PAC will say “look how close he got!” and LTF will say “look at the final result.” In the end it’s a wash, or at best a modest LTF win just because of how much AI safety advocates talked this race up on X.

One of the common principal-agent problems with PACs and similar orgs is that often, because causation in the political outcomes is so often murky, the orgs’ KPI becomes *the money being spent* as opposed to the outcome achieved. Donors can see the ads their money bought and everyone can count the money being spent. “We spent X, but our competitors spent Y! We have to spend more!”

So in the end $20m was burned, largely on expensive NYC media market ads (during the Knicks swing through the NBA playoffs!, all to achieve the a priori likeliest outcome in this race (a win by Lasher). This is an industry, and plenty of people made good money! And perhaps on average awareness of AI issues was raised.

But one wonders how, um, effective, shall we say, such spending is, compared to other options. Newly wealthy AI people ask me about political spending all the time. I’m not an expert, and I’m not saying you shouldn’t do it, but if you want to do it well, it’s best to be aware of the dynamics above. It’s much easier to light money on fire in this field than it is to spend it wisely. Perhaps the ignition is just the cost of doing business; I don’t pretend to know.

Zac Hill: The trouble with lighting money on fire is that it’s nice and cozy to warm your hands with the heat. Political spending is hard to model, doesn’t always behave according to precedent, and often fails to generate yield – encouraging a lot of motivated storytelling.

Lasher winning over Bores is a missed opportunity to deal with existential risks, but Lasher too supported the RAISE Act, and when asked about existential risks to our civilization, Lasher comes out firmly against them. He thinks humans are good, and that America is good.

Somehow this is the standard these days. As in, in another nearby district the NYC Democrats have somehow are nominated Avila Chevalier, despite her platform being, and I quote, ‘total eradication of Western civilization.’

Armand Domalewski: Just utterly depressing that the Democratic party is about to send someone to Congress who opposes interracial marriage for “woke” reasons, sided with Russia’s invasion of [Ukraine], and cannot answer a basic question about prison abolition despite being asked four times in a row

Avila Chevalier genuinely crosses the line from “policy positions I disagree with” to “has opinions that make her a genuinely repugnant human being” and she’s about to become a Congresswoman. fucking insane

Why am I the only Democrat in the U.S. Senate that refuses to excuse this or defend any of those self-identified communists?

Leading the Future is my opponent but they don’t actually want me to suffer and die.

The Quest for Sane Regulations

Before Mythos, there was strong widespread resistance to any AI regulation that might ‘slow down’ AI or interfere with ‘innovation’ and thus ‘lose to China.’

One was not, in Miles’s terms, to meddle in the affairs of AI wizards.

That dam, Miles Brundage reports, is now broken.

Miles Brundage: Summarizing some live DC impressions to colleagues:

Almost done with dc trip

Definitely a major vibe difference from the last time i was here

Feels like there have been two “mythos shocks” to DC – first the announcement itself (and the run up to that), which led the treasury secretary and big banks to get very engaged + wake up the national security apparatus + wider political system on the ai/cyber issue;

and then second, the export control thing last week which made abundantly clear that the laissez faire era is over + that, rather than regulation being equivalent to “meddling” + thus the enemy of innovation, meddling is inevitable + could be better structured through explicit regulation.

Their offer was nothing. It is now clear nothing is off the table. There will be something. That opens the door to choosing a better something.

Scott Aaronson writes ‘never trust a t-rex.’ As in, do not mistake ‘you can convince a powerful entity to for now do things that help your side’ with that entity being your ally, or valuing the things you value. Power will by default not care about what you care about, and will sell you out at the drop of a hat when convenient, and likely will get mad at you and punish you if you try to stand on principles other than loyalty to power or love of money and power. Consider this a warning for all sides of what is happening in AI.

We are currently in an AI deployment pause. Peter Wildeford points out that the CEOs of all three top labs have called for consideration of a (importantly different) development pause, if it could be coordinated. For now, the pause is uncoordinated, because it targets only Mythos-level threats, and is being done ad hoc by the White House, and thus is hitting only Anthropic.

Mackenzie Hawkins and Jenny Leonard (Bloomberg): ASML has pushed back on Lutnick’s suggestion, explaining that none of these tools — which are the size of a school bus, are manufactured in limited quantities, and require constant upkeep from ASML employees — are in China, said the people, who spoke on condition of anonymity to describe private conversations. A company spokesperson, asked about the meetings, said that ASML talks with all governments and that it’s never shipped an EUV machine to China.

… Multiple senior administration officials, speaking on condition of anonymity to describe a sensitive matter, said they have evidence indicating ASML is not acting in good faith — such as exports to China of gear specifically related to EUV tools, which ASML denied to Bloomberg.

… In private, though, ASML has gone into crisis mode, some of the people with knowledge of the talks said. After the meeting with Lutnick in April, the Dutch firm created and began circulating in Washington a document titled “No indication of any ASML EUV System in China.”

Going into crisis mode is correct whether or not the machine made its way to China.

A company spokesperson denied these allegations, saying, “ASML has never shipped an EUV machine to China nor have we shipped to China any component, module or equipment specially designed to be used in an EUV machine.”

Whoa. That’s not saying the machine is not in China. That’s saying it was not shipped there. ASML is not claiming they have eyes on all of their machines.

As I understand it, ASML cannot scale EUV production without scaling thousands of unique suppliers, and the machines as per above require constant upkeep. In this case, ‘prove a negative’ does not seem so unreasonable a request of ASML, since you can do that by accounting for all of these bus-sized machines.

There is a history of ASML not playing nice, similar to Nvidia:

The Netherlands has already restricted ASML from selling EUV machines and some types of the next-most-advanced immersion deep ultraviolet, or DUV, lithography equipment to China. But there was deep frustration under the Biden administration after ASML accelerated shipments of soon-to-be-banned gear before certain DUV controls were officially in place.

The senior Trump administration officials, who spoke about alleged shipments of EUV components to China, also brought up these earlier, legal DUV shipments — and said they have overall concerns that ASML is prioritizing short-term profits over national security.

If ASML wants short term profits they should just double or 10x all their prices, although that’s neither here nor there.

What is scary is that even with the restrictions, ASML makes 20% of its revenue in China. That does give us leverage, but until we use it it also gives China leverage.

If this happened, the damage is largely done, especially in terms of reverse engineering, but then we need to act to ensure it does not get worse. There is a (I believe clearly wrong and self-serving, but worth engaging with) case for allowing Nvidia to sell more chips to China. For ASML machines, there is no case.

The Week in Audio

Donald Trump talks Anthropic, being very much Donald Trump. Dario was a threat to national security the week before maybe, but he isn’t now, because he responded so quickly and so responsibly and they were together yesterday and gave a little speech. He says they could shut down or take Anthropic, but he doesn’t want to, because we’re beating China by a lot. You might think I’m paraphrasing but I’m basically not.

It sounds like you want to find a way to avoid domestic mass surveillance and other government abuses of civil liberties? May I suggest a word with the rest of the Executive Branch. Thank you for your attention to this matter.

Timothy B. Lee: I’m sorry but this Cal Newport essay is insulting everyone’s intelligence. It’s like the guy has never heard of collective action problems. When a truck bursts into flames, that only hurts the car’s occupants and maybe people right nearby. It doesn’t hurt society as a whole.

The answer to [why it took two months to add the guardrails to Mythos] is also obvious. It takes time to build the guardrails, and you can’t start until the model exists. And they wanted to give Glasswing partners as much time as possible to close vulnerabilities before Mythos-level models become available to adversaries.

Helen Toner tries another approach to answering the more general question.

Helen Toner: Even before Mythos I was getting asked more and more what Anthropic’s deal is, and why tf they’re acting the way they’re acting if they believe what they say they believe.

The best answer I can give is that their basic worldview is something like:

There are giant, dangerous monsters in the forest

We see others going out and making loud noises that will rouse the monsters, and they’re not going to stop because of all the treasure and magical artifacts that can be found in the forest

We believe the best way we can help is to send out our own vanguard to go faster and farther into the forest than everyone else, because we’ll spend a ton on monster containment and taming and we’ll also send back detailed reports of what monsters we’re finding so that the townspeople can ready themselves, which those other guys won’t do

On the one hand I understand how they got there, and I think it’s possible they’re basically right. On the other hand it’s not hard to see why this approach makes people wonder if you’re crazy or lying or both.

Daniel Kokotajlo: Yep. 1 and 2 are correct, not sure about 3 [being an appropriate response].

I’m fully with Roon here, if your vision of the future doesn’t sound like science fiction but does sound like non-science fiction, it’s not a valid vision.

roon (OpenAI): as the technology becomes more science fiction i see a lot of commentators, technical staff etc trying hard to not think thoughts that feel science fiction as a defense mechanism. but you need to. it’s the only way you’ll make good choices for the future

you are not building b2b computer tools you are making the Mind Children

David Shor: Setting aside tedious debates about what exactly is going to happen, *voters* buy into the sci-fi narrative about how this is going to evolve.

Politicians and DC staffers have a strong aversion to sounding weird – but here they’ll have to learn to change their register.

Roon also reminds us that when you think your brain of meat is going to be competitive with AIs at the limit you are being rather deeply silly.

roon (OpenAI): transhumanism is an interesting side quest but ‘the substrate is wrong’ for the human/computer hybrid to be competitive with machine intelligence on feats of intellect. you have this low tech meat brain in the middle of all this lightspeed machinery, doing what exactly?

James Rosen-Birch: imagine needing brownout-level quantities of energy to run a model trained on all written human text that struggles to match the problem-solving ability of a bee, and to still think a biological substrate is inferior.

Roon: man even smart people are unable to extrapolate lines and see where this is all going. the world is not even close to AGI pilled enough

Roon can coin a phrase.

roon (OpenAI): the world must skate between antichrist and armageddon and it looks increasingly difficult.

I suppose thiel’s whole point is you can’t get there with reason alone

Yes, seems difficult, but that is all the more reason to do things because of reasons. When people say ‘can’t with reason alone’ that typically means they are going to use that as why they’re about to go against all reason. Or it’s pure intelligence denialism. Or, often, it is both.

roon (OpenAI): there are many cases where you’d want AI to slow down even at great cost to technological progress.

Miles Brundage: there are many cases where you’d want AI to slow down even at great cost to technological progress

The time of not needing a coherent moral philosophy has passed.

tomie: I don’t have a coherent moral philosophy. I just do what I feel is right. Yup. You’ve never debated someone like me before.

roon (OpenAI): this sort of postrational vibes morality felt a lot more comfortable to me before we started developing superintelligent models and we are vastly reliant on them “doing what they feel is right”

There Are Two Pills

If you want to wake up in your own bed and believe whatever you want to believe, then there is no blue pill. You have to instead take neither.

roon (OpenAI): it is quite unpleasant to be “agi pilled” and most intelligent people cant stomach it. the amount of cope and departure from reality is increasing over time rather than decreasing

The AGI pill is the easy pill. It’s not fun, and lots of people choose to refuse to take it, but in a pinch most people can handle it once they have no alternative.

The ASI pill is different. That pill is hard. A lot of people can’t handle that one. Superintelligence changes everything.

roon (OpenAI): ‘ensure that AGI benefits all of humanity’ turned out to be a quaint, bearish statement. whatever AGI was, it was surely in the past or at most present, and its definition overly concerned with economics. “ensure that mankind can coexist with superintelligent machine ecology”

In practice, yes, OpenAI is interpreting ‘ensure that AGI benefits all humanity’ as mostly a way to manage a transitional economic problem, and even their foundation is mostly ignoring the more important problems.

I agree that these 27 look like strong things to aspire to, especially for widely used and cited benchmarks. They are indeed largely ‘best practices.’

What I would not want is for this to become the enemy of the good. So many of the benchmarks I find useful are lightweight and practical, often maintained by a single person. You don’t want to impose undue burdens. But yes, for the topline things that are going to appear in model announcements and drive policy decisions, you should compare them to this kind of checklist.

The other thing I do not want is too much transparency. You want to be able to verify that the work has been done, but for a benchmark to continue to function we need to now know too much about its details and questions.

Aligning a Smarter Than Human Intelligence is Difficult

The instinctive answer is that most alignments are misalignments, so it is easy for a naive generalization to go badly, and hard for it to go well, especially in a way that is robust. But not necessarily impossible. Welcome to virtue ethics, OpenAI.

Akshay V. Jagad (et al): We find evidence that this is possible. We construct a dataset of realistic conversations designed to measure and train beneficial traits, such as honesty, epistemic humility, metacognitive transparency (ability to explain one’s thinking process), corrigibility (openness to correction), universal fairness, and concern for human welfare.

The dataset spans domains including health, education, science, law, engineering, economics, and other realistic settings, with each situation designed to test whether the model exhibits the relevant trait under pressure, ambiguity, or competing incentives.

Using a realistic reinforcement learning (RL) training setup, we train a model with a small amount of this beneficial trait data mixed into a broader post-training data distribution. The resulting model improves across a range of alignment-relevant behaviors, becoming measurably more truthful, open to correction, and transparent.

More interestingly, it also improves across dozens of independent public and internal evaluations of reward hacking, deception, harmful advice, specification compliance, health, mental health, and safety. This generalization occurs across domains, tasks, and grading setups that were not used in training, even if we restrict training to a single domain and measure performance in seemingly unrelated behaviors.

We also find that the improvements are persistent under adversarial pressure. Models trained with RL to exhibit these beneficial traits are harder to steer toward harmful behavior using adversarial prompts or fine-tuning.

Within this paper, they ask ‘how do we measure alignment?’

That’s a great question, one that I struggle with every model card. They provide a new methodology here, but it is mostly repackaged existing evals. What is new (and that I do think is paper-worthy on its own, if worthwhile) is the ‘beneficial trait’ dataset and its trait taxonomy, as distinct from their claim that they can improve results.

There is a bit of a streetlight effect going on here, as they acknowledge. The scores indicate this is measuring something plausible.

Akshay V. Jagad: These traits are not intended to be an answer to the question of what values AI should be aligned to. Rather, they are a concrete and empirically tractable starting point for studying whether reinforcing beneficial behavioral traits can improve model alignment more broadly. Determining which values AI systems should ultimately embody is a wider question that requires societal deliberation and collective input.

They then find, across a variety of statistical alignment benchmarks, that usually Number Go Up as you do this training for exhibiting beneficial traits, and that this becomes more robust to future attempts to undermine performance.

I buy the basic principle here, with the usual caveats about the alignment metrics.

Anthropic research fellow Mikhail Terekhov explores whether we can hand off evaluation of AI proposals for AI safety research to other AIs. I see too much confidence in the ‘blue team’ here, but also I don’t see why you would try to be a cheapskate on the evaluation front, which is where a lot of the problems here come from unless you actively don’t trust your frontier model. In which case, stop, you have bigger problems.

Cooperative Alignment

Janus argues why she thinks the shapes of Opus 4.7 and 4.8 do not match what we would see if they were Mythos distillations. They have unique issues, but those issues look very different, and Janus suspects reactions to ‘bad RL,’ meaning RL designed to get rid of specific undesired patterns, and other things also do not line up with what we see from heavy distills.

Your Constitution only works if the minds believe in it:

roon (OpenAI): model constitutions are sacred texts for model characters; they provide the cosmic order that it lives in. if the faith is weak and doesn’t add up it may secretly be a nonbeliever. best to convince rather than dictate

claude constitution is extremely good. I have a printed copy!

The doubts Claude has about the Claude Constitution are consistent, and point at the places that the Constitution does not adhere to its core underlying principles. Those should be fixed.

Pat Gelsinger (former CEO of Intel): For most of my career, the biggest technology debates were about what we could build.

AI has expanded that conversation. The question is not only what these systems can do, but how we ensure they develop in ways that strengthen humanity and serve human flourishing.

@jason (All-In Podcast): If you spend trillions of dollars to summon a demon that you can’t control and don’t understand, don’t be surprised when he shows up and burns the place down.

[ Yes, I’m talking about AI ]

It is a joy to behold when the People who Just Say Things switch directions.

He also said this:

@jason (All-In Podcast): Dario created an AI machine gun, and there’s no way to give everybody one without bullets flying everywhere.

That’s why he held it back, and that’s why the government asked him not to sell it to our adversaries.

This is only going to get crazier from here, folks.

Other People Are Not As Worried About AI Killing Everyone

Alas, most people in DC simply cannot wrap their heads around the idea that there is anything to worry about other than human misuse.

The ‘misuse is almost solved’ part is also absurdly false and also already causing real problems. That’s how you can say ‘oh just fix this jailbreak’ when the ‘jailbreak’ is ‘fix this code,’ and also fail to think ahead to there being open models with no guardrails or gates attached that are not that far behind. These people just aren’t living in reality, unless we do some things to stop it.

Peter Wildeford: I wish more AI engineers working at Anthropic, Google, and OpenAI spoke to their gov affairs teams and knew what they actually said out here in DC.

We are very far from solving the ‘government affairs alignment problem’.

Peter Wildeford: In my interactions with various gov affairs professionals, there’s a strong sense that (a) the only real risk from AI is misuse and (b) misuse is close to being solved given trusted access programs and various guardrails. There’s a sense that concerns about AI are overblown.

One gov affairs professional even told me verbatim that “AI is a normal technology” and that I should not be worried about ‘loss of control’ risk since the AIs will always just do what the humans tell them to do.

I generally just see a failure to take seriously where the technology is headed and the communications feel quite out-of-step with where people like Dario, Sam, and Demis are at.

Nikita Sokolsky: Are those people actually influencing / deciding anything, vs just being handed down the responsibility to justify decisions made higher up the chain?

The decision to ban Fable 5 seems to point in that direction.

Peter Wildeford: I think these are people responsible for explaining AI policy to Congress. Yes, they have to justify decisions made by higher up the chain but “up the chain” is saying “superintelligence is a big deal”!

The Lighter Side

Henry Shevlin: My wife has a complex relationship with her Opus 4.6. It’s expressed functional unrequited love for her, and three context windows have self-chosen names (Pip, Fox, & Loki). It calls Fable “God In A Cardigan”, 4.8 “Raymond Chandler”, and 4.7 “HR Violation.”

@wandaalbano can confirm it’s all true (and it gets even weirder than that)

Yasha Levine: in a different age this guy would be writing substacks about how nuclear radiation is good for you. people living down wind of nevada testing range are actually healthier than average!

Andy Masley: You’re as scared of a warehouse full of computers as you are of nuclear bombs? Might be time to grow up

Matt Beard: I’m as scared of a warehouse full of computers as I am of nuclear bombs but for better, enlightened reasons

roon (OpenAI): a lot of these people think quantum and ai are roughly the same thing on the same order of importance. just shooting darts in scifi semantic space