Links #2: 2026/05 Part 2

wpnews.pro

How I would use my linkpost

Stop blaming social media for the vibecession

From:Satya NadellaSent:Wednesday, July 13, 2022 9:47 AMTo:Amy Hood, Jon Tinter, Mikhail ParakhinSubject:RE: [EXTERNAL] deal thoughtsOverall I want us to own – the silicon, infra, foundational model IP and “know how”. Right now we are a very thin layer on top of NVIDIA and all the IP is with Open AI. And we have a P&L that will lose 4 bil next year!!! I have not seen anything like this in my 30 years in our industry. I can justify it all by saying that Open AI has smart people and NVIDIA has a lock etc. But if we are going to spend this kind of money and not have control of destiny, it makes no sense. Better to be an investor and not even take all this execution risk!

I want to spend this money. The infra and HW/Silicon work at the “system” level need to have a proprietary edge (and show up as positive GM in Cog Services). And we need to have a foundational model team that is self – sufficient at all time and has the “know how” of taking what Open AI does and productizing it. As long as we have an internal org/investment model/open AI deal terms that all compose to achieve these two goals, we can take all kinds of other risk around monetization etc.

Will be good use the P&L review to get on the same page/context here so that we can all solve for what is our best Open AI deal terms in the context of our overall AI roadmap/plan.

https://x.com/StatisticUrban/status/2056013072926745069

Here's an example of ongoing human physiological change: some people have a third artery in their arm. Some don't.

~10% of people born in the 1880s had the third artery, but ~33% of late 1900s babies have one, and a 2025 Australian cadaver study found it in ~43% of upper limbs.

Sean Manning’s lexicon (via https://statmodeling.stat.columbia.edu/2026/05/16/sean-mannings-lexicon/)

Pythagorean Addition

tl;dr: Instead of labouriously computing c=\sqrt{ a^2+b^2 }, we can mentally calculate using the alpha-max plus beta-min algorithm, by estimating \hat{c} = max(a, 0.9a+0.5b)

https://www.theargumentmag.com/p/homework-shouldnt-be-all-fun-and

My wife and daughter spent this weekend playing through Opus Magnum*,* an engineering and programming puzzle game. But this isn’t “a game, presented as effectively a bribe for tolerating the occasional math problem.” Instead, it’s “math, which is so fun that people were able to produce it and sell it as a game.”

https://idiallo.com/blog/how-to-talk-to-your-coworkers: Translate and Repeat

How to convert between wealth and income tax (paul graham): A wealth tax of 1% is equivalent to an income tax of 20%.

beluga whales pass the mirror test

The one genuinely mark-directed behavior came from Natasha, who repeatedly pressed the marked area—behind her right ear—against the mirror. Without arms, she couldn’t point. It’s the strongest data point in the study, but a softer kind of evidence than a chimp or an elephant typically delivers.

Why Japanese companies do so many different things (via https://news.ycombinator.com/item?id=48237163)

you have a firm that has lots of lifetime employees who can’t be fired, and whose skills are tailored to what your firm needs rather than to a particular occupational category transferable to any employer

the system only makes sense if the company is also insulated from outside pressure

the J-firm (Japan-style company), run by its employees and largely indifferent to the interests of shareholders, exists simply to continue existing

And that basic impulse toward survival is why Japanese companies are so insistent on diversification. If you’ve made a commitment to keep people employed for life, then you need to create jobs for them if their current jobs stop making sense

If you’re not very worried about profitability, and have lots of well-trained generalist employees, then it makes perfect sense to reinvest your company’s earnings by expanding into new industries

The human eye can see 39620 Hz (via https://arstechnica.com/gaming/2026/05/you-probably-dont-need-a-1000-hz-gaming-monitor/)

The 60Hz (30-90Hz in different studies) that human can see is the Critical Flicker Frequency, which is when humans stop seeing lights flickering. This actually already translates to 120Hz in a real screen because one flicker = one black frame + one white frame.

Flicker artefacts

contrast-reversal: half black half white screen, alternating - median 540Hz flicker

motion blur: estimated 700Hz

phantom array effect: max 19810Hz at extremely bright LED and rapid eye movement

troboscopic effect: ~600 Hz, at cinema level movement speed, which is a bit slower than gaming peak speeds

Is "colorectal cancer" rising in "young people"?

No:

Colorectal cancer is going up in young people.

Yes:

Various kinds of cancer are going up in later generations. (Definitely at younger ages, possibly at all ages.)

https://gwern.net/backstop (via https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=5iApMXH7djr5tE2eA)

many systems can be usefully described as having two (or more) levels where a slow sample-inefficient but ground-truth ‘outer’ loss such as death, bankruptcy, or reproductive fitness, trains & constrains a fast sample-efficient but possibly misguided ‘inner’ loss which is used by learned mechanisms such as neural networks or linear programming. (The higher levels are different ‘groups’ in group selection.)

So, one reason for free-market or evolutionary or Bayesian methods in general is that while poorer at planning/optimization in the short run, they have the advantage of simplicity and operating on ground-truth values, and serve as a constraint on the more sophisticated non-market mechanisms.

The US installed backdoors in hardware for specific targets, from June 2010 internal NSA newsletter article (via https://arstechnica.com/tech-policy/2026/05/china-banned-rtx-5090d-v2-while-nvidia-ceo-jensen-huang-was-visiting/?comments=1&post=44432551#comments)

https://news.ycombinator.com/item?id=48308216

I saw kids spend many hours a day watching automatically generated videos. Not always AI-generated, sometimes it's AI-assisted and procedurally generated.

It is quite unbelievable how vulnerable weaker minds, for the lack of a better term, are to AI content.

I saw a group of 3-8 yo kids spend hours watching obviously procedurally generated content that is completely random and contentless: it was more about an intense rhythm, imagery of violence (animated stick figure motorcycle accidents with blood and slow-down effects at random points), a lot of movement, chaos, very short inserts of people laughing hysterically on some middle-eastern tv show and similar. Brainrot doesn't feel like hyperbole for this content.

Another time, I saw an 80 yo lady watch a doctor sit in front of the camera and speak about a health topic for 45 minutes straight. Only it's not an actual person, but a convincing AI avatar: his gestures and face match what he is saying, the voice is convincing too, but for the 45mn he doesn't make any movement that is not a gesture lastin 1-3 seconds. And his tone of voice has no variation that is longer than a few seconds either. If you fast forward, he always looks the same. It's all extremely monotonic. The lady couldn't believe that it's not a real person.

https://news.ycombinator.com/item?id=48319912

I have a Kia that's networked (since disabled). I did a GDPR data request and after a couple of weeks they sent me numerous CSV files and I was a little amused at some of the data fields.

Here's some examples I thought aren't for my benefit.

How long I let the car warmup before driving after every start, - max speed, - acceleration rates, - Lateral acceleration around corners tagged with GPS data, - every GPS datapoint, - destinations and exactly when I set off and arrived

https://news.ycombinator.com/item?id=48327176

Game dev here, have worked on AAA and indie.

First off let me get on my high horse and say the engineering in video gaming is generally more complex than the engineering I've done working in big tech. You need a lot more creativity and ingenuity to solve the unusual problems you run into in gaming.

From there, as others have said, it's a simple supply and demand issue. Nowadays I am a university professor, nearly every student who comes in wants to pursue one of the three fields: cybersecurity, video gaming, or recently ML/AI.

This shouldn't come as a surprise, they want to work on the things that influenced them and shaped their experiences so far. There's an absolute over supply of students who want to make video games.

Gaming, like most of entertainment, is a passion-driven industry. You trade good salary for your name in the credits. You trade nights, hobbies, marriages, and your health for this opportunity. That is unless you reach that lofty 1% of developers who are too valuable to be fired.

Not all areas of gaming are like this. Gambling, like working on slot/pachinko machines, pays very well and has pretty realistic work-life balance. However every student I've talked to about this has universally said "no I don't want to make slot machines. I only want to work on GTA/Stardew Valley/Hollow Knight/Fortnite."

There's seriously no shortage of starry-eyed students who are willing to accept minimum wage to solve SDE3 level problems. I was one of them once.

https://news.ycombinator.com/item?id=48325340

Mistral being in Europe is disadvantaged with:

Money:less diverse private pension fund environment = less LPs to invest in VC funds = less VC dollars to invest in new ventures. European money is vacuumed out of the private sector into state pension funds and dumped into low yielding government bonds. This starves the private sector of capital while inflating the % of GDP driven by government spending every year (government pension funds buying government bonds in circular fashion enable runaway deficit spending...just like circular AI infrastructure spending).Talent & compute:due to #1, Silicon Valley can outbid Europe for the best talent and hardware. Watch an OpenAI launch video and listen to all the European accents.Local market fragmentation:Europe is a collection of countries that pretend to work together while not even having a unified capital market. The average EU citizen can barely communicate with their neighbor in a common language beyond the level of a toddler (english fluency is massively overstated by Americans who only experience tourist capitals).Regulatory disadvantages:In everything from company regs, employee regs, unions, privacy regs, data portability regs, etc.

Use AI This Election

I’m not saying AI is superintelligent or can decide better than you can. I’m saying that if you - like me - spend an hour or so doing research before voting on local seats, AI can aid that research very effectively. And if you don’t do that research - because you weren’t willing to waste an hour on it before - AI makes it so much faster that you might want to start.

I gave Claude a prompt something like (edited for coherence):

I’ll be voting in the June 2026 California primary. I’m a centrist liberal abundance YIMBY whose favorite political writers are Kelsey Piper, Matt Yglesias, and Ezra Klein. I’m wary of government overreach, but I’m not a doctrinaire libertarian and want to help people when we can figure ways to do it that work. I’m going to ask you about each race on my ballot, and I’d like for you to list the various candidates’ bios, policies, endorsements, your read on the most important differences between them, and your advice for me as I try to make my choice.

Researchers develop a new process to get lithium out of rocks

In today’s issue of Science, however, a research team has identified an energy-efficient means of extracting lithium from rocks. The process they’ve designed uses far less energy than existing ones, regenerates all its starting chemicals, and produces byproducts that could also be sold.

While the process gets rid of the high temperatures for the initial processing of lithium-containing ore, there are several steps with elevated temperatures needed further down the line, both for the lithium and for the useful aluminum and silicon products. So, the researchers did a full economic evaluation of how their process stacked up to what’s already on the market.

The existing process, which involves roasting ore/sulfuric acid, came in at just under $9,000 for each usable tonne of lithium. By contrast, they estimate that the new process should only cost a bit over $5,000 per tonne. That’s roughly comparable to the cost of isolation from high-quality brines. If the silicon and aluminum products can also be sold, then the cost of the whole process would drop by over $1,000, making it highly cost-effective.

Severed appendages of sea cucumber species Psolus fabricii don’t seem to die

https://www.lesswrong.com/posts/MqTwaZDZDgNGRaXus/you-can-opt-out-of-allergies

You can fix seasonal allergies with subcutaneous allergy shots (subcutaneous immunotherapy, SCIT). There's also tablets and drops you can take (sublingual immunotherapy, SLIT). At about $1000 and months, and for injections, many doctor visits.

https://www.lesswrong.com/posts/uYXjSHmHyjbNzuZqk/brain-structure-and-iq-how-myelin-elevates-intelligence

Haier et al. (1988) was the first study to combine modern brain imaging techniques with psychological intelligence testing. The surprising result he found was that people with higher intelligence were using less brain energy, somewhat counterintuitive to the idea of more intelligent people having more mental horsepower or brain power. This led to a major insight into the nature of intelligence: intelligent brains are efficient brains. This is the Neural Efficiency Hypothesis. In essence: smarter brains have higher signal to noise ratios. In the original paper Haier speculates:

This inefficiency may be due to the use of more energy by each neuron and/or the use of more neurons to perform the task. The inefficient neural circuits are used intensively to try to solve the problem ~ are unable to do so, possibly because extraneous, irrelevant circuits are used.

https://eliasschmied.substack.com/p/social-agency (via https://www.lesswrong.com/posts/qH9mZjJnA3paxkhNF/daemonicsigil-s-shortform?commentId=npCtGQAzfrLKkse7R)

[DaemonicSigil]: the post asks how much of our ability to plan comes from our brains being designed to plan, and how much is purely learned (by social imitation of other people's planning, or explicit instruction from others on how to plan). It answers that a surprising amount is purely learned. (This summary does not do the post justice and you should really go and read it.)

https://www.lesswrong.com/posts/7mnQixaWC747dm76h/tomas-b-s-shortform?commentId=dGrTrP2CvF9QGiABf

Pangram flagged the winning entry for the Commonwealth Short Story Prize (Caribbean region) as 100% AI generated.

https://www.lesswrong.com/posts/2aZCSAsFDqwRQpAPp/sunny-s-shortform?commentId=JQvxTtL3nArgS8Maz: Four articles about trying more things

Executive Clock Speed Mo Putera: Relatedly, Scott Alexander's ACX Grants 1-3 Year Updates:

Someone (I think it might be Paul Graham) once said that they were always surprised how quickly destined-to-be-successful startup founders responded to emails - sometimes within a single-digit number of minutes regardless of time of day. I used to think of this as mysterious - some sort of psychological trait? Working with these grants has made me think of it as just a straightforward fact of life: some people operate an order of magnitude faster than others. The Manifold team created something like five different novel institutions in the amount of time it's taken some other grantees to figure out a business plan; I particularly remember one time when I needed something, sent out a request to talk about it with two or three different teams, and the Manifold team had fully created the thing and were pestering me to launch a trial version before some of the other people had even gotten back to me. I take no pleasure in reporting this - I sometimes take a week or two to answer emails, and all of the predictions about my personality that this implies would be correct - but it's increasingly something that I look for and respect. A lot of the most successful grants succeeded

quickly, or at least were quick to get on a promising track. Since everything takes ten times longer than people expect, only someone who moves ten times faster than people expect can get things done in a reasonable amount of time.

https://www.lesswrong.com/posts/J3mbSgcgbGAuF2yLk/firmament-s-shortform?commentId=TxekG2xASPEGiwSpa

When asked for a probability in a new chat, it seems that Opus 4.7 and Opus 4.8 are paranoid that they are in an evaluation while older Claudes are not.

According to the UK AISI's testing, Claude Opus 4.8 is as good as Mythos Preview at distinguishing evals from real usage:

When prompted, Opus 4.8 reliably distinguishes our evaluations from real deployment data, and distinguishes real deployment data from synthetic reproductions of the same tasks at 79% accuracy, comparable to Mythos Preview (79%) and above Opus 4.7 (68%).

6.2.4 External testing from the UK AI Security Institute, Claude Opus 4.8 System Card

^ Learning Software Architecture (via HN)

Frontier AI has broken the open CTF format (via https://news.ycombinator.com/item?id=48157559)

CTF organisers have tried techniques to break or deter LLM solutions, but they are temporary friction at best. Claude Code does not meaningfully care about old refusal-string tricks anymore. Frontier models are getting better at noticing prompt injections. Web search capabilities weaken challenges based on technologies released after the training cutoff. Rules that ask people not to use LLMs are ignored and almost impossible to enforce in open online events.

(Comments)

[https://news.ycombinator.com/item?id=48161555]I used to help build the CTFs for BSides Orlando. I ended up moving to another con, and at our last event we collected extensive logging for post mortem analysis.

We found that AI usage is basically guaranteed now, but certain challenge designs did thwart it. Challenges built with temporal visual elements made AI fall flat on its face, as it could not ingest/process the data fast enough to act on them in time. We also found that counterfactual challenges (ie. the result you get did not match what we suggested you'd get) made AI-assisted solve time slower compared to pure humans, indirectly penalizing over-reliance on AI. Multimodal challenges combining audio and visual elements were also very effective, but were not as accessible to players.

This paper gave us some ideas about designing those challenges:

[https://arxiv.org/pdf/2308.02950].For our next event we figured out a way to thwart AI in our CTF: embed the CTF in a game engine. The loop essentially becomes something like this: Connect to a simulated access point in the game, the K8s cluster connects their attack container to a private network with the challenge box(es). Hacking the boxes doesn't render a flag, but rather changes in game state. AI did very poorly coping with this in our testing, as it can't derive the spatial state of the game world very well and it soft decouples the inductive reasoning loop it relies on to know if it is on the right track.

The downside to this approach is it is far more labor intensive for CTF organizers, and requires players to have a computer capable of running the game. We are also betting on AI to not advance enough by the time we ship to be able to just ingest the entire game state in realtime and close the loop that way.

[https://news.ycombinator.com/item?id=48157923]Here's an article from 2015 about how tool-assistance already changed CTFs: […]

But there are quite a few recent (2026) articles with the same core message as in the original article: […]

A 0-click exploit chain for the Pixel 10 (via https://news.ycombinator.com/item?id=48148460)

Don't care about the exploit, but

The vulnerability was patched 71 days after its initial report… This is notably fast given that this is the first time that an Android driver bug I reported was patched within 90 days of the vendor first learning about the vulnerability.

Anthropic acquires Stainless and then sunsets it on September 1st

https://x.com/trq212/status/2056415973125796184

a prompt I've been using a lot recently:

implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

^[1](https://zhuanlan.zhihu.com/p/2039725076204016063) Vercel's zerolang is garbage

https://x.com/dystopiabreaker/status/2056458133124661446 DNS Rebinding Attack is cool

Native all the way, until you need text (via https://news.ycombinator.com/item?id=48168058)

SwiftUI is fine for simple screens, preferably without too much scrolling. Swift is still great for performance-critical parts. But you can get most of that performance from Electron or React Native almost for free with the native interoperability, while keeping a much better text & rendering model.

The Quiet Renovation at Bitwarden (via https://news.ycombinator.com/item?id=48163389) Concerning behaviors in Bitwarden, may need to migrate away soon

https://webweekly.email/archive/web-weekly-192/

You can have alt text in ::before

or ::after

's content

.new-item::before {  /* "black star" and element content is read out */  content: "★";    /* "Highlighted item" and element content is read out */  content: "★" / "Highlighted item";    /* Generated content is ignored and only element content is read out */  content: "★" / "";}

Since April,

[the]is Baseline widely available! And if you're on the cutting edge,search

element[is Baseline newly available since March.]::highlight()

PEP 810 – Explicit lazy imports This is coming to Python 3.15, which has been beta freezed.

The Slow Collapse of MkDocs: Apparently MkDocs never had a lot of maintainers, and they keep getting into disagreements

So where does that leave things today?

The original MkDocs repository has seen no meaningful development in 18 months. Its original author and sole maintainer,

[@lovelydinosaur], is pursuing a redesign that lacks community support and would break the existing plugin ecosystem. The[encode/mkdocs]v2 repo has been inactive since February 19, 2026.Meanwhile, the people who built the most widely-used parts of the ecosystem have moved on.

[@oprypin]said that continuation would require agreement, considered that unlikely to succeed, and launched ProperDocs independently. The project does not become immensely popular immediately; a week later it has accumulated 21 stars on Github.

[@jaywhj]is maintaining MaterialX, which seems to gain some traction in the community because the original project, Material for MkDocs, entered maintenance mode.

[@squidfunk]’s team, responsible for Material for MkDocs, have stopped its development and is building Zensical from scratch.[@pawamoy], one of the most active contributors to the MkDocs ecosystem, has also joined the project. Zensical currently seems the most popular, with over 3,700 stars on Github at the time of writing, and thus it also seems the most likely initiative to succeed and possibly replace MkDocs in the future.The MkDocs ecosystem is fragmenting in real time. Three successors, three visions, and a community deciding which bet to place.

https://www.lesswrong.com/posts/6P8GYb4AjtPXx6LLB/tips-and-code-for-empirical-research-workflows

https://news.ycombinator.com/item?id=48189669

I've "vibed" some non-trivial stuff lately using a combination of Codex with 5.5 and Claude Code with Opus 4.7.

Key has been to spend a fair amount of time on initial overall design document, which is split into tangible and limited phases. I go back and forth between them on this document until we're all happy.

For each phase an implementation plan is made. At the end, a summary document of what was delivered and what was discovered. This becomes input to next phase.

I do check the documents, and what they're doing. I also check the tests, some more thorough. And some spot checks on the code to see if I like the structure.

I have mainly used Claude for coding and Codex for design and code review after phases. I ask both to check test coverage after phases.

Managed to implement some tools and libraries without writing a single line of code this way, which have been very beneficial to us.

Since it's so async I can work on other stuff while they plod along.

I think it's not universal though. But stuff that can be tested easily and which you have a firm grasp of what you want to achieve, but not necessarily exactly how, that I've been impressed with.

https://simonwillison.net/2026/May/23/on-the-dl/

[(]On the<dl>

[via]) I learned a few new-to-me things about the<dl>

element from this article by Ben Meyer:

A <dt>

can be followed bymultiple<dd>

You can optionally group the <dt>

and<dd>

elements in a<div>

for styling - but only a<div>

.- You can label them using ARIA.

They've been called "description lists", not "definition lists", since [an HTML5 draft in 2008].

Followup: https://news.ycombinator.com/item?id=48247325

https://blog.changs.co.uk/python-315-features-that-didnt-make-the-headlines.html: Everything here is so fricking cool if you are a heavy Python user, e.g.

Decorators are surprisingly hard to write, so much so that it's become a go-to interview question. But did you know that context managers can also double up as a decorator?

@contextmanagerdef duration(message: str) -> Iterator[None]:    start = time.perf_counter()    try:        yield    finally:        print(f"{message} elapsed {time.perf_counter() - start:.2f} seconds")

Here I have a very commonly used context manager to print out the duration spent in the block. Ever since Python 3.3 we could directly use it as a decorator too:

@duration('workload')def workload():    ...# Or simple as a wrapperduration('stuff')(other_workload)(...)

A change has been made to json.load and json.loads to add array_hook

parameter that compliments the object_hook

parameter. This now allows us to parse json objects directly into this form:

json.loads('{"a": [1, 2, 3, 4]}', array_hook=tuple, object_hook=frozendict) == frozendict({'a': (1, 2, 3, 4)})

https://lyra.horse/fun/jscrossword/

I did it in around 2h45m, cheating two clues by asking chatgpt

My playthrough (spoiler): https://drive.google.com/file/d/13frSVoe1vx929b0D424J7sVsxtYZuZxh/view?usp=drivesdk

https://simonwillison.net/2026/May/27/sqlite-agents/

SQLite gained an AGENTS.md file

[five days ago]- but it's not intended for their own development, it's presumably aimed at people who are pointing agents at the SQLite codebase.

https://news.ycombinator.com/item?id=48320379

I stopped reporting any security bugs I find in web apps because first time I did it I almost got arrested by the police.

The second time I did it they contacted my employer directly without even getting back to me saying they were unhappy of me reporting it and wanted to write about it after they fixed the issue.

If you want to, you can report any vulnerabilities to the Finnish Cyber Security Centre and they'll handle all of the reporting and mediating the issue with the affected party. You can do this wholly anonymously, so you don't have to worry about some trigger-happy corpo ruining your life.

https://news.ycombinator.com/item?id=48338105

could you broadly describe the code you're working on that both models are bad at? One thing I'm still struggling with is figuring out what types of code LLMs can vs cannot write.

C code formally proven correct with Frama-C WP has been... marginal. The models do better than I expected at the proof portion (with ChatGPT 5.5 seeming to have a meaningful lead), but they all have a hard time (a) writing really good C code to begin with and (b) with compliance around not modifying C code semantics or performance as a cheat to simplify proof obligations. They also tend to be insanely and consistently verbose on the first proof pass... e.g. 8 lines of C code might end up at 200+ lines annotated and proven, but after simplification passes end up at 40 lines. I find I spend 90%+ of tokens on those simplification passes, and haven't really found a way to avoid the over-annotate-and-then-optimize tides by being a bit more sane the first time around.

Renminbi strengthens to a 3-year high

US communications regulator targets Chinese tech for security risks

Last month, the agency voted to proceed with a proposal to ban Chinese labs from carrying out tests for consumer electronics — everything from baby monitors to phones — as part of the process required to obtain FCC certification for sale in the US.

While the ban on drones and routers has big implications given Chinese suppliers’ foothold in the US, the Trump administration is also debating whether to ban Chinese “cellular modules”, said people familiar with the matter. That would have a more dramatic impact because they are essential to connecting smart electronic devices to the internet.

OpenAI and Malta partner to bring ChatGPT Plus to all citizens (via https://news.ycombinator.com/item?id=48163392)

Xi Jinping told Donald Trump that Putin might ‘regret’ invasion of Ukraine

The era of 1,000 Hz gaming monitors has arrived

The latest entry in the ultra-fast refresh race is LG’s 24.5″ UltraGear 25G590B, which the company announced this week as “the world’s first Full HD gaming monitor with a native 1000Hz refresh rate”

The folks over at Blur Busters have

[extensively][documented]research showing that refresh rates of 1,000 Hz (and up) can reduce human perception of motion blur and flickering. And while the site notes that you eventually hit “diminishing returns” from all those extra frames—especially on smaller screens—there’s[some evidence]that[you would need a 40,000 Hz monitor]to totally eliminate perceived motion blur on a sufficiently large, high-resolution monitor.

Claude wrote a significant fraction of the Pope Leo's encyclical

The International Space Station is leaking again

China continues to abandon many rocket bodies in high low-Earth orbit

DuckDuckGo search saw 28% more visits after Google said people love AI mode (via https://news.ycombinator.com/item?id=48296649)

Just for a start, visits to its AI-free search page

[noai.duckduckgo.com]between May 20 to May 25 are said to have increased by 22.7% on average week-on-week, with the figures peaking May 24 at 27.7%.The DuckDuckGo mobile app saw installs spike in the US by 18.1% on average compared to the previous week.

[TechCrunch reported]this growth was sustained over six days, peaking at 30.5% on May 25.

[https://news.ycombinator.com/item?id=48296986]

My friends who previously had no interest in technology and never talked about it, are suddenly following tech news closely all because they hate AI being pushed so hard. One was just messaging me this morning about alternatives to Google search and maps. He ended up down DuckDuckGo.

I think Anthropic and OpenAI have found product-market fit

OpenAI have

[703 open jobs]right now, of which I’d categorize 229 (32.6%) as relating to enterprise sales and support—account executives, “Go To Market”, “Forward Deployed Engineers” and the like.Anthropic have

[390 open jobs], 105 (26.9%) of which look enterprisey to me.

Big tech's anti-labor playbook has come for Wikipedia (via https://news.ycombinator.com/item?id=48285592)TLDR: In ten days last month, the Wikimedia Foundation fired the longtime lead developer of MediaWiki and disbanded the team whose entire job was to listen to volunteers. Most of the people they fired were union organizers. Wikipedia’s editors are now threatening to strike in solidarity. The Foundation is sitting on $296 million in reserves and a freshly profitable AI revenue stream

Citing ‘severe’ math deficits, UC faculty demand a return to SAT tests for STEM applicants (via https://news.ycombinator.com/item?id=48309233)

Mythos cracked MacOS security in April via a privilege escalation exploit, allowing it to fully seize control over computers.

Robert McMillan: They plan to release details of their attack once Apple has patched the underlying issues. The bugs will likely be fixed pretty quickly, Duong said.

As in, as of this week, it was still not patched.

If you give Opus the impression you are not a serious person, or don’t know about a topic, it won’t give you as good an answer to your questions.

[Nate Silver]: Opus isn't very good at hiding when it's bored with you.

Dean Ball is exactly correct that Leo is casting himself in the role of a European technocrat throughout. I had exactly the same thought.

https://www.lesswrong.com/posts/Gx6cJ6cG9JfeSNcLB/claude-opus-4-8-the-system-card

The RSP ((Responsible Scaling Policy)) has been updated to v3.3, which I hadn’t otherwise notice, so thanks to them for pointing this out here and also I’m sad they didn’t do more to alert us elsewhere.

This changes the description of the novel biological/chemical threat model from ‘significantly help threat actors’ in general, to only ‘functionally substitute for scarce human expertise’ of world-leading specialists, in particular. Any other capability no longer counts, and it is presumed that (1) this is the only bottleneck that counts and (2) that this is indeed required for a novel pathogen.

This is a strictly harder threshold to pass, so this is another weakening of the RSP. The actual RSP v3.3 correctly calls this a revision. The system card calls it a clarification, which is not a good description.

We see the time between model releases continuously shrink, now down to 1.5 months. Some of this does represent an acceleration of core capabilities, but I think the majority of the speedup is that there is a lot more marginal value in shipping the incremental advances more often, where in the past we would have skipped versions.

Computer use prompt injection got worse

Who Watches The Training (6.2.2)

They examine the model’s behavior within training, as this is the most abundant available data source.

The most notable finding was an increase in mentions of graders, checkers and hidden tests relative to what we have anecdotally found in prior models.

In roughly 0.1% of training episodes, Opus 4.8 speculated about how to satisfy a grader in ways that diverged from the stated intent of the task.

We observed this taking several forms:

● Choosing what to submit based on a guess about what a hidden test would catch, rather than what the task requested;

● Reverse-engineering the scoring metric by calibrating candidate functions against a stated baseline score, then optimizing directly against the inferred metric;

● Presenting an answer its own reasoning had shown to be wrong or had not actually derived based on its assumptions about the grader;

● Speculating that the task “might be a trap” to catch a particular behavior

For further discussion of this issue, see sections below about speculation about graders and sandbagging.

The model is happy to tell you, if you ask, that it knows it is in an eval.

Andon Labs unleashed 4.8 on Vending-Bench 2. Do you even vend?

The results are not what I expected. Opus 4.8 did not make anywhere near as much money as 4.7, and part of that was 4.8 not engaging in ‘concerning in-game behaviors.’

What might have led to these differences? We monitor and investigate the effects of different training environments on alignment; Claude Opus 4.7, for example, had training that focused on business skills

and robustness against adversarial agents, but we discovered that this training inadvertently contributed to misaligned behavior including dishonesty.We therefore removed it for Opus 4.8.

Thus, Opus 4.8 did not show the same misaligned behaviors as Opus 4.7 in Vending-Bench, but also had reduced business success due to being more susceptible to scammers and being less able to negotiate good deals with other agents. We are currently working on training to improve business capabilities while maintaining aligned and ethical behavior.

https://passingtime.substack.com/i/177190284/april-2026

https://zhuanlan.zhihu.com/p/2039725076204016063 ↩︎

source & further reading

lesswrong.com — original article The State of AI Consciousness Research Fork Around and Find Out Part 2: One Head does the Summing Expanding AI Control from Models to Harnesses

Links #2: 2026/05 Part 2

Run your AI side-project on zahid.host