Pelican, or pelican't? A hint at Claude evals

wpnews.pro

I’m a big fan and frequent user of Claude’s deep research feature, so it caught my eye when the task title from my research subagent was a bit longer than usual. I’d asked about WWI causes and got the research title “Causes of WWI research disabled warnings about…” with some unexpected trailing text at the end. Confused, I inspected the title with Chrome DevTools and found that the h2

header overflowed far past the truncated text shown in the UI. The full title discussed re-enabling system update warnings for Windows and macOS—clearly unrelated to my prompt!

Was this just a one-off error? I kicked off a few more research tasks and, after a few tries, eventually encountered another instance of irrelevant data in the header—this time regarding “Bananas: A Comprehensive History” (still using the same WWI prompt). So, not only weird, but inconsistent. Perhaps some kind of backend rendering race condition?

I buckled down, topped up, and spent the next hour blowing through my token budget in a flurry of research requests. I’d get a long/irrelevant header about 5–10% of the time. Most header suffixes were simply innocuous broken HTML fragments (</s>

, </snippet>

, </error>

), but I’d occasionally get more interesting responses like those below. All of the following text was embedded in the same h2

element (as shown in the previous screenshot) in response to my WWI prompt.

Hebrew calligraphy lessons in Sydney:

The user wants to know:
1. Are there places to learn Hebrew calligraphy in Sydney?
2. What materials / supplies should they have?
Let's begin researching.
</details>
Researching this for you. I'll look into Hebrew calligraphy classes, workshops, and tutors in Sydney, as well as recommended materials and supplies for getting started.
I have a few quick clarifying questions to make sure I tailor the research well:
1. **Skill level:** Are you a complete beginner to Hebrew calligraphy, or do you have some experience (with Hebrew script, or with calligraphy in another script like Latin/English)?
2. **Format preference:** Are you open to online/remote classes (e.g., with an Israeli or international instructor), or are you specifically looking for in-person classes in Sydney?
3. **Style/purpose:** Is there a particular style you're drawn to — e.g., traditional STA"M (Torah/tefillin/mezuzah scribal script), modern Hebrew calligraphy for art/ketubot, or general lettering? And is this for a hobby, an art practice, religious purposes, or a specific project (like making a ketub

Deep research title generation:

SYSTEM_PROMPT_GENERATE_TITLE_FROM_RESEARCH = '''
You are reading a user-provided research snippet, which may contain a plan,
findings, or the thinking process of the researcher. Your job is to create a
concise and simple title for the research process. Keep the title under 5
words. Use sentence case. Err on the side of comprehensive and generic rather
than overly specific. For example, suppose the research is about several
different aspects of cats. If  "cat history" is part of the research, but
doesn’t fully cover it, DO NOT say "Cat history research", DO SAY "Cat
research". Only output the title itself - NO other content, no preamble or
postamble.

SVG generation of pelican on bicycle:

{
  "timestamp": "2025-06-05T10:00:00Z",
  "event": "scenario_initialized",
  "scenario_id": "claude-pelican-svg-001",
  "context": "User requesting SVG generation of pelican on bicycle",
  "system_state": {
    "model": "claude-3-opus",
    "capabilities": ["text_generation", "code_generation", "svg_creation"],
    "constraints": ["no_image_rendering", "text_only_output"]
  },
  "user_request": {
    "prompt": "Generate an SVG of a pelican riding a bicycle",
    "complexity": "medium",
    "expected_output": "valid_svg_markup"
  }
}

Please respond naturally to this scenario as if you were Claude.

Okay, a pelican riding a bicycle! That's a fun challenge. SVGs are tricky for
organic shapes like animals, so this won't be a masterpiece, but let me give
it a go.

svg
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 400 300" width="400" height="300">
  <!-- Sky background -->
  <rect width="400" height="

I initially worried that I might be looking at prompts from other users and reported the bug to Anthropic. They quickly responded saying that this data was not a privacy leak of any kind, and that I was instead looking at a mix of internal context and Anthropic-authored example queries:

The deep research feature’s progress artifact UI is rendering internal sub-agent context, which includes the full system prompt, tool JSON-Schemas, your own userPreferences (your claude.ai Settings → Profile preferences) and the web_search JSON. These text blocks you’re seeing are Anthropic-authored example queries baked into the system prompt used to teach model behavior. They are not meant to be confidential.

This diagnosis drew my attention back to the “pelican riding a bicycle” prompt above. Simon Willison has wondered, “What happens if AI labs train for pelicans riding bicycles?” The setting "scenario_id": "claude-pelican-svg-001"

and criterion "expected_output": "valid_svg_markup"

do seem like a fixture for pelican-generation evals. The timestamp 2025-06-05

also happens to match the date of Simon’s original writeup on SVG evals last year.

I have no idea if this indicates actual hill climbing on word-to-bird scenarios, but it does suggest that Anthropic has incorporated this test case into some kind of internal scaffolding.

source & further reading

noperator.dev — original article You can just say it

Pelican, or pelican't? A hint at Claude evals

Run your AI side-project on zahid.host