{"slug": "learnings-from-starting-an-ai-safety-research-team", "title": "Learnings from starting an AI safety research team", "summary": "A new AI safety research team within Arcadia Impact in London has formed over the past four months, growing to eight members who collaborate with the UK AISI alignment team. The team, led by research lead Andrew Draganov, secured funding through the AISI Alignment Project and additional compute funding from Coefficient Giving to pursue projects on model motivations, scalable oversight, and automated alignment research. The effort demonstrates that building a research team from scratch is possible without an established reputation in the field, though the team benefited from Arcadia's existing infrastructure and prior relationships from programs like LASR Labs.", "body_md": "This post’s goal is to distill our takeaways from building a research team (somewhat) from scratch over the past four months. We describe some context about our team, how it came about, and then provide some lessons learned.\n\n[Since](https://forum.effectivealtruism.org/posts/rAqKSSXankvys2Fzu/the-case-for-ai-safety-capacity-building-work) AI safety is becoming [more](https://forum.effectivealtruism.org/posts/jwwrC4n9H53doRjRH/ai-safety-talent-needs-in-2026-insights-for-field-building) and [more](https://www.lesswrong.com/posts/yw9B5jQazBKGLjize/ai-safety-undervalues-founders) entrepreneurial, we hope this is helpful for others trying to do the same.\n\n## 1. The team\n\nWe're a new alignment research team within Arcadia Impact, based in London. We’re a [team of 8](https://www.arcadiaimpact.org/alignment-research), working closely with members of the UK AISI alignment team. We currently have three main projects:\n\n- Understanding model motivations. This currently looks like:\n- Trying to generate documents which fully describe a model’s behaviour (given just its behaviour).\n- Producing a open analysis of alignment training techniques and ways this training could go wrong.\n\n- Doing scalable oversight for alignment. This includes validating debate protocols in practice and then trying to apply them to fuzzy alignment-relevant tasks.\n- Building pipelines for doing automated alignment research.\n\nWe're also hiring for two roles! More on this at the bottom.\n\n## 2. Context about how the team came about\n\nThe rest of this post is written from the perspective of Andrew Draganov (research lead & current programme manager on the team) and Erin Robertson (co-director of Arcadia).\n\nIn short, Arcadia Impact had been collaborating with AISI already, through [LASR Labs](https://www.lasrlabs.org/) and [ASET](https://www.linkedin.com/company/ai-safety-engineering-taskforce/). Our alignment team started by applying for the AISI [alignment project](https://alignmentproject.aisi.gov.uk/) funding, saying that we would hire a team of researchers to collaborate with their alignment team. Andrew was taking part in LASR at the time and was brought in to help with the application. His remit then widened as the number of things to do kept growing. Once our AISI funding was approved we began the process of hiring researchers, and also applied to Coefficient Giving for additional compute funding.\n\nA bit about Andrew, since it bears on how replicable this is. In his words:\n\n- I have a PhD in computer science/machine learning and was working as a postdoc in ML before doing LASR. This means I've spent a number of years talking shop about AI research, though not as many on AI safety specifically.\n- I'm not very well-known in the AI safety community! I only have one first-author AI safety paper (which was reasonably well-received but nothing crazy). I mention this because \"you need to be an established name to lead a research team\" is a reasonable thing to assume, but it wasn't really true here.\n\nFor anyone reading this post as a template, here are some things which may be specific to our situation and might not generalise cleanly:\n\n- We were immediately hiring 7 researchers to get started at the same time! This is highly unusual and probably never how this otherwise happens.\n- Arcadia was already an established non-profit. We therefore already had visa sponsorship processes, office space, hiring systems, etc.\n- There are fiscal sponsors which can do these tasks if you want to avoid figuring out the overhead yourself.\n\n- The Alignment Project, run by AISI, was our initial funder. This is a non-standard funder for many reasons, including that Arcadia already had a working relationship with AISI writ large. If you're aiming to first get funded by, say, Coefficient Giving then the dynamics may be different.\n- Having run LASR, we know a lot of people in the ecosystem quite well. This made hiring easier (and, indeed, over half of the team are LASR alumni).\n- We're doing technical AI safety; not governance, fieldbuilding, etc.\n\n## 3. Lessons learned\n\nGiven the above context, here is advice which we hope is immediately actionable by people looking to start AI safety orgs.\n\n### 3.1 Hiring\n\n*[Written from Andrew’s perspective]*\n\nI feel like our hiring went very well and I’m really excited about the team. But also I wasted a lot of time chasing leads that were varying amounts of useful.\n\nFor one thing, everyone wants to measure 'crackedness' but it’s unclear how to do it. On that axis, the two highest-signal parts of our process were the work test and the references; if we'd relied on only those two, I think we'd have assessed raw research ability roughly as well as we did. The interviews were helpful in addition to that, but mostly to vibecheck for fit rather than to gauge ability.\n\nFor the work test, we paid 50 applicants ~$200 each to make a research proposal. We gave them 4 hours to do this, and the deliverable was just a pdf. We then graded them anonymously. This feels in line with what the work actually looks like in the age of Claude code. We’re happy to share the work test and grading template we used if someone is interested.\n\nHere are a few additional thoughts:\n\n- The various AI-safety talent scouts are\n*extremely* useful when it comes to hiring. This includes research fellowship research managers, people at BlueDot, people at 80K, etc. - There’s just so much talent across the top fellowships. Our team ended up with 4 LASR alums, 1 MATS, 1 Astra, 1 Anthropic Fellow.\n- Most of these fellowships now have extension programmes, where good people keep doing work until they get hired. Although we didn’t hire from this pool directly, the extensioners are probably the most useful group of candidates you can target – they are already-vouched for and are looking for jobs!\n\n- I probably sent 50 cold emails trying to get people to apply. This was only useful insofar as it got me a meeting with the person (which it rarely did). If I was doing this over again, I would spend more time reaching out to various MATS, LASR, and Constellation research managers, ask them who they’d recommend, and then set up 1-1s with those people.\n\n### 3.2 Networking\n\n*[Written from Andrew’s perspective]*\n\nEven though it’s clear that building a good team requires a lot of networking, it was often hard to tell which networking was “worth it” and which wasn’t. Here are the things I’d prioritise if I was doing it again:\n\n**Obtaining an active endorsement from a well-known entity in your AI safety subfield**. I claim this is the most high-leverage thing you can do when building an org, and it was very useful for us. I define an active endorsement as one in which the senior person/org is going out of their way to vouch for you and will likely work with you once you start. At minimum, a written reference from a senior person goes a long way.- Note: Appeals to authority are lame. However, there's so much noise in AI safety and a big endorsement is immediately recognized. This helps with both funding applications and hiring. For instance, we would not have hired as effectively if we couldn’t leverage the AISI and Arcadia affiliations.\n\n**Trialing out big-picture ideas on senior community members**.- I had 2-3 meetings a day for several months pitching senior people on ideas regarding the org (research, position within the community, outreach, various deliverables) and hearing their takes.\n- These meetings were monotonically more useful as a function of how prepared I was (read: how much time I had spent understanding the other person’s worldview in advance).\n- I still cringe about the first time I was describing the goal of our new org and said we wanted to do “alignment research, both technical and conceptual”, to which the person responded “so… all of it?”. But I guess these initial stumbling blocks were necessary in order to get better at talking about the ~vision~.\n\n**Talking to funders***.* In some sense, funders are scary: they know their shit, expect you to know yours, and are short on time. Also, you're cold-asking for a seemingly unreasonable amount of money. However, you're on the same team as them and should try to solicit funder opinions when available. They talk to a lot of disproportionately senior people, and I found their suggestions useful as a biased distillation of all those conversations.- Coefficient Giving is also excited about ambitious proposals, so don't pre-shrink your ask (and don't agonise over salary numbers). I wouldn’t expect to get rejected over a reasonable salary ask, and a quick survey of comparable roles at similar orgs is enough to calibrate.\n\n### 3.3 Trying to build a good team culture\n\n*[Written from Erin’s perspective, with context from running LASR Labs for multiple years]*\n\nSince the team’s just started, we’re not able to claim the culture is good (also, this is not really for us to say). Instead, here is how we thought about the process of establishing team culture prior to people joining. Parts are heavily influenced by the way this is done for LASR cohorts:\n\n**Onboard everyone at once (or failing that, hold a retreat).** Bringing people in together is a clean chance to set common norms and the way we want everyone thinking from day one. If you can't start everyone at once, then it’s useful to run a retreat at some point. This looks like letting people become friends, working on strategy together, and making concrete values.- For example, we wanted the team to think about our communication strategy, so we ran a session exploring how comparable orgs disseminate their work and left with concrete intentions for our own.\n\n**Get the team to shape the strategy. **We hired people based on them having good judgement, so we spent some time together figuring out our priorities. Specifically, we gave people a list of possible agendas and projects, spent the first week thinking hard about which to focus on, and built teams around people’s preferences.**Set expectations.** Collaborators, employees, and advisors all need to know what's being asked of them and how to thrive in their role. Be concrete early about time commitments, what good work looks like, the values you want people building, and who owns what.**Have two distinct management goals.** Reviewing success on tasks, and making people better at their job (e.g. coaching, habit forming, feedback). The second is often overlooked in early-stage teams but is an important way to keep the team happy and improve the productivity of the team over time.\n\n### Interested in working with us?\n\nWe're [hiring](https://www.arcadiaimpact.org/alignment-research)! Specifically, we're looking for an **Alignment Programme Manager**, a senior generalist to help build and run the team. We're also hiring a **Communications and Operations Associate** to shape how our research reaches stakeholders and to keep the team's operations running. Both will be based at the LISA office in central London, with visa sponsorship available.\n\nIf you think your skills don’t fit neatly into one of these descriptions but you think you’d be a good fit, please apply – we are flexible on the exact role and are more interested in finding good candidates! The deadline for applications is June 23rd.\n\nSimilarly, if you're working on related topics, please reach out! The easiest option is to send an email to andrew[at]arcadiaimpact[dot]org.\n\n[Discuss](https://www.lesswrong.com/posts/4onALBNDff2LFPyNZ/learnings-from-starting-an-ai-safety-research-team#comments)", "url": "https://wpnews.pro/news/learnings-from-starting-an-ai-safety-research-team", "canonical_source": "https://www.lesswrong.com/posts/4onALBNDff2LFPyNZ/learnings-from-starting-an-ai-safety-research-team", "published_at": "2026-06-05 16:27:01+00:00", "updated_at": "2026-06-05 16:56:03.847203+00:00", "lang": "en", "topics": ["ai-safety", "ai-research", "artificial-intelligence", "machine-learning", "ai-ethics"], "entities": ["Arcadia Impact", "UK AISI"], "alternates": {"html": "https://wpnews.pro/news/learnings-from-starting-an-ai-safety-research-team", "markdown": "https://wpnews.pro/news/learnings-from-starting-an-ai-safety-research-team.md", "text": "https://wpnews.pro/news/learnings-from-starting-an-ai-safety-research-team.txt", "jsonld": "https://wpnews.pro/news/learnings-from-starting-an-ai-safety-research-team.jsonld"}}