From job posting to hire: templates, sourcing campaign, and LLM-resistant tasks

Mieux Donner, a French effective giving initiative, hired three people from 424 applications after opening four roles to fill two positions, investing 160 hours in a four-step process. The organization optimized for finding the best candidates by shaping roles around them rather than pre-defining narrow positions, and shared their hiring templates and lessons for reuse by other nonprofits.

This document explains how Mieux Donner ran its 2026 hiring round: how we decided what to hire for, how we built the offer and the process, the results we got, and what we would tell another organisation doing roughly the same. It is meant to be reused. We also advise you to read the chapter on hiring in “ How to Launch a High-Impact Nonprofit ”. Mieux Donner is the French effective giving initiative, incubated through Ambitious Impact AIM and Giving What We Can in 2024. We were roughly 2FTE, have directed over €1M to high-impact charities at a giving multiplier of 5–6x https://mieuxdonner.org/mieux-donner-2025-activity-report/ and are now looking to expand the team. I used AI to do some analysis on the application without applicant data and to correct my speech-to-text. A note for applicants: This document is written for people running a hiring process, not for people applying to one. Reading it will probably not help you, and we do not really advise it. Knowing how a process is designed could be useful if you are applying to a government body or a high-earner position, but the process we follow is unlikely to resemble any of those. And if you are applying to an EA-inspired organisation, there is no point in trying to rig the process: you would be taking the role from someone with more capacity to have a bigger impact than you. On confidentiality. The text of this document is not confidential and you are welcome to reuse it. The underlying materials, namely the exact questions, the practical tasks, the response emails and the weighted factor model WFM we score candidates with, live in a separate folder we keep confidential to protect the integrity of the process. You can request access; we only grant it to people in a managing or hiring position writing from an organisation email. Positions planned | 2 | Roles opened | 4 see why below | Applications received | 424 | People hired | 3 2 planned + 1 part-time | Process | 4 steps, about 6 hours per candidate who goes the distance | Our time invested | about 160h: 20h design, 40h outreach, 100h reviewing | We had secured funding for another year of operations and were confident that growing the team was the right next step. We hire on permanent contracts with a trial period. We knew we wanted to grow, but not where extra effort would pay off most: more SEO content, podcast outreach, an ambassador network, relationships with high-net-worth individuals, corporate partnerships, website optimisation, social media, newsletter. We had cost-effectiveness models and had compared notes with other effective giving initiatives, but real uncertainty remained. So rather than pre-defining two narrow roles and hoping the best candidates would fit them, we opened four broader roles and optimised for finding the best possible person, then shaped a role around them. We set out the full rationale in an earlier post on the EA Forum “ Why we opened 4 roles to fill 2 positions https://forum.effectivealtruism.org/posts/PRFyWG6gJSXb3PLAx/mieux-donner-is-hiring-why-we-opened-4-roles-to-fill-2 ” . What opening four roles did to the applicant pool. Opening more roles increased the total number of applications, but volume was shaped more by the nature of each role than by the count. What mattered most was whether the role title matched terms people actually searched for. People browse job boards by function: "communications", "operations", "fundraising". A role that maps onto a familiar function attracts strong volume; a role described in terms that do not match how candidates describe themselves, whether because the title is too specialist, too vague “generalist” , or simply uncommon, will attract far fewer applicants regardless of how well the underlying work is described. Role | Applications | Rate 1st step passed | | Communications & Partnerships easy to attract | 161 | 42% | | Operations broad, generalist scope | 124 | 46% | | Director of Philanthropy we actively pushed sourcing | 102 | 49% | | Growth Hacker specific skill set | 33 | 39% | | Applied to several roles & not passing 1st step | 4 | We received no negative feedback from applicants about opening more roles than we intended to fill. Some applicants did not even realise the other roles existed, and it did not seem to change their experience. We are unsure whether stating that we look for “exceptional people” deterred anyone, since we cannot observe those who chose not to apply but we know that this framing attracted some people. Despite stating clearly in the FAQ that candidates could only apply for one role, a meaningful number at step 1 applied for several. We asked them to pick the one where they felt they had the strongest chance of performing well in the exercises, and told them we would have time later in the process to build a role that combined their skills if needed. We define the tasks and questions around the time we expect the person to actually spend on the job: the biggest chunks of real work become the biggest parts of the evaluation. This is mainly based on discussion during a knowledge sharing session about hiring during EAG London 2026. Before writing anything, decide whether to run an open round a public posting or a closed one targeted outreach only . A few questions to ask yourself: We would still open a round publicly at least once a year, to give strong people outside your network a chance, and because a public round has communication benefits of its own see the landing-page section below . An argument we are sceptical of. “if you look for a more senior profile, closed round is better” came up several times. We are dubious: years of experience are not well correlated with performance. It may matter only where the role involves counterparts who would judge a candidate on visible age or seniority. We publish the full offer on our website as a landing page https://mieuxdonner.org/join-a-team-that-wants-to-make-effective-giving-the-norm/ with a very detailed FAQ, then share it widely: EA job boards, our newsletter, social media, and essentially every free job platform in France, plus a one-day free LinkedIn boost. Publishing the full offer publicly has a side benefit: it generates backlinks and SEO value, which builds search-engine confidence in the site. The landing page is worth real effort. We worked hard to make ours attractive and received many compliments; a few people who were not job-hunting told us they reconsidered after reading it. We also used the page to make the case for effective giving: many readers interested in the Director of Philanthropy page already worked in foundations or philanthropy advising, so the page introduced effective giving to thousands of people well beyond our hiring target. A webinar was worth it. We ran a live Q&A webinar https://luma.com/72l73qnm and found it well worth doing: 120+ registered and about 70 attended live, and many candidates later watched the recording https://www.youtube.com/watch?v=x-xcC6djHqw and were grateful for it. A version we only advertised on LinkedIn got little organic traction. Between the webinar and the detailed FAQ, candidates had almost no remaining questions. Some people still wanted to jump on a call with us pretending they had questions for us, we said that everything was already documented and we still invited them to send any, promising to fold the answers into the FAQ. Keep a running target list. If your round is open, and even more if it is closed, this matters. In the six months before opening, I noted everyone I found impressive or whose profile matched a topic we work on. By launch I had about 50 people I invited to apply and about 50 more I invited to share the offer with their networks. A newsletter that would share your offer to the right audience would be valuable, but I did not find one accepting. Make the referral ask actionable. I used Happenstance which runs boosted search across your LinkedIn network to find about 30 more people. And one sourcing tip that worked: instead of asking “can you think of one or two people who would fit?” to which few people produce names , ask “can you run your LinkedIn contacts through Happenstance and send me the list?” It is a bigger ask, but almost everyone said yes and gave way more names. High impact directory: we used the High impact Directory https://www.highimpactprofessionals.org/td-job-seekers by High Impact Professionals. We checked for people matching our criteria and interested in the roles speaking French was the most limiting one and sent them an email with the offer a lot of them receiving it in SPAM though The salary philosophy we state publicly in the offer: Our approach to remuneration rests on two principles: fairness to our team and responsibility to our beneficiaries. Pay should be enough to avoid frustration or financial stress, so everyone feels fairly valued. As a non-profit we balance that against directing as much funding as possible to our beneficiaries: we do not try to match private-sector pay, but to keep salaries fair and consistent with our mission. Needs vary, so we ask candidates to be open about their expectations, and we agree a fair package with the person we hire. Practical choices: What we ask candidates to reflect on, for the last round of the process. Publishing a range anchors expectations, most of our top candidates land near the top of the range we publish. We want them to answer not “what salary would make me happiest?” but “what salary is high enough that I will not be financially stressed?”, the two can differ. We share the director's Romain's salary, which is low and the lowest in the organisation. People often revise their ask down, sometimes by thousands of euros. We also tell candidates we will not negotiate, and that a high ask is taking budget from other campaigns and makes them less competitive: we might hire someone else simply because they would cost less. We run a referral bonus: €300 to you, or €500 donated to the charity of your choice, paid €150 on hire and €150 after six months. Referrals must reach us by direct message or email, naming someone who has agreed to be contacted before the person applies. In practice we saw barely any signal that it changed outcomes, and no pushback about offering it. Heavily adapted from Charity Entrepreneurship's current processes and their book How to Launch a High-Impact Non-Profit . Four steps, about 6 hours total for a candidate who goes all the way: About TestGorilla: We ran both the Big Five and the problem-solving test on TestGorilla. At the time we used it, the free tier allowed unlimited responses on both tests; their pricing policy appears to have changed since, so check current terms before relying on the same approach. We were genuinely happy with the platform. TestGorilla has strong built-in mechanisms for detecting suspicious behaviour, and across roughly 200 candidates who completed the tests, almost none triggered any flag. We also saw no strong evidence that candidates gamed the personality test: the distribution of Big Five scores across our pool followed a roughly normal curve centred near 50%, consistent with honest responses rather than strategic self-presentation. We weight each step roughly by the share of real job-time it represents, with deliberate exceptions. The written application gets only 15%, because we don't think it reflects candidate quality well; it is mainly a gate to see who clears the bar. The breakdown: Each criterion has a predefined 0 to 10 scale with explicit anchors what a 6 or an 8 looks like . The first time we scored without anchors, ratings drifted; anchoring made them repeatable even a month apart. We noise-tested this. About a month after completing the application stage, we re-scored five applications without looking at our original scores first. In several cases after this noise test, some applicants were sent in 2 different channels, we did not recognise the applicants by evaluating their answers that we are checking first until we re-read the CV, which forced a genuinely fresh evaluation. When we then compared the two sets of scores, none of them changed significantly. We consider this a reasonable signal of consistency. Defining anchors before you start is what made this possible. All candidates across all four roles were scored in the same weighted factor model. The exercises and their anchors differed by role, but we designed them to produce comparable scores, so that a 7 on the philanthropy task and a 7 on the growth hacker task reflected a similar level of performance relative to what that role required. This made it possible to compare candidates across roles and to apply a single passing bar across the whole round. For each question and task we first run the top LLMs and read their answers, to find where they underperform for example, they often claim all donations are equally good, which is not an effective-giving stance . We then set the WFM so that a typical ChatGPT answer scores about 5 out of 10. We judge answer quality independently of AI use except when keeping obviously wrong formulation : we want people who can produce high-quality, Mieux-Donner-style work, with or without AI. Some candidates used AI well feeding it our site, using precise prompts to get past generic answers, which we are fine with. We built 16 tasks across the four roles under one constraint: the best exercises should be hard to do well with AI alone. . What we found actually works: The honest conclusion. It will keep getting harder to find tasks that a talented human can do in 30 minutes but AI cannot. Treat your task list as a living document, not a one-time build, and re-benchmark against the newest models the day before you launch, between rounds, some questions we relied on had been newly mastered by the latest models I haven't tested Fable 5 . We write parts of every response email, positive and negative, for every stage, in advance, so processing is fast. Applications are handled every day or two; each candidate gets half-personalised rejection with specific feedback on each exercise or an invitation to the next step. For scheduling we ask candidates for their availability and plan the email with the next step; later steps use booking links the director's calendar for the coworking session, which needs a 30-minute prep; the co-founders' calendar for the interview . We send a single chase to non-responders. How we communicate rejections. Up to and including step 3 the coworking session , rejections are sent by email with per-exercise feedback. Candidates who reach the final interview are called directly: we explain the specific reasons for the decision. One person ran the entire process from start to finish. The advantage is no noise in the evaluation: one consistent set of standards throughout. The disadvantage is obvious: the process is entirely dependent on that person's availability. In our case that person was also the only member of the organisation, which is not ideal. We received a large amount of positive responses to our rejection emails: roughly half of them got a reply with only 2 negative . Several candidates proposed a follow-up call about effective giving. More than 20 people connected with Romain on LinkedIn after being refused, and several commented positively on his LinkedIn posts in the weeks that followed. Because we use clear metrics to evaluate each exercise, we already know exactly what a candidate did well and where they fell short. Writing a personalised rejection takes up to one minute more per applicant than sending a generic email, and perhaps 2 minutes more than no reply at all. Across the full round, that is roughly four to six extra hours total compared to ghosting. We mainly do that for deontological reasons but I also believe that it makes me feel better about the process and that it has positive results and lets people have a better image of Mieux Donner and effective giving. We cap the expensive late stages: we wanted roughly 10 final interviews and 30 coworking sessions. We set each stage's passing score once a batch of candidates had reached it, aiming for about a 30% acceptance rate per stage, looser at step 1, tighter at step 2. We did make exceptions. A volunteer whose qualities we already knew from prior collaboration passed a step despite being 0.1 points below the threshold. A candidate who had produced exceptionally high-quality content as part of their application was passed through despite being 0.4 points short. In a third case, a candidate whose task scores were almost all 9/10 did not formally pass because their personality and problem-solving test scores were close to zero, leaving them 0.1 or 0.2 points below the bar overall. We let these exceptions through, a decision we are still not certain was right. We told candidates explicitly where they stood. We stated what percentage of applicants they represented: "top 10% after step 2". At the final interview invitation, we told candidates directly that we estimated their odds of receiving an offer at 20–30%. Our goal was to keep strong candidates engaged through a long process, make them feel genuinely valued, and be transparent in a way that respected their time and helped them calibrate their own decisions. The biggest failure in our process was speed. Romain was simultaneously running the organisation, managing all communications about the offer, and reviewing applications more than 200 applications in the final week . At some points in the process we took close to three weeks to reply, when we had told candidates to expect one to two weeks between steps. This was by far the most negative part of the round, and we are not proud of it. Part of this was caused by an unexpected leave that left Romain alone to run the organisation. But part of it was structural, and avoidable. The lesson: do not open a round before everything is ready. By "everything", we mean having already written the outreach list, the messages to those people, the newsletters to contact, the landing pages, and the announcement for every platform you plan to publish on. With all of that prepared in advance, you can launch the full communications wave within a few days and keep the offer open for only ten to fourteen days. After completing the process, we sent a final email to all applicants summarising the outcome and sharing resources they might find useful, regardless of how far they progressed. For candidates who reached the top 10% of the process, we sent a more personalised message with specific opportunities that might be a good fit for their profile. Our strongest candidates who were not hired were added to the Top Candidates High Impact Professionals directory, with their consent. Stage | Enter | Pass | Rate | | Written application | 424 | 189 | 45% | | Long task the real filter | 163 | 27 | 17% | | Coworking session | 27 | 9 | 33% | | Final interview | 9 | 4 | 44% | Finalists vs. offers, and who we hired. We treat “success” as passing the bar to be hired: four candidates passed it our finalists . We had planned to hire two, but because four cleared the bar, and with some evolution in current team members' responsibilities, we hired three, including one part-time. So: four finalists, three hires. The long-task stage is where the funnel really narrows and where most of the signal is. We also lost a meaningful number of qualified candidates to drop-off, most of them before the long task; those who withdrew did so mainly because they found another job during the process. The written application barely predicted who would pass the practical exercises. Strong application scores and strong task performance were almost uncorrelated. The real work sample, a substantial and role-representative task, was the true filter. We barely evaluate biographical data. We look at it only at the application stage, and mostly as a coarse signal: whether someone was admitted to a competitive university or role. We are not sure this is the right approach and would like a better way to use this signal. We sourced through several channels and tracked how far each one's candidates went, not just who we hired depth = furthest stage reached . Small numbers beyond the coworking stage, so read this as directional. Channel | n | Reached long task | Reached coworking | Reached interview | Finalists | | Jobs that make sense niche FR board | 60 | 51.7% | 6.7% | 3.3% | 2 | | High Impact Directory | 51 | 47.1% | 3.9% | 0.0% | 0 | | LinkedIn job offer | 76 | 15.8% | 1.3% | 1.3% | 0 | | Outreach / referrals non identified | 237 | 45.6% | 8.4% | 2.5% | 2 | Takeaway. Favour niche, values-aligned boards and referrals. A mass public posting brought volume but no finalists, and even the most EA-aware pool performed only at the average. The people who do well on your real tasks are not reliably the ones who look best on paper, nor the ones who arrive most aligned with your principles. For the 2nd round we initially planned to give ⅔ of the weight to the task and ⅓ to the personality traits and problem-solving tests. But the variance of the tests was way higher than of the tasks, we need to change by 75%-25% to have the tests responsible for 33% of the variance. When we analysed our questions, the open, judgement-based ones spread candidates widely and tracked the final decision. Rubber-stamp questions where everyone answers well added little but are probably good safeguards to keep. We are pretty happy about the questions we asked and feel confident in what they revealed about who the best applicants were. About one in six applications showed signs of unedited AI generation and less than 2% were purely AI. Because we benchmark every question against the top models, simply using AI without adding value did not clear the bar. We apply a tiered penalty rather than an outright ban: a light, isolated AI-sounding phrase is a minor and recoverable deduction; a structure clearly copied from ChatGPT is a heavier penalty; a raw copy-paste with no added substance is practically disqualifying. | Score | Meaning | n | Passes the first screening | | −10 | Direct copy-paste, with no added value | 7 | 0 | | −5 | Structure very similar to the original | 16 | 2 | | −2 | Contains at least one unnatural or awkward expression | 43 | 16 | Across the whole round we spent about 160 hours: roughly 20h designing the process and tasks, 40h on outreach and communication, and 100h reviewing candidates. Going from one role to four roughly doubled the design effort rather than quadrupling it, thanks to shared exercises and a benchmark-and-anchors system. Step | Candidates | Time per candidate | | 1. Written application | 424 | ~5 min | | 2. Practical exercises | ~175 | ~7 min | | 3. Coworking session | 27 | ~70 min | | 4. Final interview | 9 | ~70 min x2 two interviewers | You are welcome to adapt this process for your own organisation. If it is unclear where an additional hire would create the most value, we would tentatively recommend opening broader roles and optimising for the quality of the person over the precision of the role. And whatever you do, get a realistic work sample in front of candidates early: in our experience it is the best predictor you have. Templates questions, tasks, response emails, and the weighted-factor model may be requested only by individuals in hiring or management roles using an email address from the hiring organisation, and must be kept strictly confidential: Template-Process-Hiring Mieux-Donner https://drive.google.com/drive/folders/1rQfGaDzhuc-THtYZiOSwW9SP29LrnRLh?usp=drive link .