Interviewing in the Age of AI

wpnews.pro

Given the speed at which AI models and tooling evolve, will engineers still write code – let alone review – in six months? And, if such a core skill disappears, should companies evolve their interviews?

While most companies have chosen the status quo (including the very companies leading this revolution 11. According to Anthropic's own hiring guidelines, the take-home should be completed "without Claude unless we indicate otherwise".), some are embracing the new world and creating interviews where AI usage is allowed, encouraged, or even required. AI proficiency sometimes becomes the main subject of the interview.

In this article, I want to convince you that you should generally keep AI away from your interviews, and I will give you some concrete ways to adapt interviews to AI.

Two dimensions for good interviews: signal quality and cost to company #

First dimension: signal quality. For a given set of skills, the best interview questions help you identify strong candidates, ignoring noise (e.g., aspects that are not critical for the role or easily teachable).

There are some sub-dimensions impacting signal quality:

Invulnerability to interview-specific preparation: if the interview's performance is primarily driven by the amount/effort of preparation that goes into it, you risk getting signals only about that trait.Realism: while interviews should resemble day-to-day activities, it is not an end in itself. Case in point: the infamous "algorithm & data structure" interview has remarkably resisted throughout the years - despite being a skill that is rarely useddirectlyon the job.Equality: some candidates are better prepared for your interviews, because they have prior domain expertise, they paid for mentoring, they have more time, they found your interview questions online, or they know someone who went through the process recently. In an ideal world, the playing field is level for all candidates.Difficulty: good interviews are usually difficult enough that the majority of candidates fail. Difficulty is achieved through multiple means. The best approach remains broad and ambiguous problems requiring multiple insights to solve.

Second dimension: cost to company. Interview questions require a significant time investment:

Designing a first draft and getting approval to experiment with it
Creating a scorecard across roles, levels, etc.
Testing it on some first internal and external candidates
Documenting and training interviewers

This investment has to be sustained across time, as questions and scorecards are continuously calibrated.

Cost to company has some sub-dimensions too:

Difficulty: creating questions is one thing, creating a difficult enough question is an even bigger challenge. Two irrelevant extremes would be an interview so easy everyone passes, or one so hard nobody does. Both extremes waste everyone's (the company's, the interviewer's, the candidate's) time.Appeal to candidate: interview processes that require too much time from the candidate risk turning away good engineers and hurt conversion rates. The same goes with boring interview questions (especially for take-home). Questions say something about your engineering culture - bad questions can lower your chances to close.

Those two dimensions are not fully independent. Difficulty, for instance, impacts both: difficult interviews let strong candidates shine, but might result in false negatives.

Interviews do not have to be perfect. There will always be false negatives and false positives. There isn't much you can do to identify false negatives. Having a good onboarding process, together with clear first semester milestones, ensures that you quickly manage out false positives.

A typology of interviews #

Take-home interviews

Take-home: the candidate is asked to submit a solution to (1) an ambiguous problem (e.g., some product specifications), (2) complying with a few technical constraints (e.g., a shortlist of programming languages).

Take-home challenges are often followed by a review interview during which the candidate presents their work and is asked to make some modifications on the spot.

Signal quality: high (before AI)- They provide very broad signals (e.g., design, coding, attention to details, testing).

Candidates having spent six hours or more on an exercise demonstrate motivation.

Cost to company: medium- Their assessment can be automated.

Since the artifact (usually, code) can be reviewed asynchronously, they're easier to coordinate and calibrate.
They might turn away candidates.

As we'll see, they're very vulnerable to AI and motivated individuals.

Live exercise interviews

Live exercise (e.g., algorithm & datastructure, live coding, system design, postmortem review, usually over one hour). The candidate is provided with a problem (e.g., "design Netflix's architecture", "write a rate-limiter") and solves it on the spot, in front of the interviewers.

Signal quality: medium- They're quite objective when designed and orchestrated properly.

The signals are more focused, though (they're usually focused on one topic)

Cost to company: medium- You need a lot of different questions to be less vulnerable to candidate preparation.

To reduce costs, some companies use automated services 22. Something I am quite opposed to. But that's for another article..

Presentation interviews

Presentation (e.g., describe a project you drove, diagram an architecture, "tell us about a time when...") puts the candidate fully in control of selecting the problem and the answer.

Signal quality: low - the interview has more failure modes than other interview types:- The candidate has never worked on an interesting problem (e.g., they're too junior).

The candidate chooses an uninteresting problem.
The candidate overstate their impact or contribution

"The most common hiring mistake is hiring good interviewers.",3How to Hire, Henry Ward. - The candidate under-prepares the presentation.

The candidate is a strong communicator, but not a strong doer.
The interviewer does not assess correctly because they lack domain knowledge.

Cost to company: low- There isn't much to prepare from a calibration standpoint.

There are many strategies to prevent and mitigate lower signal quality, in particular, asking the candidate to reflect on their solution (e.g., "what would you do differently") or asking hypothetical questions (e.g., "what if we change requirement X?"). In that case, the question becomes closer to an uncalibrated live exercise.

This interview requires a lot more effort and expertise from the interviewer.

Not an interview type: "come work with us"

Actual work: come work for us for a week (paid). Used by companies such as Linear.

Signal quality: high** Cost to company**: high

Most companies mix interview types

Most companies use a mix of those interview types. Live exercises dominate, though.

Interview type	Signal quality	Cost to company
Live exercise	Medium	Medium
Take-home	High	Medium
Presentation	Low	Low
Actual work	High	High

Unrelated to AI: you need to assume your questions will leak #

It's only a matter of time before your questions leak. Websites such as Glassdoor list all your interview secrets. Candidates go through your interview process just so that they can sell them. You could bury your head in the sand and ignore this, but then your interview signals will get weaker over time, and the main driver for interview performance will become "did you bother searching for our interview process".

There are multiple tactics to address this.

Tactic: Control the preparation. Level the playing field by either including one presentation in your mix, or by providing precise interview guidance (e.g., "system design will focus on databases", "algorithms will be about graphs") 44. In

The Hiring Post — Quarrelsome, Thomas Ptacek recommends starting the process with 30-45 minutes of director-level time before any screening begins..

Tactic: Have many different questions for a given interview type and regularly archive old questions. If candidates can't accurately predict the question, they'll have to broaden their preparation, which is exactly what you want. Evidently, this is not free.

Tactic: Make it harder to leak. For example: bring candidate onsite, use whiteboards, have the most vulnerable questions at the end of the process (less candidates, so lower probability to leak).

AI coding is threatening current interview models #

Interview type	Signal	Cost	Vulnerability to preparation/AI
Live exercise	Medium	Medium	High
Take-home	High	Medium	High
Presentation	Low	Low	Low
Actual work	High	High	Low

(1) Take-homes become too easy for candidates, and too costly for companies. In 2026, most submissions are probably AI-generated or at least AI-aided. It is only a matter of time before your currently-resisting challenge is solved by the next model release.

Consequently, most candidates will pass this first step. You'll have to spend a lot of time reviewing those. You could be tempted to have AI review take-home AI-generated submissions, but that would be absurd.

AI coding shifted the cost of those interviews from the interviewee to the interviewers. Taking inspiration from Brandolini's law:

The amount of energy needed to refute bad code is an order of magnitude bigger than that needed to produce it.

(2) If software engineers spend less time crafting code, it seems natural to deprioritize live-coding exercises. We don't ask candidates to write machine code – we use higher level languages. Wouldn't it make sense to adapt the tooling allowed during interview to match what engineers use day-to-day?

(3) Once a question leaks, AI is a powerful coach. It used to be quite time- and resource-intensive to (1) find the interview questions and (2) prepare them. Nowadays, candidates get the most powerful (and cheap!) help there is with AI.

How the classical school evaluation model resisted technology #

Here it is useful to make a parallel with the academic model. Having only studied in France, I will use it as the main example. Most French high school and college exams look the same:

No material (courses, books, etc.) allowed.
No tools authorized (in particular, calculators are very rarely allowed).
Content not known in advance (everything studied so far is fair game).
Content can't be guessed (each exam is different, and used only once).
Problems are broad and ambiguous. For instance, the queen of French literary exams is the dissertation5. "The essay is the most personal and most elaborate form of the philosophy student's work.", Anatole de Monzie,5Instructions ministérielles, 1925, which involves writing a 5-10 pages essay based on a one-sentence subject (e.g., "AI & software interviews"). This format exists since 1830. Scientific exams are roughly the same: three or four ambiguous problems to solve.

Those "live exercises" are complemented with other forms of evaluation (e.g., take-home, multiple-choice knowledge questions, group exercise, presentations) but they're the exception, not the rule.

Re-using our typology: Signal quality: high- The preparation space is very broad and requires sustained effort.

Cost: very high- A new subject (and scoring guidance) has to be designed for each exam.

All candidates go through the same exercise at the same time (totally impractical for company interviews).

What's fascinating about this model is that it hasn't changed that much, even with leapfrog improvements in cognitive tooling ("copy-pasting", Internet, calculator, solvers, etc.). I think the reasons are the same as the ones I will describe below: education should focus on foundational skills, not tools du jour. This approach is consistent with an Aristotelian model focused on judgment (phronesis) rather than memory (mneme).

Why companies should limit AI usage during interview #

A useful distinction: foundational vs. instrumental traits & skills

Foundational traits & skills are competencies, attitudes, habits that are hard or costly to build. For instance:

Raw intellectual capabilities
Deep expertise acquired through years of learnings (distributed system with millions of requests per seconds, moving hundreds of microservices into a monolith - or vice-versa, etc.)
Second-order reasoning
Virtues, such as work ethic, integrity, resiliency

Foundational skills relate to "fundamentals", i.e., internalized knowledge that allows someone to identify, abstract, and solve a problem. Fundamentals provide the ground to fertilize and build more skills. Hiring for fundamentals lets you say "they're smart, they'll figure it out".

Instrumental skills are cheap or fast to grow. For instance:

Achieving an intermediate proficiency level in a programming language
Using a text editor adequately
Searching in documentation
Tweaking an AI prompt

When interviewing, you're often looking for many instrumental skills signals (e.g., many tools mastery) to validate a candidate's foundational trait (e.g., willingness to invest in productivity, structured learning).

Rationale 1: AI proficiency is not a foundational skill

While engineering tooling has consistently improved, interviews have remained largely the same: there are no low-code interview type, system design solutions use mostly basic, non-managed technologies, etc. The reason is intuitive: the best companies are not looking for proficiency in a single tool 66. Especially right now: "the rise of LLMs will further enhance the importance of skilled Expert Generalists, and thus incentivize enterprises to put more effort into identifying, and training people with these skills.", Expert Generalists, martinfowler.com.

That's also why programming languages expertise usually doesn't matter that much during interviews 77. Tip to candidates: interpreted language (Python, TypeScript,

slots

in Python?"). A programming language is just a tool, used to serve a higher purpose (problem-solving). There is evidently some nuance there: deep knowledge of a programming languages and their tradeoff says something about the candidate's expertise.The same goes with AI.

AI usage requires skills – at times quite nuanced, e.g. prompt/context engineering, MCP/skills definition, multi-agent workflow, harness engineering – but those are instrumental skills. They're not that difficult to teach and learn, and ultimately they require the same foundational skills as the one required to write and review code, design scalable and resilient architecture, come up with creative product solutions, etc.

Companies are hiring brains, not hands that mindlessly type instructions to AI agents.

More than ever, interviews should focus on foundational skills that make software engineers great at their job 88. Put in the wrong human hands, AI considerably increases the blast radius of mistakes (the same applies to physical automation machines).. Reviewing and producing are two faces of the same coin. Reviewing code, architecture, analysis, etc. requires

similar skillsas those used to write code, design an architecture, analyze some data.

I don't think we'll stop reviewing code any time soon, because no matter the language, you need a human being to create and verify the business requirements (a sufficiently detailed spec is code).

Source: A very comprehensive and precise spec, CommitStrip

Rationale 2: AI hides foundational traits & skills

One cannot hire a hand; the whole man always comes with it.

– Peter Drucker

Here is it useful to take inspiration from Lewis Mumford's distinction between a tool, which is driven by the human worker, and a machine, which operates according to its own logic, and, as it were, has agency. If your interview questions allow too much AI usage, it will become almost impossible to distinguish the engineer's unique contribution, vs. that of the AI model.

Be wary of engineers who use AI like a "machine" rather than a "tool". AI represents a qualitative leap in productivity, far beyond a more powerful auto-complete. AI can be used to externalize most of the thinking. As models improve, what used to be human being's protected territory (e.g., "taste" 99. The Fitts list (1951) differentiated primarily human functions (detection, perception, judgment, induction, improvisation, longterm memory) and machine functions (speed, power, computation, replication, simultaneous operations, short term memory). One must admit this distinction does not apply well to AI machines.) is under attack. Fitts' list, displayed below, seems outdated.

Jacques Derrida analyzed Plato's description of the written word as a pharmakon to memory, i.e., both a remedy and a poison. AI is both a remedy (automate rote refactors, avoid wasting time learning a library's idiosyncrasies, etc.) and a poison (risk of foundational skills atrophy).

Interviews that over-emphasize AI risk evaluating the model ("machine"), not the human. Make sure the exercise you design emphasizes human reasoning as the main subject of the interview.

Rationale 3: AI is evolving too quickly

As Arthur Mensch (Mistral CEO) stated, every 12 months, AI models gain about a year of software engineering experience. Some people used to joke that AI agents' work was comparable to an intern. You don't hear those jokes anymore.

Most companies just don't have the cycles to continuously create and maintain AI-resistant interview questions that forces the candidate to exercise foundational skills. Trying to continuously come up with interview questions that still resist the best models is a lost battle when (1) models evolve every month, (2) you don't necessarily have access to all models. Anthropic's Designing AI resistant technical evaluations is a good case study of "fighting" the AI, as opposed to "fighting" candidates.

Trying to come up with a more difficult take-home is similar to coming up with more difficult mental calculation while allowing calculators.

Moreover, AI best practices evolve every month. Case in point: prompt engineering is becoming less important as models get better at understanding instructions. Whether the candidate has kept up with today's techniques is not a useful signal.

Fundamentals, on the other hand, haven't changed (definitionally).

Response to objections #

You're not providing any data! (1) A true experiment (e.g., a randomized controlled trial) with statistical significance is an almost impossible task, and I don't think any company would be willing to accept the false negatives it might generate. (2) Most interview design decisions are based on abstract reasoning, not clinical-trial-style experiments.

What about using AI to cheat? (for instance: during interviews). Provided you explicitly mention they're not allowed, using AI tools should result in an immediate pass. As Warren Buffet said:

In looking for people to hire, you look for three qualities: integrity, intelligence, and energy. And if you don't have the first, the other two will kill you. You think about it; it's true. If you hire somebody without [integrity], you really want them to be dumb and lazy.

– Warren Buffett

Should companies use AI to evaluate candidates? They shouldn't. This could be another article, but in summary: (1) It is ethically wrong - you're hiring a human being, a knowledge worker. A machine cannot evaluate everything. (2) AI evaluation are non-deterministic and known to hallucinate - you will need to review the AI's review anyway.

Concrete recommendations for companies #

Do not allow AI usage during most interviews. Do not over-emphasize specific tools either. Focus on foundational skills.

Invest in live exercises. Live exercises don't have to fake, or boring, or low-signal. They don't have be short either. Revisit your data structure & algorithm interview - it remains the most intellectually challenging one! Make sure the exercise requires human effort (cf. Plato, Letter VII, 344b-c).

Have a mix of interview types to cost-effectively get broad signals.

Adapt your take-home. Either explicitly forbid AI usage, or allow it but don't waste time reviewing AI output. Make sure the take-home is systematically followed by a live exercise based on it: ask the candidate to present their work, how they approached tradeoffs, change requirements, ask about scale, etc.

Have at least one interview that evaluate reviewing skills. Those interviews are less costly to come up with, and give very interesting signals. They're also lower stakes for candidates. For instance, ask the candidate to review: an AI plan, a postmortem, an existing codebase (Bug squash: An underrated interview question, Jake Zimmerman), a product requirements document, a tradeoff analysis, a system architecture.

Consider bringing candidates onsite. This is the simplest way to prevent cheating, and it makes it slightly harder to leak interview questions. Obviously this only works for RTO (return to the office) companies.

References #

Thanks to my colleague JM who started the conversation in Interviewing in the AI Era Means Following One Problem End to End

I list a ton of articles on my charlax/engineering-management repository. Here's a selection: How to Hire, Henry WardThe Hiring Post — Quarrelsome, Thomas PtacekYour interviews shouldn’t be spoilable, Rafe Colburn

Here are some resources that inspired this article:

Designing AI resistant technical evaluations, Tristan Hume, AnthropicA.I. Should Elevate Your Thinking, Not Replace It, Koshy John

The opinions stated in this article are my own, and not that of my employer. This article wasn't written by AI.

According to Anthropic's own

hiring guidelines, the take-home should be completed "without Claude unless we indicate otherwise".↩︎Something I am quite opposed to. But that's for another article.

[↩︎](#fnref2)"The most common hiring mistake is hiring good interviewers.",

[How to Hire](https://medium.com/eshares-blog/how-to-hire-34f4ded5f176#.jxkz3wrs3), Henry Ward[↩︎](#fnref3)In

The Hiring Post — Quarrelsome, Thomas Ptacek recommends starting the process with 30-45 minutes of director-level time before any screening begins.↩︎"The essay is the most personal and most elaborate form of the philosophy student's work.", Anatole de Monzie,

Instructions ministérielles, 1925↩︎Especially right now: "the rise of LLMs will further enhance the importance of skilled Expert Generalists, and thus incentivize enterprises to put more effort into identifying, and training people with these skills.",

Expert Generalists, martinfowler.com↩︎Tip to candidates: interpreted language (Python, TypeScript,

~~Perl~~, etc.) being more terse, they are almost always a better choice for live coding interviews.↩︎Put in the wrong human hands, AI considerably increases the blast radius of mistakes (the same applies to physical automation machines).

↩︎The Fitts list (1951) differentiated primarily human functions (detection, perception, judgment, induction, improvisation, longterm memory) and machine functions (speed, power, computation, replication, simultaneous operations, short term memory). One must admit this distinction does not apply well to AI machines.

↩︎

source & further reading

dein.fr — original article