# Python interview questions: what each one actually predicts on the job (2026)

> Source: <https://dev.to/fourleaf/python-interview-questions-what-each-one-actually-predicts-on-the-job-2026-27nc>
> Published: 2026-06-16 23:33:11+00:00

Canonical: this is a cross-post. The original lives at

[https://four-leaf.ai/blog/python-interview-questions]

You can find a hundred Python interview question lists in about ten seconds. Most of them are the same: here's the question, here's the answer, memorize it, good luck. Final Round AI's popular roundup runs to [95 questions](https://www.finalroundai.com/blog/python-interview-questions) in exactly that shape. Those lists optimize for the wrong thing.

I've sat on the interviewing side of enough Python screens to know what actually moves a decision, and it's almost never whether the candidate could recite the definition of a decorator. It's whether they could read a stack trace without flinching, whether they reached for a list comprehension or a four-line loop, whether they knew when a Pandas operation was about to blow up memory. Those signals don't show up on a flashcard.

This guide does something different. For every question, you get a short version of the strong answer, then the part that matters: what the question actually predicts about you on the job, and a trivia tax flag when the question rewards memorization more than skill. Use it to spend your prep hours where they count.

Python is everywhere in interviews because it's everywhere in work. In the [2025 Stack Overflow Developer Survey](https://survey.stackoverflow.co/2025/technology), 57.9 percent of developers reported using Python, up seven points in a single year, the largest jump of any major language. It sits behind only JavaScript, HTML/CSS, and SQL. If you're interviewing for software engineering, data science, ML, or analytics, a Python screen is close to guaranteed.

That ubiquity is also why generic question lists fail you. When a topic is this broad, a list of 95 questions has to stay shallow to cover the surface. You end up with fifteen variations on "what's the difference between a list and a tuple" and nothing on the questions that actually separate candidates: reading unfamiliar code, debugging under pressure, choosing the right data structure when it matters.

There's a second problem. Interviewers know these lists exist, and they've adjusted. In [interviewing.io's 2025 survey](https://interviewing.io/blog/how-is-ai-changing-interview-processes-not-much-and-a-whole-lot) of 67 interviewers (52 of them at FAANG companies), 81 percent suspected candidates of using AI to cheat and 75 percent believed AI assistance was letting weaker candidates pass interviews they'd otherwise fail. The response has been more follow-up questions, more "walk me through why you did that," more probing of whether you understand the code on the screen. A memorized answer survives the first question and falls apart on the second.

The goal is to study the questions that build transferable reasoning and to spot the pure trivia, so you can give the trivia five minutes instead of fifty.

Each question below carries two notes.

**Signal** is what a strong answer tells an interviewer about how you'd perform on the job. Data wrangling speed, debugging instinct, idiomatic style, library fluency, systems thinking. This is the reason the question gets asked, even when the interviewer couldn't articulate it.

**Trivia tax** is a flag for when a question mostly rewards having seen it before. These questions still get asked, so you should know the answers, but memorizing them teaches you nothing you'd use writing real code. Learn them fast and move on.

To be clear about method: the signal and trivia-tax calls here are editorial judgment from time spent on the interviewing side, not the output of a formal study. Where I cite numbers, they come from named public sources, linked inline. The example questions are drawn from real screens and from [Four-Leaf's](https://four-leaf.ai) own practice question bank.

This is where interviewers check whether you write Python or whether you write some other language using Python syntax. The questions look basic. The signal is in how idiomatic your answer is.

Lists are mutable, tuples are immutable and hashable, so tuples can be dictionary keys and set members while lists can't. The "when" matters more than the "what": tuples signal a fixed record (a coordinate, a row), lists signal a growing collection.

**Signal:** whether you think about mutability as a design choice, not just a property.

**Trivia tax:** partial. The definition is rote, but the "when would you use each" turns it into a real question.

It builds a list in a single expression like `[x * 2 for x in nums if x > 0]`

. The strong answer includes the "not": skip comprehensions when the logic needs multiple statements or side effects, and skip building a full list when a generator expression would stream the values lazily.

**Signal:** idiomatic style plus judgment about memory. A candidate who knows comprehensions but never knows when to stop will write unreadable nested ones.

`*args`

and `**kwargs`

.
`*args`

collects extra positional arguments into a tuple, `**kwargs`

collects extra keyword arguments into a dict. You use them to write functions that forward arguments or accept a flexible signature.

**Signal:** low on its own. It matters with the follow-up: write a decorator that works on any function, which forces real use of both.

**Trivia tax:** yes, in isolation. Know it cold, spend no real time on it.

A decorator is a function that takes a function and returns a new function, used to wrap behavior like timing, logging, or caching without touching the original. A clean answer uses `functools.wraps`

to preserve the wrapped function's name and docstring.

``` python
import functools
import time

def timed(fn):
    @functools.wraps(fn)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = fn(*args, **kwargs)
        print(f"{fn.__name__} took {time.perf_counter() - start:.4f}s")
        return result
    return wrapper
```

**Signal:** high. Decorators sit at the intersection of closures, first-class functions, and `*args`

/`**kwargs`

. A candidate who writes one cleanly understands a lot of Python at once. The `functools.wraps`

detail separates people who've shipped decorators from people who've only read about them.

`is`

and `==`

?
`==`

compares values, `is`

compares identity (whether two names point to the same object). The trap is small-integer and string interning, where `a is b`

can be `True`

for `256`

but `False`

for `257`

.

**Signal:** whether you understand that variables are references to objects. The interning trivia is a distraction.

**Trivia tax:** the interning edge case is pure trivia. The reference-model understanding underneath it is not.

The default is evaluated once, at function definition, so `def f(x, acc=[])`

shares one list across all calls. The fix is `acc=None`

then `acc = acc or []`

inside the body.

**Signal:** high, and underrated. This is a real bug that ships to production. A candidate who's hit it has written enough Python to have been burned, which is exactly the experience interviewers are probing for.

A generator produces values lazily with `yield`

, holding only one value in memory at a time instead of building the whole sequence. You use them to process large or infinite streams without loading everything at once.

**Signal:** memory awareness and a grasp of laziness. Candidates who reach for generators on a "process this 10GB file" question are showing real instinct.

The Global Interpreter Lock means only one thread executes Python bytecode at a time, so threads don't speed up CPU-bound work. For I/O-bound work threads still help (the GIL releases during I/O waits), and for CPU-bound parallelism you use `multiprocessing`

or native extensions.

**Signal:** high for backend roles. The follow-up that matters is "so when would you use threads at all," which separates people who memorized "GIL bad" from people who understand the I/O-bound case.

`@staticmethod`

, `@classmethod`

, and an instance method?
Instance methods take `self`

, class methods take `cls`

and can construct or configure the class, static methods take neither and are just namespaced functions. Class methods are the idiomatic way to write alternative constructors.

**Signal:** moderate. The alternative-constructor use of `classmethod`

is the part that shows real OOP fluency.

**Trivia tax:** partial. The definitions are rote, the "when would you use a classmethod" is not.

`if __name__ == "__main__":`

do?
It guards code that should run only when the file is executed directly, not when it's imported as a module. Without it, your script's side effects fire on import.

**Signal:** low. It's a useful idiom but knowing it predicts almost nothing about engineering ability.

**Trivia tax:** yes. One of the most over-asked Python questions. Know the one-sentence answer, move on.

A shallow copy duplicates the outer object but shares references to nested objects, so mutating a nested list shows up in both copies. `copy.deepcopy`

recursively duplicates everything.

**Signal:** moderate. Connects to the reference model and to a real bug class. Candidates who've debugged a shared-nested-object bug answer this with conviction.

Here the language matters less than the reasoning, but Python-specific tools (dicts, sets, `collections`

, `heapq`

, slicing) are exactly what interviewers want to see you reach for. Solving with the right standard-library tool is itself a signal.

Recurse: if an item is a list, recurse into it, otherwise add it. Use `isinstance(item, list)`

rather than `type(item) == list`

so subclasses work too.

**Signal:** clean recursion and the `isinstance`

detail. The detail is small but it's the kind of correctness instinct that shows up everywhere in real code.

Count character frequencies; a palindrome allows at most one character with an odd count. `collections.Counter`

plus a single pass over the counts does it.

**Signal:** whether you reach for `Counter`

instead of building a frequency dict by hand. Reinventing `Counter`

isn't a sin, but it tells the interviewer you don't know the standard library well.

Sliding window with a "missing" counter: expand the right edge until the window is valid, then shrink from the left while tracking the best window seen. This is a hard question and interviewers know it.

**Signal:** high. Sliding window is one of the highest-value patterns to internalize because it transfers across dozens of problems. Getting the shrink condition right under pressure is a strong signal.

In Python the shortcut is `collections.OrderedDict`

with `move_to_end`

on access and `popitem(last=False)`

on eviction. The deeper answer is a hash map plus a doubly linked list, which is what you'd write if asked to do it without the standard library.

**Signal:** high, and it's a great question precisely because it has two valid altitudes. Knowing the `OrderedDict`

trick shows Python fluency; being able to drop to the linked-list version shows you understand why it's O(1).

Kahn's algorithm: compute in-degrees, start from the zero-in-degree nodes, and reduce neighbors' in-degrees as you remove nodes. If you can't process every node, there's a cycle.

**Signal:** graph reasoning and the cycle-detection insight. Comes up more than people expect because dependency ordering is a real problem (build systems, task schedulers).

Scan the grid; on each unvisited land cell, increment the count and flood-fill (DFS or BFS) to mark the connected region. Mutating visited cells in place avoids a separate visited structure.

**Signal:** grid traversal and DFS, both extremely common. The in-place-visited trick is a small efficiency signal.

`list(dict.fromkeys(items))`

. Dicts preserve insertion order since Python 3.7, so this is both correct and idiomatic. The naive answer rebuilds with a seen-set, which works but is more code.

**Signal:** idiomatic Python. The `dict.fromkeys`

answer reliably surprises interviewers in a good way.

**Trivia tax:** partial. Knowing the one-liner is a bit of a party trick, but the underlying insight (dicts are ordered, sets aren't) is real.

List append and index are O(1), list membership (`x in list`

) is O(n), dict and set lookup are O(1) average. The trap is `x in some_list`

inside a loop, which quietly makes an algorithm O(n^2).

**Signal:** high. This is the single most practically useful complexity knowledge, because the list-membership trap shows up in real code constantly. Candidates who instinctively switch a list to a set for membership checks are showing exactly the right reflex.

Maintain a min-heap of size k with `heapq`

: push each number, and pop the smallest whenever the heap exceeds k. The top of the heap is your kth largest.

**Signal:** whether you know `heapq`

exists and when a heap beats sorting. Sorting the whole stream is O(n log n) per query; the heap is O(n log k), which matters at scale.

For data and backend roles, library fluency often matters more than raw algorithms. These questions test whether you've used the tools, not just read about them.

`loc`

and `iloc`

?
`loc`

selects by label, `iloc`

selects by integer position. The bug they're probing for is chained indexing like `df[df.a > 0]['b'] = 1`

, which can silently fail; the fix is a single `loc`

call.

**Signal:** real Pandas mileage. Anyone who's used Pandas seriously has been bitten by the `SettingWithCopyWarning`

, and mentioning it unprompted is a strong tell.

The operations run in compiled C over contiguous memory, avoiding Python's per-element interpreter overhead and object boxing. A loop over a DataFrame row by row can be hundreds of times slower than the vectorized equivalent.

**Signal:** high for data roles. The follow-up is usually "so how would you avoid iterating this DataFrame," and the strong answer reaches for vectorization, with `.apply`

only as a last resort.

`apply`

versus a vectorized operation in Pandas?
Prefer vectorized operations whenever they exist; `apply`

runs a Python function per row or per group and loses the C-speed advantage. Reach for `apply`

only when the logic genuinely can't be vectorized.

**Signal:** whether you treat `apply`

as a convenience or a performance cliff. Candidates who reach for `apply`

first are usually newer to Pandas.

`dropna`

removes it, `fillna`

replaces it, and the real answer is "it depends on why it's missing." The strong candidate asks whether the data is missing at random before choosing, because filling with a mean can distort a model.

**Signal:** high for data science. This is where statistical thinking shows through a Pandas question. The mechanical answer is easy; the judgment is the signal.

NumPy stretches arrays of compatible shapes so element-wise operations work without copying, comparing dimensions from the right and treating size-1 dimensions as stretchable. Adding a shape `(3,1)`

array to a shape `(1,4)`

array yields `(3,4)`

.

**Signal:** real NumPy fluency. Broadcasting is the thing people either understand or fake, and a clean shape example is hard to fake.

`async`

/`await`

and when it helps.
`async`

/`await`

lets a single thread handle many I/O-bound tasks by suspending one while it waits and running another. It helps for network calls, database queries, and file I/O; it does nothing for CPU-bound work.

**Signal:** high for backend roles. The discriminating follow-up is "would async speed up a heavy computation," and the right answer is no, because it doesn't add parallelism.

`requests`

library do, and how do you handle a failed request?
It's the standard HTTP client. The mature answer covers `response.raise_for_status()`

, timeouts (always set one), and retry logic with backoff for transient failures.

**Signal:** production instinct. Junior answers stop at `requests.get(url).json()`

. Senior answers mention the timeout unprompted, because they've had a request hang forever in production.

Write a function named `test_*`

with a plain `assert`

. Use fixtures for shared setup, `parametrize`

to run one test over many inputs, and `monkeypatch`

or mocking to isolate external calls.

**Signal:** whether testing is a habit or an afterthought. Mentioning `parametrize`

and fixtures unprompted signals someone who actually writes tests, not someone who's heard tests are good.

An object that defines `__enter__`

and `__exit__`

, used with `with`

to guarantee cleanup (closing files, releasing locks) even if an exception fires. You can also write one with `contextlib.contextmanager`

and a generator.

**Signal:** moderate to high. Knowing `with open(...)`

is table stakes; being able to write your own context manager shows real depth.

Iterate over the file object line by line (`for line in f`

), which streams rather than loading everything, or read in fixed-size chunks. For structured data, Pandas `read_csv`

with `chunksize`

gives you an iterator of DataFrames.

**Signal:** high. Memory-aware file handling is a real-world skill that pure algorithm questions miss entirely.

This is the section the competitor lists skip, and it's the one that predicts the job best. On the job you read and fix far more code than you write from scratch. Good interviewers know it, so they show you broken code and watch how you reason.

The strong move is to profile before guessing: `cProfile`

or even a few `time.perf_counter()`

calls to find the actual hot spot. The most common real culprit is an O(n) membership test (`x in list`

) inside a loop, fixable by switching to a set.

**Signal:** very high. Profiling before optimizing is the clearest separator between engineers who've worked on real performance problems and those who guess. Candidates who immediately start rewriting without measuring are showing you how they'd behave on the job.

`KeyError`

intermittently. How do you debug it?
Reproduce it, read the traceback to the exact line, then reason about why the key is sometimes absent (a race, a missing default, an assumption about input). Tools: `dict.get`

with a default, `collections.defaultdict`

, or a guard, depending on the cause.

**Signal:** high. Reading a traceback calmly and working from the bottom line up is a learnable skill that many candidates visibly lack. Watching someone debug is more informative than watching them code.

``` python
def add_item(item, items=[]):
    items.append(item)
    return items
```

The mutable default argument is shared across calls, so the list accumulates across every call that doesn't pass its own list. Fix with `items=None`

and `items = items if items is not None else []`

.

**Signal:** high. This is the mutable-default bug in disguise, and recognizing it on sight tells the interviewer you've been bitten before, which means real experience.

```
result = [x for row in matrix for x in row if x > 0]
```

It flattens a 2D matrix and keeps positive values. The two `for`

clauses read left to right like nested loops, which trips up people who only ever write single-level comprehensions.

**Signal:** code-reading fluency. Being able to parse dense Python you didn't write is a daily-work skill that whiteboard questions never touch.

Order-dependence between tests, shared mutable state, hardcoded paths, timezone or locale assumptions, and unpinned dependencies. The meta-signal is whether the candidate has a systematic checklist or just shrugs.

**Signal:** high for anyone past junior. Flaky-test debugging is a real and frustrating part of the job, and having a mental checklist is exactly the experience interviewers want.

Interviewers increasingly hand you working code and ask you to trace it, specifically because tracing is hard to fake with AI. The strong answer narrates state changes step by step and flags any line that would surprise a reader.

**Signal:** high, and rising. Given that 81 percent of interviewers in [interviewing.io's survey](https://interviewing.io/blog/how-is-ai-changing-interview-processes-not-much-and-a-whole-lot) suspect AI-assisted cheating, expect more code-reading and fewer blank-page prompts. The skill being tested is genuine comprehension.

For data science and ML roles, Python is the medium and the real questions are about statistics, modeling, and judgment. The interviewer wants to know you can turn a vague problem into clean code and defensible reasoning.

High bias means the model is too simple and underfits; high variance means it's too complex and overfits to noise. The tradeoff is choosing model complexity so test error is minimized, often with regularization to pull a complex model back.

**Signal:** foundational. Nearly every DS loop asks some version of this. A strong answer connects it to a concrete decision (why you'd add regularization), not just the textbook definition.

**Trivia tax:** partial. The definition is rote, but the "how would you act on it" is real.

L1 (Lasso) adds the absolute value of coefficients to the loss, which drives some to exactly zero and performs feature selection. L2 (Ridge) adds squared coefficients, which shrinks all of them smoothly without zeroing them out.

**Signal:** whether you understand the geometric reason L1 produces sparsity, not just that it does. The follow-up "why does L1 zero things out and L2 doesn't" separates memorizers from understanders.

Resampling (oversampling the minority, undersampling the majority, or SMOTE), class weights in the model, and crucially the right metric: accuracy is useless on a 99/1 split, so use precision, recall, F1, or AUC. The best answer starts with "what's the business cost of each error type."

**Signal:** high. This question rewards judgment over recipe. Candidates who jump straight to SMOTE without asking about the cost of false negatives are missing the point.

Precision is the fraction of positive predictions that are correct; recall is the fraction of actual positives you caught. Optimize precision when false positives are costly (spam filtering), recall when false negatives are costly (cancer screening).

**Signal:** high. The concrete examples are what matter. A candidate who can map precision and recall onto a real decision understands the metrics; one who only recites the formulas usually doesn't.

Gradient descent walks the parameters downhill along the loss gradient, scaled by a learning rate. Batch uses the whole dataset per step (stable, slow), stochastic uses one example (noisy, fast), mini-batch splits the difference and is the standard in practice.

**Signal:** whether you understand the speed-versus-stability tradeoff and the role of the learning rate. The learning-rate sensitivity is the part that shows real training experience.

It's an ensemble of decision trees trained on bootstrapped samples with random feature subsets, averaging their predictions to reduce variance. Choose it when you want a strong baseline with little tuning and some feature-importance insight, on tabular data.

**Signal:** moderate. Knowing the mechanism is table stakes; the "when would you choose it over gradient boosting" follow-up is where real modeling judgment shows.

A forward pass computes the prediction and loss; the backward pass uses the chain rule to compute how much each weight contributed to the loss, and the weights update in the direction that reduces it. It's the chain rule applied systematically across layers.

**Signal:** high for ML roles. The chain-rule framing is the discriminator. Candidates who can explain it without hand-waving understand what their framework is doing under `loss.backward()`

.

The strong answer is a process, not an algorithm: understand the target and the business question, explore and clean the data, establish a simple baseline, then iterate with better features and models while validating honestly. Mentioning a baseline first is the senior tell.

**Signal:** very high, and the most realistic question in any DS loop. It maps directly to the actual job. Candidates who jump to "I'd train XGBoost" without mentioning a baseline or validation are showing inexperience.

Define the metric and minimum detectable effect, compute the sample size for adequate power before you start, randomize properly, then run until you hit that sample size rather than peeking and stopping at the first significant result. Peeking inflates false positives.

**Signal:** high for product DS roles. The "don't peek" insight is the one that separates people who've actually run experiments from those who've only read about p-values.

They map words to dense vectors where semantic similarity becomes geometric closeness, so "king" and "queen" sit near each other and analogies fall out of vector arithmetic. They let models transfer learned meaning instead of treating words as opaque IDs.

**Signal:** moderate for NLP-flavored roles. With LLMs now dominant, the more current follow-up is how embeddings relate to what a transformer learns, which tests whether you've kept up.

You don't have time for all of this. Spend it where the signal density is highest.

`loc`

/`iloc`

and the `SettingWithCopyWarning`

, vectorization versus `apply`

, missing-data judgment, and precision and recall mapped to a real decision.`if __name__ == "__main__"`

, reversing a string, reciting `*args`

and `**kwargs`

. Know the one-line answers, spend nothing more.Reading answers builds recognition. It does not build the ability to produce a clean answer while someone watches and the clock runs. Those are different skills, and the gap between knowing your answer and delivering it under pressure is where good candidates lose offers.

The fix is to practice out loud, under something like real conditions. Explain your reasoning as you go, because interviewers score your thinking as much as your code, and because narrating your approach is exactly what the rise in AI-cheating suspicion has made interviewers want to hear. Solve a problem you haven't seen, talk through the tradeoffs, and get feedback on where your explanation went fuzzy.

That's the gap [Four-Leaf's voice mock interviews](https://four-leaf.ai) are built to close. You practice answering real questions out loud, get scored on substance and delivery, and drill the spots where you freeze, so the answer comes out clean when it counts. You can generate fresh Python questions by role and difficulty and run a full mock before your real one. The questions in this guide are a map of what gets tested. Practicing them out loud is how you turn the map into an offer.