Python interview questions: what each one actually predicts on the job (2026)

wpnews.pro

Canonical: this is a cross-post. The original lives at

[https://four-leaf.ai/blog/python-interview-questions]

You can find a hundred Python interview question lists in about ten seconds. Most of them are the same: here's the question, here's the answer, memorize it, good luck. Final Round AI's popular roundup runs to 95 questions in exactly that shape. Those lists optimize for the wrong thing.

I've sat on the interviewing side of enough Python screens to know what actually moves a decision, and it's almost never whether the candidate could recite the definition of a decorator. It's whether they could read a stack trace without flinching, whether they reached for a list comprehension or a four-line loop, whether they knew when a Pandas operation was about to blow up memory. Those signals don't show up on a flashcard.

This guide does something different. For every question, you get a short version of the strong answer, then the part that matters: what the question actually predicts about you on the job, and a trivia tax flag when the question rewards memorization more than skill. Use it to spend your prep hours where they count.

Python is everywhere in interviews because it's everywhere in work. In the 2025 Stack Overflow Developer Survey, 57.9 percent of developers reported using Python, up seven points in a single year, the largest jump of any major language. It sits behind only JavaScript, HTML/CSS, and SQL. If you're interviewing for software engineering, data science, ML, or analytics, a Python screen is close to guaranteed.

That ubiquity is also why generic question lists fail you. When a topic is this broad, a list of 95 questions has to stay shallow to cover the surface. You end up with fifteen variations on "what's the difference between a list and a tuple" and nothing on the questions that actually separate candidates: reading unfamiliar code, debugging under pressure, choosing the right data structure when it matters.

There's a second problem. Interviewers know these lists exist, and they've adjusted. In interviewing.io's 2025 survey of 67 interviewers (52 of them at FAANG companies), 81 percent suspected candidates of using AI to cheat and 75 percent believed AI assistance was letting weaker candidates pass interviews they'd otherwise fail. The response has been more follow-up questions, more "walk me through why you did that," more probing of whether you understand the code on the screen. A memorized answer survives the first question and falls apart on the second.

The goal is to study the questions that build transferable reasoning and to spot the pure trivia, so you can give the trivia five minutes instead of fifty.

Each question below carries two notes.

Signal is what a strong answer tells an interviewer about how you'd perform on the job. Data wrangling speed, debugging instinct, idiomatic style, library fluency, systems thinking. This is the reason the question gets asked, even when the interviewer couldn't articulate it.

Trivia tax is a flag for when a question mostly rewards having seen it before. These questions still get asked, so you should know the answers, but memorizing them teaches you nothing you'd use writing real code. Learn them fast and move on.

To be clear about method: the signal and trivia-tax calls here are editorial judgment from time spent on the interviewing side, not the output of a formal study. Where I cite numbers, they come from named public sources, linked inline. The example questions are drawn from real screens and from Four-Leaf's own practice question bank.

This is where interviewers check whether you write Python or whether you write some other language using Python syntax. The questions look basic. The signal is in how idiomatic your answer is.

Lists are mutable, tuples are immutable and hashable, so tuples can be dictionary keys and set members while lists can't. The "when" matters more than the "what": tuples signal a fixed record (a coordinate, a row), lists signal a growing collection.

Signal: whether you think about mutability as a design choice, not just a property.

Trivia tax: partial. The definition is rote, but the "when would you use each" turns it into a real question.

It builds a list in a single expression like [x * 2 for x in nums if x > 0]

. The strong answer includes the "not": skip comprehensions when the logic needs multiple statements or side effects, and skip building a full list when a generator expression would stream the values lazily.

Signal: idiomatic style plus judgment about memory. A candidate who knows comprehensions but never knows when to stop will write unreadable nested ones.

*args

and **kwargs

. *args

collects extra positional arguments into a tuple, **kwargs

collects extra keyword arguments into a dict. You use them to write functions that forward arguments or accept a flexible signature.

Signal: low on its own. It matters with the follow-up: write a decorator that works on any function, which forces real use of both.

Trivia tax: yes, in isolation. Know it cold, spend no real time on it.

A decorator is a function that takes a function and returns a new function, used to wrap behavior like timing, logging, or caching without touching the original. A clean answer uses functools.wraps

to preserve the wrapped function's name and docstring.

import functools
import time

def timed(fn):
    @functools.wraps(fn)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = fn(*args, **kwargs)
        print(f"{fn.__name__} took {time.perf_counter() - start:.4f}s")
        return result
    return wrapper

Signal: high. Decorators sit at the intersection of closures, first-class functions, and *args

/**kwargs

. A candidate who writes one cleanly understands a lot of Python at once. The functools.wraps

detail separates people who've shipped decorators from people who've only read about them.

is

and ==

? ==

compares values, is

compares identity (whether two names point to the same object). The trap is small-integer and string interning, where a is b

can be True

for 256

but False

for 257

.

Signal: whether you understand that variables are references to objects. The interning trivia is a distraction.

Trivia tax: the interning edge case is pure trivia. The reference-model understanding underneath it is not.

The default is evaluated once, at function definition, so def f(x, acc=[])

shares one list across all calls. The fix is acc=None

then acc = acc or []

inside the body.

Signal: high, and underrated. This is a real bug that ships to production. A candidate who's hit it has written enough Python to have been burned, which is exactly the experience interviewers are probing for.

A generator produces values lazily with yield

, holding only one value in memory at a time instead of building the whole sequence. You use them to process large or infinite streams without everything at once.

Signal: memory awareness and a grasp of laziness. Candidates who reach for generators on a "process this 10GB file" question are showing real instinct.

The Global Interpreter Lock means only one thread executes Python bytecode at a time, so threads don't speed up CPU-bound work. For I/O-bound work threads still help (the GIL releases during I/O waits), and for CPU-bound parallelism you use multiprocessing

or native extensions.

Signal: high for backend roles. The follow-up that matters is "so when would you use threads at all," which separates people who memorized "GIL bad" from people who understand the I/O-bound case.

@staticmethod

, @classmethod

, and an instance method? Instance methods take self

, class methods take cls

and can construct or configure the class, static methods take neither and are just namespaced functions. Class methods are the idiomatic way to write alternative constructors.

Signal: moderate. The alternative-constructor use of classmethod

is the part that shows real OOP fluency.

Trivia tax: partial. The definitions are rote, the "when would you use a classmethod" is not.

if __name__ == "__main__":

do? It guards code that should run only when the file is executed directly, not when it's imported as a module. Without it, your script's side effects fire on import.

Signal: low. It's a useful idiom but knowing it predicts almost nothing about engineering ability.

Trivia tax: yes. One of the most over-asked Python questions. Know the one-sentence answer, move on.

A shallow copy duplicates the outer object but shares references to nested objects, so mutating a nested list shows up in both copies. copy.deepcopy

recursively duplicates everything.

Signal: moderate. Connects to the reference model and to a real bug class. Candidates who've debugged a shared-nested-object bug answer this with conviction.

Here the language matters less than the reasoning, but Python-specific tools (dicts, sets, collections

, heapq

, slicing) are exactly what interviewers want to see you reach for. Solving with the right standard-library tool is itself a signal.

Recurse: if an item is a list, recurse into it, otherwise add it. Use isinstance(item, list)

rather than type(item) == list

so subclasses work too.

Signal: clean recursion and the isinstance

detail. The detail is small but it's the kind of correctness instinct that shows up everywhere in real code.

Count character frequencies; a palindrome allows at most one character with an odd count. collections.Counter

plus a single pass over the counts does it.

Signal: whether you reach for Counter

instead of building a frequency dict by hand. Reinventing Counter

isn't a sin, but it tells the interviewer you don't know the standard library well.

Sliding window with a "missing" counter: expand the right edge until the window is valid, then shrink from the left while tracking the best window seen. This is a hard question and interviewers know it.

Signal: high. Sliding window is one of the highest-value patterns to internalize because it transfers across dozens of problems. Getting the shrink condition right under pressure is a strong signal.

In Python the shortcut is collections.OrderedDict

with move_to_end

on access and popitem(last=False)

on eviction. The deeper answer is a hash map plus a doubly linked list, which is what you'd write if asked to do it without the standard library.

Signal: high, and it's a great question precisely because it has two valid altitudes. Knowing the OrderedDict

trick shows Python fluency; being able to drop to the linked-list version shows you understand why it's O(1).

Kahn's algorithm: compute in-degrees, start from the zero-in-degree nodes, and reduce neighbors' in-degrees as you remove nodes. If you can't process every node, there's a cycle.

Signal: graph reasoning and the cycle-detection insight. Comes up more than people expect because dependency ordering is a real problem (build systems, task schedulers).

Scan the grid; on each unvisited land cell, increment the count and flood-fill (DFS or BFS) to mark the connected region. Mutating visited cells in place avoids a separate visited structure.

Signal: grid traversal and DFS, both extremely common. The in-place-visited trick is a small efficiency signal.

list(dict.fromkeys(items))

. Dicts preserve insertion order since Python 3.7, so this is both correct and idiomatic. The naive answer rebuilds with a seen-set, which works but is more code.

Signal: idiomatic Python. The dict.fromkeys

answer reliably surprises interviewers in a good way.

Trivia tax: partial. Knowing the one-liner is a bit of a party trick, but the underlying insight (dicts are ordered, sets aren't) is real.

List append and index are O(1), list membership (x in list

) is O(n), dict and set lookup are O(1) average. The trap is x in some_list

inside a loop, which quietly makes an algorithm O(n^2).

Signal: high. This is the single most practically useful complexity knowledge, because the list-membership trap shows up in real code constantly. Candidates who instinctively switch a list to a set for membership checks are showing exactly the right reflex.

Maintain a min-heap of size k with heapq

: push each number, and pop the smallest whenever the heap exceeds k. The top of the heap is your kth largest.

Signal: whether you know heapq

exists and when a heap beats sorting. Sorting the whole stream is O(n log n) per query; the heap is O(n log k), which matters at scale.

For data and backend roles, library fluency often matters more than raw algorithms. These questions test whether you've used the tools, not just read about them.

loc

and iloc

? loc

selects by label, iloc

selects by integer position. The bug they're probing for is chained indexing like df[df.a > 0]['b'] = 1

, which can silently fail; the fix is a single loc

call.

Signal: real Pandas mileage. Anyone who's used Pandas seriously has been bitten by the SettingWithCopyWarning

, and mentioning it unprompted is a strong tell.

The operations run in compiled C over contiguous memory, avoiding Python's per-element interpreter overhead and object boxing. A loop over a DataFrame row by row can be hundreds of times slower than the vectorized equivalent.

Signal: high for data roles. The follow-up is usually "so how would you avoid iterating this DataFrame," and the strong answer reaches for vectorization, with .apply

only as a last resort.

apply

versus a vectorized operation in Pandas? Prefer vectorized operations whenever they exist; apply

runs a Python function per row or per group and loses the C-speed advantage. Reach for apply

only when the logic genuinely can't be vectorized.

Signal: whether you treat apply

as a convenience or a performance cliff. Candidates who reach for apply

first are usually newer to Pandas.

dropna

removes it, fillna

replaces it, and the real answer is "it depends on why it's missing." The strong candidate asks whether the data is missing at random before choosing, because filling with a mean can distort a model.

Signal: high for data science. This is where statistical thinking shows through a Pandas question. The mechanical answer is easy; the judgment is the signal.

NumPy stretches arrays of compatible shapes so element-wise operations work without copying, comparing dimensions from the right and treating size-1 dimensions as stretchable. Adding a shape (3,1)

array to a shape (1,4)

array yields (3,4)

.

Signal: real NumPy fluency. Broadcasting is the thing people either understand or fake, and a clean shape example is hard to fake.

async

/await

and when it helps. async

/await

lets a single thread handle many I/O-bound tasks by suspending one while it waits and running another. It helps for network calls, database queries, and file I/O; it does nothing for CPU-bound work.

Signal: high for backend roles. The discriminating follow-up is "would async speed up a heavy computation," and the right answer is no, because it doesn't add parallelism.

requests

library do, and how do you handle a failed request? It's the standard HTTP client. The mature answer covers response.raise_for_status()

, timeouts (always set one), and retry logic with backoff for transient failures.

Signal: production instinct. Junior answers stop at requests.get(url).json()

. Senior answers mention the timeout unprompted, because they've had a request hang forever in production.

Write a function named test_*

with a plain assert

. Use fixtures for shared setup, parametrize

to run one test over many inputs, and monkeypatch

or mocking to isolate external calls.

Signal: whether testing is a habit or an afterthought. Mentioning parametrize

and fixtures unprompted signals someone who actually writes tests, not someone who's heard tests are good.

An object that defines __enter__

and __exit__

, used with with

to guarantee cleanup (closing files, releasing locks) even if an exception fires. You can also write one with contextlib.contextmanager

and a generator.

Signal: moderate to high. Knowing with open(...)

is table stakes; being able to write your own context manager shows real depth.

Iterate over the file object line by line (for line in f

), which streams rather than everything, or read in fixed-size chunks. For structured data, Pandas read_csv

with chunksize

gives you an iterator of DataFrames.

Signal: high. Memory-aware file handling is a real-world skill that pure algorithm questions miss entirely.

This is the section the competitor lists skip, and it's the one that predicts the job best. On the job you read and fix far more code than you write from scratch. Good interviewers know it, so they show you broken code and watch how you reason.

The strong move is to profile before guessing: cProfile

or even a few time.perf_counter()

calls to find the actual hot spot. The most common real culprit is an O(n) membership test (x in list

) inside a loop, fixable by switching to a set.

Signal: very high. Profiling before optimizing is the clearest separator between engineers who've worked on real performance problems and those who guess. Candidates who immediately start rewriting without measuring are showing you how they'd behave on the job.

KeyError

intermittently. How do you debug it? Reproduce it, read the traceback to the exact line, then reason about why the key is sometimes absent (a race, a missing default, an assumption about input). Tools: dict.get

with a default, collections.defaultdict

, or a guard, depending on the cause.

Signal: high. Reading a traceback calmly and working from the bottom line up is a learnable skill that many candidates visibly lack. Watching someone debug is more informative than watching them code.

def add_item(item, items=[]):
    items.append(item)
    return items

The mutable default argument is shared across calls, so the list accumulates across every call that doesn't pass its own list. Fix with items=None

and items = items if items is not None else []

.

Signal: high. This is the mutable-default bug in disguise, and recognizing it on sight tells the interviewer you've been bitten before, which means real experience.

result = [x for row in matrix for x in row if x > 0]

It flattens a 2D matrix and keeps positive values. The two for

clauses read left to right like nested loops, which trips up people who only ever write single-level comprehensions.

Signal: code-reading fluency. Being able to parse dense Python you didn't write is a daily-work skill that whiteboard questions never touch.

Order-dependence between tests, shared mutable state, hardcoded paths, timezone or locale assumptions, and unpinned dependencies. The meta-signal is whether the candidate has a systematic checklist or just shrugs.

Signal: high for anyone past junior. Flaky-test debugging is a real and frustrating part of the job, and having a mental checklist is exactly the experience interviewers want.

Interviewers increasingly hand you working code and ask you to trace it, specifically because tracing is hard to fake with AI. The strong answer narrates state changes step by step and flags any line that would surprise a reader.

Signal: high, and rising. Given that 81 percent of interviewers in interviewing.io's survey suspect AI-assisted cheating, expect more code-reading and fewer blank-page prompts. The skill being tested is genuine comprehension.

For data science and ML roles, Python is the medium and the real questions are about statistics, modeling, and judgment. The interviewer wants to know you can turn a vague problem into clean code and defensible reasoning.

High bias means the model is too simple and underfits; high variance means it's too complex and overfits to noise. The tradeoff is choosing model complexity so test error is minimized, often with regularization to pull a complex model back.

Signal: foundational. Nearly every DS loop asks some version of this. A strong answer connects it to a concrete decision (why you'd add regularization), not just the textbook definition.

Trivia tax: partial. The definition is rote, but the "how would you act on it" is real.

L1 (Lasso) adds the absolute value of coefficients to the loss, which drives some to exactly zero and performs feature selection. L2 (Ridge) adds squared coefficients, which shrinks all of them smoothly without zeroing them out.

Signal: whether you understand the geometric reason L1 produces sparsity, not just that it does. The follow-up "why does L1 zero things out and L2 doesn't" separates memorizers from understanders.

Resampling (oversampling the minority, undersampling the majority, or SMOTE), class weights in the model, and crucially the right metric: accuracy is useless on a 99/1 split, so use precision, recall, F1, or AUC. The best answer starts with "what's the business cost of each error type."

Signal: high. This question rewards judgment over recipe. Candidates who jump straight to SMOTE without asking about the cost of false negatives are missing the point.

Precision is the fraction of positive predictions that are correct; recall is the fraction of actual positives you caught. Optimize precision when false positives are costly (spam filtering), recall when false negatives are costly (cancer screening).

Signal: high. The concrete examples are what matter. A candidate who can map precision and recall onto a real decision understands the metrics; one who only recites the formulas usually doesn't.

Gradient descent walks the parameters downhill along the loss gradient, scaled by a learning rate. Batch uses the whole dataset per step (stable, slow), stochastic uses one example (noisy, fast), mini-batch splits the difference and is the standard in practice.

Signal: whether you understand the speed-versus-stability tradeoff and the role of the learning rate. The learning-rate sensitivity is the part that shows real training experience.

It's an ensemble of decision trees trained on bootstrapped samples with random feature subsets, averaging their predictions to reduce variance. Choose it when you want a strong baseline with little tuning and some feature-importance insight, on tabular data.

Signal: moderate. Knowing the mechanism is table stakes; the "when would you choose it over gradient boosting" follow-up is where real modeling judgment shows.

A forward pass computes the prediction and loss; the backward pass uses the chain rule to compute how much each weight contributed to the loss, and the weights update in the direction that reduces it. It's the chain rule applied systematically across layers.

Signal: high for ML roles. The chain-rule framing is the discriminator. Candidates who can explain it without hand-waving understand what their framework is doing under loss.backward()

.

The strong answer is a process, not an algorithm: understand the target and the business question, explore and clean the data, establish a simple baseline, then iterate with better features and models while validating honestly. Mentioning a baseline first is the senior tell.

Signal: very high, and the most realistic question in any DS loop. It maps directly to the actual job. Candidates who jump to "I'd train XGBoost" without mentioning a baseline or validation are showing inexperience.

Define the metric and minimum detectable effect, compute the sample size for adequate power before you start, randomize properly, then run until you hit that sample size rather than peeking and stopping at the first significant result. Peeking inflates false positives.

Signal: high for product DS roles. The "don't peek" insight is the one that separates people who've actually run experiments from those who've only read about p-values.

They map words to dense vectors where semantic similarity becomes geometric closeness, so "king" and "queen" sit near each other and analogies fall out of vector arithmetic. They let models transfer learned meaning instead of treating words as opaque IDs.

Signal: moderate for NLP-flavored roles. With LLMs now dominant, the more current follow-up is how embeddings relate to what a transformer learns, which tests whether you've kept up.

You don't have time for all of this. Spend it where the signal density is highest.

loc

/iloc

and the SettingWithCopyWarning

, vectorization versus apply

, missing-data judgment, and precision and recall mapped to a real decision.if __name__ == "__main__"

, reversing a string, reciting *args

and **kwargs

. Know the one-line answers, spend nothing more.Reading answers builds recognition. It does not build the ability to produce a clean answer while someone watches and the clock runs. Those are different skills, and the gap between knowing your answer and delivering it under pressure is where good candidates lose offers.

The fix is to practice out loud, under something like real conditions. Explain your reasoning as you go, because interviewers score your thinking as much as your code, and because narrating your approach is exactly what the rise in AI-cheating suspicion has made interviewers want to hear. Solve a problem you haven't seen, talk through the tradeoffs, and get feedback on where your explanation went fuzzy.

That's the gap Four-Leaf's voice mock interviews are built to close. You practice answering real questions out loud, get scored on substance and delivery, and drill the spots where you freeze, so the answer comes out clean when it counts. You can generate fresh Python questions by role and difficulty and run a full mock before your real one. The questions in this guide are a map of what gets tested. Practicing them out loud is how you turn the map into an offer.

source & further reading

dev.to — original article Tokeness review: one API key for GPT/Claude/Gemini/Grok/DeepSeek/Kimi (with real caveats) Our dev labs open-sourced a local Python middleware framework that intercepts, repairs, and stabilizes malformed AI JSON data streams within local in-memory arrays. Optimizing LLM Stream Ingestion: Reconstructing Truncated JSON Payloads in 0.0122ms

Python interview questions: what each one actually predicts on the job (2026)

Run your AI side-project on zahid.host