{"slug": "zen-and-the-art-of-machine-learning-research", "title": "Zen and the Art of Machine Learning Research", "summary": "A researcher argues that success in machine learning research depends more on discipline and consistent effort than innate talent, comparing the process to meditation. The author advises beginners to focus on fundamental concepts rather than chasing short-lived trends or benchmarks, and notes that experience can sometimes be counterproductive in the rapidly evolving field.", "body_md": "# Zen and the Art of Machine Learning Research\n\n### temperament over talent\n\nSo you want to do AI research? It’s true that no one *really* teaches you how. Not directly, anyway. But it turns out that the way to get started is pretty simple: some combination of (i) reading and (ii) building stuff. You can’t do one without the other. You become a researcher through the combination.\n\nIt turns out the process of becoming a great researcher is not unlike learning to meditate:\n\n**I. **\n\nThe way to get started is pretty simple, through some combination of\n\n(a) reading and learning, and\n\n(b) building stuff.\n\nYou can’t only do one. You’ll become a researcher through this combination.\n\nThere’s an old Zen saying that goes something like this –\n\non days we find insight, we sit.\n\non days we do not find insight, we sit.\n\nDoing research is basically like this. Scientific insights can come seemingly at random. Most days they will not come. An important trait for success is just putting in the time & effort. Like any other pursuit (music, sports, sales, etc.), if you want to become world-class, it will take a tremendous amount of discipline.\n\nNoam Shazeer makes a nice hat-tip to the inherent randomness of successful research ideas in the SwiGLU paper:\n\n“We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence.”\n\nA related comment is that *it’s possible to read too many papers*. If you want to solve a problem, the tried-and-true path to success is to attempt a solution, try it, reach a bottleneck, try to solve it, and only reach for literature when you’ve run out of ideas yourself.\n\n**II.**\n\nFine, but what should I work on?\n\nIf you’re just starting out, here’s my honest answer: I don’t think the exact topic matters much.\n\nThat said, I would warn you against choosing things that have been popular for less than six months. AI moves fast, but the fundamental ideas haven’t changed in forty years. If you want to make a career out of this, I wouldn’t advise you to think too hard about the concepts of 2026: harnesses, agents, context engineering, etc. These will change.\n\nInstead, you’ll learn more by going back to the basics: learn what cross-entropy is. Compute it by hand for a small distribution. Deeply understand SVD, to the point where you can start to visualize it in your head. Don’t think too much about RL for coding specifically, instead learn the ideas behind policy gradients, why they’re useful, and why they’ve been popular for decades.\n\nOne more meta-comment: if the best possible outcome of your research project is a higher score on an existing benchmark, you are not going deep enough. Often, existing datasets won’t test new interesting capabilities.\n\n[Jason Wei makes a similar point](https://x.com/_jasonwei/status/1875268874859344349):\n\nAn underrated but occasionally make-or-break skill in AI research (that didn’t really exist ten years ago) is the ability to find a dataset that actually exercises a new method you are working on.\n\nAs for a concrete suggestion, I can’t make one; that has to come to you. Go deep, focus on the basics, and don’t chase benchmarks. Stay in the water and the ideas will come.\n\n**III.**\n\nin the beginner’s mind there are many possibilities; in the expert’s mind there are few\n\n– Suzuki\n\nSomething often-repeated in Silicon Valley these days is how experience in AI research might actually be counterproductive to good research intuition in the modern day. I’ve observed parts of this up-close; many researchers from the pre-scaling-era remain interested in designing methods that work at a small scale but will obviously fail when tested at scale.\n\nOne really impressive thing about OpenAI is that most of the people running the company (on the technical side, at least) are under 35. Many of the important decisionmakers behind chatGPT are under 30. One thing we can take away from this is that since AI is such a nascent field (chatGPT is less than four years old!) *no one has a huge advantage*, because no one has been working on it for very long.\n\nIn short, holding on to ideas for too long can actually be counterproductive. Stay open-minded and refuse to let ego cloud your judgement.\n\n**IV.**\n\nInspiration strikes when you least expect it.\n\nHere are two examples from history:\n\nThe discovery of the\n\n[structure of the benzene ring](https://en.wikipedia.org/wiki/August_Kekul%C3%A9)famously came in a dream: the structure had never been seen before, but was imagined as a snake biting its own tail.[Ozempic basically comes from lizards](https://www.sciencealert.com/ozempic-literally-came-from-a-monster-and-its-not-alone). The GLP-1 hormone it mimics was first found in the venom of the Gila monster, a desert lizard that eats just a few times a year. Somehow we figured out how to make this work for humans too.\n\nOne important takeaway is that *to do good research, you must do things other than research*. Most of my personal “aha moments” happened away from the keyboard, especially when going on walks.\n\nDarwin, Tesla, Feynman, Aristotle. Many great thinkers of history proclaimed the outsized benefits of stretching your legs and going for a little stroll. Even if you don’t do research, you should probably go on more walks.\n\n**V.**\n\nEven when inspiration strikes, nature may not be benevolent: even with a perfect implementation, our idea might just not be *true* in some fundamental sense. Or perhaps it was, or seems to be. When the results come in, how should we react?\n\nAnother principle we can borrow from Zen is (experimental) equanimity.\n\nWhen analyzing an experiment, we can channel the following mentality:\n\nDid it go well? *Great!*\n\nDid it go poorly? *Also great!*\n\nBoth outcomes teach you the same amount of information. In fact, it’s often possible to learn more from a string of negative results than a single positive result. “Wow, it’s still not working – incredible!” Now that’s a healthy attitude for research.\n\nThe converse of this is that you shouldn’t get that excited about good results. In fact, most good results come because of a bug; it’s not that the results themselves were good, it’s that you measured incorrectly, and convinced yourself. Everyone wants their ideas to work – and this is a good thing! – but one thing all experienced researchers share is extreme skepticism, especially in the face of outcomes that seem too-good-to-be-true. Unfortunately, they almost always are.\n\n**VI.**\n\nA flower does not think of competing with the flower beside it. It just blooms.\n\nResearch is extremely outcome-driven. Especially in academia, it’s easy to look at others’ successes on paper and turn to emotions.\n\nPeople succeed for different reasons. Some people get lucky. The academic reviewing process, in particular, is neither consistent nor fair. When new research comes out in your area that you admire, ask yourself the following question:\n\nAm I operating at the proper level of depth to have made this insight myself?\n\nNow there are two possible outcomes. If the answer is yes – great. Your process is sound, but you didn’t make this finding; you were busy, you were doing something else, but you could’ve.\n\nAnd if the answer is no – then take this as motivation to go deeper.\n\n**VII.**\n\nbefore enlightenment, chop wood, carry water. after enlightenment, chop wood, carry water.\n\nMany successful projects typically involve hundreds of hours of gruntwork behind the scenes. Andrej Karpathy labeled a [nontrivial portion of ImageNet by hand](https://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/). The creators of [SWEBench](https://arxiv.org/abs/2310.06770), who were ahead of their time in many ways, spent hundreds of hours painstakingly filtering GitHub data to get a small, tractable set of GitHub issues useful for evaluation.\n\nIf you look at the career of great researchers, they likely spent lots of time working in obscurity before finding success. Get used to this. The more ambitious and forward-thinking an idea, the more work it may be to thoroughly implement and evaluate. This difficulty is a feature, not a bug.\n\n**VIII.**\n\nCollin Raffel, an amazing researcher whom I deeply respect, once mentioned that he thinks many ideas fail not because they’re bad ideas, but because the code has a bug that the researcher never found.\n\nIn general this is a really difficult problem, especially in the world of LLMs. A modern deep learning software stack is extremely complicated, and bugs can lie anywhere: in training, in inference, in harnesses, in data.\n\nif something looks wrong, you cannot move on. You can and should log many metrics and strive to understand all of them. If some of the metrics look different than you expected, you need to figure out why, because something may be wrong. I’ve tweeted before that one of the most important traits in a researcher is [healthy paranoia](https://x.com/jxmnop/status/2062995349573382219). Be paranoid!\n\n**IX. **\n\nOne practical point is that most experiments that involve deep learning take too long. Training models can take weeks or months. These days, evaluating a model on a single task can take multiple days.\n\nEspecially when coding with agents, our instinct may be to spin up many experiments in parallel and let them all run at a slow cadence. Although simple parallelization helps to some degree, [context switching](https://en.wikipedia.org/wiki/Human_multitasking) is a harmful pattern.\n\nIt is of paramount importance that you design ergonomic research workflows that support fast experimental feedback. Shorten cold-start times for training, make small evals that return results quickly. I really admire Keller Jordan’s [nanoGPT speedrun](https://github.com/kellerjordan/modded-nanogpt) as an example of how much we can learn from fast iteration cycles.\n\n(This said, at the end of the day, some results take an unavoidably long time. When you can, maintaining state over multiple days and understanding last week’s experiments when they finish today is an incredibly useful skill.)\n\n**X.**\n\nCoding agents help you move faster, but they make two problems worse: we have a harder time understanding basic details, and we context switch more often. A good researcher actively works to fight against both forces.\n\nCodex can write a training script for you; it can even execute the script, babysit it while it’s running, interpret the results, and send them to you in an email. But maybe it ran into an error and shortened the system prompt without asking you. Maybe it shortened sequence lengths to get eval running in a reasonable time. Maybe it ran the wrong config because you didn’t specify.\n\nFrom an engineering perspective, these are all small errors with an easy fix. But from a scientific one, they’re grave: small omissions like this can materially change important results of papers and are therefore not acceptable. *Beware dragons*. Even if you didn’t write the code, if you want to understand your results, you need to understand the system that produced them.\n\nI’ll level with you – this is hard! It’s tempting to outsource understanding to the machine. For many applications, it’s faster. But doing good science requires learning how the entire system works, so that you can be sure observations about it are true. There’s no easy way around this.\n\n**XI.**\n\nTLDR: Talent isn’t all that it takes to become a successful researcher. *Temperament* is greatly underrated. Stay curious and persistent, remain thoughtful and meticulous, and the ideas will come.", "url": "https://wpnews.pro/news/zen-and-the-art-of-machine-learning-research", "canonical_source": "https://blog.jxmo.io/p/zen-and-the-art-of-machine-learning", "published_at": "2026-06-16 00:45:50+00:00", "updated_at": "2026-06-16 01:18:32.980977+00:00", "lang": "en", "topics": ["machine-learning", "ai-research", "ai-ethics"], "entities": ["Noam Shazeer", "Jason Wei", "OpenAI", "ChatGPT"], "alternates": {"html": "https://wpnews.pro/news/zen-and-the-art-of-machine-learning-research", "markdown": "https://wpnews.pro/news/zen-and-the-art-of-machine-learning-research.md", "text": "https://wpnews.pro/news/zen-and-the-art-of-machine-learning-research.txt", "jsonld": "https://wpnews.pro/news/zen-and-the-art-of-machine-learning-research.jsonld"}}