How Excel got agentic

Mukul Singh transitioned from Microsoft Research to the Office Product Group two years ago to deliver agentic AI capabilities in Excel, now known as edit with Copilot in Excel. The feature, initially developed as Excel Agent Mode, treats Excel as a low-resource programming language and has since expanded to PowerPoint, Word, and Outlook. Singh's move from pure research to product was driven by a desire to see his work deliver measurable impact, culminating in agentic tools that are changing how users work across Microsoft Office.

When Mukul Singh made the jump from pure research into product, it was a leap of faith. But he had an idea that he wanted to bring to life: delivering agentic AI capabilities in Excel https://www.linkedin.com/posts/mukulsingh105 excel-agenticai-airesearch-ugcPost-7412568841274388480-2ypM/?utm source=share&utm medium=member desktop&rcm=ACoAAASC7AYB7x9oOs7iVcAOnm5D7V1HcCwW7OY . While this was well before buzzwords like “the agentic AI era” had cultural cachet, the research was already headed in that direction. So armed with a prototype and a healthy dose of ambition, Singh made his pitch. Two years on, he’s fully transitioned from his role in Microsoft Research to a new gig in the Office Product Group and successfully delivered the ability to edit with Copilot in Excel previously known as Excel Agent Mode https://www.linkedin.com/posts/mukulsingh105 excel-agenticai-airesearch-ugcPost-7412568841274388480-2ypM/?utm source=share&utm medium=member desktop&rcm=ACoAAASC7AYB7x9oOs7iVcAOnm5D7V1HcCwW7OY . Not ones to rest on their laurels, the team quickly went on to ship agentic capabilities in PowerPoint https://support.microsoft.com/en-us/topic/edit-with-copilot-in-powerpoint-008f17aa-8e5f-4cda-ba6c-0588000bdad7 , Word https://support.microsoft.com/en-us/office/edit-with-copilot-in-word-647d5d14-eaec-4e8a-a574-7cefffa7f8f0 , and Outlook https://techcommunity.microsoft.com/blog/outlook/copilot-in-outlook-new-agentic-experiences-for-email-and-calendar/4499798 . It’s changing the way people get work done—and it started with the hypothesis that Excel, at its heart, is a low-resource programming language. We sat down with Singh to learn more about his journey from research to product, the science and research behind Office’s new agentic capabilities, and why Excel was the perfect testbed for agentic AI. To kick us off, tell us a little about yourself. I’m a researcher in the Office Product Group team. I recently started a science team focusing on agents and AI for the Microsoft Office product portfolio. I was originally in Microsoft Research, working on research full-time, publishing papers, and very far away from the product world. To be honest, it’s been an incredible journey getting to go from research to then being embedded in the product space. How long ago did you make that transition? It was only two years ago—and right at the cusp of when the Excel Agent Mode work started. In fact, that was the catalyst. I had no intention of ever moving out of academia and deep research. And, you know, Microsoft Research is one of the best labs in the industry that’s still connected to academia. So I was pretty okay with my life. I didn’t want anything else. It was the most perfect blend that I could hope for. And then this project in Excel started, and there were some initial discussions. I thought it sounded interesting, because one of the things in research that you always feel is missing as a gap is that you don’t see your work actually delivering value. You see it deliver a lot of theoretic value—you see its shape and the direction of the world, per se. Other people might take your research direction and extend it in meaningful ways. Research does shape society in a way, but you don’t see any immediately measurable impact. So at that point, I felt like I wanted to work on something where I could see that happen—where I could watch the impact unfold in front of me. What are some of the research questions that ultimately led to Excel Agent Mode which we’re now calling edit with Copilot in Excel ? My research in MSR and all of my papers previously had all been about AI for low-resource programming languages. I like to explain low-resource coding languages as languages that are very obscure and make up less than like 0.1% of the entire coding community. So AI models are generally bad at them because they just learn off the internet and known sources. To give you an example, internally a Microsoft there’s a language called Kusto Query Language KQL https://learn.microsoft.com/en-us/kusto/query/?view=microsoft-fabric . That’s just used for telemetric querying. Now, it’s a public language—it’s published. But no one outside of Microsoft uses it because why would they? We designed it for our database systems. So that’s the type of languages that the models are bad at. All of my research was looking at how we could make models good at these languages, which they are not naturally good at. We need to drop in hints, cues, documentation, give it retry mechanisms and everything. When I was initially approached about this work in Excel, I was at first very skeptical of how that and my work might be related. I’ve done a lot of tabular research, sure, but it’s not Excel. But then they drew the parallel that, actually, Excel is just another low-resource language programming. It has its own internal language. It has its own engine. It’s just that the model doesn’t know it. Yeah, in your LinkedIn post https://www.linkedin.com/posts/mukulsingh105 excel-agenticai-airesearch-ugcPost-7412568841274388480-2ypM/ , you talked a little bit about how Excel functions like a low-resource programming language. What really surprised you about the project when you dug in? So the vision originally for Excel Agent Mode—and, by the way, kudos to the team and their thinking. This was before any agents. Today, everything is agents, right? You can automate, you go to an app and assume that there’s a button somewhere that will do it for you. But that didn’t use to be the case. Excel, I just didn’t think of it as an AI-forward app. I thought of Excel as an app that adapts, right? That doesn’t need AI. But the vision at that point was that the team wanted to automate end-to-end user workflows. If I open Excel, anything I have in mind that I want to do, this agent I’m building should be able to do it. That was a very difficult vision to achieve. In fact, that’s the reason that this project took so long and had so many setbacks—because we set our ambition up so high, before the models even had the capability to be good enough to do anything like this at scale. Say you want to add a pivot table, right? That’s a common task. I don’t know how to do it, but I know people do it. It’s just like a five- to 10-line code snippet that, wherever someone clicks the button, “Create pivot table,” internally that code is run. Now, we could connect the model and it could generate all of that code, which is just some random JSON and JavaScript objects. To us, it’s like gibberish text, but it’s completely deterministic and exactly controls the behavior. And the good thing about Excel is that all of its surface is programmable. This is not true for all apps, by the way. Very few of Microsoft’s apps are truly programmable, but Excel—you have to hand it to the engineers 40 years ago when they were setting it up: They made sure that everything is programmable end-to-end. What are some of the twists and turns that the work has taken over the last two years? When we started this project, the vision really hit home. We had insane videos that the product managers and designers made, showing how the product is today. But this was two years ago, right? And the videos were the same. So you see the disconnect was that everyone, leadership bought in. Like, this is the future. We want to invest in this. It’s the right thing to do. And we got a very strong crew of people. I was brought in. We were all working on it, but the reality was that the models weren’t good enough. This was before the age of the racing models. The best model we had to work with was 4.0, and anyone who has played with models knows that 4.0 just cannot do long chain tool calling. It used to collapse after a couple of calls, and it was only able to do things like start a formula, start a problem, which at that time was still cool and—bless our marketing team, they were able to make that such a good feature with the help of design and everything. But even that, to us, felt like, “Oh, we’re falling short.” And they were like, “No, this is still great.” The industry just wasn’t here yet. That was, I feel, the weakest moment of the project where there was this doubt. Like, is this just too aspirational? Is it just not possible? Did we predict the future wrong, or will the models not even get there for like 20 years? We just didn’t know. We had promised a lot to customers on the backing of these researchers’ point of view, and now we were in a space where we didn’t feel confident that we could deliver. There were debates about whether we should just cut this project and rebrand it into something completely different. So it was indeed quite the journey. We were shooting way above our weight class with very little evidence other than just the intuition of maybe three people who said, “We think this is where the world will end up.” When did you get a sense that the models were headed in the right direction in terms of reasoning and long chain tool calling? I think that’s a very good question. When OpenAI announced their first series of reasoning models https://openai.com/o1/ , that was a pivotal moment for us. We tried the model, and it couldn’t solve any of the promised scenarios we had. But the traces showed signs of life. Like in the pivot table example: It tried to generate the pivot table code. It ran it—it just gave back an error that the area where you’re trying to create the pivot table already has data in it. Now, the previous models at this point would either keep looping in this same error, just give up, or do some other random gibberish text response and not recover. But o1 tried to recover. That was the first sign of life that, OK, this is now working in a chain. It does something, looks at feedback, and tries to recover and keep trying until it succeeds. It went up to like 20, 30 tool calls, still not able to solve real complex tasks. But we thought that if we reinforced it with the right information, it might be able to do something. This is backtracking a little bit, but when you ran up against the limitations of the models—when it seemed like maybe you were headed down the wrong path and the research had it wrong—what kept you going? I’ve been in the space of generating low-resource programming languages for a while. I had even published papers for Excel in general, so I know the ecosystem. Everything’s built on top of Excel. So based on all of the research, we had a very strong intuition that this is where the models were headed. And our parallel was very simple: that coding agents, like people using GitHub Copilot and VS Code, and Excel should intuitively be no different. Their implementation and all the UX decisions are different, but they are the same in principle—it’s a surface in which I do manual work. I want that automated, and I’m doing that through code. Our parallel was always that Excel is an IDE. People don’t see it as an IDE. They see it as an app, but that’s just because it’s presented as an app. It’s really just an IDE. For analysts, it’s the equivalent of VS Code to a developer. You’ve been talking about this, but I’m going to ask the question directly: What is it that makes Excel a really solid testbed for agentic AI? I truly believe that Excel is the first large-scale enterprise app that was actually automated by an agent—meaning that the agent can take meaningful action in the app. So it doesn’t have fixed workflows. It’s actually controlling it. And what makes Excel the perfect testbed is that it’s programmable. All of it can be controlled by the agent. It’s not just code, because that code has real consequences in the app that you can see. If I add a chart, I see a chart there, right? If I connect a formula to a different sheet, I see live updates. So it’s kind of a living environment. It’s a live environment in which I’m making changes. And if I make changes, the environment will undergo some change. And I’ll use that as feedback, so there’s observability. And that is the best testbed. An agent needs an action space that allows you to interact with an environment. And it needs feedback from the environment. Now, this seems trivial, but honestly, very few apps fit this charter perfectly. So this loop made Excel very interesting. All of the research across everything we’ve discussed, it’s been over a span of 17 or 18 research papers all going back two to three years. When someone looks at it at first, they might not even connect the dots. But there’s like three papers on just the best representation of data for Excel sheets. And people will think, “Oh, that’s probably for storage and optimization and context querying. But no, it’s just that we need a concise representation for the model. Excel actually showed the way for a lot of the apps in the industry in general that this is how an automation pattern works. At least within Microsoft: After Excel, PowerPoint followed. Then Word followed, the same pattern. And Outlook was recently announced—same pattern, same research crew delivering all of that, one after the other, just because all of the fundamentals are there. So your team was involved in expanding the editing with Copilot functionality across the Microsoft Office product suite? Yeah. After my stint at Excel, I kind of figured out that, you know what? This is kind of fun. In fact, it’s very similar to research where you get a vague abstract problem, you have no guidance, and you just try to go solve it. Talk to people. Figure things out. It’s like the same environment, just the difference is that when it actually ships, you feel a sense of accomplishment, and you can point people to it. Like I can tell an Excel guru, “Open that side panel. Try a query.” You’ll be amazed, and I built that. So the moment that Excel was in a stable state, we handed it off back to the feature team—the PMs and designers—to run with it. And then we moved on to the next surface, PowerPoint, and that seemed even more interesting. Ever since I moved into a Director of Science role inside the Office Product Group, where I’m overseeing all of the agent work, I’ve realized that it’s all based on very similar patterns. And it just so happens that all of my research aligned very closely with where the industry headed because, let’s be honest: Eight years ago, when I was starting my research for low-resource programming languages, at that time, there were just embedding models. At the time, it just seemed interesting. We had no idea the shape that agents would take. Going into this field, the models were already good at good languages, so what could I contribute there? The only place where I could meaningfully contribute was with the things the models were bad at. So I literally made a list of everything these models struggled with, and it just stood out that programing seemed useful. If it can write good code, someone somewhere will be happy. Were there any hiccups along the way? Did you run into any unexpected technical challenges? At first, we ran into the faster horses trap. We’d show customers these videos and ask, “As an analyst, will you be more productive with this tool?” And everyone said, “It needs to come back with an answer in 30 or 40 seconds. That’s how long I can wait. There’s no way in the world that I’m going to sit there for 10 minutes and leave it running. No one works like that. I’d cancel that operation after 30 seconds.” Out of every customer we talked to, not a single person had ever said that they were willing to go beyond one minute of wait time. Right. So we were optimizing for that. It needs to complete faster. If the model is making three errors and recovering, that’s not good enough. It shouldn’t make those three errors. And that sent us down paths of optimization that are very tricky. Like today, they’re still not solved—they’re that difficult. But we thought that was non-negotiable for the product. Then this company called Shortcut came out, and their app ran for 25 minutes on a single query. But the results were amazing. Initially people said, “Who’s going to wait 25 minutes?” But within a week, everyone said, “I now just log in, push the query, and just leave. I don’t need to do like anything.” And we were like, what? But for every one of our customers that we talked to, their mindset immediately changed. They were asking for faster horses, but then someone delivered them a car. And now they understand that that’s exactly what they needed. That’s really interesting. The challenges were really around customer expectations. We went with the assumption that all our apps are IDEs. But the hypothesis starts to break down because people use these apps with nuance. And the moment we fail to understand that, it goes from a good to a bad experience. For Excel, it was that people are willing to wait for long-running tasks. But they need auditability. It can’t just work behind the scenes—it has to show its work. It needs to be computed on the sheet. Similarly, for PowerPoint, people really care about their brands and templates. Initially the version of the PowerPoint agent that we shipped, which if you asked me to rank them, I would say it’s the best version that we built—that was the one that everyone hated. We had optimized for a PowerPoint agent that works with its own independence, like in a vacuum—because that’s the way I use PowerPoint. I start from scratch, no templates. I just have some information that I need to present, and I want to find the best way to do that. But there were other very real constraints that we had to understand. So PowerPoint really pivoted to take into account templates and brand guidelines. Similarly for Outlook, we found that people really care about confirmation. Like with Visual Studio and all these IDEs, they try to optimize for the least user engagement required, because that’s automation, right? You can leave it running and you don’t have to worry about it unless there’s something that you really need to look at. So they really try to cut down on the model asking something back. But in Outlook, the story is completely the reverse. People actually love it when the model gives them a list of things that it proposes, and you can just accept or delete each of them in a list. So like: - Send this draft email to this person? Accept. - Create this to-do list? Accept. - Delete these seven emails? Reject. And, again, it was an anti-pattern. What can you tell us about the tech transfer process and how your research made its way into the product? Yeah, that was actually tricky. I feel like, in general, a lot of people struggle with taking their research and landing it in a product. There’s a very weird gap, because the product teams are very direct about needing to talk to more researchers, and the research teams are keen to get the product group’s attention because everyone wants their work to be used and get funded. But there’s still this weird gap—it’s just not well connected. And the surprising thing is, it’s not because of a lack of will from either side. It’s because product always comes with so much nuance. The product teams can’t just blindly trust that the research teams are going to understand everything and build it. And for the research team, it’s very hard to put themselves in the product team’s shoes and understand all the constraints. Maybe it needs to run in 10 seconds or less, or it needs to have auditability traces. That’s something that the research teams would feel is a very strange objective. But at least for Excel, I think the process was smoother because we operated as an integrated team. What was the overall team dynamic like? For Excel Agent Mode, it was a crew of just 10 or 11 people. Everyone had a title, but the titles meant nothing, right? There were PMs checking in PRs. There were devs writing papers. And there were researchers writing specs. There was just everyone on board working as if it were a startup—logged into a conference room and just checking in change after change. That was honestly the most fun time, when it was just people building stuff, coming up with ideas. But then we did do a proper handoff. Taking the research into the project became easier because everyone discussed everything openly, so everyone understood the same things. What avenues for future research did this work open up? I think a lot. Before this, I don’t think people were really thinking of agents for things that normal people use as apps. That just wasn’t a concept. And Excel has already given way to PowerPoint, Word, and Outlook— which are all entirely different surfaces, all just automating the app completely. There was significant uptake, even outside Microsoft, when people saw that, not only is it possible, but people love it. This project was a testbed that showed two very distinct things: that it was possible to automate an app through these models, which previously was not the accepted truth, and that there was a lot of appetite for it. Excel in its early days had a lot of flaws in the agent. But people overlooked a lot of that just because of those scattered moments of brilliance where they were like, “Oh my god, it did something that I couldn’t even have imagined.” This has opened the door so that, I think over the next couple of years, all of these apps that we use day to day will have an agent—not to replace it, but that will be our go-to whenever we struggle to get something done. It showed the path, that this is possible for other apps, regardless of their complex surfaces.