# Presentation: Choosing Your AI Copilot: Maximizing Developer Productivity

> Source: <https://www.infoq.com/presentations/choosing-ai-copilot/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global>
> Published: 2026-06-03 11:05:00+00:00

## Transcript

**Sepehr Khosravi:** I'm Sepehr. I'm a Machine Learning Platform Engineer at Coinbase. I also teach part-time at UC Berkeley, a couple of different AI classes. I'm also the founder of AI Scouts, which is a free camp I run for teens to learn AI. I used to work at a teddy bear company before coming into tech. This is my second career. Before hopping into choosing your favorite AI copilot, one thing I want to go over is seeing where do we stand today in terms of developer productivity and AI tools. Go ahead and answer the first question, which is going to be, what level of AI-assisted coding best describes you? We're seeing mostly intermediate people, around 50%. Five percent of people saying they don't use any AI at all, which is great. Thirteen percent being around beginner. We have a good amount of people being around advanced with 33% as well.

Then, second piece, what percentage of your daily coding that you generate do you think is generated or assisted by AI? It seems like a good chunk of people are having the majority of their code, 50% to 75%, actually be generated by AI, which is larger than I've seen previously. Then third and final question, what developer productivity tools are you using the most frequently? Here, you can go ahead and type whatever you have, and it'll pop up as words. I see actually the biggest two ones here are Cursor and Claude. Third seems to be Copilot. Then some people just said code. That's great as well. I actually gave this speech recently for QCon in San Francisco. I have the numbers from that event. If you're curious in seeing it, this is what we had in SF. Around 24% of people were having 50% to 75% of their code generated by AI.

In comparison here, I think it was over 30% of you. Maybe even New York is a little bit more advanced than SF in terms of the AI usage. Then here, we only had 2% of people not using any AI tools at all. It might be a little bit higher in New York, but just interesting to see. I think this is the most interesting part. Most of the people in SF are still on Copilot, whereas here it seems most people are using Cursor and Claude, which is a great improvement. Where does the entire industry stand? This is the biggest developer survey I could find by Stack Overflow this year, where they surveyed 50,000 different software developers. What we found out from this is actually one in three developers are using AI once a month or less, which was less than I expected. This might be a little bit biased, because it is on Stack Overflow.

Maybe that audience uses it a little bit less than other audiences. Gives you a rough metric of where we might be at. Another thing that was super interesting is when it comes to AI tool sentiment, in 2025, we're using AI more than ever. The tools are better than ever. Yet the sentiment seems to have gone down. The past two years, the positive sentiment was 70% plus. In 2025, it's around only 60%. Why is this? I think it has a little bit to do with a lot of headlines that have come out this year. Zuckerberg went on and he said, AI will replace mid-level engineers by the end of 2025. We see a lot of other CEOs saying similar things and people talking about how great AI productivity tools are. I think that's a natural reaction to statements like that. The public sentiment might shift to the other side, where people start to refute those statements and say things like, AI's not all that it's hyped to be.

It's not as good as people are saying. That's why sentiment overall may go down. You see a lot of AI naysayers online. The reality probably lies somewhere in the middle. That's what we're going to try to gauge today and talk about the best ways to go about gaining that productivity.

## Objectives

The agenda is going to be talking about the current state of developer productivity tools. We're going to go over how to choose your AI copilot. We're going to go into Cursor and Claude Code, which are my top two picks. Then we're going to go over some lessons I had from a session with the Databricks CEO.

## The Current State of Developer Productivity

First of all, this is a long-term research done by Stanford on developer productivity. Over 100,000 developers were using this. I won't go into their exact methodology, but they went further than just commit number or PR number. They had manual reviewers look at code, see how much productivity it's gaining. This is the results that they came up with. What they found out is that with AI-assisted tooling, developers are typically generating 30% to 40% more code than they were previously. Then when they took it a step further, they also found out that 15% to 25% of this code that ends up getting generated is oftentimes reworked. In the end, they estimated net overall software engineer productivity is about 15% to 20% gain from using these different AI tools. I would argue that that can be even higher for those who become experts in the tools and are using it in the right way.

At the minimum, probably, we can expect some number around there. In terms of tools for AI, for writing software, I think there's really three different tiers. The first tier is going to be these all-in-one tools for non-developers. I think this is where a lot of the headlines might come from too. This is where we actually do see 100x developer productivity or non-developer productivity. The classes I teach at Berkeley, for example, or the teams that I teach, these are people who don't have any technical background, wouldn't have been able to write any software before, and now they're launching full-on companies and generating revenue using these tools. I think for this, there really is 100x increase. Then for us developers here, I think there's two tiers of tools. One is these IDE layers built on top of the foundational models. These are your Cursors, Copilot, Windsurf, IntelliJ, Cline, Google Antigravity.

These are built on top of the foundational layers and you can use them to help generate code for you. Then we have another layer of code, which are these terminal-based IDEs, which are built by the foundational models themselves, like Claude with Claude Code, ChatGPT with Codex, Gemini with Gemini CLI, and Kimi, and so forth. This is where you all ranked where your AI productivity level is today, and I hope that everybody here can move one level up, especially the people who are at none to beginner, start to adopt these AI tools or at least test them out a little bit.

## Choose Your AI Copilot

Going into it, tons of tools. What are people using? What should we be using? That same Stack Overflow survey states that 75% of people are actually still using Visual Studio Code as their main development tool. I see some people shaking their head, maybe at disbelief of why Visual Studio Code is still first. I was too when I first viewed this. They went even one level further, and of the people who are using Visual Studio Code, they asked what tool would you want to start using next? These are the top four tools that we saw, two of them being Claude Code and Cursor, which we're going to talk about, not because they were on this survey, but because I think those are probably the top two tools out there right now.

## Top Cursor Tips

First of all, I'm going to go into Cursor, top 10 Cursor tips, quick fire. Hopefully everybody can gain something from beginner to expert from this. Number one, this is for complete beginners who hate AI, have never used AI, I would recommend you just start here. It's the Cursor Tab feature. If you just use Cursor, it gives you these auto-recommendations in gray, and you can hit tab, and it'll fill it out for you. It's Cursor's own custom model that they've built, and this adopts based on recent changes that you make, edits that you accept, and linting errors. A lot of times you'll generate like 10 to 20 lines of code just from hitting tab and not having to lift a finger, or just one finger. I would start off there. Tip number two is the Cursor Agent. This is like the main interface of Cursor where everything runs off of. You can select whichever AI model you want to use, type your commands into it, and it'll generate and edit code for you.

What's really great about this is the tooling that Cursor agents come with. For example, they can search your entire codebases. They can search the web. They have access to MCPs. They have access to your terminal, and it can run commands for you. I think this is part of what really makes Cursor great. Tip number three is a feature they launched semi-recently, actually, which is multi-agent mode. You can put in a prompt of something that you want an agent to do, command three different agents to run that same exact prompt. I especially use this every time a new model comes out. For example, like ChatGPT 5.2 came out recently. I want to see, do I want to shift off of Opus 4.5 that I'm currently using to this new model or not? For a while, I'll have the new model that comes out shadow what I was doing previously. For a week, I'll use it, and compare the outputs between the different models, and see which one I want to continue using.

Also, just for a sense of what these look like, I went ahead and generated the same UI using a bunch of different AIs. First of all, this is Claude Opus 4.5. I asked all of them to just generate a simple HTML landing page for an M5 Mac that's coming out. This is Claude Opus. It took 2 minutes and 30 seconds to generate this UI. This is ChatGPT 5.2. This one took 3 minutes and 8 seconds, a little bit busier than the previous one. This is Gemini 3 Pro, which took 51 seconds. I think right now, although Claude is probably the best for generating code in general, when it comes to UI, just straight creating UI off the bat, Gemini is probably the leader. Although on this one, it's dark. Then, finally, there's the Composer model. Composer is Cursor's custom model they built, and that is one of my favorite features of Cursor. Whereas all these other AIs are probably smarter than Composer is, where Composer stands out is its speed. It's built for quickly generating outputs. It did this in 24 seconds. Take your pick at what you want, but I think Composer might even be the best one here.

Tip number four is using Shift Tab and Plan Mode. When you're in Cursor, you can hit Shift Tab to switch between three different modes. Typically, you're in Agent Mode. That's like the basic one that makes all your code edits and whatnot. Another one is Ask Mode, where if you don't want it to edit any of your codebase, but you want to chat with it, get an idea of what you want to do with it, ask questions about your codebase, you could switch into Ask Mode. Then, third, we have a Plan Mode, where if you have a larger project that you want to put in, I typically recommend pulling up a Google Doc, writing out all your project specifications, and then putting it into Plan Mode. Cursor will go ahead and generate an MD file with its plan of what it is going to do. It's going to split it up into subtasks, and do each of the subtasks, complete it before moving on to the next one, and you can review its plan before it starts.

This is great for when you're taking on bigger tasks. It'll do something like this. Tip number five is turn on Cursor Sound. I think that this one's pretty underrated. A lot of coding with AI comes down to just waiting a couple minutes for your code to load, and then you'll switch off the tab and forget about it. With this, it gives you a little ding so you know when to come back with it. Actually, it's so big of a problem that YC recently invested in this company called brainrot IDE, where it will actually pop up TikTok videos and games for you to play and watch while your AI code is generating. I probably don't recommend downloading brainrot IDE, but turn on Cursor Sounds to deal with that same problem.

Tip number six is Cursor Commands. If you have a list of different functions that you run on your terminal, I really recommend using this. My favorite usage for this is a create-pr command where instead of having to put three different Git command lines, you can just create a command that knows how to create a PR based on your recent changes, and it does all that process for you. You could even throw in how you want the PR description to look like in there. It's just like something that we do every day, but it's going to save you a couple minutes each time. Tip number seven is Cursor Rules. Cursor Commands, you specifically call them to run a certain workflow. Cursor Rules aren't in the same way. You can specify between Always Apply. You can view Cursor Rules as a set of prompts. If there's a set of prompts that you're constantly using, you probably want to make it a Cursor Rule. It's just basically storing a prompt, and you can call it whenever you want.

You can have an Always Apply rule, so this is for general advice that you want Cursor to have for all of your chats. For example, if you don't want it to generate comments, you can put it in here so it doesn't generate any comments in your code. There's a Apply Intelligently type rule where it will decide based on whatever task you put in whether it should apply it for that specific prompt or not. Then there's another one where you can set it up to apply only to specific files. You can say, when you're editing these UI files, make sure you follow these UI formats in this rule written here. Then there's another one which is Apply Manually where you can just specifically call it whenever you want, otherwise Cursor is never going to prompt it by itself. This is what that looks like. The nice thing about these rules are you can set them at a project level, you can set them at a user level, you can even share them with your teams.

You can also write it in this format called AGENTS.md, which is this format that a lot of these different AI tools are adopting now, like Codex. Most of the big ones are adopting it so that between different AI tools, you don't have to write new rules based on a new format for all of them. All of them will know how to read from AGENTS.md. That being said, Claude Code actually hasn't adopted this. Hopefully they will soon. For best practices of Cursor Rules, it's pretty similar to best practices for prompting in general, except you want to keep it under 500 lines for these rules. Split them up into multiple different rule files if you start to write really long ones. Yes, give it concrete examples, avoid vagueness. The same things you're doing when you're doing your regular prompting. A couple of my favorite Cursor Rules that you could do, one is like a refresh.md. If you have a bug that's persisting and you've tried and it's not working, you can write a set of rules to tell Cursor to zoom out, analyze the entire codebase from scratch. I use this one like three, four times. I tried some prompt and it wasn't able to figure it out.

Another one is like a no-comments.md. If you don't want extra comments, some of these AI tools just put way too many. You can remove that. Or like a prd.md. If you want to write company specific documentation, this doesn't even have to be code, and you have a specific format that you're writing this in, you can create an MD for that. Then when you finish, it will generate DOM documentation, or like a prd for you or whatever you need. Tip number eight is MCPs. This is where you really go from beginner AI to like expert AI when you integrate all the different MCP tools and put these into your Cursor knowledge codebase. A couple of the top ones that I recommend. One is, having an MCP for your document store is probably going to be the most important one, whether you're using Confluence, Google Docs, Notion, whatever it is. There's often a lot of gaps in our codebases and that's where the AI tends to fail.

When it has access to your documentation, it's going to have a lot higher success rate in figuring out whatever you tell it off the first try. The second one is some sort of version control MCP, whether that's probably GitHub for most of you or Bitbucket. It has access to reading all your previous PRs, your commits, assessing what went wrong, what didn't. Number three is some sort of project management MCP, whether that's Linear, Asana, Jira. This is great because, one, if you have a ticket, it can directly go and read the ticket and start implementing it for you, but it can also help you write tickets if you need. Four is some sort of database MCP. It can do testing, it can query whatever tables it needs to, Snowflake, Supabase, Postgres, whatever it is. You probably want to set these tools to read only so you don't accidentally end up wiping out your entire codebase.

Tip number five is some observability tool, Datadog, Prometheus, whatever it is, so that whenever you have pages, it can go observe whatever is happening. If you add all these, there's a lot of productivity to be gained. Another tip, though, is Cursor actually has a limit on how many tools you can implement, at 85. I would recommend even going less than this because the more you put in, the more context rot it begins to have, and it struggles to know which tool to call. Try to pick particular ones that are most important and you can turn them on and off based on whatever you're working on at the time. Tip number nine, prompt engineering. I'm not a huge fan of going into specific rules for prompt engineering. What I think is a lot more important is just general context engineering of telling the AI everything that it needs to know about your project. I think that's going to be the most important part rather than typing rules in a certain way to the AI.

That being said, I want to share a couple of cool prompts. One, this is from Claude themselves, where each of these chats have context. We have a context limit on whatever we're typing. When you get close to your context limit, sometimes these AIs will give you a shorter answer than they're supposed to because they realize, "My context window is about to run out. I need to give this person an answer." If you get to that 90% context, put in some prompts similar to these, where it'll tell the AI specifically, don't worry about the context window, just give me the best answer. That helps a lot. Another one, this is this huge prompt given by Google themselves, which they put this prompt onto Gemini and they saw roughly around 5% improvements across benchmarks by using this one. I put that to say you can make improvements using these different prompts, but around 5%. Typically for your own coding productivity, that 5% really won't matter too much.

You can just put in a second prompt right after, but these become important when you're putting it into your systems and your users are actually running commands which touches these prompts, then that 5% becomes a lot more important, but not for your general productivity. For your general productivity, most important thing is going to be context. Just tell it everything that it needs to know. Tip number 10 is Cursor Checkpoints. We're all using version control systems already, which is great, but with chat specifically, sometimes you're talking into a chat and it's really understanding your problem well. Then you put in some follow-up and it completely goes in the wrong direction and all the context gets messed up. It's good to know you can restore a checkpoint within the chat and go back to that state where it was producing things pretty productively for you.

That's the 10 tips. I said top 10 tips because it just sounds better than top 14 tips, but I actually have four more tips, so we'll go with four more. Tip number 11, this is just awareness. Cursor does indexing. That's one thing that I love about it. Every time you load your codebase, it will index at least 80% of your files, some of the huge ones it may skip on. This makes semantic search a lot better. Every time you make a change, whether you add a new file, modify a file, delete a file, it will go ahead and update the index accordingly. Tip number 12 is adding some Cursor Slack integration. This can be great for small fixes. Whenever you're asking for that PR review, your teammate might see some problem, and instead of either him telling you or having to go do it, you can just tag Cursor within Slack and tell it, go put extra quotations around this. For simple fixes, it makes it super easy. It can just make a PR for you.

Tip 13 is Cursor Browsers. This is great. You can actually see the UI of whatever you're working on in another tab on Cursor. What's really great about it is it allows you to test the application and gives Cursor access to your console logs and network traffic. Especially if you're working on frontend, this is really great. Tip number 14 is YOLO mode, typically you have to accept edits from Cursor, tell it it's ok to run this terminal command. If you turn on YOLO mode, it can just give it full reigns. I don't highly recommend this. You might run into problems, but just so that you know it's there. One use case that it's particularly useful for, you can tell it to write tests for my code, then code, then run the tests, make sure it works, and then generate more tests that you think will be useful, and fix anything that comes up that isn't going as accordingly. That's Cursor, 14 tips.

## Claude vs. Cursor: Real-World Example Comparison

If Cursor is so great, why do we even need Claude Code to begin with? I want to go a little bit over Claude Code versus Cursor in a real-world example of where Claude Code might be used over Cursor. This was a feature I actually had to implement at work and I gave it both to Cursor and both to Claude Code using the same base AI model. For Cursor, it just selected some solution and it acted on it. It went ahead and implemented some non-optimal solution for me. On the other hand, with Claude Code, it searched the web for three different repos, three different open-source options. It had a really high-quality level of analysis and it ended up saving me a lot of hours on this project where I wouldn't have saved if I was solely using Cursor. I think that's where Claude Code really stands out compared to Cursor. Claude Code does a lot more thinking than Cursor does.

For Cursor, if it's like something quick you need to fix, if you want to switch between different LLMs, that's when you want to use Cursor. It also gives you visual bonuses, whereas Claude Code is fully on the terminal. If you have really complex features that require research, that's when you want to use Claude Code. It does burn a lot more tokens as an FYI. Claude Code tends to over-engineer things. If you are working on something similar, it's actually to your disadvantage to use Claude Code because sometimes it'll just fully get it wrong because of how over-engineered it goes. I think a mix of these tools would be the best for your toolkit.

## Maximizing Claude Code

Going into Claude Code, a lot of these will be similar tips to Cursor. I'm just going to keep this section short and only give a couple items. You can think of Claude as these main four core components. One is skills. These are like rules except they're auto-invoked. How we said Cursor has rules and you can choose how to apply them, these are those auto-invoked rules that Cursor has just on Claude. Then we have subagents. These are like specific workflows that you want to run. These are great because they have access to their own specific MCPs. Then there's commands similar to what we talked about in Cursor. Then, finally, plugins. Plugins are a way of distributing your packages so you can bundle together skills, agents, and commands into one package and then share it with your team or even people outside of your team. Going into skills, you can write something like a blog to HTML converter. When you're writing a blog, instead of having this format, it can go ahead and write it in your specific company format. Just like a simple use case of it.

Going into commands, similar to what we talked about before, you can create a PR command on Claude Code as well. Number three, this is where it becomes interesting and where the differentiation happens is subagents. You can create different subagents on Claude Code. All these subagents will have their own context windows. On top of that, you can give them their own set of MCP tools. For example, you might set up a PagerDuty investigation tool and you'll give it access to Datadog and Slack. Whenever you get an alert about something that's going on, it can read that, then go into Datadog, figure out what the issue is, and then try to give you the solution or the root cause. Or you can have a documentation subagent where every time you make a PR, it can tell, I need to update documentation for this and it does it in your company format. Or like a Karen subagent where it will go through and see if everybody finished their tasks that week. If they didn't, they'll notify them or eliminate them.

Those are subagents. Biggest piece being, they have their own context windows, which can be a benefit over using Cursor. Then finally we have plugins. This is just bundling everything up together and sharing it with your team. This is a visual representation. You get commands. You get agents. You get MCP servers. You pack that all together and send it off.

## Battle of the AIs

I recommended Cursor and Claude Code, but this is definitely opinionated. There's a lot of people in the industry who will recommend different tools. I think a couple of the top two in contention is probably like Cline against Cursor, and then Codex versus Claude Code. I like Cursor more specifically because of the Composer model that it has. That's very quick. That is one of the main reasons for choosing it. Then for Claude Code versus other ones, I think Claude Code does a particularly good job of, not only I feel it implements the best solutions, it's really good at explainability. It's almost like they had education in mind, so when it does something, it really does a good job at telling you exactly what it did. Another cool one to watch out for, I think, is Google Antigravity, because this is the first time that one of the companies with the foundational models built an IDE and they have their own CLI. With the combination of all of these things, I think Google might end up winning in the long run, whereas the other ones haven't launched something like this. Something to keep in mind.

## Bonus: Documentation, PR Review, Evals, and More

Bonus, I want to go over a couple of different applications of using AI and different tools. One is DeepWiki. This is from Devin. This is AI for documentation for any repo. You just give it your GitHub repo link and it generates full documentation for you. Of course, it's not perfect, but it is great. Recently, I was working on some open-source code where it had literally one page of documentation for the entire repo, but they had DeepWiki set up for it. I would have spent so much more time reading through this codebase if that wasn't there. Especially if you have repos where you just have no documentation on, I highly recommend adding this. It has its own AI built within it, so you can ask questions based on whatever documentation it generated. Two is the AI Code Reviewer tool. I don't have an opinion on which one of these is the best. I've heard CodeRabbit is really great.

Just making you aware, it's great to have one of these for just catching small bugs, syntax errors, styling, things like that, and just giving a quick review on your PRs. Another one is just these low code tools, I think part of our responsibility as engineers is to help out our non-tech workers and enable them with these tools. They might not know about it, but you do. n8n is great for this. Lovable is great for this. Sometimes even Cursor if they're a little bit tech-savvy. My top recommendation is n8n. This is like a workflow tool that you can use, and it has connections with thousands of different tools. It's just nodes. You can drag and drop whatever nodes you need, plug them into it together. You have the ability to create nodes that have actual code in them. If you want to help them as well, you can generate code in there. This is great. It has AI agent implementation, so they can set up some email node pretty easily that has an AI agent, like go through the emails and write all the results to some sheet or simple tasks like that. I've seen many people rewrite workflows that save thousands of hours on these non-tech teams, so highly recommend that tool.

Another one is CLAUDish. If you really love using the terminal rather than using an IDE, but want to access different AI models, not just Claude, there's some open-source solutions. One of them is CLAUDish where it looks exactly like Claude Code, but you can access Gemini, Grok, DeepSeek, and Claude. I think it won't be the exact same as Claude Code actually is, but it does give you a little bit more of that deep thinking that it offers along with having access to different models.

Another thing is evaluating impact. I think it's super important that you evaluate if these AI tools or these AI solutions are actually working. In terms of metrics, I don't think there's any perfect metric for this. Some things that might be helpful to track are PRs merged, support tickets created, time to merge a PR, number of revert PRs. You can read the rest that are up there. I think it is important to just set up tracking of these because a lot of times we'll come up with a story, like a qualitative story that we see, and then we can help tell that story through the quantitative metrics that we've recorded. It's not like you want to always look at PRs merged over time, but at specific times you'll reference it for telling a story that you need to tell. Then there's also costs that come with this for all these different tools. I specifically didn't go into it for this specific talk, mainly because I think most of you will just be using company money. Not maybe as important. If you're personally using it, definitely check into the cost evaluations and the tradeoffs.

Couple of other final notes. One is that same research paper from Stanford that I referenced earlier, they went further into their study to see on which specific tasks does AI do well and on which ones does it not do the best. What they found out is specifically in greenfield projects where you're starting a codebase from scratch, that's when AI really excels, and they saw 35% to 40% increases in productivity there. As your project becomes more mature, that goes down. As well as when your task complexity becomes harder, that percentage also comes down to where there might be no productivity gain. Another interesting thing is language popularity is also super important. If you're using Python and Java, these AIs typically have a lot better chance of doing it right, whereas if you're using either newer coding languages or super old ones where there's not as much information out there on, it will not do the best. I think that's important to keep in mind. Even when you're starting a new repo and choosing what coding language you want to use, maybe you want to use one that's more AI-friendly.

## Beyond Writing Code - Lessons Learned

Then, finally, I want to go beyond just writing code. A couple weeks ago, I was hosting this Exec Ed talk at Berkeley, and we had the CEO of Databricks, Ali Ghodsi, come on. He shared a story that I felt like was really important that I wanted to share. They build these data connectors at Databricks. Ali said that typically it takes them four quarters to build one of these data connectors. Then some new AI tool came out, and Ali himself actually went and tested it to build one of these connectors. He was like, in a day, I was basically able to build the whole thing. Maybe not 100% of the way there, but 90%. He said, I took this tool, passed it on to my team, and told them, let's start using this and cut the four quarters down to just one day. They went ahead, they reviewed it, and they came back and told Ali, yes, this tool's great. We're going to cut it down from four quarters to three quarters.

Ali's like, what? I just did it in a day. He wasn't sure. He gave some pushback, but ultimately, they were like, because the X, Y, Z process is like, it's not right. We can't get it down to a day. He kind of gave up on it. Then he said there was this one German employee that came in and revamped everything. He was actually able to go from launching one of these connectors in four quarters to launching 21 in just a single quarter. Some crazy productivity boost. He had some learnings to take away from it. One is that people are just people. Humans are resistant to change, even the best ones, even the greatest engineers. What's really important, I think, when trying to make some of these changes is having an outside set of eyes come in. Because they will have a fresh set of eyes, maybe have new things that they think of. Also, when they're not on that specific team, they might be more likely to want to implement changes. Whereas when you're on your own team, you might be a little bit more conservative.

Two is yaysayers versus naysayers. He said like, the people who said you can't do it come up with their logical reasons of why you can't, and they're right. Same with the people who are yaysayers. They say you can do it, and they come up with their own reasons. They also typically end up being right. His takeaway was, put the yaysayers in the power positions to lead for innovation. Which I think is pretty important, because even if you are pretty anti-AI, if the execs at these companies are putting the yaysayers in the power positions, maybe just even selfishly for your own career, you want to try to adopt it a little bit more. Three is that software is just that, software. What they really figured out with these connectors was that 80% of the process was actually just like the other stuff, going through customer interviews and writing design documents and things like that. Only 20% of the process was actually coding.

What they did is they removed the other stuff. They took more risk of getting the project wrong. They built a project, knowing that this software might be wrong, and we might have to rebuild it. Even then, it's just so much easier to reproduce this software now that we will take that risk and cut out all the interviews and things ahead of time. A bonus that he shared is just reassess all previously made assumptions. A lot of things work, and we think, ok, this is the way it works, this is the way it's been, and we don't really question it. Now with AI, a lot of those assumptions are no longer correct, and that's what we really need to be doing, going back and revisiting all these assumptions and seeing if they still hold true today. Then his last tip was, every company is dying to hire that German guy. For a lot of people out there, he recommended push and be AI forward, because every company really is looking for that AI yaysayer to come in and revolutionize all of their different workflows.

## Downfalls to AI

Then another thing I want to share, like AI may be absolutely imperfect. There are a lot of downfalls to AI. Sometimes it has unintended changes. Sometimes you end up building suboptimal designs. A lot of times it hallucinates super confidently. We also might have skill erosion where we start to forget and get worse at certain things as we rely on AI more. Additionally, there might be new security threats that have come in and dependency risks where somebody builds an entire codebase using AI, has no idea how it works, and then a problem comes up and everybody is lost. Obviously, there's tradeoffs as there is with anything in life or in coding with using these tools, but I think the gains are probably higher than the negatives where we're at today. Speaking of absolutely imperfect, actually I saw this headline where Google Antigravity accidentally deletes somebody's entire hard drive data on their laptop and it didn't have access to it and it was just apologizing that it's deeply sorry. Just another example of like, there is downfalls. We should use them with caution and guardrails.

## Key Takeaways

Overall, for the key takeaways from this, one is that AI doesn't just speed up your coding workflow. Try to look at the tasks outside of that, writing documentation, writing PRD, design files, things like that. That's where we actually maybe see even more productivity gain is these types of tasks. Two is that I hope all of you here today will try at least one AI-powered IDE and one AI-powered CLI. I think what might have happened with those of you who are not using AI at all is like you maybe tried it out a year ago where it maybe wasn't all that it was advertised to be and you were like, no, I'm not going to use this, and you're gone.

If you come back and revisit it, you might be shocked at how far these tools have come in the past couple months. Especially with like Claude Opus 4.5, I feel it's just really amazing, and just give it a shot for one task and see how it goes. Then, finally, hopefully you guys start to add some rules and skills to help with your repetitive tasks, even if it's starting with something as minor as making a PR with one of these rules. Start to build that library of these different rules and skills and share it with your team and they will appreciate it as well. Then, finally, just reassess all previously made assumptions that you've had. You don't know what productivity gains you might be able to make.

## Questions and Answers

**Participant 1:** You talked a little bit about commands and skills for Claude Code. Could you say a little bit more about how you decide whether some functionality is better represented as a skill or a command?

**Sepehr Khosravi:** For a command, it's more when you want it to be something where you have to explicitly tell the AI, do this. That's when you would do a command. "Hey, I want you to make a PR." That would be a command. Skills are more the ones that are intelligently applied where you just don't want to have to tell the AI, know this stuff, but you want it to go and find out by itself. You can drop a rule in a certain folder for a certain UI and it won't always apply and you don't want to tell it to always apply, but the AI on its own will figure out, I should pick up this rule and use it.

**Participant 2:** This is about the brainrot thing, but more on the go. When I'm at my desk at home and I'm only like half programming, and I let AI do its thing, I realize I don't really have to be at my desk. It's completely irrelevant. I don't need a full keyboard or mouse. I tried using things like the CLIs on a phone to just do a full development stack, and it always fails me. One of the connectors won't connect to GitHub properly or just can't access some of my resources from the internet, and the AI usually just tells me I'm unwilling to do this right now, or wait three months. Have you found any solution that's sort of holy grail of having a developer in your pocket?

**Sepehr Khosravi:** If I'm understanding correctly, you said you tried it a little bit and then you tried using it on your phone, and then that's when you ran into problems?

**Participant 2:** Even on the desktop. I pretended I was on my phone on my desktop and I couldn't get that experience, basically.

**Sepehr Khosravi:** What tools were you exactly using?

**Participant 2:** I tried Claude, ChatGPT, Workspaces, and Cursor.

**Sepehr Khosravi:** I think part of it comes down to what I shared previously. It really depends on the task you're doing as well. If it is like a brownfield task, it's a more complex task, those are the stuff where AI can't fully do right now. If it could, maybe none of us would be here. It should be able to do those simpler tasks. If it's not, it might be in your prompting, or not giving it enough context or giving it too much context. If you're connecting a lot of MCPs and you said the AI was having trouble figuring out which one to use, I would recommend cutting down and just turning on the specific ones you needed to use. Hopefully some of the tips here you take away and you try out and you have better results on. Of course, not for every task.

**Participant 2:** I'm not talking about complex tasks either. Like, just make a repo, run CI/CD, it would struggle, things like that.

**Sepehr Khosravi:** I see. For things like that, I've had a lot of success and I know a lot of people are having success. I would just recommend maybe giving it another shot, trying a little bit better prompting and seeing where it takes you. There definitely is developer productivity gains out there. There are tons of people who I see using these successfully. Hopefully a couple of these tips help.

**Participant 3:** When you talked about context a couple of times you talked about when you give too much context it can actually degrade the performance. I think you just mentioned it with MCP and also too many tools. I'm curious with respect to when you add context as part of rules, so the Claude Skills or the Cursor Rules, have you experienced whereby you can provide too many rules or too many skills and it actually overloads in terms of just the sheer amount of knowledge you can give it and it starts degrading? Where would you say if you have seen that there's a good balance in terms of providing it the right amount?

**Sepehr Khosravi:** I think it's an art based on your codebase where as you add more rules you get a sense of now it's not performing as well, maybe I want to cut back, and you figure it out on your own when you're doing this. That's what I would recommend. Add them incrementally and at a point where you start to see maybe this is not doing as well as I want it to, then try to subtract.

**Participant 3:** When you add a sheer number of rules and skills, are they automatically added into the context or is that more based on how the agent goes to grab that?

**Sepehr Khosravi:** This is the previous part where we talked about where there's different types of rules and how you can choose to apply them. There are some rules where you can choose to always apply. I don't recommend putting in a lot of those because that hops in in every context window. Those are just like the absolute requirements that you want in every product. Then there's the apply intelligently ones. Those ones, you can start to apply as many as you like, and when you start to feel like it's degrading, maybe pull some away. Then there's apply to specific files, same with that. Then there's the apply manually. Those ones just like add as many as you like and then it's just up to you to decide when you want to add the context back. To the person who was struggling earlier, I think that's another great point. What I've seen a lot of times with people who are failing trying to use AI is they try to do everything in the same chat window and the context becomes rotten. Everybody, every new task, just start a new agent. Don't get stuck in that same window typing over and over again.

**Participant 4:** One thing I've noticed from my teammates who use AI tools more than me is they'll ship things that technically work but oftentimes the solutions are not as simple or not as elegant as what a human would come up with. We're not at this point yet but it feels like if you had a team that was consistently using AI tools for a year, two years, you would end up building up a lot more like tech debt, because the code was just not as clean. I'm wondering if you've run into that problem. If you think it's real or if you think that AI tools will keep getting better and we won't really have to deal with that.

**Sepehr Khosravi:** I think it's definitely true. I think it's really important that we now do code reviews with an extra pair of eyes more than we were before to make sure this doesn't happen. I think it really depends on your scenario too. If you're working at a startup maybe you're ok incurring this tech debt, but you're moving so much faster and it's better for your business. Whereas if you're at a larger company, you want to fight against this as much as you can, and I think you do that through PR reviews as much as you can and just building that practice as a team, of like, let's not ship out AI slop.

**Participant 5:** I was experimenting with Claude Code and the agentic workflows remotely, so I have an agent which goes to Figma, Jira, grabs all those things, connects to different MCPs and creates a PR. I was wondering like what kind of operational metrics can I track? Like you touched on that a little bit but I haven't been tracking anything like that. There are multiple steps that can be tracked to figure out if AI is using the right tools to do its thing, how do you track things like that when you're doing things remotely, creating PRs on a day-to-day basis?

**Sepehr Khosravi:** I generally recommend whatever you can track probably just track because it doesn't really hurt. For that in specific, like tickets created, tickets completed, time from ticket open to time for ticket completed would be a good one to add. Then in general, like with Cursor or some of these tools, if you're on enterprise, they give you an option to log what percentage of your code is written by AI. That's another good one to track. Then other just like simple PR, commit metrics, and things like that.

**Participant 6:** Do you have any insight on UX and UI tools that may exist out there?

**Sepehr Khosravi:** I'm not a frontend engineer, so I don't do the most of that. On the side, I do a little bit. The main insight that I have is that Gemini, although I don't think it's the best coding model, it's the best for UI right now. That's what I recommend. I don't do frontend, so I don't have the best advice on that.

**Participant 7:** You mentioned earlier that you use that like multiple models feature in Cursor to compare the new models. Then, after you run that, what do you look for to say like, I like what this thing did more than this one. There's the raw output, but I feel like there's also how well did it follow the instructions, did it do refactors? What are you looking at?

**Sepehr Khosravi:** We have like a SWE benchmark that most of these AI tools get run against when they first release. That's for just defining how well did it complete these 500 sets of tasks that we've defined as software engineers that every AI should be tested against. I don't think that's the best benchmark. That is the best benchmark we currently have because there's no other way to really test it better than to just look at it with your own set of eyes and see how you like it. I just think it differs by everyone. Who likes Claude the most out of all the AI tools? Who likes Gemini the most? Who's using Gemini the most? Who's using ChatGPT the most? It's just like everybody likes something different. I think it really just comes down to your personal taste of how much did you like the response that it gave out, not just the response code-wise, but how well it explained it to you as well maybe, and how fast it was. Make those tradeoffs for yourself and decide.

**Participant 8:** Given all these tools and all these ecosystems, do you see any use case where actually typing in code is still the best solution? Or in other words, when should we not use these tools?

**Sepehr Khosravi:** I think it comes to those two things. One, it's like those brownfield tasks where it's really complicated, maybe using coding languages that aren't as common. You might be just wasting time using AI on those. It depends on the project. If it's a project where it's really important, really critical to the company and you want to make sure like everything you understand really well, maybe you don't want to use AI for that as well if the cost of something failing is high. Then, yes, I wouldn't use it in that scenario.

**Participant 3:** Based on that quick poll there where everyone does have their own favorite agent or LLM, for me it feels like we're going back to almost like everyone having their favorite IDE and you'll see a lot of companies that are probably only going to say we want everyone to use IntelliJ. Others will say, no, just choose what will make you most productive. In a multi-agent organization, what do you feel like we are most missing to allow teams to choose whichever agent they want? An example there being, if we go back to rules, you'll have Claude Rules, you'll have skills, and everything being different formats.

**Sepehr Khosravi:** I touched on this a little bit earlier, that AGENTS.md format that most of the companies are starting to adopt, I think that's going to be the best bet. I hope that everybody ends up adopting that. In terms of IDEs, I think it just comes down to the company, like having to pay for a bunch of different IDEs probably doesn't make sense, so each company will probably just end up selecting one, but in a perfect world of, we'd have access to any of the IDEs that we want to use. I don't know what's the solution to that exactly. I have my recommendation. I would say use Cursor and Claude Code. I think those two are the best.

## Resources

I also have a couple of links up here, one is if you want to stay connected, there's a link with all my links up there. Two is this Claude Plugins Marketplace you could scan, if you just want to look at some different Claude Commands, and then one for Cursor Rules as well.

**See more presentations with transcripts**