Presentation: Platform Teams Enabling AI - MCP/Multi-Agentic Tools Across Linkedin

wpnews.pro

Transcript #

Karthik Ramgopal: I am Karthik from LinkedIn.

Prince Valluri: I'm Prince from LinkedIn.

Karthik Ramgopal: We're going to be talking today about how we are using AI to accelerate how we do engineering at LinkedIn. At LinkedIn, our vision is to create economic opportunity for every member of the global workforce. Our mission is to connect the world's professionals, so make them more productive and successful. What does this translate into? Pretty massive scale. We have 1.3 billion members. A large percentage of the world's population is on LinkedIn. We make 17,000 connections every minute on LinkedIn. We have roughly close to 2 million feed updates being served every minute. We serve the site in 36 languages. Like when I say site, I mean the apps as well. How does this translate into engineering scale? Again, pretty massive. We have about 7,000 deployables. We serve about 3.2 million peak QPS. We exchange 45 trillion Kafka messages every day. We have 10,000-plus repositories. We submit over a million PRs every year. This is even before a bunch of this agentic AI stuff landed. Pretty massive. What this means is that if we boost engineering efficiency, we can unlock a lot of business value, because engineering is at the heart and core of what we do.

Historically, how have we tried to fix it? Pretty conventional ways. We wrote some CLI tools. We said, these are simple scripts you can run to build tools, to pretty sophisticated automation tools, and go run it. We built self-serve UIs so that our people can go to mostly web-based UIs and do what they want. Last but not the least, we invested a lot in documentation. This is wikis and docs and other things. No documentation is useful if you cannot search for it, so we indexed it all into search systems. The fundamental problem with all of this is that there is absolutely no cognition in any of this. The human is still doing the cognition. The human is still doing a bunch of repetitive hands-on work whenever there are cognition loops involved in between, because the moment you get a cognition loop, you could simply not automate it before. With AI, we have an opportunity to change how we do this.

AI is the New Execution Model for Engineering #

Prince Valluri: The hardest part of engineering right now is not writing code. It's writing the right code, which requires coordinating across different tools and systems and teams. We're moving from humans doing all of this work manually and stitching together everything, to humans expressing intent and the systems executing that intent more reliably. Intent is what we want to change, what we want to fix, what we want to migrate, what we want to understand. We'd like it to be pretty explicit and structured. A plan is a system-translated step-by-step flow. It's about what systems are supposed to be touched, in what order. What tools are needed? What can fail, what should not fail? Execution is where the orchestration, the environment, and the governance become very critically important. Validation is where trust is really built. It ensures that every step is validated, whether it is through builds, or tests, or static analysis, or other kinds of safety checks like evals. Finally, output is a response or a reviewable artifact, like a pull request with full traceability.

Let's unpack all of these. Before that, once we say AI is an execution model, the next question is, where do we apply this? We focused on areas where three things are true. The work is repeatable. The cost of coordination is very high. The outcome can be relatively easily validated. This led us to three primary surfaces. Development. Now here, with coding the goal is to offload mechanical, repeatable, well-scoped coding tasks. Migrations are a great fit because it's effectively the same change repeated over and over again. Testing is already quite structured, so it makes it easy for agents to reason about them. With operations, deployments does not mean pushing the button, but it is more about preparing everything around it, like validating readiness, reviewing configs, surfacing risks. Reliability and resilience management is about pattern recognition. Agents can continuously monitor logs and alerts and spot issues.

During incidents, humans spend a lot of time gathering context, and agents can really help compress that by assembling timelines and correlating signals and suggesting next steps. Information retrieval is very interesting. We all have some level of code search, but I think code search is a bit more semantic. Engineers constantly ask, where is this used? Is there another implementation of what I'm trying to build here? That's been very helpful. For analytics, a lot of the insights already exist in our data systems. Agents make it easier for us to formulate those queries, validate results, and very importantly, explain them. Finally, troubleshooting often requires stitching together clues from multiple systems. Agents can really help maintain institutional knowledge for everybody. A common pattern here is that we are aiming to use agents for execution as much as possible and leave the control and judgment for humans.

Let's look at what's common across these use cases. Scaling AI in engineering, is it about building one great agent, but it is about avoiding 100 slightly different fragile implementations? Without shared foundations, every team will solve the same problems again and again, and the results are going to be very inconsistent. It turns out there's a lot in common. Starting with orchestration, every agent-driven workflow needs a reliable execution backbone. This includes standard runtime, scheduling, retries, the ability to ask humans for help. Moving on, the agents are only as good as the context and tools we give them. They need consistent access to engineering knowledge and tools that are safe and auditable and designed for machine invocation. Finally, nothing really earns trust without evaluations. Continuous testing, golden tasks, regression detection, and full traceability are completely non-negotiable. This is how we make sure that the agents improve over time.

This is the trap we want to avoid. Every team building their own version of orchestration, tools, context, and safety. When this happens, we just get very expensive repetition. This is why enabling AI cannot be an individual team-by-team effort, it is a platform responsibility.

How do we use these agents? Agents are only as good as how well we specify our intent. Human intent is rarely precise. We all start with a goal in our head and we express it incrementally, with assumptions and shortcuts and missing context. We rely on back-and-forth clarification and intuition. When we say things like, you know what I mean, or, do the usual thing, agents don't have intuition. From their perspective, what's the usual thing? They don't know what to do unless we say it. Ambiguous intent often leads to hallucinations, unsafe changes, scope creep, and just inconsistent outcomes. For agents to be reliable, intent has to be very explicit. What's in scope? Very importantly, what's out of scope? What systems can be touched, cannot be touched? What does done mean? Structure brings clarity. This is how we get our agents to stay on point.

We have to turn our messy intent into structured specs that can be safely executed. This is not a design doc. This is not a 30-page RFC. It's a structured description of what needs to happen, not how to write code. Let the agents figure that out. For steps, we break the work into clear, ordered steps. This helps the agent plan and sequence the work, and work deterministically and help humans understand what will actually happen. We explicitly state what tools can be used. The agents will operate within this defined capability set. Acceptance criteria is really the definition of done. Additionally, there are obvious signals like test passing and green builds, no new warnings. Finally, guardrails are just as important as steps. This defines what is out of scope, what systems should not be touched, and this really prevents surprise behavior. This works because once the spec is defined, the machine handles the mechanics of planning, execution, retry, validations, and the humans stay focused on intent and judgment.

Orchestration Layer #

Let's talk about orchestration. This is what the orchestration layer looks like from a high level. Rather than describing these boxes abstractly, let's walk through what happens when a coding agent runs. The interaction starts in a familiar place, like a chat or an API. The developer submits a spec, and from there on out, everything happens async. The orchestrator spins up a sandbox with the agent, with identity and permissions and a consistent execution environment. This keeps the behavior repeatable. Then the agent analyzes the code, understands the task, updates multiple files, runs tests, fixes issues. Nothing's perfect the first time. All of the state is persisted, and if the job s or retries or resumes later, nothing really is lost. The agent is then able to reach out to various systems through the tools that we've given it, such as code search or dependency analysis, and every tool call is scoped and permission checked and always logged for audit.

In reality, systems fail all the time. Tests flake, builds fail, repos may be unavailable. All of this is very natural. The orchestrator is responsible for handling retries and backoffs, so the agent can just focus on the task at hand, and this is very critical for reliability at scale. Finally, like I said, every step, every tool call is always logged for the agent plan, what tools it called, what outputs it saw, what reasoning it did, and what code it changed. When it creates the PR, the entire trace comes with it, so whoever is reviewing the PR can always check what happened. This is where orchestration really matters. The whole system becomes a more reliable execution machine when we have that. The same orchestration model applies whether the agent is coding, or debugging, or responding to incidents, or anything else.

Let's take a very specific use case and see what happens. The request is something familiar, upgrade a library version across all impacted repos. This is exactly the kind of task that is very painful for humans, I'm sure everybody here resonates with that, but it's perfect for agents. The first thing the system does is pull the dependency graph and figures out which repos are impacted. That's very deterministic. From that point onward, the orchestration takes over and provisions a sandbox for every repository. Within each sandbox, the agent follows the same structure flow, create a plan, make changes, update the dependency, run builds, tests, ensure nothing breaks. Once all validations are good, create a reviewable PR. At all points, the agent has access to certain tools, which we explicitly gave it for this task, and now it can use that to understand the task better. As always, we log every tool call.

This really is the difference between AI helping you once and AI helping you every time. The same workflow runs whether it is one repo or 100. Why does it matter? Without orchestration, the same task can produce different outcomes depending on timing or environment or order of execution. Having good orchestration ensures that, at least for the most part, the same intent produces the same outcome. We also don't want rogue or fickle agents, so we want to ensure that every tool is explicitly made available and then every tool call is scoped and permissioned. We're able to give power to the agents without giving them too much authority. Failure patterns. Like I said, failure always happens in the real world. What we want to do is make these failures completely non-events. Having consistent failure handling and retries just helps us achieve that.

Agents, if we have agents that are defined in the same framework and speak the same language, it makes them easier to interop with each other. This allows them to work together where appropriate. More importantly, it allows us to observe them in the same manner. Finally, development is where the platform pays off. Teams are no longer trying to build the same plumbing and instead get all of the platform benefits for free. They're free to focus on the core problem that they are trying to solve. With good orchestration, the agents are not only more usable, but it also makes them more reliable and helps us make them more governable.

Tooling Layer #

Orchestration is great, but the real work gets done when agent makes tool calls. You can have the best orchestration engine in the world, but if you don't have a way to safely manage the tools exposed to the agent, the outcomes are going to be poor. Whatever the agents are doing, they're only doing that through the tools. Let's talk about the tooling layer here. What's important for tools is schemas and predictability. Agents are not great at guessing. We do not want them to guess. We focus on giving them structured inputs and outputs. These need to have the same shape and form for every invocation to keep things more consistent and help keep things more deterministic. This also means that we need to have them versioned so we can safely evolve them over time without changing too much. For safety, we don't really want agents to hit APIs directly. It creates room for ambiguity.

They always go through approved tools with clear permissions and boundaries. Read versus write is always explicit. Privilege escalation is always explicit. At the risk of sounding like a broken record, every tool call is always locked. Human tools are built around UIs and dashboards. Agent tools have to be built around machines. What does that mean? It means we need to have proper error codes and stable interfaces and rich context instead of just raw dumps. If there is noise, the agent will guess. We know when that happens, it just leads to horrible hallucinations.

A critical advancement in the world of tooling lately has been MCP. The Model Context Protocol by Anthropic makes it easy for us to easily and, more importantly, predictably declare exposed tools and for agents to invoke those tools. The first reason it's so powerful is it gives us model independence. Different agents powered by different models can all use the same set of tools without any coupling to proprietary function calling formats or specific APIs. Every week, there's a new model at the top of the benchmarks, and we might do good by avoiding getting locked into a specific model. The second idea is that tools are more than just functions. They're more like capabilities. Each capability comes with permissions and retry logic and schema validation, and observability hooks, all built right in, so when the agent invokes the tool, it's actually getting data and safety and structure and contract. We can bake all of that right in.

Some of the examples of tools that we use pretty widely internally are code source tools, like semantic search and dependencies, which help the agent understand code better, especially in multi-repo environments like ours. Observability tools include access to logs and metrics, alerts, and incidents for operations. Knowledge tools like PR history, ownership, and system architecture just help ground the agent in reality. What's reality? This is one of the biggest lessons we've learned. When an agent does not do very well, our first instinct is try a better model. This only takes you so far. An agent does not know what it does not know. Most failures do not come from a lack of reasoning ability. They come from missing, stale, or just incomplete facts. Even with great tools, the agent can only act on what it knows.

Context #

The next layer is about making agents smarter. It's about grounding them in engineering reality, what we commonly refer to as context. When we say agents need more context, this is what we mean. On one side, we have an agent operating in isolation. It could be using a very smart model, which also sounds very confident. We know that without the right facts, it's just going to start guessing or hallucinating and just produce shallower results. It doesn't know how your system is built and anything that isn't in the repo already. On the right, the model is surrounded by more reality: code, dependencies, tests, ownership, infra, operations. It actually has something to reason about and fewer chances of guessing. Not zero, but few. Context is not really just one thing. It's multiple signals coming from different sources that together ground the agent. The factual knowledge that we provide to our agents today to make them operationally better is this.

We're a heavily multi-repo engineering organization, so the code is split into thousands of smaller, some not-so-small repos. When an agent is changing something, it needs to be aware of the code, the dependencies, what can break if something changes. Information from past code changes always helps: how similar problems were solved in the past, what reviewers cared about, what broke before. An experiment we did here was we took a bunch of recent PRs and created a semantic understanding out of it. If we asked the agent to upgrade, let's say, Gradle to version 9, it can easily find other PRs that did that in the past in other repos and identify patterns that are generally followed for migrations like that, or dependency updates like that, which has been really helpful. All of these signals and information just from different systems, and create a comprehensive knowledge base for our agents to interact with. Given the complexity of this, this just reinforces the point that this should be part of the platform and not individual team's responsibility.

Now that we have context, how do we serve it? The answer is not all at once. We start with everything that's available, let's say our information universe, code, dependencies, PRs, what have you. This is far too large and far too noisy to hand over to the agent all at once. The first step is scoping. Based on the task, we narrow down this universe, only the repos involved, only the services impacted. This immediately removes a lot of irrelevant information. Then we try to get more precise based on the step that the agent is currently on. Let's say during planning, the agent gets access to the dependency graph and ownership information. With specific context, the agent can take concrete actions. This is essentially how we avoid hallucinations and over the agent with context. Orchestration figures out when to fetch info, tools figure out how to fetch info, and all this context controls what the agent knows.

Memory - Grow More Intelligent Over Time #

There are also things that happen over a period of time, this is where memory comes in. Memory is how agents get better with use. Not smarter in the general sense, but more reliable, more accurate, and more aligned with what we expect out of them. How does it work? Everything still starts with an input spec or prompt. Before execution, the agent pulls relevant memories, past successes, known failure patterns, common fixes, reviewer feedback, only what's relevant to the task. This memory is combined with the fresh task-specific context and helps with the planning. LLMs are stateless by nature. They need to be provided with all the right context in each call. When the model is triggered, it knows both what's true now and what it has learned before. This really reduces mistakes. After an execution, the system decides if it has seen anything new that's worth remembering.

If a fixed work or reviewer accepted the changes or tests were broken because of a change that should not be made again, it's just a critical feedback loop that helps agents get better over time. There are different kinds of memories. What we have, working memory, which is very short-lived and task-specific. Things like message history, intermediate reasoning, immediate task context, it only exists for the duration of the task, and then it's gone. Next is long-term memory. This is where learning happens. This is where validated lessons live. Procedural knowledge, like this migration pattern generally works, or episodic knowledge, where a certain class of change always causes a certain set of tests to break. Don't do that. At the bottom is collective memory. This is shared across all agents and all teams. These are patterns, conventions, best practices. This is essentially the institutional knowledge.

Autonomy and Human in the Loop #

When the agents are actually working in the execution environment, the first concern often is, what if they go too far? This is why sandboxes are so important. We want to give the agent freedom, but put guardrails in place to prevent escalation. Within the sandbox, they can read and write files, query dependencies, run builds, validations, push changes to PR branches for reviews, everything that gives them enough autonomy for meaningful engineering work. What they cannot do is deploy changes to any environment, whether it's staging or production, merge changes to the main branch, or make direct system calls for irreversible actions, or have unrestricted internet access. This separation is very intentional. The autonomy lets the agent move fast, but the lack of authority keeps them safe. What do we do about this gap? This is where humans come in. They are in the driver's seat to make the decisions and use their judgment to influence the direction of the agents.

It's not really micromanagement. It's a way for humans to intervene where appropriate and do what they are best at doing, which is decision-making. Human in the loop doesn't mean humans are doing the work of the agents, but they're doing higher leverage work. First, humans ensure accuracy. They review outputs, not each step. Things like, is the change correct? Did the change meet the intent? Second, humans provide control, approvals or rejections. Should something be merged? Should something be deployed? These are always reserved for humans. Finally, humans teach the system. This is where our institutional knowledge grows. Every feedback becomes training data and helps make the system better. Every approval, every rejection tells the system what good looks like or what bad looks like. The key idea here is very simple. Instead of bolting humans onto agent systems, we just design for them. Agents run continuously. They plan, execute, validate, and keep making progress without constant supervision.

There are explicit points, places in the workflow where judgment is required, not because the agent failed, maybe it did, but because authority is needed. At these points, humans step in, review artifacts, make decisions, resolve ambiguity. Once the decision is made, the agent continues. This just ensures that agents don't run completely unchecked.

Invocation Modes - Not All Invocations are the Same #

Now let's talk about invocation modes.

Karthik Ramgopal: There are actually different ways to invoke these agents. We're going to talk a little bit about it right now, and we will see examples later. The first invocation mode is actually the common/popular one, which is like online invocation. This is similar to any other chatbot-like system, where you try to make a synchronous request, you're waiting for responses at the other end. Latency is really important because you have a user waiting at the other end. A common technique used here is to use streaming progress updates to hide the latency. Either you show thinking states or you just stream the final response itself. Another pattern is to use the like nearline invocation, which is the agent gets triggered by environmental changes of interest. It starts executing. Now here, throughput is obviously way more important than latency. More importantly, you need a way to asynchronously notify a human at the other end without the human literally sitting in front of a UI.

Last but not the least, you have the batch invocation, which has triggered on schedule via some periodic runs. Typically, here, the throughput is even more important than the nearline mode as compared to latency. Again, you still need the targeted asynchronous user updates. These may mirror traditional compute paradigms, which we've seen for deterministic software programs. That's for a reason, which we will see later.

Pick The Right Model #

How do you pick the right model? When I say like model, a lot of people think I must be talking about AI models. Partially. It's very important to pick the right execution model even before getting into AI models. The first question you need to ask yourself is, do I need reasoning? If you don't need reasoning, do not use AI. Just write code or write rules. You can use AI to generate the code or generate the rules to ease your process, but way cheaper, way more deterministic, way faster. I sometimes say, move AI to the left instead of to the right. Let's say you do need reasoning at inference time. You have to ask yourself then, what are your quality, scale, and latency characteristics which you're looking for? If you're able to use a commercial cloud-hosted model, go for it. If you cannot, you need to do something custom because you do not fit into the quality, scale, and latency criteria, start picking a custom model.

Even for the custom model, you have a customization like pyramid. Everyone is like, I'm going to train my own model. No. Bad idea. Do not do that. Start at the bottom, which is to pick a commercial model which you're just accessing as inference over an API, ideally. Try to do whatever you can to make it work. Retrieval-augmented generation, cache-augmented generation, and there are very sophisticated ways of doing this with knowledge graphs and things like that. Try all of that. Let's say that doesn't work. Start doing post-training optimization. This can be as simple as supervised fine-tuning to more sophisticated forms of reinforcement learning with human feedback, preference optimization algorithms. Again, be really careful, because every time you do this, you are signing up for a lot of operational overhead, maintenance overhead, and more importantly, upkeep overhead because the models are evolving so fast. Let's say even this does not work, then go down the pre-training route.

Trust Comes from Consistent, Repeatable Results #

Trust in any of these systems obviously comes from consistent, repeatable results. How do you establish trust? In a traditional deterministic software program, we say we go write tests. Here evals take the place of tests. How do you structure your evals? Do not believe in vibes. Vibes are not good enough. You need curated golden datasets which explore all possible permutations and combinations of inputs and outputs. You need objective signals as much as possible. If you cannot create objective signals in some areas, you need subjective signals, which is where LLM-as-a-Judge patterns are very useful. Most importantly, you need regression detection, because as your system evolves, it is very important to catch regressions when they happen and squash them. Again, humans like us are still useful. Humans should be used to define policy and scores for eval. Humans also offer a lot of nuance which these AI systems cannot yet capture.

It's important to automate for scale. Humans cannot do automation at scale. This is very important. You need to invest in a bunch of tooling to visualize results. Otherwise, if your evals go off track, it's very hard to know where they went off track. Auditability and transparency is really important because when the agent is running, it's not just prompts being fed to the model. It's a bunch of these things. You have a bunch of tool calls executing. You need to capture the state of the agent, which is, how did it decide to take the next step? You need the reasoning chain and traces of the agent. You need, of course, all the interactions with the model. You also need environmental awareness, because a bunch of these agents get triggered by changes in the environment. Even when you execute, the environment matters. You need a bunch of world-class visibility into all these things.

Divide and conquer, classic computer science paradigm. How do we apply it here? When we talk about agents, it's often not just a single agent running as a single box. We are talking about multi-agent systems, which work in close collaboration with each other. I'm taking an example here. We have an incident investigation agent, which is supposed to investigate incidents and try to fix them by itself as much as possible. This is what you see as a user of the system. This is your facade. Behind the scenes, the incident investigation agent is actually using the insights agent to try and understand what happened. Once it understands what happens, it uses the coding agent to actually make a fix. Then it gives it to the evaluator agent to make sure that things are ok, which may use an optimizer in the background, again, to optimize things, after which the code gets pushed and hopefully the incident is resolved.

Agents We've Built, and What They Do #

We've talked a lot about theory. Let's talk a little bit about agents we've actually built and what they do. The first example is actually not an agent which we've built. It's an agent which the GitHub team has built, called GitHub Copilot. You just take GitHub Copilot, introduce it into an engineering organization as large and as old, like we've been there for 20-plus more years as LinkedIn, it just won't work because there are so many local, contextual organizational stuff which GitHub Copilot simply doesn't know about. How do we augment Copilot with this information? We built custom MCP servers. We have local and remote MCP servers which are able to inject LinkedIn-specific context. For example, here you can see an example of how we make LLM calls in the LinkedIn codebase. Again, we have a wrapper. This essentially uses the code search tool to give examples of how to use the wrapper.

More importantly, this results in predictable and repeatable results. The code which gets generated gets generated in a way in which we would want the human developers to write this code. This is very important. It's not AI slop. It ends up saving a lot of on-call and developer time because before all this, it would be a question to the on-call. How do I do this? Give me an example. Or I go to a search system and I search for it, but it's not in my IDE. You're trying to basically meet the developers where they are, which is pretty key to the system. More importantly, it results in a decentralized and scalable development model. Every team can spin up their own MCP. Right now, this is what we are telling a bunch of platform and tooling teams as LinkedIn. Earlier, you thought about API access and human access. Right now, also think about how agents access your system, plan, and build for it.

Background coding agent. This is what Prince was talking about earlier. This is perhaps one of the most sophisticated multi-agentic systems we have in the engineering space at LinkedIn for internal use cases. It starts with this template which says, what are you coding next? It's where you describe the spec, which Prince talked about earlier. Once you submit the spec, we have the task getting executed asynchronously in the background. All the code changes are context aware of the LinkedIn engineering context as well as local context, when issues happen, humans take action. It all runs on isolated sandboxes for safety. You have a full audit trail of what the coding agent ended up doing. The result of this is a PR or a series of PRs. Before that, how do you translate from spec to PR execution? People keep doing things again and again. As much as we would love it to be different, various aspects of our job are repeatable and redundant.

If you're able to capture this repeatability and redundancy in the form of templates, you don't need to do the rethinking again and again. You can also keep optimizing these templates, which is what we do here, where we offer reusable prompt templates, which essentially have battle tested patterns so that you aren't repeating the same problem again and again. Observe agent. This is actually an example of a nearline agent, which gets triggered by alerts. As soon as an alert happens, it triggers this agent. It comes to life, and it starts working. This helps reduce a bunch of on-call toil for triaging issues as well as for mitigating them. More importantly, it offers a single pane of glass view across different systems. The human developer or the on-call engineer does not need to go across n different systems to try and find out what went wrong. They can see it all in one place. More importantly, it has elephantine like memory. It can "remember" a bunch of stuff. I say remember within quotes because it can use tools and memory systems to pull in a bunch of historical insights and trends and similar incidents to understand and root cause what happened.

UI QA agent. There's an interesting story behind this. The reason the story is important is because we do not want to throw AI at a problem just because it's cool. We are working on server-driven UI at LinkedIn. What this means is that we have a component library and an engine on all the three client platforms we support, which is iOS, Android, and web. The components are defined on the server and the server controls the UI. One of the problems with this setup and scaling this setup, in addition to the obvious advantages, of course, is, how do you validate functionality? Let's say I had a share box or a comment box, and I changed the UI treatment by rejigging the components on the server, how do I validate that the functionality is still working on the clients? I cannot write code because the UI is changing dynamically, drastically. I cannot keep up with the code even if I wrote it because it can change continuously, which is where we bring AI into the mix.

Because ultimately what you want to test is that the functionality works, not the internal mechanics of how it works. We let the AI figure it out, which is where the UI QA agent comes in, which executes periodic batch runs for us to identify regressions. We describe the test cases using natural language, which is how we describe what should work, things to look out for. Like for example, we may say things like, you should be able to write a comment, and after you write the comment, it should appear in the feed update, or you should see a toast saying posted. Specs like this which describe the functional spec of what we want and not the actual code mechanics of go verify this here. This helps us identify and root cause regressions, and is a huge replacement for both manual testing as well as expensive to write and maintain integration tests. It works across iOS, Android, and web.

Finally, let's talk about the analytics agent, which is a traditional online chatbot. It is an agent which offers multimodal output. It can output reams of text. It can output charts, visualizations. Help you understand and analyze the bunch of rich analytics data, which we have at LinkedIn. Surprise, it's not just used by engineers, it's also used by product managers and business operations because data is the ultimate truth. It saves a bunch of our time for our data scientists and analysts. More importantly, this is also built in a way such that if you have a new data source, you can onboard it easily onto the system and it just becomes more intelligent. This again results in a decentralized and a scalable development model.

Best Practices: Derived from Experience #

What are some of the best practices which are derived from our experience, or maybe I should say like battle scars? The first best practice, as Prince alluded to before, is, please describe what you need clearly. You do not want the agents guessing and hallucinating. Which means, write clear specs, start with structure, and ensure you have the appropriate human-in-the-loop fallbacks. Agents aren't as autonomous yet as we'd like them to be. Invest in platform abstractions. Ensure that you do not solve the problem repeatedly in slightly different ways, again and again, which means, invest in abstractions for orchestration, context engineering and tool calling, as well as evaluations and safety. Trust me, it'll pay off. Build smart, which means, resist the temptation to build everything ground-up. Reuse what exists.

In a lot of the systems we've described, we use a bunch of open-source software. We use open protocols. We use existing systems, existing storage systems, RPC systems, queuing systems. Try to reuse as much as possible. Try to also buy or extend. By extend, I meant like how we've extended GitHub Copilot with MCP to fit into our ecosystem, over building. The reason is this space is moving very fast. You want to keep up with this space. As much as you can, try to buy or extend instead of building. Finally, adopt open standards so that you get easy interop and you can actually make your prior strategy of extension or buying work in your ecosystem. Most importantly, ensure that you hold a human accountable for any form of decision-making.

See more presentations with transcripts

source & further reading

infoq.com — original article