{"slug": "the-future-of-agentic-development-redefining-the-data-practitioner-lifecycle-kit", "title": "The future of agentic development: Redefining the data practitioner lifecycle with Data Agent Kit", "summary": "Launch of Data Agent Kit, an open-source collection of tools and plugins designed to unify the fragmented landscape of data engineering and agentic development. It integrates directly into existing developer environments like VS Code and Claude Code, providing pre-codified skills and secure connections to cloud data platforms to streamline workflows. The kit aims to shift data practitioners from manual coding to intent-driven engineering by offering a single view of the data estate and reducing the \"context window tax\" associated with building AI agents.", "body_md": "The modern software development landscape isn’t happening just on one surface — it’s happening across an entire ecosystem of agentic tools. Agents are being developed at an unprecedented scale, and these agents require direct access to enterprise data for context and grounding.\nHowever, the current tooling for building agents and managing data is heavily fragmented. This can make it difficult to access data, increasing security risks, and causing broken developer experiences that hinder innovation.\nTo address this challenge, we recently launched Data Agent Kit, a unified, open-source collection of data engineering and data science skills, tools and plugins that integrate directly into the environments practitioners already use, such as VS Code, Claude Code, Codex, Gemini CLI and the Antigravity CLI. By seamlessly bringing together these core tools and skills with your enterprise data, the Data Agent Kit effectively serves as a comprehensive harness for agentic context, memory, and personalization. It provides:\nAgentic skills: Pre-codified pathways for interacting with your data estate, covering query optimization, ML best practices, data validation, data drift checks, governance, and troubleshooting.\nModel Context Protocol (MCP) tools: Secure connections between agentic workflows and cloud data platforms like BigQuery, AlloyDB, and Google Cloud Storage. Developers can now configure connection parameters for their cloud datasets and data processing engines without having to manage complex, manual pipeline code.\nPlugins and extensions: Native IDE integrations that enable rich, context-aware developer interactions.\nTogether, these Data Agent Kit capabilities help data practitioners go from manually writing code to intent-driven data science and engineering: defining the desired business outcomes, constraints, and success criteria, and allowing the AI-augmented system to figure out how to execute it. This shift is critical because today, when building agentic applications that navigate complex data architectures, there’s often a 'context window tax' i.e., developers have to manually paste vast amounts of schema metadata into prompts, eating up token limits and increasing latency. Meanwhile, data practitioners often lack guidance about how to efficiently query, optimize, and troubleshoot cloud data, while specialized, fragmented development environments cannot see across your entire data estate. Data Agent Kit helps with these challenges and others, providing the foundational capabilities data practitioners need for a new agentic way of working.\nRead on for an overview of Data Agent Kit’s features and benefits, how to install it and connect your local environment to your data estate, and an intent-driven engineering example.\nData Agent Kit makes your entire data estate available in a single view. This goes beyond providing a simple catalog for databases such as BigQuery, AlloyDB and Spanner; rather, it integrates data engineering and science tasks, orchestration pipelines, and jobs into a single interface. This allows practitioners to manage their entire data workflow — from discovery to production — without context switching. Data Agent Kit’s intelligent routing automatically chooses the optimal compute engine for your task — whether that’s BigQuery for SQL-native analytics and ELT, or Spark for custom Python transformations and distributed ML training.\nData Agent Kit offers a library of predefined agentic skills (e.g., ML best practices, ELT, building data apps) based on Google Cloud’s data engineering and science expertise. Rather than relying on generic LLM prompts, it codifies prescriptive guidelines into your workflow. This allows you to inject enterprise-grade data intelligence directly into your IDE or CLI.\nGrounded in this unified data, Data Agent Kit delivers native conversational analytics directly within your workspace, making it easy to explore your data. Powered by the same Gemini natural language to SQL technology found in our first-party agents (e.g., Conversational BigQuery and Looker), Data Agent Kit lets you run natural language queries to profile, search, and visualize your datasets.\nTo see how Data Agent Kit’s skills and MCP tools work together, consider a financial services scenario: Your company is facing rising fraud claims. With your transaction data stored in Cloud Storage, you need to build a high-confidence fraud detection model and schedule orchestration pipelines. Traditionally, this involves hours of data wrangling across multiple consoles. With the Data Agent Kit, you can complete this in minutes, directly within your IDE or CLI. Let’s see how.\nYou can get started with the Data Agent Kit in under a minute through an integrated setup process.\nTo do so, search for \"Google Cloud Data Agent Kit\" in your IDE’s marketplace (VS Code) or via the GitHub repo in your CLI (Gemini, Antigravity, Claude, Codex) from the links in the “Get started today” section below. Data Agent Kit automatically configures dependencies and checks your Google Cloud login status.\nClick the Google Cloud icon in your activity bar to authenticate via IAM. Once logged in, your Cloud Storage, databases, and catalog assets appear instantly in your workspace.\nUse the settings menu to set project IDs, regions, and verify MCP status to ensure all backend services are authorized. Data Agent Kit also includes a quick-start guide on using the tools and skills.\nWith Data Agent Kit installed, you can skip the manual ETL boilerplate, and directly describe your high-level goal to your coding assistant (e.g., Claude Code, GitHub Copilot) in natural language. The assistant leverages Data Agent Kit’s skills to plan and execute the workflow.\nPrompt:\nI have the\nraw transaction logs landing in the\nGCS bucket gs://fin-clearing-raw/.\nFirst,\ncreate a Spark notebook and (1)\ningest these logs into an\nIceberg table in BigQuery.\nSecond,\ncreate a dbt project to (2)\ndeduplicate them, (3)\nremove the transactions with invalid transaction id and store them in a separate Iceberg table, (4)\nstandardize the timestamps and perform any other necessary cleanup tasks (5)\nsync the output to another Iceberg table (6) join this output table with tables that have payer and payees identities and write the output to a final Iceberg table.\nThird, I would like you to\ntrain an ML model on Spark using a notebook to detect fraudulent transactions in the output table. I am thinking about a LightGBM model but I am open to any suggestions you might have. Use the relevant datasets in the project.\nFinally,\ncreate an inferencing step using Spark notebook to the above pipeline to perform batch inferencing and write flagged transactions to a Spanner table.\nCreate an\norchestration pipeline that first runs the ingestion then the dbt and next the inference notebook.\nBehind the scenes, Data Agent Kit plans a robust multi-step orchestration of the entire data lifecycle, from exploration to inference.\nStep 1: Notebook creation, ingestion and initial storage\nFind your bronze data — raw, unfiltered data on financial transactions — and bring it into an Iceberg table before doing the transformations.\nAutomatically create a Notebook to ingest the raw logs from Cloud Storage.\nWrite the necessary SQL, and store the ingested data into an Iceberg table in BigQuery.\nStep 2: Transformation (dbt Project)\nNow, clean the bronze data into silver and gold tables:\nData preparation: Deduplicate the transaction logs.\nFilter invalid IDs: Identify transactions with invalid IDs and store them in a separate Iceberg table.\nClean and standardize: Standardize timestamps and perform other necessary cleanup tasks.\nSync: Output the cleaned data to another Iceberg table, leveraging the BigQuery MCP server.\nEnrichment: Join the cleaned table with payer and payee identity tables.\nFinal output: Write the joined dataset to a final Iceberg table.\nStep 3: Machine learning and inferencing\nWith your gold table minted, it’s time for some data science: model training and inferencing. Here, the agent hands the clean data from the previous step to the model to spot fraudulent patterns.\nTraining: Use a Spark notebook to train an ML model.\nInference: Create a Spark notebook inferencing step for batch processing.\nStorage: Write all flagged fraudulent transactions to a Spanner table by leveraging the Spanner MCP.\nStep 4: Orchestration and execution\nFinally, you’re ready to move to production and schedule the whole orchestration pipeline: Ingestion -> Transformation -> Inference.\nWhen things go sideways: Agentic incident management and intelligent recovery\nIf an orchestration pipeline fails, not to worry, Data Agent Kit streamlines resolution using its intelligent incident management capabilities:\nIntelligent diagnosis: Automatically conducts root cause analysis to pinpoint failure sources\nAutonomous remediation: Drafts and tests fixes, bypassing manual debugging\nAutomated recovery: Validates and deploys fixes via automated Git workflows\nAnd there you have it: You’ve gone from raw discovery to a fully automated, fraud-catching machine in a matter of minutes, all from within the same UX. No need to hop between multiple browser tabs, IDE interfaces, or learn data engineering and science best practices — Data Agent Kit orchestrates a clean end-to-end flow leveraging various MCP tools and codified skills. Ultimately, this approach helps you achieve what matters most: shipping innovative, high-performance data applications at scale.\nData Agent Kit is available today in preview. Start by installing it in your favorite IDE or CLI:\nThen visit the documentation to learn more and get started.", "url": "https://wpnews.pro/news/the-future-of-agentic-development-redefining-the-data-practitioner-lifecycle-kit", "canonical_source": "https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/", "published_at": "2026-05-19 17:45:00+00:00", "updated_at": "2026-05-19 22:02:41.412521+00:00", "lang": "en", "topics": ["developer-tools", "data", "open-source", "artificial-intelligence", "cloud-computing"], "entities": ["Data Agent Kit", "VS Code", "Claude Code", "Codex", "Gemini CLI", "Antigravity CLI", "BigQuery", "AlloyDB"], "alternates": {"html": "https://wpnews.pro/news/the-future-of-agentic-development-redefining-the-data-practitioner-lifecycle-kit", "markdown": "https://wpnews.pro/news/the-future-of-agentic-development-redefining-the-data-practitioner-lifecycle-kit.md", "text": "https://wpnews.pro/news/the-future-of-agentic-development-redefining-the-data-practitioner-lifecycle-kit.txt", "jsonld": "https://wpnews.pro/news/the-future-of-agentic-development-redefining-the-data-practitioner-lifecycle-kit.jsonld"}}