{"slug": "stop-passing-files-between-agents-with-local-paths", "title": "Stop Passing Files Between Agents With Local Paths", "summary": "A developer describes the gap between local agent development and production deployment, where files written to a local filesystem become ephemeral artifacts in containerized or serverless environments. The developer proposes treating agent outputs as 'artifacts' with metadata, stable IDs, and lifecycle rules, rather than simple files at paths.", "body_md": "I got very used to running agents locally.\n\nThe workflow was simple: run the agent, let it write outputs into my filesystem, then inspect everything in an ./outputs folder.\n\nMarkdown reports, JSON files, screenshots, charts — whatever the agent produced, it was right there.\n\nThen I deployed it.\n\nSame agent, same logic. But now the \"output\" lived in a container filesystem that vanished the second the task finished. A retry wrote `report_20260313_103042.pdf`\n\nnext to `report_20260313_103041.pdf`\n\n. And when I wanted to share this with someone, I no longer had a clean link.\n\nNothing about the agent had changed.\n\nEverything about the environment had.\n\nIf you build agents that produce files (reports, datasets, images, JSON dumps), you've probably hit this gap.\n\nLocal development hides it.\n\nProduction hands it to you on day one.\n\nOn your machine, persisting agent output is trivial:\n\n``` php\nfrom pathlib import Path\n\ndef save_report(content: bytes, run_id: str) -> Path:\n    out_dir = Path(\"./outputs\") / run_id\n    out_dir.mkdir(parents=True, exist_ok=True)\n    path = out_dir / \"summary.pdf\"\n    path.write_bytes(content)\n    return path\n```\n\nThat's it. Write bytes, get a path, move on.\n\nYou can list the directory, cat the file, open the PDF, or hand the path to the next script in your pipeline.\n\nFor one person running one agent on one laptop, this is perfectly fine.\n\nThe problem is not local development.\n\nThe problem is mistaking “it works on my laptop” for “I have a storage layer.”\n\nProduction agents do not get a reliable ./outputs/ folder.\n\nThey run in environments where the filesystem is temporary, isolated, or both.\n\nServerless functions may give you /tmp, but it is scoped to the execution environment and often limited in size. Containers lose local state when they restart. Background workers, queues, and orchestrators can run each task on a different machine.\n\nAnd retries are not an edge case. They are part of the system.\n\nYour orchestrator will eventually rerun a failed step, and now you have the same logical output produced twice.\n\nThen there is the human in the loop.\n\nAgents produce things people actually need to read: compliance PDFs, analysis summaries, generated slides, CSV exports, charts, screenshots, debug bundles.\n\nThose people do not have SSH access to your worker node.\n\nThey need a link, not a filepath on a machine they will never see.\n\nSo the production checklist starts looking very different from local dev:\n\n| Local | Production |\n|---|---|\n`path.write_bytes()` |\nUpload to durable object storage |\n`./outputs/run_42/` |\nQueryable grouping by run/session |\n| \"It's in the repo\" | Stable ID retrievable from any machine |\n| You remember the filename | Idempotent retries that don't duplicate |\n| Files live forever | TTL / lifecycle rules |\n| You Slack the file manually | Shareable download URL with expiry |\n\nI have talked to a few teams that hit the same wall.\n\nThe agent logic is done.\n\nNow the artifact plumbing begins.\n\nHere's the distinction that changed how I think about this:\n\nA file is bytes at a path.\n\nAn artifact is a file plus context.\n\nThat context is what makes the output usable after the agent is done.\n\nFor example:\n\nA PDF sitting on disk is a file.\n\nA PDF tagged with session_id=pipeline_run_42, agent_id=report-writer, model=claude-sonnet-4, retrievable as art_2xk9f7v3m1p0, and set to expire in 30 days?\n\nThat is an artifact.\n\nYour agent may still produce files.\n\nBut downstream agents, debug tools, production workflows, and the humans waiting in Slack all need artifacts.\n\nMost teams do not start by building an artifact store. They start with S3 (or R2, or GCS) and a slowly growing feeling that object keys aren't enough.\n\nThe pattern I keep seeing, including in our own user research, goes like this.\n\nFirst, put the bytes in object storage:\n\n``` python\nimport hashlib\nimport boto3\n\ns3 = boto3.client(\"s3\")\nBUCKET = \"my-agent-outputs\"\n\ndef upload_file(local_path: str, tenant_id: str) -> str:\n    data = open(local_path, \"rb\").read()\n    content_hash = hashlib.sha256(data).hexdigest()\n    ext = local_path.rsplit(\".\", 1)[-1]\n    key = f\"{tenant_id}/{content_hash}/{ext}\"\n    s3.put_object(Bucket=BUCKET, Key=key, Body=data)\n    return key\n```\n\nThen you realize the object key is not enough.\n\nYou need to know which run produced the file, which agent created it, what kind of output it is, when it should expire, and how to find it later.\n\nSo you add a metadata table:\n\n```\nCREATE TABLE artifacts (\n  id            text PRIMARY KEY,\n  tenant_id     uuid NOT NULL,\n  filename      text NOT NULL,\n  content_type  text NOT NULL,\n  size_bytes    bigint NOT NULL,\n  content_hash  text NOT NULL,\n  session_id    text,\n  agent_id      text,\n  metadata      jsonb NOT NULL DEFAULT '{}',\n  expires_at    timestamptz NOT NULL,\n  created_at    timestamptz NOT NULL DEFAULT now(),\n  deleted_at    timestamptz\n);\n\nCREATE INDEX idx_artifacts_session\n  ON artifacts (tenant_id, session_id, created_at DESC)\n  WHERE deleted_at IS NULL;\n```\n\nThen wrap it in an API:\n\n``` python\ndef create_artifact(file_path, session_id, agent_id, metadata=None):\n    key = upload_file(file_path, tenant_id=current_tenant())\n    artifact_id = f\"art_{generate_id()}\"\n    db.execute(\n        \"\"\"\n        INSERT INTO artifacts\n          (id, tenant_id, filename, content_hash, session_id, agent_id, metadata, expires_at)\n        VALUES (%s, %s, %s, %s, %s, %s, %s, now() + interval '30 days')\n        \"\"\",\n        (artifact_id, key, session_id, agent_id, metadata or {}),\n    )\n    return artifact_id\n```\n\nCongratulations, you're on your way to building an artifact store.\n\nThen the other 80% shows up:\n\nI've watched engineers spend time building this type of wrapper and still not get dedup, TTL, or session semantics right.\n\nThis is not a knock on those teams. It is necessary plumbing. But necessary plumbing is still plumbing - and most teams should be spending that time on their product, not rebuilding agent infrastructure.\n\nIf you are deciding whether to build this yourself or use a purpose-built layer, this is the basic checklist I would use.\n\nYou need to answer one question quickly:\n\nWhat did this pipeline run produce?\n\nNot grep logs.\n\nNot list an S3 prefix and hope the naming convention held.\n\nOne query:\n\n```\nartifacta ls --session pipeline_run_42\n```\n\nA session should be whatever your orchestrator already uses: pipeline_run_42, daily_batch_20260313, customer_report_8841.\n\nIt should not require a separate “create session” step just to group outputs.\n\nWhen a report looks wrong three weeks later, you need to know what produced it.\n\nWhich agent?\n\nWhich model?\n\nWhich stage of the workflow?\n\nThat means agent_id and metadata should be captured at upload time, not buried in logs you hope still exist.\n\n```\nclient.push(\n    \"analysis.json\",\n    session_id=\"pipeline_run_42\",\n    agent_id=\"summarizer\",\n    metadata={\"model\": \"claude-sonnet-4\", \"stage\": \"final\"},\n)\n```\n\nObject storage metadata is not enough.\n\nHeaders are limited, awkward to query, and easy to make inconsistent across a pipeline.\n\nYou want structured metadata stored with the artifact record and filterable when listing artifacts.\n\nAgent systems usually need two forms of deduplication:\n\nThese solve different problems.\n\nContent hashing prevents duplicate storage. Idempotency prevents a retry from creating a second logical artifact.\n\nConflating the two is a common bug in homegrown wrappers.\n\nArtifacts should expire by default.\n\nAn experiment, batch run, or debug file should not live forever because nobody remembered to clean it up.\n\nStorage lifecycle rules help, but they usually operate at the bucket or prefix level. They do not understand your artifact metadata, which makes per-artifact expiration harder than it should be.\n\nHumans need a link, not a file path.\n\nA good artifact layer should make it easy to create a stable download URL with configurable expiry:\n\n```\nhttps://dl.example.com/lnk_...\n```\n\nThat link should be separate from your internal storage details and easy to share with a teammate, customer, or workflow step.\n\nDownstream agents should not coordinate through shared filesystem paths.\n\nAgent A pushes an artifact and gets an ID. Agent B pulls by ID, or lists the session and filters by metadata.\n\n```\nexport ARTIFACTA_SESSION_ID=\"pipeline_run_42\"\n\npython extract.py    # pushes CSV\npython analyze.py     # lists session, pulls CSV, pushes report\npython notify.py      # creates download link for the human\n```\n\nSession sealing matters too.\n\nOnce a run is finalized, late uploads should fail clearly instead of silently corrupting the run:\n\n```\n409 Session 'pipeline_run_42' is sealed. No new artifacts can be added.\n```\n\nI'm building [Artifacta](https://artifacta.io), an artifact store purpose-built for AI agents.\n\nIt is not an orchestrator, search engine, or agent framework. It is the layer between your agent and object storage: session-aware, queryable artifact storage with a CLI, MCP, Python SDK, and REST API.\n\nFor example:\n\n```\npip install artifacta-cli\nexport ARTIFACTA_API_KEY=\"ak_live_...\"\n\nartifacta push report.pdf --session earnings-q4-2025 --agent report-writer\nartifacta ls --session earnings-q4-2025\nartifacta link art_2xk9f7v3m1p0   # share with a human\n```\n\nOr from Python:\n\n``` python\nfrom artifacta import Client\n\nclient = Client()\nartifact = client.push(\"report.pdf\", session_id=\"earnings-q4-2025\")\nprint(artifact.id)  # art_2xk9f7v3m1p0\n```\n\nI’m sharing it because this is a problem I keep seeing in agent workflows, even if Artifacta is not the solution every team chooses.\n\nI’m curious how other teams handle this today:\n\nDrop your setup in the comments. I’m especially interested in approaches that are not just object storage plus glue code.", "url": "https://wpnews.pro/news/stop-passing-files-between-agents-with-local-paths", "canonical_source": "https://dev.to/jnakano/stop-passing-files-between-agents-with-local-paths-oop", "published_at": "2026-06-17 16:16:18+00:00", "updated_at": "2026-06-17 16:21:45.700843+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-infrastructure"], "entities": ["S3", "R2", "GCS", "Claude"], "alternates": {"html": "https://wpnews.pro/news/stop-passing-files-between-agents-with-local-paths", "markdown": "https://wpnews.pro/news/stop-passing-files-between-agents-with-local-paths.md", "text": "https://wpnews.pro/news/stop-passing-files-between-agents-with-local-paths.txt", "jsonld": "https://wpnews.pro/news/stop-passing-files-between-agents-with-local-paths.jsonld"}}