@hazeljs/agent 1.0.1: Production Hardening for Real Deployments

The Hazel team released @hazeljs/agent 1.0.1, a patch focused on production hardening with Redis-backed state, durable tool approvals, circuit breakers, and observability. The update is backward compatible and addresses multi-instance deployment risks such as state loss and broken approvals across replicas.

We are shipping @hazeljs/agent 1.0.1 — a patch release focused on operational durability , resilience consolidation , and production observability . If you run agents behind a load balancer, need human-in-the-loop tool approvals, or want circuit breakers and traces in production, this release is for you. 1.0.1 is backward compatible. No breaking API changes — only new optional configuration, factories, and exports. A minimal production bootstrap with Redis-backed state, durable approvals, and strict event handling: js import { HazelApp } from '@hazeljs/core'; import { Agent, Tool, AgentModule, AgentService } from '@hazeljs/agent'; import { createClient } from 'redis'; @Agent { name: 'ops-agent', description: "'Operations assistant' } " class OpsAgent { @Tool { description: "'Restart a service', requiresApproval: true } " async restartService input: { service: string } { return { restarted: input.service, at: new Date .toISOString }; } } const redis = createClient { url: process.env.REDIS URL } ; await redis.connect ; await AgentModule.forRootAsync { redis: { client: redis }, useRedisApprovals: true, runtime: { strictEventHandlers: true, enableCircuitBreaker: true, observabilityProvider: myObservabilityProvider, // optional }, } ; const app = new HazelApp { modules: AgentModule } ; const agentService = app.get AgentService ; agentService.on 'agent.tool.approval.requested', event = { // Approve from any replica — request is stored in Redis agentService.approveToolExecution event.data.requestId, 'admin' ; } ; await agentService.execute 'ops-agent', 'Restart the payment worker' ; Same agent code as 1.0.0 — only module wiring changes for production. @hazeljs/agent 1.0.0 shipped a full agent runtime: execution loop, tools, memory/RAG, multi-agent graphs, A2A, streaming, and guardrails hooks. What it did not optimize for was multi-instance production : | Area | 1.0.0 default | Production risk | |---|---|---| | Execution state | In-memory | Lost on process restart | | Tool approvals | In-memory Map s | Broken across replicas; lost on crash | | Retry / rate limit | Local utilities | Drift from @hazeljs/resilience | | Observability | Local metrics + events | No OTel spans or LLM cost bridge | | RAG errors | Silently returned | Hard to debug in prod | 1.0.1 closes these gaps without changing how you define agents or tools. New factory helpers pick the right persistence backend from config or environment: js import { createStateManager, createStateManagerFromEnv, AgentModule, } from '@hazeljs/agent'; // Sync — when you already have a connected Redis client const stateManager = createStateManager { backend: 'redis', redisClient, } ; // Async — connects from REDIS URL const stateManager = await createStateManagerFromEnv { redisUrl: process.env.REDIS URL, } ; Environment variables: | Variable | Values | Behavior | |---|---|---| AGENT STATE BACKEND | memory , redis , database | Explicit backend selection | REDIS URL | Redis connection URL | Auto-selects Redis when set | AgentModule.forRoot and AgentModule.forRootAsync wire Redis state when a client or URL is provided: js import { AgentModule } from '@hazeljs/agent'; import { createClient } from 'redis'; const redisClient = createClient { url: process.env.REDIS URL } ; await redisClient.connect ; await AgentModule.forRootAsync { redis: { client: redisClient }, useRedisApprovals: true, runtime: { strictEventHandlers: process.env.NODE ENV === 'production', }, } ; forRoot redis.client forRootAsync REDIS URL or redis.url before bootSee PERSISTENCE.md https://./PERSISTENCE.md for Redis, Prisma, and hybrid setups. Tool approvals no longer live only in process memory. A new IApprovalStore interface supports: InMemoryApprovalStore RedisApprovalStore When AgentModule is configured with Redis, approvals are stored in Redis automatically so human-in-the-loop flows work across replicas . New exports: js import { IApprovalStore, InMemoryApprovalStore, RedisApprovalStore, createApprovalStore, } from '@hazeljs/agent'; Local retry and rate-limit utilities now delegate to @hazeljs/resilience : RetryHandler → RetryPolicy RateLimiter → TokenBucketLimiter The public API of RetryHandler and RateLimiter is preserved marked @deprecated for direct resilience use in a future minor . The deprecated circuit-breaker.js shim was removed. Circuit breaker behavior is now validated end-to-end: repeated LLM failures through AgentRuntime.execute open the circuit and subsequent calls fail fast with CircuitBreakerError . Failed agent executions AgentState.FAILED now propagate as errors through the circuit breaker and retry layers instead of returning silently. Optional peers were added for production tracing and cost tracking: @hazeljs/observability optional @opentelemetry/api optional When you pass an observabilityProvider in runtime config, the agent emits OpenTelemetry spans: | Span | When | |---|---| agent.execute | Full agent run | agent.tool.execute | Tool invocation | agent.llm | LLM chat call | Span attributes include agent.name , agent.execution id , agent.tool.name , and session metadata. LLM usage is bridged to trackCost when the provider is configured. js import { AgentRuntime } from '@hazeljs/agent'; const runtime = new AgentRuntime { observabilityProvider: myObservabilityProvider, llmProvider, } ; No hard dependency on OTel — spans are no-ops unless a provider is injected. RAG search failures no longer silently return an empty context. The runtime now: agent.rag.failed AgentEventType.RAG QUERY FAILED ragContext: graceful degradation AgentEventEmitter accepts strictEventHandlers: true . When enabled, errors in event handlers propagate instead of being swallowed — recommended for production. If @hazeljs/ai never registers HAZELJS AI ENHANCED SERVICE , AgentService now logs a clear error after 500ms instead of failing silently. Set runtime.llmProvider explicitly or ensure the AI module loads first. State managers use minimal interfaces instead of any : RedisClientLike PrismaClientLike DatabaseStateManager Safer to wire real clients without losing type checking at the boundary. 474 tests pass with coverage thresholds enforced. New integration coverage includes: RedisApprovalStore Test locations: tests/integration/production-hardening.test.ts tests/integration/hardening-coverage.test.ts Jest uses tsconfig.jest.json for monorepo-friendly typechecking; optional @hazeljs/eval peer is stubbed during tests. npm install @hazeljs/agent@1.0.1 No code changes required for existing apps. To adopt production features incrementally: redis: { client } or await AgentModule.forRootAsync { redis: { url } } useRedisApprovals: true with a Redis client @hazeljs/observability and pass observabilityProvider runtime.strictEventHandlers: true in productionNot in 1.0.1, planned for future minors: @hazeljs/flow HAZELJS AI ENHANCED SERVICE Questions or feedback? Open an issue https://github.com/hazel-js/hazeljs/issues or join the discussion on GitHub https://github.com/hazel-js/hazeljs .