Building a self-hosted, AI-native workflow engine in Rust (180 node types, no SDK bloat)

A developer built Trigix, an open-source (MIT), self-hostable workflow automation platform with a Rust execution engine and AI nodes that can run against local models. The platform features ~180 node types, each implemented as a variant of a NodeType enum with an async function, and avoids system dependencies at build time by using pure-Rust implementations or runtime shell-outs. The engine supports async DAG scheduling with parallel fan-out/fan-in, retries, timeouts, cancellation, and sub-workflows.

I've spent the last while building Trigix — an open-source MIT , self-hostable workflow automation platform. Think n8n, but the execution engine is in Rust and the AI nodes can run entirely against local models. This post is about a few engineering decisions I think are worth sharing, not a feature tour. A workflow is a DAG of typed nodes. Every node is one variant of a NodeType enum and one async function: js match node.node type { NodeType::Http = execute http node, ctx, &client .await, NodeType::Agent = execute agent node, ctx, &client, ai url .await, NodeType::Sqs = execute sqs node, ctx, &client .await, // … ~180 arms } Adding a node type touches a fixed set of places enum variant, executor impl, dispatch arm, a node type → str map, and the frontend palette/config panel . It's mechanical, which is the point: the cost of a new integration is bounded and reviewable, and the compiler tells you if you forgot a touch point. The engine itself is the interesting part — async DAG scheduling with parallel fan-out/fan-in, retries, timeouts, cancellation, and sub-workflows — but the node catalog is where most of the surface area lives, so I optimized hard for "cheap to add, hard to get subtly wrong." Most SaaS/DB/vector-store/cloud nodes are thin HTTP clients that return { status, body } . The value of a first-class node over "just use the generic HTTP node" is the curated config UI and auth handling, not raw capability. That framing kept ~150 of the nodes small and uniform. For cloud services I leaned on caller-supplied tokens where possible e.g. GCS, Vertex AI, BigQuery, Snowflake all take a bearer token instead of baking in each provider's OAuth dance. Honest trade-off: less magic, but no giant SDK per provider and no credential-exchange code to maintain. A self-hosted tool has to build everywhere with cargo build , with no "first install these system libraries" footnote. That constraint drove a few choices: AWS SQS/SNS/Bedrock . Instead of pulling in the AWS SDK, I implemented Signature V4 from scratch with the crypto crates already in the tree sha2 / hmac / hex . The reassuring part: AWS publishes a SigV4 test suite, so the signer is unit-tested against the canonical get-vanilla vector: // The signer must reproduce AWS's published signature exactly. assert auth.ends with "Signature=5fa00fa31553b73ebf1942676e86291e8372ff2a2260956d9b8aae1d763fbf31" ; That single test is worth more than a hundred round-trip mocks — it pins the canonical-request and signing-key derivation to a known-good answer. SSH/SFTP. The obvious crate ssh2 binds to libssh2 — a system dependency that breaks the build for anyone without the dev headers and cmake . So I used russh + russh-sftp , a pure-Rust SSH implementation. cargo build stays self-contained; password and private-key auth both work. SQL Server. Same reasoning — tiberius is a pure-Rust TDS driver, so the MSSQL node needs no native client. OCR. This one I couldn't keep pure-Rust without a heavy native dependency, so instead of linking libtesseract at build time, the node shells out to the tesseract CLI at runtime. The workspace still builds everywhere; OCR just needs the binary present when that node actually runs. Being explicit about that boundary beats a build that fails on a clean machine. The theme: a system dependency at build time is a tax on every contributor and CI run; a dependency at runtime is opt-in and only paid by the people who use that feature. The LLM/agent/RAG nodes speak the OpenAI-compatible wire format, so you can point them at a local Ollama or vLLM and run the whole AI side offline. RAG retrieval runs on your own Postgres/pgvector with hybrid vector + full-text search, optional reranking, and an HNSW index. Cloud providers are optional, not assumed — which matters if you're self-hosting precisely to avoid sending data out. docker compose -f docker-compose.poc.yml up -d --build That brings up the console/API, Postgres+pgvector, Redis, and an optional AI runtime. There's also a Helm chart on GHCR and a one-click GitHub Codespaces devcontainer if you want to poke at it before committing a VM. Honest status: it's young and I'm the primary author; 600+ tests and a v1.3.0 release, but treat it accordingly. And full transparency since it's relevant on this platform — I used AI coding assistants while building it; it's a real, tested codebase, not a one-shot generated repo. Repo: https://github.com/bj-qizhi/trigix https://github.com/bj-qizhi/trigix — feedback on the engine design and the node model especially welcome.