Workflow Caching in Self-Hosted Roboflow Inference

wpnews.pro

If you've deployed Roboflow Workflows to the edge with a local inference server, you might have seen this scenario before: You make an update to a workflow, perhaps incrementing a model version or simply adding an output field. You hit Publish, go to your edge device, run inference on your test image, and get the exact same result as before. No error in the output, just the server steadfastly serving the older workflow definition.

This is a common situation with self-hosted inference and the root cause is easier to understand with the knowledge of how caching works with Roboflow Workflows. This guide provides a short overview of the caching process and some useful troubleshooting tips.

Where This Applies #

In this context, "self-hosted" applies to the Roboflow Inference Server, whether you run it via docker (inference server start

) or the native apps for your OS. Simply starting the inference server doesn't authenticate you with Roboflow. Instead, when you run a workflow using the inference SDK, your API key is passed with the call itself.

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="YOUR_ROBOFLOW_API_KEY",
)

result = client.run_workflow(
    workspace_name="your-workspace",
    workflow_id="your-workflow",
    images={"image": "board.jpg"},
)

On the first call, the server uses the key to download the published workflow definition from the Roboflow platform, determines the model references inside it, and then downloads the model weights it doesn't already have prior to running inference. Subsequent calls run on-device against the cached artifacts rather than re-down them on every request.

Ultimately that single request involves three separate dependencies (server image, workflow definition and model weights), and each has its own update method. This separateness (rather than a failure) is typically the reason why a change made might lead to unexpected results.

Three Dependencies #

A running workflow is built on three dependencies that update independently.

Dependency	What it is	Where it lives
Server image
The Inference Server software (Docker image or native app)	On the host machine (a Docker image or an installed app)	On an image pull or app update. `inference server start` pulls the latest by default, so restarting the server updates it unless you pass `--use-local-images`
Workflow definition
The JSON representation of your workflow	In-memory cache with a TTL; also a JSON file on disk for offline fallback	TTL refresh, restart, or cache bypass
Model weights
Each model's weight files and config	On disk, keyed by `model_id/version`
A new version downloads fresh on first use; the same version is always served from cache

When returned data looks out of date, the question to ask is which of the above didn't refresh. Typical "my change didn't take" scenarios involve a workflow dependency that is out of sync with the others.

Model Weights #

With a specific model version, the weights are saved to disk under model_id/version

. A retrain creates a new model version, and a subsequent call will result in a cache miss, triggering a download. Because a new version is fetched once and cached locally to disk, the weights are typically not the cause of a stale result.

Workflow Definition #

The workflow definition is a JSON document that contains meta information, description of blocks used (including any custom coding blocks) and how they're wired together. When you first run a workflow using self-hosted inference, the server retrieves it from the Roboflow platform and caches it with a time-to-live. If you see an unexpected result, a stale workflow cache is likely the cause, and here's why:

Saving and publishing a workflow on the Roboflow platform does not result in an automatic update to the cached copy on the local server. The server continues to serve its cached version until the cache expires, the server restarts, or your client explicitly bypasses the caching entirely.

The workflow caching happens in two places:

There's an in-memory cache with a TTL based on an environment variable ( WORKFLOWS_DEFINITION_CACHE_EXPIRY

) which defaults to 15 minutes. - The server also writes the definition to a JSON file on disk (under MODEL_CACHE_DIR/workflow

) on every successful fetch. The JSON file, however, is only accessed as a fallback when the API is unreachable. This is so an offline device can continue serving the last known good definition. (Note that offline workflow operation is an Enterprise feature that requires an Enterprise plan. Also, the cache override (use_cache=False

) only guarantees freshness when the Roboflow platform is accessible. If the device is offline it will fall back to the possibly stale JSON file on disk.)

Forcing a Cache Refresh

When developing a workflow, it's not necessary to wait out the cache TTL between iterations. This is done by specifying use_cache=False

when making a request.

result = client.run_workflow(
    workspace_name="your-workspace",
    workflow_id="your-workflow",
    images={"image": "board.jpg"},
    use_cache=False,   # bypass the cached definition; pull the published spec now
)

You can also restart the server itself (for the native app, quit it fully and restart.)

Adding a Workflow Version Tag #

You can list loaded models with client.list_loaded_models()

, but there's no clean way to read the cached workflow version back as text. One recommendation would be to add a version tag that's returned as part of the workflow output when executed. Incrementing the version tag before publishing the workflow makes it easy to reconcile what's cached versus current.

return {
    "board_type": board_type,
    "status": status,
    "defect_count": defect_count,
    "workflow_version": "v3.4",   # bump on EVERY publish
}

In this example I used a custom code block.

Add that field to the workflow's Outputs, and every response will report which definition is live. If you've published v3.4

and still see v3.3

, you can spot that the older version has been cached and know that the TTL (or a missed Publish) is the cause. A date or timestamp could work as an alternative to the version number.

Troubleshooting Guide #

Here's a helpful list of items to check if a Workflow change is not taking effect on the local device:

Did you actually Publish (not just Save)? Saving a workflow edit alone does not make the latest update available to be pulled down by the server.

Deploy and Expected an Update to the Device? The Deploy button does not push a workflow update to the server; it provides client code examples to run yourself. If the workflow is not already cached, it will be pulled from the Roboflow platform when the client makes a request.

Are you inside the 15-minute TTL? You can either wait it out or specifyuse_cache=False

to bypass the cache and force a fresh pull.Is your This helps you understand if the workflow cache itself is stale versus a model issue.workflow_version

tag stale in the response?If using Docker, was the container recreated, or just restarted? Model weights are cached toMODEL_CACHE_DIR

, which resides in the container's writable layer by default (/tmp/cache

or/tmp/model-cache

) A simpledocker restart

keeps that path persistent, so the weights survive. Recreating the container clears it: a freshdocker run

, a reboot that recreates the container, orinference server start

. If the device is online, that results in a one-time download. If offline, there would be a failure unless a persistent volume is mounted toMODEL_CACHE_DIR

.Are two servers using the same default port? The Docker server and the native app both default tolocalhost:9001

, so a stale instance may respond while you edit the one assumed live. Confirm only one is running.

In practice, workflow caching on a self-hosted Inference Server is working as designed. A prediction that looks stale almost never means that something is broken; more likely, a local dependency is out of sync with the platform. The fix is almost always patience or a deliberate refresh.

Cite this Post

Use the following entry to cite this post in your research:

Workflow Caching in Self-Hosted Roboflow Inference. Roboflow Blog: https://blog.roboflow.com/workflow-caching-in-self-hosted-roboflow-inference/

source & further reading

blog.roboflow.com — original article You cannot sell AI written software