{"slug": "i-built-chanprobe-because-my-go-queues-were-invisible", "title": "I Built chanprobe Because My Go Queues Were Invisible", "summary": "The article describes the author's creation of **chanprobe**, a Go library designed to address the lack of observability in native Go channels. While standard channels are simple and effective, they become problematic in production by hiding latency, as they only reveal their current length and capacity, not critical debugging information like how long items have been waiting or whether producers are blocked. Chanprobe provides a named, instrumented queue that exposes metrics such as oldest item age, total sent/received/dropped counts, and backpressure status, allowing developers to pinpoint exactly where delays are occurring in their async pipelines.", "body_md": "I like Go channels.\n\nThey are one of those language features that feel simple in the best possible way.\n\nYou can write something like this:\n\n```\njobs := make(chan Job, 1024)\n\ngo func() {\n    for job := range jobs {\n        process(job)\n    }\n}()\n```\n\nAnd for a lot of cases, that is enough.\n\nClean, readable, idiomatic.\n\nBut after using channels in real services, I kept running into the same uncomfortable problem:\n\nonce a channel becomes part of your production pipeline, it also becomes a place where latency can hide.\n\nAnd native channels do not tell you much.\n\nYou can check:\n\n```\nlen(jobs)\ncap(jobs)\n```\n\nBut that is basically it.\n\nThat tells you how many items are buffered right now, but it does not answer the questions I usually care about when something is slow.\n\nFor example:\n\n```\nIs the producer blocked?\nIs the consumer too slow?\nHow long does an item wait before processing?\nDid we drop anything?\nWhen did backpressure start?\nWhich internal queue is causing the delay?\n```\n\nThat is why I started building `chanprobe`\n\n.\n\nRepository:\n\n[github.com/devflex-pro/chanprobe](https://github.com/devflex-pro/chanprobe)\n\n## The kind of problem I wanted to solve\n\nImagine a service that delivers webhooks.\n\nThe flow is simple:\n\n``` php\nHTTP request -> validate -> enrich -> queue -> deliver to customer\n```\n\nSomewhere in the middle, there is usually a channel:\n\n```\ndeliveryQueue := make(chan WebhookJob, 10_000)\n```\n\nThis works fine until customers start saying:\n\n“Sometimes webhooks arrive 30 seconds late.”\n\nAt that point, you start looking around.\n\nCPU looks fine.\n\nMemory looks fine.\n\nThe database is not obviously slow.\n\nLogs do not show errors.\n\nThe service is not crashing.\n\nBut something is still wrong.\n\nOne possible explanation is that the delivery workers are slower than the producers. The queue starts filling up. Jobs spend more and more time waiting before a worker picks them up. Eventually, your latency is not in the database or in the network.\n\nIt is sitting inside an in-memory channel.\n\nWith a normal channel, I can inspect the current length:\n\n```\nfmt.Println(len(deliveryQueue))\n```\n\nBut that does not tell me how long the oldest job has been waiting.\n\nAnd for production debugging, this difference matters a lot.\n\nThis is useful:\n\n```\nqueue length: 8241 / 10000\n```\n\nBut this is much more useful:\n\n```\noldest item age: 37s\n```\n\nBecause now I know that at least one job has already waited 37 seconds before processing.\n\nThat is not just a metric.\n\nThat is an explanation.\n\n## What I wanted the API to feel like\n\nI did not want to build a huge framework.\n\nI also did not want to replace every channel in a Go codebase.\n\nI wanted something explicit that I could use at important async boundaries.\n\nSomething like this:\n\n```\njobs := chanprobe.New[Job](\"webhook_delivery\", 10_000)\n\nif err := jobs.Send(ctx, job); err != nil {\n    return err\n}\n\njob, ok := jobs.Recv(ctx)\nif !ok {\n    return\n}\n\nprocess(job)\n```\n\nThe queue has a name, because names matter in observability.\n\nI do not want to know that “some goroutine is blocked on channel send”.\n\nI want to know that `webhook_delivery`\n\nis full, or that `email_sender`\n\nis dropping work, or that `image_resize`\n\nhas items waiting for 12 seconds.\n\n## Basic usage\n\nHere is a small example:\n\n```\npackage main\n\nimport (\n    \"context\"\n    \"fmt\"\n\n    \"github.com/devflex-pro/chanprobe\"\n)\n\nfunc main() {\n    ctx := context.Background()\n\n    jobs := chanprobe.New[string](\"jobs\", 1024)\n    defer jobs.Close()\n\n    if err := jobs.Send(ctx, \"hello\"); err != nil {\n        panic(err)\n    }\n\n    job, ok := jobs.Recv(ctx)\n    if !ok {\n        return\n    }\n\n    fmt.Println(\"processed:\", job)\n}\n```\n\nThis is intentionally boring.\n\nThe interesting part is not that it can send and receive values.\n\nChannels already do that.\n\nThe interesting part is that the queue can describe what is happening inside it.\n\n```\nsnapshot := jobs.Snapshot()\n\nfmt.Printf(\"name: %s\\n\", snapshot.Name)\nfmt.Printf(\"len: %d\\n\", snapshot.Len)\nfmt.Printf(\"cap: %d\\n\", snapshot.Cap)\nfmt.Printf(\"sent: %d\\n\", snapshot.SentTotal)\nfmt.Printf(\"received: %d\\n\", snapshot.ReceivedTotal)\nfmt.Printf(\"dropped: %d\\n\", snapshot.DroppedTotal)\nfmt.Printf(\"oldest item age: %s\\n\", snapshot.OldestItemAge)\n```\n\nIn a real service, this gives me a much better starting point during debugging.\n\nInstead of guessing where latency lives, I can ask the queue directly.\n\n## Context-aware send and receive\n\nOne thing I wanted from the beginning was context support.\n\nWith a native channel send, this can block forever:\n\n```\njobs <- job\n```\n\nOf course, you can write a `select`\n\nmanually:\n\n```\nselect {\ncase jobs <- job:\n    return nil\ncase <-ctx.Done():\n    return ctx.Err()\n}\n```\n\nThat is fine, but if every important queue needs the same behavior, I prefer to make it part of the abstraction.\n\nWith `chanprobe`\n\n:\n\n```\nif err := jobs.Send(ctx, job); err != nil {\n    return err\n}\n```\n\nAnd receiving is similar:\n\n```\njob, ok := jobs.Recv(ctx)\nif !ok {\n    return\n}\n```\n\nFor me, this is less about saving a few lines of code and more about making queue behavior consistent across the project.\n\n## Drop policies\n\nNot every queue should block forever when it is full.\n\nSometimes blocking is correct.\n\nFor example, if every job must be processed, backpressure should probably propagate to the producer.\n\nSometimes dropping the newest item is correct.\n\nFor example, if the system is overloaded and new work can be rejected.\n\nSometimes dropping the oldest item is correct.\n\nFor example, if you only care about the latest state and old queued values are already stale.\n\nSo `chanprobe`\n\nsupports different policies.\n\nThe default policy is blocking:\n\n```\njobs := chanprobe.New[Job](\"jobs\", 1024)\n```\n\nYou can also choose `DropNewest`\n\n:\n\n```\njobs := chanprobe.New[Job](\n    \"jobs\",\n    1024,\n    chanprobe.WithDropPolicy(chanprobe.DropNewest),\n)\n```\n\nOr `DropOldest`\n\n:\n\n```\njobs := chanprobe.New[Job](\n    \"latest_events\",\n    1024,\n    chanprobe.WithDropPolicy(chanprobe.DropOldest),\n)\n```\n\nThe point is not that one policy is better than another.\n\nThe point is that queue behavior should be intentional.\n\nIf work can be dropped, I want that to be visible.\n\nIf producers are blocked, I want that to be visible too.\n\n## What the queue can tell you\n\nA snapshot contains things like:\n\n```\ntype Snapshot struct {\n    Name              string\n    Len               int\n    Cap               int\n    Closed            bool\n\n    SentTotal         uint64\n    ReceivedTotal     uint64\n    DroppedTotal      uint64\n\n    SendBlockedTotal  uint64\n    RecvBlockedTotal  uint64\n\n    SendWaitTotal     time.Duration\n    RecvWaitTotal     time.Duration\n    ItemWaitTotal     time.Duration\n\n    OldestItemAge     time.Duration\n}\n```\n\nThe fields I personally care about most are usually not `Len`\n\nand `Cap`\n\n.\n\nThey are useful, but they are not enough.\n\nThe more interesting fields are:\n\n```\nsnapshot.OldestItemAge\nsnapshot.DroppedTotal\nsnapshot.SendBlockedTotal\nsnapshot.SendWaitTotal\nsnapshot.ItemWaitTotal\n```\n\nBecause they explain behavior.\n\nIf `DroppedTotal`\n\nis growing, the system is losing work.\n\nIf `SendBlockedTotal`\n\nis growing, producers are being slowed down.\n\nIf `OldestItemAge`\n\nis high, queue latency is becoming part of user-visible latency.\n\nThat is the signal I wanted.\n\n## Debugging with expvar\n\nI wanted the core package to stay lightweight.\n\nI did not want to force Prometheus, OpenTelemetry, or any other dependency on users.\n\nSo the first built-in exporter is based on `expvar`\n\n.\n\nExample:\n\n```\npackage main\n\nimport (\n    \"net/http\"\n\n    \"github.com/devflex-pro/chanprobe\"\n)\n\nfunc main() {\n    chanprobe.PublishExpvar(\"chanprobe\", nil)\n\n    http.ListenAndServe(\":8080\", nil)\n}\n```\n\nThen you can inspect:\n\n```\ncurl http://localhost:8080/debug/vars\n```\n\nThis is not meant to be the final observability story for every production system.\n\nIt is just a simple way to expose what the queues know.\n\nPrometheus and OpenTelemetry exporters can live separately without making the core package heavier.\n\n## Why not just use pprof or runtime/trace?\n\nI use those tools too.\n\nThey are extremely useful.\n\nBut I see them as solving a slightly different problem.\n\n`pprof`\n\nand `runtime/trace`\n\nhelp me understand what the Go runtime is doing.\n\n`chanprobe`\n\nis more application-level.\n\nIt is not trying to tell me only that goroutines are blocked.\n\nIt is trying to tell me which named queue is responsible.\n\nThere is a big practical difference between these two statements:\n\n```\nsome goroutines are blocked on channel send\n```\n\nand:\n\n```\nwebhook_delivery is 98% full and the oldest item has been waiting for 37s\n```\n\nThe second one is much closer to the way I debug real services.\n\n## What I intentionally did not build\n\nI deliberately avoided the “clever” version of this project.\n\nThere is no `unsafe`\n\n.\n\nThere is no runtime monkey-patching.\n\nThere is no attempt to resize Go channels magically.\n\nThere is no global goroutine scanning.\n\nThere is no promise that this is faster than channels.\n\nActually, it should be obvious: this adds instrumentation, so it has overhead.\n\nThat is why I would not use it everywhere.\n\nI would use it only where queue visibility is worth the cost.\n\nFor small internal coordination channels, native Go channels are still perfect.\n\nFor important queues in production pipelines, I want more information.\n\n## What I learned while building it\n\nThe hardest part was not making a queue.\n\nThe hard part was deciding what behavior should be explicit.\n\nWhat should happen when the queue is full?\n\nShould send block or fail?\n\nWhat should happen after close?\n\nShould existing items still be receivable?\n\nWhat exactly counts as a dropped item?\n\nWhich metrics are actually useful, and which ones are just noise?\n\nI also realized that a queue is not just an implementation detail.\n\nIn many services, it is part of the system’s behavior.\n\nIt can hide latency.\n\nIt can create backpressure.\n\nIt can drop work.\n\nIt can make producers slow.\n\nIt can make consumers look fine while users are waiting.\n\nIf a queue can affect production behavior, I think it deserves observability.\n\n## Current status\n\n`chanprobe`\n\ncurrently has:\n\n```\ngeneric bounded queues\ncontext-aware Send and Recv\nnon-blocking TrySend and TryRecv\ndrop policies\nsnapshots\nregistry\nexpvar support\nexamples\ntests\nbenchmarks\n```\n\nThe repository is here:\n\n[github.com/devflex-pro/chanprobe](https://github.com/devflex-pro/chanprobe)\n\nIt is still small, but already useful enough to try in real Go services.\n\nMy next ideas are a Prometheus exporter, better examples, and maybe more detailed latency metrics without making the core package too heavy.\n\nIf you have Go services with worker pools, event pipelines, background jobs, or internal queues, I would be curious to hear if this kind of visibility would help you debug production issues faster.", "url": "https://wpnews.pro/news/i-built-chanprobe-because-my-go-queues-were-invisible", "canonical_source": "https://dev.to/devflex-pro/i-built-chanprobe-because-my-go-queues-were-invisible-3ld9", "published_at": "2026-05-22 21:23:23+00:00", "updated_at": "2026-05-22 21:31:26.684563+00:00", "lang": "en", "topics": ["developer-tools", "open-source"], "entities": ["chanprobe", "Go"], "alternates": {"html": "https://wpnews.pro/news/i-built-chanprobe-because-my-go-queues-were-invisible", "markdown": "https://wpnews.pro/news/i-built-chanprobe-because-my-go-queues-were-invisible.md", "text": "https://wpnews.pro/news/i-built-chanprobe-because-my-go-queues-were-invisible.txt", "jsonld": "https://wpnews.pro/news/i-built-chanprobe-because-my-go-queues-were-invisible.jsonld"}}