In the world of web traffic, there’s a simple rule: if it looks like a regular user, walks like a user, and even brings its favorite cookies along—it doesn't always mean there’s a human on the other side. Sometimes, it’s just a very diligent bot that happened to read the User-Agent
documentation yesterday.
In this article, we’ll share how our traffic analysis tool evolved from naive trust in headers to a paranoid level of verification, and how that led to a "spring cleaning" of our architecture.
(For more on the project's first deep refactoring, read our article: Refactoring Laravel Visit Analytics: The Path to Version 2.0.0 )
Once upon a time, we were young and naive. We believed in the User-Agent
string with all our hearts. We looked at it like a passport: "Oh, is that Chrome 128 on Windows 11? Welcome, honored user!" But the statistics from our VisitAnalytics package quickly knocked that romantic nonsense right out of us.
We began to see strange patterns: thousands of "different" devices visiting the site, all with perfectly calibrated, "squeaky-clean" UA strings. But upon closer inspection, it turned out that the behavior of these "people" was suspiciously uniform. They were like soldiers in identical uniforms, marching through a desert where there was no one else but them.
We didn’t jump straight to active defense. At first, we just started collecting data. Our gut told us that not all users were who they seemed to be. Bots had evolved, learning to spoof their User-Agent strings so well that they were indistinguishable from real browsers. But they had an Achilles' heel: Client Hints (the Sec-CH-*
headers).
Humans don’t "optimize" headers.
A real user's browser sends a whole bunch of Sec-CH-* headers automatically: from engine version to processor architecture. This is "living" information that changes along with updates. Furthermore, the "Accept-Language":"en-US,en;q=0.9,fr;q=0.8,es;q=0.7"
header of a normal human being differs from the bot equivalent "accept-language":"en-US,en;q=0.9"
.
Bots are lazy or overthink it.
Analyzing our package's statistics, we noticed: bot creators either forget about Sec-CH-*
entirely, leaving a void where a whole stack of data should be, or they "over-optimize" them. They try to generate them programmatically, leading to logical inconsistencies. It’s like a person in a tuxedo wearing rubber boots: individually, it’s all fine, but together, it makes you question the "tailor."
Here are two examples from the log. The first is a typical human visit:
{
"id": 1234,
"ip_address": "2003:c1:d71c:fe1f::",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
"target_headers": "{\"sec-ch-ua\":\"\\\"Not)A;Brand\\\";v=\\\"8\\\", \\\"Chromium\\\";v=\\\"138\\\", \\\"Google Chrome\\\";v=\\\"138\\\"\",\"sec-ch-ua-platform\":\"\\\"Windows\\\"\",\"sec-ch-ua-mobile\":\"?0\",\"sec-fetch-site\":\"none\",\"sec-fetch-dest\":\"document\",\"sec-fetch-mode\":\"navigate\",\"accept-language\":\"en-US,en;q=0.9,fr;q=0.8,es;q=0.7\",\"accept-encoding\":\"gzip, br\"}",
"url": "https://oleant.dev/blog/freelancer-vertrage-fur-webentwickler-in-deutschland-so-schutzt-du-dich-rechtlich",
"referer": "www.google.com",
"payload": null,
"processed_at": "2026-05-25 10:10:03",
"anonymized_at": "2026-05-25 11:10:02",
"bot_score": 15,
"is_bot": 0,
"is_official_bot": 0,
"bot_reasons": "[\"single_page_scan\"]",
"bot_evidence": "{\"single_page_scan\":{\"visit_depth\":\"1_page_only\"},\"analyzed_at\":\"2026-05-25 10:10:03\"}",
"created_at": "2026-05-25 10:01:26.133"
}
The only thing is that they didn't browse the site; they only read one article. Now, here is the second example, a clear bot:
{
"id": 1235,
"ip_address": "14.165.179.0",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
"target_headers": "{\"accept-encoding\":\"gzip, br\"}",
"url": "https://oleant.dev/en/blog/how-to-write-a-resume-for-german-companies",
"referer": null,
"payload": null,
"processed_at": "2026-05-25 10:00:02",
"anonymized_at": "2026-05-25 11:00:02",
"bot_score": 85,
"is_bot": 1,
"is_official_bot": 0,
"bot_reasons": "[\"suspicious_minimal_headers\",\"missing_mandatory_header_accept-language\",\"missing_mandatory_header_sec-fetch-dest\",\"missing_mandatory_header_sec-fetch-site\",\"missing_mandatory_header_sec-fetch-mode\",\"missing_mandatory_header_sec-ch-ua\",\"missing_mandatory_header_sec-ch-ua-platform\",\"missing_mandatory_header_sec-ch-ua-mobile\"]",
"bot_evidence": "{\"suspicious_minimal_headers\":{\"found_count\":1,\"required_count\":5},\"analyzed_at\":\"2026-05-25 10:00:02\"}",
"created_at": "2026-05-25 09:51:05.092"
}
This "client" gave themselves away precisely because of the missing headers. But we didn't arrive at this realization immediately—let's look at how our understanding evolved.
We began to notice that bots give themselves away through internal contradictions. For example, when a User-Agent claims to be Windows, but the Sec-CH-UA-Platform
headers timidly point to Android.
At that moment, we realized: stop trusting the facade. Statistics showed us that for accurate identification, you shouldn't just read the headers, but look for cognitive dissonance within them. We stopped simply "recording" visits and started analyzing their integrity, turning our log files from a simple table into a real dossier on every "digital chameleon." This realization was the first step toward creating a system that later allowed us to move from passive observation to the effective hunting of botnets.
The User-Agent alone wasn't enough. We quickly understood that bots had learned to mimic virtuosically, swapping this string for any task. However, while observing the logs, we noticed a pattern: botnets often use proxies to rotate IP addresses, hoping to remain unnoticed. But they forget one detail—the "environment" of the request.
We saw that despite constant IP changes (likely through proxy farms), the combination of User-Agent
Client Hints
for bots is suspiciously stable. It's their signature. They can change their "face" (IP
), but their "digital skeleton" (headers
) remains unchanged for the entire network. To expose them, we created a Fingerprint: a unique hash that became our main weapon. In Laravel 11, we implemented this directly in the Middleware TrackVisits , turning a set of headers into a stable identifier:
// Hash generation: linking UA with critical headers
$targetHeaders = $this->extractTargetHeaders($request);
// Even if the IP changes, the hash content remains a constant for the botnet
$fingerprintInput = $request->userAgent() . '|' . json_encode($targetHeaders);
$fingerprintHash = hash('sha256', $fingerprintInput);
When the same hash started appearing from 50,000 different IP addresses within an hour, we knew—this is a botnet. Previously, we stored this data in the botnet_fingerprints
table, but it quickly turned into a "graveyard" of useless records. We realized: we don't need an archive, we need real-time reactions. We rewrote the BotnetAnalyzer
to search for anomalies "on the fly," analyzing activity within the current window:
// Looking for anomalies in the current window without querying archive tables
$window = now()->subMinutes($params['analysis_window_minutes'] ?? 10);
// Looking for hash matches from different IP addresses
$isCluster = VisitLog::where('fingerprint_hash', $log->fingerprint_hash)
->where('created_at', '>=', $window)
->where('ip_address', '!=', $log->ip_address)
->exists();
if ($isCluster) {
// The entire "pack" of bots is marked automatically at the moment of appearance
$this->markAsBotnet($log->fingerprint_hash);
}
When we removed botnet_fingerprints
from the database, the system accelerated instantly. We stopped hoarding the history of "dead" proxies and moved to detecting the botnet "conductor" by their handwriting. If hundreds of different IPs arrive with the same fingerprint—it doesn't matter how often they rotate proxies, we see it's one and the same "army."
The hunt for the botnet started successfully, but our "digital trophy room" began to suffocate us. We were storing every suspicious hash in the botnet_fingerprints
table. With every passing day, it grew like yeast, turning from a security tool into a database bottleneck.
We realized we had fallen into the "collection trap." We were trying to store attack history when, in reality, we only needed to know what was happening right this second. So, we took a radical step: we deleted botnet_fingerprints
from our database schema.
The outcome of our efforts exceeded all expectations:
Database load dropped by 40%. Heavy JOINs and endless SELECTs on a table with millions of rows are gone.
Reaction speed. Suspicion checks now happen almost instantaneously. Thanks to our first line of header analyzers, 95% of bots are filtered out before reaching more expensive checks (such as network-based PTR record lookups). All 11 analyzers are only passed by humans or as-yet-uncaught bots, which accounts for just a few percent. The rest get their "brand" marked by one of the analyzers in the check queue.
Clear conscience. We stopped being "archivists of evil" and became digital minimalists.
Now, our system no longer suffers from accumulated "digital baggage." It lives in the moment: it analyzes the request, compares it against "hot" patterns, and, if necessary, instantly flags the threat. We’ve learned that for botnet protection, it's not the depth of history that matters—it's the speed of decision-making here and now.
When we started hunting bots via hashes, we faced an ethical dilemma. That same fingerprint_hash
that helped us identify a botnet had essentially become a "digital footprint" of real people. If we hold a hash that can be decrypted or matched back to original headers, we are effectively storing personal data. And we are all about privacy!
The goal became clear: we need to see botnet activity without seeing the identities of the users.
We have implemented the FingerprintAnonymizerService
. Its logic is simple: at the moment the log is saved, the system retains the data necessary for analytics, but which is ultimately too extensive for true anonymization. After the analytics, once we understand who is before us—human or bot—we no longer need their fingerprints. We pack the bot, along with its fingers and other parts, entirely into solitary confinement, while we welcome the worthy citizen to the site with full honors. All sensitive data is cleaned up in the process. The waiter (the web service) is, after all, not a policeman or security guard; if the guests are already at the disco, they are our guests, and their personal data no longer matters to us, we gladly serve them. But if it is a thief (read: bot), then the thief (aka bot) must sit in jail (a movie quote). He also gets his personal prison number, but without special amenities. And the bot John Johnson becomes simply inmate №245, I hope the analogy is clear.
public function handle(VisitLog $log): array {
$updates = [];
// Transforming the complex User-Agent into a simple client "portrait"
if ($config['anonymize_ua'] ?? true) {
$updates['user_agent'] = $this->anonymizeUserAgent($log);
}
// Replacing detailed headers with a simple list of keys
if ($config['anonymize_headers'] ?? true) {
$updates['target_headers'] = $this->anonymizeHeaders($log->target_headers);
}
// Wiping the original hash if analytics are complete
if ($config['anonymize_fingerprint_hash'] ?? true) {
$updates['fingerprint_hash'] = 'anonym-sha256-ready';
}
return $updates;
}
The most interesting transformation happens inside anonymizeUserAgent
. We no longer store the raw UA string. Instead, we use Client Hints (if available) to extract general parameters—browser, OS, and platform—and discard unique identifiers.
Before anonymization:
We saw the specific engine version, processor architecture, and a full set of parameters that, combined, could "fingerprint" a unique user.
After anonymization:
We see only Chrome / Windows (Desktop).
We applied a similar approach to headers: the anonymizeHeaders
method simply returns an array of keys (array_keys
), stripping away any values that might contain cookies or specific session tokens. The result? Our logs now look like a set of "statistical generalizations." We still see that a botnet is attacking the site, and we still flag it in the system, but now we are fully protected against accusations of privacy violations. We transformed a detailed trace of every visitor's behavior into a safe stream of aggregated statistics.
Now, even if the log database falls into the wrong hands, it >would appear as cryptographic junk to anyone wanting to de->anonymize our users. This is true engineering minimalism: >protecting the system without harming people.
All these adventures—from the disappointment in the "honesty" of the User-Agent to the deletion of archive tables and the implementation of deep anonymization—culminated in release 2.4.0. This is not just a minor update. It is the transformation of our product from a "hobbyist detective" who simply keeps logs into a professional protection system that has learned the internet's most important lesson: you can't trust anyone, not even the headers.
Performance.
Ditching unnecessary tables and switching to real-time analysis allowed us to reduce database load by 40% and stop worrying about log scalability issues.
Privacy-First.
Thanks to the FingerprintAnonymizerService
, we are no longer "secret keepers." We analyze threat patterns while leaving users' personal data off our analyzer's radar.
Smart Detection.
We now catch botnets not by IP, but by their "digital signature," making proxy rotation attempts meaningless.
We continue to evolve. We are already looking into implementing Bloom filters for even lightning-fast, on-the-fly verification of "suspicious" fingerprints. Regarding the interface, we have postponed plans for deep integration with Filament for now. Yes, we want to see beautiful dashboards and attack graphs right in the Laravel admin panel, but our current priority is maximum data purity and detection accuracy. Filament is the storefront, but we are still organizing the "warehouse." But rest assured: botnet visualization in the admin panel is only a matter of time, and this item is bolded at the top of our backlog, waiting for its turn.
Version 2.4.0 is the foundation. We have learned to see the invisible and have cleared the system of excess "digital trash." Onward—faster, bigger, and more efficient.
Stay tuned, we’re still on the hunt.