The smart TV in your living room is a node in the AI scraping economy Smart TV owners are unknowingly contributing to the AI data-scraping economy through a software development kit (SDK) from data-collection firm Bright Data, which is embedded in consumer apps and turns devices into residential proxy nodes. The company markets access to over 400 million home IP addresses, allowing AI companies to route web-scraping traffic through paying residential subscribers' connections to bypass blocks from services like Cloudflare and DataDome. Connected TVs serve as ideal proxies due to their constant power, high-speed WiFi, and 24/7 uptime, raising concerns about the legal supply side of residential proxy networks and their role in fueling AI training data harvesting. The work at Include Security has us working with AI day in and day out hacking it, using it, training it, etc . We’re all aware of the community-level opposition happening against datacenters, aimed at improving AI capabilities, being built recently. What you might not be aware of are the distributed efforts to train AI that could be using the devices inside your home. In this post, we’re going to explore how the company Bright Data facilitates modern AI models scraping training data from the Internet https://brightdata.com/blog/web-data/web-scraping-for-machine-learning using its residential proxy network. Bright Data is a data-collection company that sells access to what it markets as the world’s largest residential proxy network of 400M+ home IP addresses that its customers route web-scraping traffic through. The supply behind that network comes from an SDK: a piece of software embedded in consumer apps that, with the user’s consent, turns their phone or smart TV into one of those exit nodes. We’ll document what you, the average user, should know about what this company’s SDK does on your systems such as your mobile phone and your smart TV. We’re going to explore how their SDK works, which platforms have shipped it, and why your Internet-connected TV is the ultimate proxy for AI models looking to train on data scraped from the Internet. Why This Matters Now AI companies depend on web-scraped content: for pre-training, for retrieval, for agent grounding, for search. But the modern web isn’t scrapeable from a datacenter. Cloudflare, DataDome, HUMAN https://datadome.co/bot-management-protection/how-proxy-providers-get-residential-proxies/ , among others throttle or block requests from known cloud IPs. The workaround is residential proxies. A scraping job routed through a Comcast or T-Mobile subscriber’s connection arrives at the target site from an IP that belongs to a paying residential customer. Krebs reported in October 2025 https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-from-ddos-to-residential-proxies/ that “a glut of proxies from Aisuru and other sources is fueling large-scale data harvesting efforts tied to various AI projects.” Academic measurement https://ieeexplore.ieee.org/document/8835239 going back to 2019 shows these networks are overwhelmingly misused. The FBI issued a formal advisory https://www.fbi.gov/investigate/cyber/alerts/2026/evading-residential-proxy-networks-protecting-your-devices-from-becoming-a-tool-for-criminals earlier this year. Most of the existing press has focused on the illegal residential-proxy supply: botnets Aisuru https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-from-ddos-to-residential-proxies/ , Kimwolf https://krebsonsecurity.com/2026/01/the-kimwolf-botnet-is-stalking-your-local-network/ , trojanized apps HUMAN Security’s PROXYLIB disclosure https://www.humansecurity.com/learn/blog/satori-threat-intelligence-alert-proxylib-and-lumiapps-transform-mobile-devices-into-proxy-nodes/ , pre-infected IoT hardware Google/Mandiant’s IPIDEA takedown https://cloud.google.com/blog/topics/threat-intelligence/disrupting-largest-residential-proxy-network . These are the bad actors. On the other hand, the legal supply side has received far less scrutiny. Today Bright Data is the largest residential proxy network in the world by its own marketing, advertising “150M+ IPs” sourced via a consent SDK embedded in partner apps. This research documents how that SDK works, which platforms have shipped it, and why the connected-TV is the ultimate residential proxy. Why Connected TV CTV is the Ideal Proxy Connected TV, a.k.a Smart TV, is a near-perfect residential proxy. Compared to a mobile phone: Factor | Mobile phone | Smart TV / CTV | | Power | Battery most of the day | Always plugged in | | Network | WiFi + cellular | Always WiFi, high-speed | | Uptime | Intermittent | 24/7 in standby | | Bandwidth ceiling | Low cellular caps | Effectively unlimited | | User attention | Actively used | Often unattended | | Consent UI | Text on a phone screen | Text navigated via TV remote arrow keys | | Corporate/family oversight | Higher MDM, mobile EDR | Virtually none | A TV never hits 1% battery, jumps between WiFi networks or gets locked when the user is asleep. Some partner publishers do disclose the Bright Data relationship in their privacy policies PlayWorks is one example https://play.works/privacy-policy . But privacy-policy disclosure is the wrong control surface for a TV. It is hard to scroll through a legal document navigated by arrow keys on a remote, and the in-app consent dialog, doesn’t convey that a paying Bright Data customer is about to route their scraping traffic through the user’s home internet. Petflix, a Roku app documented by The Verge https://www.theverge.com/column/885244/smart-tv-web-crawler-ai , is a representative case. Its opt-in screen reads: “To enjoy Petflix for free with fewer ads, you are allowing Bright Data to occasionally use your device’s free resources and IP address to download public web data from the internet. Bright Data will only use your IP address for approved business-related use cases. None of your personal information is accessed or collected except your IP address. Period.” The Petflix dialog says “occasionally.” The SDK’s publicly queryable config sets max bw monthly wifi: 200,000,000,000 bytes — a 200 GB default monthly WiFi budget. Who Bright Data Names as Partners Bright Data exposes a partner manifest endpoint. The endpoint is unauthenticated and anyone can fetch it. Names in the manifest that I was able to identify with high confidence from public sources: Partner ID from config | Entity | Scale | | playworks digital | PlayWorks Digital Ltd | | CloudTV Integrated across 125+ TV brands and 15+ OEMs https://www.cloudtvos.com/applications.php Longvision Media HK LongTV 5M OTT users across HK and Malaysia https://hk.long.tv/ Viber Media S.à r.l. Rakuten 250M–820M monthly users of the Viber messenger https://earthweb.com/viber-statistics/ Supercent Korea 1 Korean mobile publisher by downloads in 2023 https://en.supercent.io/about Moonfrog Labs Stillfront subsidiary ~10M MAU on Teen Patti Gold alone; acquired for $90M https://inc42.com/buzz/sweden-based-stillfront-acquires-teen-patti-creator-moonfrog-labs/ Hola Networks Others desoline, free time, ott studio, global microtrading, m m media, easystaff lp are present but less identifiable from public sources. bright screensavers, bright videos , and brightdata are Bright Data’s own apps. A note on what the partner list proves: Being listed in Bright Data’s config means an integration might have existed at some point. It does not by itself prove that a specific publisher’s currently-shipping app s includes the SDK in production. For any named publisher, per-app verification is required. What the partner list does directly prove: - Bright Data ships this roster in an unauthenticated public endpoint . - At least three CTV-focused entities PlayWorks, CloudTV, Longvision monetized their user’s devices as residential proxy exit nodes. PlayWorks in particular reports CTV distribution across major TV platforms and ISPs, with reach figures in the hundreds of millions of households per its own marketing materials. How does the Bright Data SDK turn a user’s device into a residential proxy exit node? The Bright Data SDK is a publicly documented commercial product, offered to publishers via Bright Data’s SDK integration docs https://docs.brightdata.com/api-reference/SDK with a JavaScript variant https://docs.brightdata.com/api-reference/SDK-JS for web . What follows builds on that public surface with findings from reverse-engineering the shipping iOS framework and instrumenting 30 days of its runtime traffic. The SDK ships as an iOS framework brdsdk.framework inside partner apps. I reverse-engineered the binary and captured 30 days of traffic from a research fleet running the SDK inside a consent-installed partner app. The Unauthenticated Config On every launch the SDK calls: GET