{"slug": "ai-now-writes-as-many-online-articles-as-humans", "title": "AI Now Writes as Many Online Articles as Humans", "summary": "AI-generated articles now account for 50% of all online articles, matching the share written by humans, according to a new analysis of 55,400 randomly sampled web pages from Common Crawl. The proportion of primarily AI-generated articles surged from near zero before ChatGPT’s November 2022 launch to 36% within 12 months and 48% within 24 months, but has plateaued at roughly 50% since early 2025. Researchers used three separate AI detectors—Pangram, GPTZero, and Copyleaks—each with false positive and false negative rates below 2%, to confirm the trend, though they note that AI-generated articles largely do not appear in Google or ChatGPT search results.", "body_md": "## Key Takeaways\n\n- The number of articles published on the internet that are primarily AI-generated (50%) is equal to the number written by humans (50%).\n- ChatGPT launched in November 2022. Within the first 12 months, the percentage of primarily AI-generated articles jumped to 36%, and reached 48% by 24 months.\n- However, since Q1 2025 the percentage of primarily AI-generated articles has plateaued at roughly 50%. We previously published this finding with data up to May 2025, and new data confirms this trend.\n- We build on our\n[prior research](https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans)by using three different AI detectors (Pangram, GPTZero, Copyleaks). We independently evaluate each to show that the false positive rates and average false negative rates are consistently below 2%. Each AI detector shows a similar trend. - While the trend is the same, our previous study estimated the proportion of primarily AI-generated articles to be 3.3 percentage points higher. This relatively small difference is the result of averaging three AI detectors rather than relying on the accuracy of a single detector.\n- Despite the prevalence of AI-generated articles on the web, we show in a\n[separate study](https://graphite.io/five-percent/ai-content-in-search-and-llms)that these articles largely do not appear in Google and ChatGPT. We do not evaluate whether AI-generated articles get as much traffic as human-written articles, but we suspect that they do not.\n\n## Motivation\n\nSince ChatGPT launched in November 2022, many companies have explored publishing content generated by LLMs such as ChatGPT, Claude, and Gemini to grow their traffic across channels such as Google Search, social, and advertising. This is a cost-effective alternative to spending hundreds of dollars for humans to write content.\n\nThe quality of AI content is rapidly improving. In many cases, AI-generated content is as good or better than content written by humans ([MIT Study](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4453958)). It is often hard for people to distinguish whether content is created by AI ([Originality.ai Study](https://originality.ai/blog/can-humans-detect-ai-content)).\n\nWe seek to evaluate the prevalence of AI-generated articles.\n\n## Results\n\nWe observe significant growth in primarily AI-generated articles, coinciding with the launch of ChatGPT in November 2022. After only 12 months, primarily AI-generated articles accounted for 35.9% of articles published.\n\nIn Q1 2025, the quantity of primarily AI-generated articles being published on the web nearly equaled the quantity of human-written articles, 49.6% vs. 50.4%. In Q4 2025, primarily AI-generated articles surpassed human-written at 50.9%, before returning to 49.9% in Q1 2026.\n\n### Primarily AI-Generated Article Growth Has Plateaued\n\nWhile primarily AI-generated articles grew dramatically after ChatGPT launched, we do not see that trend continuing. Instead, the proportion of primarily AI-generated articles has remained relatively stable, near 50%, over the last five quarters. We hypothesize that this is because practitioners found that primarily AI-generated articles do not perform well in search, as shown in a [separate study](https://graphite.io/five-percent/ai-content-in-search-and-llms).\n\n# Methodology\n\n### Common Crawl\n\n[Common Crawl](https://commoncrawl.org/) maintains one of the largest publicly available web archives. It contains billions of pages and is used by researchers and developers. It is a key data source for training large language models.\n\n### Selection of Articles\n\nWe need a representative sample of English-language articles on the web. While Common Crawl does not crawl every page, its archive is the best free and publicly available proxy for the web. We want to measure the proportion of all articles being published that are primarily AI-generated, so we do not filter by traffic or use a curated subset. We randomly select 55.4k URLs from Common Crawl, and confirm that each is in English, has an article schema markup, is at least 100 words, has a publish date between January 2020 and March 2026, and is an article or listicle as classified by the Graphite page type classifier.\n\n### AI Detection\n\nWe classify each article using three AI detectors: [Pangram](https://www.pangram.com/), [Copyleaks](https://copyleaks.com/), and [GPTZero](https://gptzero.me/). The AI detectors produce different outputs. We provide the output of each detector, and how we transform that output into a binary, primarily AI / primarily human classification below.\n\nPangram and Copyleaks provide the proportion of the article’s content that is AI-generated.\n\nPangram\n\n- Output: Proportion of the article that is Human, AI-assisted, AI\n- Classify as primarily AI if: proportion AI + proportion AI-assisted > proportion Human\n\nCopyleaks\n\n- Output: Proportion of the article that is Human, AI\n- Classify as primarily AI if: proportion AI > proportion Human\n\nIn contrast, GPTZero provides an article-level prediction. (Its Advanced Sentence Scanning output includes sentences that most impact the classification, but it does not directly provide the proportion of AI-generated content. We prefer to use its article-level output rather than devising our own method for computing the proportions.)\n\nGPTZero\n\n- Output: Prediction (Human, Mixed, AI) and confidence score\n- Classify as primarily AI if: prediction is AI or Mixed\n\nNote that the labels indicating a mixture of AI and human writing are rarely predicted on our dataset: GPTZero tags 6.4% of articles as Mixed, and Pangram tags 1.9% of articles as having AI-assisted text.\n\nAccurate detection of AI-generated content is required to make claims about the prevalence of AI-generated articles on the web. There is considerable disagreement about the accuracy of AI detection algorithms, and many argue that detecting AI is impossible, or at best, highly inaccurate. Therefore, before classifying the articles in our data set, we evaluate the accuracy of the AI detectors.\n\n### Evaluation of False Positive Rates\n\nTo evaluate the false positive rate (the percentage of human-written articles classified as primarily AI-generated), we need a dataset of human-written articles. Since the large-scale adoption of AI tools began with ChatGPT, we argue that, with high probability, articles published before its release were written by humans. Therefore, we run each detector on the 15.7k articles in our Common Crawl dataset that were published between January 2020 and November 2022. In the table below, we see that all the AI detectors have low false-positive rates.\n\n### Evaluation of False Negative Rates\n\nTo evaluate the false negative rate (the percentage of primarily AI-generated articles classified as human-written), we use GPT-5, Gemini 3.1 Pro, and Claude Opus 4.6 to generate 2,000 articles using each, covering the same topics as a set of reference articles published before November 2022. For each reference article, we first generate a 100-word summary of the article using GPT-5, then we use the summary to AI-generate an article using the system prompt:\n\n*You are an expert content writer. Your task is to generate clear, engaging, and informative content about the topic provided by the user.*\n\n*Write in a professional yet friendly tone.**The target audience is people searching on the web for key terms related to the topic provided by the user.** The user will provide a word count for the prompt. Ensure that the generated content adheres to the specified word count, allowing for a variance of plus or minus 10 percent.**Avoid jargon unless explained.** Do not include any disclaimers or meta-commentary.*\n\nand prompt:\n\n*Write an online article based on the summary provided below with approximately {word_count} words. Use plain text only (no markdown). Add section headings if needed.*\n\n*SUMMARY: {summary}*\n\nwhere *word_count* is the word count of the reference article.\n\nAll detectors have low false negative rates, especially for GPT-5, the most popular LLM as of May 2026.\n\nThe raw data for this evaluation is available [here](https://docs.google.com/spreadsheets/d/1euEfWQXDD83ogMV1z3aAlwfs6W8_GWqZZC0GZESIdBc/edit?gid=614454270#gid=614454270).\n\n### Quantifying Primarily AI-Generated Articles on the Web\n\nFinally, we classify all 55.4k articles in our dataset using each detector to evaluate the percentage of articles that are primarily AI-generated. First, we compute the percentage of articles published in each quarter that are primarily AI-generated using each AI detector. Then, we simply take the average of those AI detector-level estimates.\n\nThe raw data with classifications is available [here](https://docs.google.com/spreadsheets/d/1EzkAbI_jvUi8UA9iYW9SRko3HmNSU2qbXu5FrQ5CjWI/edit?gid=1532631691#gid=1532631691). Note that we do not include the URLs to avoid identifying specific companies that may be publishing AI-generated articles.\n\n### Comparison with Our Prior Study\n\nWe previously published a [study](https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans) on the same topic in October 2025. The differences from our prior study are:\n\n- We extended our Common Crawl sample to include articles published through March 2026.\n- We used three AI detectors instead of one, and averaged their detections. This method is preferable because we do not rely on the accuracy of a single detector.\n- The methodology is similar in that for Pangram and Copyleaks, we consider an article primarily AI-generated when a majority of its content is detected as using AI. For GPTZero, we use its article-level predictions.\n\nThe overall story is the same: a steep rise in primarily AI-generated articles after ChatGPT’s release and a plateau near 50% more recently. However, the percentage of primarily AI-generated articles we find by using multiple detectors is slightly lower than before (3.3 percentage points, on average), due to the more robust averaging method.\n\n## Limitations\n\n### AI-Assisted Articles\n\nMany people incorporate AI into their content creation process. One strategy is to ask AI to create a first draft, then have a human in the loop to edit or rewrite it. We did not evaluate the accuracy of AI detectors using this strategy.\n\n### AI Models\n\nAI models continue to improve, and may become harder to detect. We only evaluate the false negative rate on articles generated by GPT-5, Gemini 3.1 Pro, and Claude Opus 4.6. The AI detection algorithm may have lower accuracy when applied to articles generated by other models.\n\n### Acknowledgements:\n\nWe are grateful to Pangram, Copyleaks, and GPTZero for allowing us to use their AI detectors for this study.\n\n[Pangram](https://www.pangram.com): AI detection platform that identifies AI-generated and AI-assisted writing with detailed authenticity analysis.[Copyleaks](https://copyleaks.com): Content integrity platform offering AI detection and plagiarism checking across text, code, and documents.[GPTZero](https://gptzero.me): Detects AI content from ChatGPT, GPT-5, Claude, Gemini, and checks writing quality to make every word worth reading.\n\nWe are also grateful to Common Crawl for providing free web crawl data to researchers since 2008.\n\n## Appendix\n\nResults by AI detector:", "url": "https://wpnews.pro/news/ai-now-writes-as-many-online-articles-as-humans", "canonical_source": "https://graphite.io/five-percent/ai-now-writes-as-many-online-articles-as-humans-do", "published_at": "2026-05-29 16:42:27+00:00", "updated_at": "2026-05-29 17:18:51.777552+00:00", "lang": "en", "topics": ["artificial-intelligence", "generative-ai", "large-language-models", "natural-language-processing", "ai-research"], "entities": ["ChatGPT", "Pangram", "GPTZero", "Copyleaks", "Google"], "alternates": {"html": "https://wpnews.pro/news/ai-now-writes-as-many-online-articles-as-humans", "markdown": "https://wpnews.pro/news/ai-now-writes-as-many-online-articles-as-humans.md", "text": "https://wpnews.pro/news/ai-now-writes-as-many-online-articles-as-humans.txt", "jsonld": "https://wpnews.pro/news/ai-now-writes-as-many-online-articles-as-humans.jsonld"}}