{"slug": "github-availability-report-may-2026", "title": "GitHub availability report: May 2026", "summary": "GitHub experienced nine incidents causing degraded performance across its services in May 2026, including a major disruption on May 4 that lasted over an hour and affected pull requests, issues, actions, and Copilot. The outage was triggered by a routine database schema migration that saturated connection capacity during peak traffic, impacting approximately 1.3% of requests at its height. GitHub reported progress on infrastructure improvements, including moving 40% of monolith traffic to Azure and isolating key database services to prevent future cascading failures.", "body_md": "# GitHub availability report: May 2026\n\nIn May, we experienced nine incidents that resulted in degraded performance across GitHub services.\n\nIn [March](https://github.blog/news-insights/company-news/addressing-githubs-recent-availability-issues-2/) and [April](https://github.blog/news-insights/company-news/an-update-on-github-availability/) we shared updates on GitHub’s availability and infrastructure investments. As that work continues and we approach some major milestones, we wanted to start sharing more regular updates in our monthly availability reports. So before we dive into incidents from May, here’s how we’re tracking with our ongoing work to make GitHub more reliable.\n\n**Our progress in making GitHub more resilient**\n\n**The short version:** GitHub’s traffic is growing rapidly, driven in large part by AI-assisted and agentic development workflows, and we’ve been transforming our infrastructure to keep up with it. That means moving to Azure for elastic capacity, breaking our monolith apart into isolated services, and eliminating the shared failure points that have driven past incidents.\n\nHere’s where we stand. We’re now serving **40% of monolith traffic from Azure** (up from 8% in February), with Git traffic at 30% and repository replication at 99%. We’ve more than doubled our effective capacity in four months. At the same time, we’re completing the isolation of our primary database cluster: splitting users, authentication, and authorization into independent domains so that a problem in one can no longer cascade across the platform. Our new users service is fully cut over, and handling double the traffic at substantially lower database cost. Stateless authentication tokens are also rolling out, eliminating per-request database lookups that amplified pressure during traffic spikes.\n\nWe are making structural changes that permanently remove failure modes. We acknowledge that we have work to do, but we’re committed to getting it done and making GitHub reliable when and where you need it. The principle guiding our decision is simple: **availability, then capacity, then features.**\n\nThanks for your partnership as we keep building GitHub’s reliability and resilience.\n\nIn May, we experienced nine incidents that resulted in degraded performance across GitHub services.\n\n**May 04 15:45 UTC (lasting 55 minutes)**\n\nOn May 4, 2026, between 15:34 and 16:40 UTC, github.com experienced a service disruption that produced elevated latency and an increased rate of request failures across a broad set of customer-facing services. Total customer impact lasted approximately one hour and six minutes.\n\nThe most significantly impacted service was pull requests, which was statused `Red`\n\nfor the duration of peak impact. Issues, actions, webhooks, and Git operations experienced elevated latency and intermittent errors. A number of dependent services—including Codespaces, Pages, Packages, OAuth and GitHub Apps, Marketplace, and Copilot—also saw varying degrees of degraded performance due to shared data dependencies. At peak, approximately 1.3% of requests returned a 5xx response, averaging around 0.46% across the duration of the incident.\n\nThe disruption was triggered by a routine online schema migration running against a large, heavily-accessed database table. The migration had been progressing without issue for several hours, but as traffic ramped up toward the weekly peak, the combined load from the migration and normal production traffic saturated database connection capacity. This produced query contention on a primary database and cascading timeouts across services that depend on it.\n\nThe incident was detected within approximately three minutes of the first signs of impact through a combination of automated monitoring and on-call observation. Once the contributing migration was identified, it was paused, and dependent services recovered shortly thereafter. Time to mitigation was approximately 33 minutes, and full resolution followed approximately 30 minutes later\n\nAs follow-up, we are implementing several improvements to reduce the likelihood and blast radius of a similar event. Migrations against large, high-traffic tables will be more tightly aligned with low-traffic windows and will use dynamic throttling that adapts to live cluster load. We are adding automated circuit breakers that will pause in-flight migrations when latency or connection utilization on the underlying database crosses safe thresholds, and we are extending our monitoring so that migration-induced pressure (i.e., write rate, lock time, and connection saturation) triggers alerts before customer impact occurs. In parallel, we are reviewing connection-pool capacity to ensure adequate headroom is maintained while migrations are running.\n\n**May 05 13:37 UTC (lasting 3 hours and 49 minutes)**\n\n**May 06 07:19 UTC (lasting 2 hours and 25 minutes)**\n\nOn May 5 and May 6, GitHub Actions was degraded across two related incidents affecting hosted runners. The two events were connected: remediation work performed after the May 5 incident introduced the configuration issue that triggered the May 6 incident.\n\nOn May 5, 2026, from 13:22 to 17:05 UTC, GitHub Actions hosted runners in the East US region were degraded. Approximately 13.5% of jobs requesting a standard runner failed and ~16% of requested larger runners with private networking pinned to East US failed or were delayed by more than five minutes. Copilot code review requests were also impacted. Approximately 8,500 code review requests timed out during this window. Affected users saw an error comment on their pull requests and were able to retry by rerequesting a review. Most runner requests were picked up by other regions automatically, but a portion of requests still routing to East US were impacted.\n\nThis was triggered by a scale-up operation for hosted runner VMs in the East US region. This is a regular operation, but the VM create load hit an internal rate limit when VM creates pull images from storage. Existing backoff logic was not triggered because of the response code returned in this case. The rate limiting and VM creation failures were mitigated by reducing load to allow for recovery and allowing queued work to be processed. By 15:34 UTC, queued and failed job assignments were mostly mitigated, with less than 0.5% of runner assignments impacted between 15:34 and full recovery at 17:05.\n\nOn May 6, 2026, from 06:45 to 09:15 UTC, GitHub Actions Standard Ubuntu hosted runners were again degraded, and approximately 17.1% of jobs requesting a standard runner failed. The issue was caused by unexpected configuration data introduced during remediation work for the previous day’s incident, which blocked new allocations as daily load ramped up. We removed the problematic data at 08:51 UTC, allowing allocations to resume and runner pools to scale back up and recover.\n\nWe are improving our system’s throttling behavior when limits occur, improving our controls to more quickly mitigate similar situations in the future, and reviewing all limits end-to-end for similar operations. In addition, we are updating the filter logic for this allocation data to be resilient to abnormal data shapes and improving monitoring to alert when allocations are blocked, allowing the team to respond before customer impact starts.\n\n**May 06 11:21 UTC (lasting 38 minutes)**\n\nOn May 6, 2026, between 11:02 and 11:13 UTC, users were unable to start or view Copilot cloud agent or remote sessions. During this time, all requests to the session API returned errors, preventing users from creating new sessions or viewing existing ones. The issue was caused by a configuration change to the service’s network routing that inadvertently removed the ingress path for the service. The team reverted the change at 11:13 UTC which restored service. The incident remained open until 11:59 UTC while the team verified full recovery. We are taking steps to improve our deployment validation process to prevent similar configuration changes from impacting production traffic in the future.\n\n**May 06 15:25 UTC (lasting 3 hours and 39 minutes)**\n\nOn May 6, 2026, between 15:12 and 19:02 UTC, creation of new pull request review threads on github.com failed. This included new line comments and file comments on pull requests. Existing pull requests and previously created comments were unaffected.\n\nThis incident was caused by a 32-bit integer key reaching its maximum value in a Vitess lookup table used during pull request thread creation. The primary table was migrated to a 64-bit integer key, but the Vitess lookup table remained 32-bit. Once the values in the primary table passed the available 32-bit ID space attempts to create new review threads began failing, resulting in a near 100% failure rate for new thread creation requests. We mitigated the issue by updating the impacted lookup table definitions across all shards to use 64-bit integer column types, increasing the available ID range, and restoring normal operation. Service was fully restored once the schema changes competed globally.\n\nTo help prevent similar incidents, we are expanding existing monitoring of database columns to include Vitess lookup tables and enabling earlier detection of any tables approaching a column size limit.\n\n**May 07 05:02 UTC (lasting 1 hour and 54 minutes)**\n\nOn May 7, 2026, between 04:12 and 06:13 UTC, Copilot cloud agent and Copilot code review agent sessions for pull requests were delayed or failed to start.\n\nDuring the impact window, new Copilot coding agent sessions triggered by pull requests were not created, while review agent sessions dropped by ~50% from the baseline.\n\nThe issue was caused by follow-up recovery work from a separate [pull request incident](https://www.githubstatus.com/incidents/f5pb5d5mr9yh). As part of that recovery, we ran a large database migration, which caused replication delays on several replica hosts.\n\nAlthough those replicas were not serving user traffic, our safeguards correctly treated the elevated replication lag as a signal to slow down writes to the affected database cluster. As a result, some pull request background processing was temporarily delayed. That processing is responsible for sending the internal events that Copilot agents use to begin work, so affected agents did not start until the database replicas caught up.\n\nThe system recovered once replication lag returned to normal and pull request processing resumed. We are making critical background processing more resilient to replication lag so it degrades gracefully and keeps essential events flowing under strain, reducing the chance of similar secondary impact during future recovery work.\n\n**May 15 08:13 UTC (lasting 35 minutes)**\n\nOn May 15, 2026, from 07:43 to 08:48 UTC, GitHub Actions experienced a degradation that caused workflow runs to fail or experience delayed starts for a subset of customers. The incident was triggered by a planned failover of supporting infrastructure used by GitHub Actions. During that operation, an automated service discovery update did not propagate correctly, which caused traffic to be routed incorrectly and increased request timeouts in a core dependency for workflow orchestration.\n\nAt peak impact, 42% of Actions runs failed. Downstream services that depend on Actions workflow execution were also impacted, including GitHub Pages and Copilot cloud services. At 08:12 UTC, responders manually corrected the service discovery routing issue. Timeout and failure rates recovered shortly after, and we continued monitoring until full stabilization was confirmed across all affected services. The incident was marked resolved at 08:48 UTC.\n\nTo prevent recurrence, we are taking several steps. First, we are implementing failover guardrails that validate service discovery state before completing failover operations. Second, we are strengthening verification checks. Finally, we are improving dependency resilience to reduce timeout cascades during infrastructure events.\n\n**May 26 10:57 UTC (lasting 2 hours and 21 minutes)**\n\nOn May 26, 2026, between 10:40 and 12:56 UTC, GitHub Actions jobs were degraded. From 10:40 to 12:16 UTC, all newly queued runs failed to start. From 12:16 to 12:56 UTC, runs that required downloading actions for their workflows continued to fail. GitHub Pages, Copilot code review, Copilot coding agent, Octoshift, and GitHub Enterprise Importer were also impacted due to their dependency on GitHub Actions.\n\nThis was caused by our automated account review system incorrectly suspending the service account used by GitHub Actions to authenticate workflow runs and download actions.\n\nWe mitigated by restoring the account at 12:16 UTC, marking it exempt from further automated review at 12:20 UTC, and redeploying a related service at 12:48 UTC to flush cached account state. Full recovery was confirmed at 12:56 UTC.\n\nDuring this incident, a small number of issues, pull requests, comments, and discussions were marked as hidden when the service account was disabled. No data was lost. All content hidden because of this incident has been restored and full search index restoration was completed.\n\nTo prevent a recurrence, we have added an allowlist of all service accounts that cannot be suspended by automated systems, and ensuring these protections are enforced consistently across all account management tooling. We are also improving diagnostic tooling for accounts and reducing cache propagation delays to shorten time to mitigate similar incidents in the future.\n\n**May 28 19:01 UTC (lasting 1 hour and 40 minutes)**\n\nOn May 28, 2026, between 18:27 and 20:41 UTC, the GitHub Copilot service was degraded due to an issue with the Responses API of an upstream provider affecting the GPT-5.2, GPT-5.3-Codex, GPT-5.4, and GPT-5.5 models. Requests routed to these models via the Responses API returned elevated error rates, which also affected Copilot coding agent and Copilot code review. No other models were impacted.\n\nWe mitigated the incident by shifting traffic away from the affected models while the upstream provider deployed a fix.\n\nGitHub is working to improve automated failover for the affected models and strengthen monitoring to prevent similar incidents in the future.\n\nFollow our [status page](https://www.githubstatus.com/) for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the engineering section on the [GitHub Blog](https://github.blog/category/engineering/).\n\n## Tags:\n\n## Written by\n\n## Related posts\n\n###\n[\nGitHub Universe is back: All together now, in the agentic era ](https://github.blog/news-insights/company-news/github-universe-is-back-all-together-now-in-the-agentic-era/)\n\nGitHub Universe is back: returning to the historic Fort Mason Center in San Francisco on October 28–29, 2026.\n\n###\n[\nGitHub Copilot app: The agent-native desktop experience ](https://github.blog/news-insights/product-news/github-copilot-app-the-agent-native-desktop-experience/)\n\nAt Microsoft Build 2026, GitHub introduced new tools, updates, and surfaces so agents can work the way you already work.\n\n###\n[\nStill a developer. Just outside. Our latest GitHub Shop collection is here. ](https://github.blog/news-insights/company-news/still-a-developer-just-outside-our-latest-github-shop-collection-is-here/)\n\nThe ESC collection lets you escape the confines of your desk and get out into the sun where good ideas are bound to happen.", "url": "https://wpnews.pro/news/github-availability-report-may-2026", "canonical_source": "https://github.blog/news-insights/company-news/github-availability-report-may-2026/", "published_at": "2026-06-11 21:30:15+00:00", "updated_at": "2026-06-11 21:43:10.486166+00:00", "lang": "en", "topics": ["ai-infrastructure"], "entities": ["GitHub", "Azure", "Microsoft"], "alternates": {"html": "https://wpnews.pro/news/github-availability-report-may-2026", "markdown": "https://wpnews.pro/news/github-availability-report-may-2026.md", "text": "https://wpnews.pro/news/github-availability-report-may-2026.txt", "jsonld": "https://wpnews.pro/news/github-availability-report-may-2026.jsonld"}}