Eileen McCormick public
[search 0]
Download the App!
show episodes
 
This is the Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Catch our Outage Deep Dive series for special coverage of Internet outages. We go under the hood to determine what happened, covering key lessons and ways IT teams can minimize downtime in similar situations. Also tune in every other week for the Pulse Update podcast series to hear from the Internet experts at ThousandEyes as they share the latest data on ISP outages, public cloud provid ...
  continue reading
 
Loading …
show series
 
Explore what happened during recent outages at google.com, X (formerly Twitter), and CDN service jsDelivr. The Internet Report team will also discuss why a detailed understanding of every component in your service delivery chain is vital to maintain the availability and resiliency of your service. If even one component encounters challenges, the en…
  continue reading
 
Go under the hood of a ChatGPT outage, H&R Block’s Tax Day disruption, and more incidents from the past few weeks. The Internet Report team will also discuss Microsoft’s update on recent subsea cable cuts and the latest global outage trends. ——— CHAPTERS: 00:00 Intro 00:57 ChatGPT Outage 03:35 Revisiting West Coast of Africa Cable Cuts 09:07 H&R Bl…
  continue reading
 
With tax season coming to a close in the United States, IT teams at tax preparation companies and other organizations in the industry will be taking extra care to make sure that their systems can handle a spike in traffic due to a potential last-minute rush of filings. Tune in to hear The Internet Report hosts discuss how IT teams can navigate majo…
  continue reading
 
The end-to-end delivery of modern digital services can introduce a complex web of dependencies and failure points, which can stem from direct relationships as well as third-party providers, introducing layers of abstraction for operations teams to keep track of. Managing this complex ecosystem can be challenging. Unexpected issues may arise from se…
  continue reading
 
Over a two-day period this past week, major social media platforms—Meta’s Facebook and Instagram, LinkedIn, and Discord—all experienced disruptions. In the same timeframe, Comcast was also impacted by an outage that affected access to specific services and applications. Meta experienced issues with its log-in process, Discord navigated unexpectedly…
  continue reading
 
Load is a fundamental but, at times, challenging variable for networks and operations teams to handle. In the past few weeks, ThousandEyes saw various load-related problems impact organizations including Google Cloud, Front, several Australian banks, and Minnesota State University Moorhead. Tune in to learn more about what happened during these inc…
  continue reading
 
When outages happen, it’s what you do next that matters. It’s important to have a backup plan in place that you can quickly activate to minimize the impact of an incident. Over the past two weeks, companies initiated a range of resiliency actions, including asking customers to use alternate authentication methods (or to avoid logging out of a servi…
  continue reading
 
The ThousandEyes Internet Intelligence team joins us from Cisco Live in Amsterdam, talking about a major theme from the event—security. Tune in to hear their thoughts on how visibility can help companies in their security efforts, the sovereignty of data in flight, and why you don’t have to choose between security and performance. ——— CHAPTERS 00:0…
  continue reading
 
What happened during the recent Microsoft Teams and Azure disruptions? Go under the hood of these incidents and also explore other recent disruptions in this week’s Pulse Update. CHAPTERS - 01:03 Network issue leads to Microsoft Teams service disruption - 04:09 Azure Resource Manager exhausts capacity, causing service issues - 06:20 Oracle Cloud ex…
  continue reading
 
What caused recent dips in performance for OpenAI’s ChatGPT? Tune in to hear The Internet Report team unpack this and other recent disruptions, including a hack that led to an outage at the Spanish branch of the Orange mobile network, and a blip for customers of the cloud services provider DigitalOcean. They’ll also cover the outage trends they’re …
  continue reading
 
As they launch into 2024, organizations are facing a different outage landscape than they had at the start of 2023. The past year saw increases in cloud service provider (CSP) outages, application outages, and the percentage of U.S.-centric outages—all of which point to an evolution in the way outages happen and the need for different strategies to…
  continue reading
 
As 2023 comes to a close, in the spirit of Dickens’ holiday classic “A Christmas Carol,” let’s reflect on the valuable insights left by the ghosts of network operations teams past, present, and yet to come. Tune in to hear host Mike Hicks (Principal Solutions Analyst at ThousandEyes) discuss lessons from the NetOps teams of the past, the current st…
  continue reading
 
Recent changes appeared to trigger a series of events for two peering points internationally—with very different impacts. Tune in to learn more about these incidents, why they differed, and the lessons they leave. Mike Hicks, Principal Solutions Analyst at ThousandEyes, will also cover the latest outage numbers and explore other recent incidents, i…
  continue reading
 
As companies gear up for Black Friday, The Internet Report team shares some best practices for delivering great customer experiences and minimizing downtime during one of the retail industry’s biggest days of the year. Mike Hicks, Principal Solutions Analyst at ThousandEyes, will cover some helpful case studies of Black Fridays that experienced som…
  continue reading
 
Backend-related incidents have been a recurring theme in outages across 2023, caused by everything from data center issues and hardware mishaps to failures at common (shared) services. Recently, we saw two examples of these backend issues when data center power problems led to outages at both Cloudflare and Workday. Tune in to hear more about what …
  continue reading
 
This Halloween, The Internet Report team is sharing some of their most thrilling (and chilling) networking tales. Pull up a chair (and a big bowl of your favorite Halloween candy) to hear what happened—and important lessons learned. ——— CHAPTERS 00:00 Intro 01:40 Haunting obstacles with a dynamic routing protocol that thwarted crew changes on an oi…
  continue reading
 
In recent weeks, back-end infrastructure work and other backend-related issues impacted various online and consumer banking services, including DBS and Citibank in Singapore. Simple front-facing customer experiences that we’ve become accustomed to today can often mask considerable complexity on the backend. The service delivery chain of technologie…
  continue reading
 
Outages and degradations can happen when underlying data isn’t fresh enough. In recent weeks, stale data may have contributed to incidents at both Slack and Cloudflare. Slack began experiencing issues when, by our best guess, its app stopped trusting the freshness of the data in the cache; and, separately, Cloudflare’s 1.1.1.1 DNS resolver ran into…
  continue reading
 
Providing great digital experiences relies on a complex service delivery chain. The past few weeks brought multiple reminders that the root cause of cloud and app disruptions often comes down to one single link in this chain. While the component at issue may appear small, if it’s not functioning normally, the consequences can be significant. Additi…
  continue reading
 
In a world that operates at “hyperscale,” the potential for hyperscale-sized problems is also very real. The measure of a good provider—and a well-engineered system—is how well they handle these anomalous conditions and minimize disruption. During recent weeks, some of these hyperscale-sized outages hit, including data center-focused disruptions th…
  continue reading
 
An outage occurs, a change is rolled back, and everything stabilizes. But what happens when the change is attempted a second time? These second tries often go much more smoothly. While another outage might still occur during this “take two,” the impact is usually far less severe. The engineering team has learned from what went wrong the first time …
  continue reading
 
Context matters when working on a distributed web-based application or service where everything is linked and dependent on each part functioning correctly. It’s all too easy for one team to make a change that unexpectedly affects something another team is working on. Or the combined impact of both changes may also accidentally break something. To a…
  continue reading
 
In an end-to-end service delivery chain, isolated changes can have broad consequences. This played out recently when an erroneous SSL certificate change at Microsoft appeared to cause a SharePoint Online and OneDrive for Business outage. While this incident definitely underscores the importance of valid security certificates, it’s also a reminder o…
  continue reading
 
Let’s face it. Not every contingency can be planned for. Sometimes an outlier scenario pops up and causes an unexpected outage or disruption. Over the past few weeks, multiple companies appeared to be impacted by such edge cases: Azure; GitLab; and Meta’s WhatsApp, Facebook, Instagram, and Threads—its newest addition. Tune into the latest Pulse Upd…
  continue reading
 
The application opens, but users encounter errors when they try to do anything—what gives? It’s the curious case of the disappearing backend. Discover why application issues often show up like this, with the service reachable but unresponsive beyond rendering a basic landing page, and sometimes an accompanying error message. In this episode, hosts …
  continue reading
 
Though network outages are still far more common, application outages seem to be increasing in 2023—and having bigger impacts. Tune in to learn more about this trend and dive into incidents at Okta and Instagram. Host Mike Hicks will also explore other outage trends from the first half of the year in this special episode reflecting on the state of …
  continue reading
 
For three consecutive years, there appears to have been a spike in outages and degradations in May. A potential “spring cleaning effect” may explain why. Tune in to learn more about this possible trend and explore what happened during recent incidents at Twitter; Microsoft 365; Slack; Instagram; Apple’s iMessage; and subscription-based streaming se…
  continue reading
 
Tune in to explore ways that outages can impact distributed software development teams and what companies can learn from recent incidents at GitHub, Google Cloud, and Apple. To learn more, check out these links: Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-outages-and-distributed-dev-teams?utm_s…
  continue reading
 
When it comes to your technology strategy, it's a good idea to have more than one way to access every resource—just in case. As IT environments have changed, so has the thinking around the right approaches to achieve this desired redundancy. Two recent incidents at Google Cloud and Microsoft 365 reinforce the importance of redundancy—and the need f…
  continue reading
 
Understanding the unique characteristics of different kinds of Internet outages can help you quickly recognize the type of incident you’re dealing with and take the right steps to mitigate its impact. This week’s episode discusses the anatomy of common outage categories and explores recent case studies: - Security-related incidents: Western Digital…
  continue reading
 
This week’s Pulse Update unpacks OpenAI’s ChatGPT outage and discusses why the outage actually represented a pragmatic move on the part of OpenAI. We’ll also discuss global outage trends; explore other recent incidents at Dish Network, Microsoft, and Virgin Media UK; and look at why responses to performance problems vary, based on application chara…
  continue reading
 
On April 4, 2023, Virgin Media UK (AS 5089) experienced two outages that impacted the reachability of its network and services to the global Internet. The two outages shared similar characteristics, including the withdrawal of routes to its network, traffic loss, and intermittent periods of service recovery. In this episode, we discuss how the outa…
  continue reading
 
HTTP 403, 503, and 504 status codes dominated the last few weeks as multiple companies experienced application degradations and outages. These incidents at companies like Okta, Twitch, Reddit, and GitHub leave important lessons on navigating similar issues and minimizing downtime for your own users. To learn more, check out the links below: - Inter…
  continue reading
 
It was an eventful fortnight on the Internet as Twitter, Dish Network, Akamai, and Ticketek Australia all experienced outages. Tune into our latest episode for insights from our analysis of these events and practical tips for IT teams. To learn more, check out the links below: - Internet Report: Pulse Update Blog: ttps://www.thousandeyes.com/blog/i…
  continue reading
 
In the space of a week, we saw two data center-related incidents lead to long Microsoft and Oracle outages. Join us as we analyze these outages and ways IT teams can minimize downtime in similar situations. We’ll also discuss a series of application issues that impacted companies including Twitter and Tesla. To learn more, check out the links below…
  continue reading
 
We discuss insights from a recent trio of similar incidents at Microsoft, Cloudflare, and Slack, along with other outage news, including a Comcast outage that impacted some Philadelphia neighborhoods on Super Bowl Sunday. 00:00 Intro 00:58 Outage Trends: By the Numbers 4:33 Microsoft Outage (Jan. 25) 4:58 Cloudflare Outage (Jan. 24) 9:27 Slack Outa…
  continue reading
 
Live from #CiscoLiveEMEA, we discuss the Feb. 7 Microsoft Outlook outage to understand how the event unfolded, why it may have played out the way it did, and what you can learn from this outage event. To dive deeper, check out the links below: Explore the outage in the ThousandEyes platform (NO LOGIN REQUIRED) Microsoft Outlook Outage Analysis Blog…
  continue reading
 
In this episode, we cover the latest internet trends and unpack important takeaways from the recent FAA, Fastly, and Microsoft outages. We also discuss how several early 2023 outages and disruptions reinforced the need for application monitoring and testing to counter, or at least anticipate the effect of, anomalous conditions on certain routes. 00…
  continue reading
 
At around 7:05 a.m. UTC on January 25, 2023, Microsoft started experiencing service related issues. At the same time, ThousandEyes observed BGP withdrawals and a significant number of route changes that resulted in a high amount of packet loss, ultimately affecting various services like Outlook, Teams, SharePoint, and others. 00:00 Welcome: This is…
  continue reading
 
This episode covers the latest global network outage numbers and interesting end-of-year trends; how resilient application architectures, clouds, and networks are challenging old ways of thinking; and a deep dive into an outage that disrupted Spotify’s music streaming on December 14, 2022. To learn more, check out the links below: Internet Report P…
  continue reading
 
This is the Internet Report: Pulse Update, where we review and provide analysis of significant outages and trends across the Internet, from the previous two weeks. Every other week, we'll publish a new episode covering the latest tally of outage events, and highlighting a few interesting outages. This week, in addition to our usual look at global a…
  continue reading
 
Starting at ~12:12 UTC on Dec 12, 2022, an ISP in the Democratic Republic of Congo leaked a route belonging to the Quad9 DNS service, causing some traffic, including Verizon US customer traffic, to get routed to Africa for ~90 minutes. High traffic loss was observed throughout the incident which was resolved at ~13:40 UTC. 00:00 Welcome: This is Th…
  continue reading
 
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. In this episode, we unpack four notable outages that impacted WhatsApp, Zscaler, Salesforce, and Facebook, which all appear to have a common theme. Join our co-hosts Mike Hicks, Principal Solutions Analyst at ThousandEyes, and Chris Villemez, T…
  continue reading
 
We're back! 00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On this episode, our newest host, Chris Villemez, is joined by Kemal Sanjta to discuss a BGP-related incident that took down Twitter for many users around the globe on March 28th. 00:36 Under the Hood: Chris Villemez …
  continue reading
 
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On today’s episode, our newest host and Technical Marketing Engineer, Chris Villemez, is joined by Kemal Sanjta, Principal Engineer, to dive into the details of the recent AWS outages from December 7th, 10th and 15th. They’ll walk through what …
  continue reading
 
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:15 Headlines: Today we’re going to do a thorough analysis of the major Facebook outage that took place yesterday, Monday, October 4. I’m joined by ​​Gustavo Ramos, ThousandEyes’ in-house expert on Network Engineering. Thousand…
  continue reading
 
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:08 Headlines: Today, Mike Hicks (Principal Solutions Analyst, ThousandEyes) and I discuss a recent BGP routing incident that had intermittent impacts on Amazon’s services, including Amazon.com and AWS compute resources, during…
  continue reading
 
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined today by Mike Hicks, principal solutions analyst here at ThousandEyes, to cover the outage of Akamai’s DNS service. The outage, which occurred on July 22nd around 3:38 PM UTC (8:38AM PT), struck during the course of business hours in…
  continue reading
 
00:00 Welcome:This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:13 Headlines: Today, Kemal and I unpack an interesting BGP incident, in which a large-scale route leak briefly altered traffic patterns across the Internet. 00:58 Under the Hood: The incident began on Thursday, June 3rd at arou…
  continue reading
 
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined by ThousandEyes’ BGP expert, Kemal Sanjta, to review the June 16th outage of Prolexic Routed, a DDoS Mitigation Service operated by Akamai. According to a statement from Akamai, the outage was not due to a DDoS attack or system updat…
  continue reading
 
Loading …

Quick Reference Guide