The “ITOps, DevOps, AIOps - All Things Ops” podcast is dedicated to operating and managing modern large-scale IT infrastructures. If you want to learn best practices from other Leaders in IT operations, this show is for you. Each episode features an interview with a senior IT executive or Thought Leader, discussing topics like: 1. How to manage the increasing complexity of hybrid IT infrastructures 2. How to effectively leverage automation to “do more with less” 3. Getting the most out of mo ...
…
continue reading
Covering everything from the future of organizational culture to accelerated cloud adoption, in this inaugural podcast series, Splunk thought leaders sit down with Principal Analyst Daniel Newman to discuss the unique intricacies organizations are navigating in an era of rapid digital transformation and how data remains the key solution to thriving in uncertain times.
…
continue reading
This is The Internet Report, a podcast uncovering what’s working and what’s breaking on the Internet—and why. Tune in to hear ThousandEyes’ Internet experts dig into some of the most interesting outage events from the past couple weeks, discussing what went awry—was it the Internet, or an application issue? Plus, learn about the latest trends in ISP outages, cloud network outages, collaboration network outages, and more.
…
continue reading

1
Troubleshooting Tips & Outages at Zoom, Spotify & More
16:41
16:41
Play later
Play later
Lists
Like
Liked
16:41Dive into recent service disruptions at Zoom, Spotify, SAP Concur, and Vanguard UK, and explore what they reveal about troubleshooting best practices for ITOps teams. Tune in now for insights from The Internet Report team or use the chapters below to jump to the sections that most interest you. CHAPTERS: 00:00 Intro 00:52 Zoom Outage 04:40 SAP Conc…
…
continue reading

1
Ep. 52 - Keeping the Lights On at Visa: How to Engineer for Reliability at Scale - with Divya Veerapandian
37:24
37:24
Play later
Play later
Lists
Like
Liked
37:24How do you ensure reliability across hybrid infrastructure when the cloud half isn’t fully in your control? In this episode, host Elias Voelker talks with Divya Veerapandian, Senior Director for Infrastructure Reliability Engineering and Global Head of Infrastructure Network Services at Visa. From network automation and hybrid observability to AI-d…
…
continue reading

1
Why Even 1% Packet Loss Can Impact User Experiences
23:17
23:17
Play later
Play later
Lists
Like
Liked
23:17Packet loss can be bad news for network flows and customer experience. However, in our experience, NetOps teams tend to focus on major spikes in packet loss, while overlooking smaller amounts like 1 or 2%. This might be a mistake. Tune in for a deep dive into research findings suggesting that even 1% packet loss can significantly impact user experi…
…
continue reading

1
Ep. 51 - IT for Human Rights: Scaling Secure Infrastructure for a Global Nonprofit – with Lydia Nicola
45:36
45:36
Play later
Play later
Lists
Like
Liked
45:36For Lydia Nicola’s IT team, a security breach isn’t just a risk—it could mean life or death for the people they protect. In this episode, Elias Voelker speaks with Lydia Nicola, Head of IT Operations at Amnesty International, about the unique challenges of running IT for a global nonprofit. Lydia shares how her team secures infrastructure in politi…
…
continue reading

1
Understanding Service Disruptions at X, Workday & Mastercard
16:57
16:57
Play later
Play later
Lists
Like
Liked
16:57Go under the hood of recent service disruptions at X, Workday, and Mastercard—and explore why it’s so important to quickly (and accurately) identify the root cause of an outage. ——— CHAPTERS 00:00 Intro 00:59 X Outage 07:08 Workday Outage 11:00 Mastercard Service Disruption 14:48 By the Numbers 16:05 Get in Touch ——— For additional insights, check …
…
continue reading

1
Unpacking the Slack Outage & Other Backend Issues
18:39
18:39
Play later
Play later
Lists
Like
Liked
18:39Dive into the recent Slack outage and disruptions at Microsoft 365, Grafana Cloud, and Otter.ai—plus, explore key takeaways for ITOps teams. ——— CHAPTERS: 00:00 Intro 00:48 Slack Outage 06:55 Microsoft 365 Outage 11:44 A Pair of Otter.ai Outages 14:21 Grafana Cloud Disruption 15:55 By the Numbers 17:58 Get in Touch ——— To learn more about how to de…
…
continue reading

1
Configuration Mishaps Strike Again: Asana Outages & More News
31:01
31:01
Play later
Play later
Lists
Like
Liked
31:01Outages connected to configuration mishaps were a common theme last year, and we’ve continued to see incidents like these in 2025. Configuration changes triggered two consecutive Asana outages in early February, and configuration or update-related issues may also have contributed to recent disruptions at Barclays, ChatGPT, Jira, and Discord. Tune i…
…
continue reading

1
Ep. 50 - Data Center Efficiency: Monitoring, AI, and Decarbonization - with Martin Casaulta & Martin Hirschvogel
45:03
45:03
Play later
Play later
Lists
Like
Liked
45:03How can you drive data center efficiency without compromising performance? In this episode, host Elias Voelker sits down with Martin Casaulta, Chief Technologist at Hewlett Packard Enterprise Switzerland, and Martin Hirschvogel, Chief Product Officer at Checkmk. They discuss data center efficiency—from traditional metrics like PUE to the impact of …
…
continue reading

1
The Show Must Go On: ITOps Lessons From the Events Industry
31:33
31:33
Play later
Play later
Lists
Like
Liked
31:33What does it take to deliver successful digital experiences at major events like concerts and conferences? With special guest Dominic Hampton—Managing Director at attend2IT—we’ll explore the dynamic world of event IT and key takeaways ITOps teams at enterprise companies can apply to their own events as well as in their day-to-day operations. We’ll …
…
continue reading

1
Ep. 49 – 90% First-Contact Resolution: How to Build a Secure, Efficient IT Helpdesk– with Peg Kearney
41:45
41:45
Play later
Play later
Lists
Like
Liked
41:45How do you hit 90% first-contact resolution—and keep it there? In this episode, Elias speaks with Peg Kearney, Director of IT Operations at the University of Arizona College of Nursing, about how her helpdesk team maintains a 90% first-contact resolution rate by hiring top talent and providing them with the right tools and system access. Peg also h…
…
continue reading

1
Configuration Change Trouble & Other 2024 Outage Trends
21:55
21:55
Play later
Play later
Lists
Like
Liked
21:55Configuration changes played an outsized role 2024 outages. Tune in to hear more about this and other outage trends—and learn how ITOps teams should plan accordingly in the year ahead. We’ll also share insights from recent incidents at OpenAI and Google Cloud’s Pub/Sub, and dive deeper into a degradation incident that Netflix experienced at the end…
…
continue reading

1
Ep. 48 - From Surviving to Thriving: How a Major Cyberattack Sparked a Full IT Transformation - with Thomas Klithav Hansen
34:11
34:11
Play later
Play later
Lists
Like
Liked
34:11A major cyberattack led Blue Water Shipping to completely transform its IT infrastructure—and now, they're stronger than ever. In this episode, host Elias Voelker sits down with Thomas Klithav Hansen, Head of IT Operations at Blue Water Shipping, to discuss how the attack became the catalyst for a transformative IT journey. You’ll learn: 1. The rol…
…
continue reading

1
2024 Outage Trends Solidify; Plus OpenAI & Meta Outages
19:49
19:49
Play later
Play later
Lists
Like
Liked
19:49With nearly a year of data available, the topline outage trends for 2024 are coming into focus. Tune in to see what the numbers are showing. The Internet Report team will discuss how Internet service provider (ISP) outage numbers are continuing to increase, while cloud service provider (CSP) outages are also becoming more frequent, indicating a cha…
…
continue reading

1
DigitalOcean, Reddit Outages & Worldline’s IT Perturbations
15:44
15:44
Play later
Play later
Lists
Like
Liked
15:44The past few weeks are somewhat of a representative sample of 2024 from an outage perspective, with connectivity issues and updates at the root of the four recent incidents. Both DigitalOcean and real-time payments provider Worldline experienced connectivity issues to data centers that made services unreachable. Meanwhile, Microsoft and Reddit enco…
…
continue reading

1
Ep. 47 - 35,000 to 130,000 Students in 5 Years: Scaling IT at Internationale Hochschule - with Thomas Singbartl
31:27
31:27
Play later
Play later
Lists
Like
Liked
31:27Since 2019, Thomas Singbartl, Head of Global IT Operations at Internationale Hochschule (IU), has supported the university's astonishing growth journey from 35,000 to 130,000 students. Join host Elias Voelker in this episode as Thomas shares how IT fueled IU’s exponential growth, why the term “digital transformation” no longer applies, and how Synt…
…
continue reading

1
Talking Proactive Optimization, ChatGPT Issues & More
19:19
19:19
Play later
Play later
Lists
Like
Liked
19:19Powerful things happen when ITOps teams move beyond a break-fix approach and lean into proactive optimization. Instead of just responding to issues as they occur, when teams have independent visibility into their end-to-end service delivery chain, they can proactively identify possible areas for optimization and improvement. For example, streamlini…
…
continue reading

1
Ep. 46 - IT for the City of Atlanta: Building Scalable and Resilient Systems - with Tameka Neely-Dudley
31:10
31:10
Play later
Play later
Lists
Like
Liked
31:10What can we learn about resilience, scalability, and workforce development from the IT organization of the 6th-largest metro in the US? In this episode, Tameka Neely-Dudley, Director of IT Infrastructure Operations and Service Delivery for the City of Atlanta, shares insights from her nearly 25-year career, beginning as an intern and growing into a…
…
continue reading

1
DORA & ITOps Best Practices; Plus BMO, Google Outages
30:13
30:13
Play later
Play later
Lists
Like
Liked
30:13The Digital Operational Resilience Act (DORA) goes into effect on January 17, 2025, and financial institutions serving the EU will need to meet an enhanced set of requirements related to risk management, network resilience, and incident reporting. While DORA is directly applicable to EU financial institutions, it prompts important discussions about…
…
continue reading

1
Ep. 45 - The Metrics That Matter: Optimizing ITSM by Focusing on Customer Effort – with Huseyin Uysal
36:35
36:35
Play later
Play later
Lists
Like
Liked
36:35Which KPIs really matter in IT Service Management? In this episode, Elias sits down with Huseyin Uysal, Head of Global Service Desk at ISS, to uncover what separates successful IT service management from the rest. With a wealth of experience managing global teams and optimizing IT processes, Huseyin reveals the metrics that really matter, how custo…
…
continue reading

1
Let’s Talk Status Pages & Salesforce, Microsoft Outages
18:08
18:08
Play later
Play later
Lists
Like
Liked
18:08A recent Salesforce outage highlighted the limitations of status pages and the importance of considering a variety of data points when identifying the source of an outage. Tune in to hear The Internet Report team discuss what happened and why. They’ll also share insights from a recent Microsoft Outlook outage and cover the latest Internet outage tr…
…
continue reading

1
Ep. 44 - Mature ITSM: How to Drive Top-Down Change and Build Well-Oiled IT Operations - With Haroon Hasan
39:01
39:01
Play later
Play later
Lists
Like
Liked
39:01What separates a well-oiled IT operation from one constantly putting out fires? In this episode, we dive deep into the world of IT Service Management (ITSM) with Haroon Hasan, author of "Choose to Lead" and Director of IT Service Management and Governance at Computacenter. With 20+ years of experience, Haroon shares insights on optimizing ITSM for …
…
continue reading

1
ServiceNow, Microsoft & Workday Outages, Explained
15:54
15:54
Play later
Play later
Lists
Like
Liked
15:54A recent certificate problem impacted ServiceNow, and other issues prevented users from accessing key cloud services including Microsoft 365, Azure Virtual Desktop, and Workday. Tune in to hear what happened during these incidents and a separate data center fire that caused a Reliance Jio outage for customers across multiple areas of India. Listen …
…
continue reading

1
Ep. 43 - Scaling Without the Cloud: How Sofascore Manages Millions of Real-Time Requests with Clever Caching - with Josip Stuhli
54:41
54:41
Play later
Play later
Lists
Like
Liked
54:41What happens when your infrastructure faces a live peak of millions of users worldwide—without cloud scalability? In this episode, Sofascore’s CTO Josip Stuhli breaks down how his team navigates massive traffic surges, optimizes caching, and saves big by ditching the cloud while still delivering real-time updates to 25 million monthly users. You'll…
…
continue reading

1
Managing Traffic During Peak Demand; Plus, Microsoft, Akamai Outages
19:17
19:17
Play later
Play later
Lists
Like
Liked
19:17During high-traffic seasons like Black Friday or a much-anticipated product launch, maintaining good digital experiences for customers is vital. We’ve all heard tales of floods of eager shoppers crashing a website during a major sale—leaving them unable to make their coveted purchases. To guard against a breakdown like this during high-traffic peri…
…
continue reading

1
Ep. 42 - Integrating Cybersecurity with Operations: Ensuring Impact and Efficiency at UNICEF USA - with Andrew Nuxoll
41:12
41:12
Play later
Play later
Lists
Like
Liked
41:12Successful cybersecurity isn’t about heroics, it’s about preventing disasters you’ll never hear about. In this episode, Andrew Nuxoll, Managing Director of IT Operations and Cybersecurity at UNICEF USA, shares his journey from working at various managed service providers to leading cybersecurity efforts at a global NGO. Andrew offers insights into …
…
continue reading

1
The Current Subsea Cable Ecosystem: Resiliency & What’s Next
22:45
22:45
Play later
Play later
Lists
Like
Liked
22:45Let’s dive into the fascinating world of subsea cables. With special guest Murray Burling—Executive Director of Oceans and Environment at RPS—we’ll explore the current subsea cable ecosystem and chat about what the future might hold. Tune in for insights on how important subsea cables are for today’s digital experiences, how decisions are made on w…
…
continue reading

1
Ep. 41 - IT Leadership in Higher Education: Strategies for Service Management, Optimal Customer Experiences, and Employee Growth - with Mark Katsouros
1:08:42
1:08:42
Play later
Play later
Lists
Like
Liked
1:08:42How can universities navigate the complexities of service delivery while pursuing growth and innovation? Mark Katsouros, Senior Director for IT Engineering and Operations at Duquesne University, brings nearly 40 years of higher education IT experience. From the University of Maryland to pivotal roles at the University of Iowa and Penn State, Mark h…
…
continue reading

1
Analyzing X’s Livestream & GitHub, Google Outages
16:35
16:35
Play later
Play later
Lists
Like
Liked
16:35Explore the recent Google Cloud and GitHub outages, plus get insights from a network perspective into the August 12 X livestream event featuring Elon Musk and Donald Trump. In the case of Google Cloud, a power issue in one of its European regions impacted connectivity and affected several services and networking equipment. The problems disrupted co…
…
continue reading

1
Why NetOps Is the Real MVP of the Sports World
33:11
33:11
Play later
Play later
Lists
Like
Liked
33:11This week, The Internet Report team and special guest Dave Anderson—a tech industry veteran and co-host of "A Very Melbourne Podcast," which covers the Australian Football League and more—are chatting about how to assure great digital experiences at major sporting events. Large sporting events are always logistically complex, and today that’s even …
…
continue reading

1
Unpacking the CrowdStrike Update, Azure Outage, & More
17:41
17:41
Play later
Play later
Lists
Like
Liked
17:41On July 19, many organizations around the globe—including airlines, banks, and hospitals—experienced outages as Windows machines reportedly got stuck in a boot loop that ultimately resulted in the Blue Screen of Death (BSOD). These disruptions had a common source: an update from CrowdStrike, a managed detection and response (MDR) service used to pr…
…
continue reading

1
Twitter to X: Charting Performance and Outages
18:16
18:16
Play later
Play later
Lists
Like
Liked
18:16On May 17, X reached a major milestone when the social media platform completed its full migration from twitter.com to x.com. While the number and frequency of outages did increase after the company’s acquisition by Elon Musk, following the domain migration, there don’t appear to have been any significant disruptions to the X.com platform. In this …
…
continue reading

1
Ep. 40 - Service Management in IT and Beyond - with Martijn Adams
51:40
51:40
Play later
Play later
Lists
Like
Liked
51:40Martijn Adams, General Manager at 4me, brings a lifetime of expertise in IT service management, having worked with leading companies such as Philips, Deloitte, and Danone. This episode delves into his journey and the unique approaches that 4me employs to streamline service management across IT, HR, and facilities. You'll discover how service manage…
…
continue reading

1
Ep. 39 - Scaling Cyware: Lessons From Growing the Company Headcount Fivefold - With Joe Aurilia
41:31
41:31
Play later
Play later
Lists
Like
Liked
41:31How can you scale your tech company while maintaining rigorous operational standards? Senior VP of Operations at Cyware Joe Aurilia shares what he learned while 5x-ing the company. In this episode, Joe shares how he's building operations from the ground up, handling the complexities of international teams, and embedding a culture of security and co…
…
continue reading

1
Insights From Outages at Starlink, Schwab & Internet Archive
16:54
16:54
Play later
Play later
Lists
Like
Liked
16:54Three recent outages at Starlink, Charles Schwab, and the Internet Archive highlight key reminders for NetOps teams around backup options, the role of intelligence, and understanding your end-to-end service delivery chain. A subset of Starlink users were unable to establish a connection; some users of Schwab.com and its apps may have found themselv…
…
continue reading

1
Cloud Outages Rise & Other H1 2024 Internet Outage Trends
21:31
21:31
Play later
Play later
Lists
Like
Liked
21:31Believe it or not, we’re already about halfway through 2024. Looking at the outage data from this year so far, we see continued evolution, following patterns observed over the past few years. Notably, the percentage of cloud service provider (CSP) outages is still increasing—though at a more accelerated rate than seen in recent years. Tune on to le…
…
continue reading

1
Ep. 38 - Transparency, Credibility, and Connection: Hard-Earned Lessons From 25 Years in IT - with Paul Teodorescu
56:05
56:05
Play later
Play later
Lists
Like
Liked
56:05Learn from Paul Teodorescu's 25 years of IT experience as he shares the importance of transparency, credibility, and connecting with people in the tech industry. In this episode, Paul shares his journey from crawling under desks at Merrill Lynch to advising top firms like Morgan & Morgan. Explore the nuances of interim management versus advisory ro…
…
continue reading

1
Meta and Salesforce Tackle Intermittent Issues
17:51
17:51
Play later
Play later
Lists
Like
Liked
17:51When it comes to assuring great digital experiences for your users, intermittent issues can be incredibly difficult to discover and diagnose because the service is both working and not working simultaneously—or, it may simply be running slow. Some users may experience issues, while for others, everything will work just fine. In this week’s episode,…
…
continue reading

1
Ep. 37 - How GenAI Is Reshaping the Way We Do ITOps - with Nathanial Smalley
48:14
48:14
Play later
Play later
Lists
Like
Liked
48:14The end of the traditional SRE? How do you see the future unfolding as AI's role in IT operations grows? In this episode, we welcome Nathanial Smalley, Principal Sales Engineer at Transposit. He brings his rich experience from over a decade at Splunk and his current role at Transposit to discuss the impact of AI on IT operations. He delves into pra…
…
continue reading

1
Outages at X, google.com, and jsDelivr + Why Details Matter
18:04
18:04
Play later
Play later
Lists
Like
Liked
18:04Explore what happened during recent outages at google.com, X (formerly Twitter), and CDN service jsDelivr. The Internet Report team will also discuss why a detailed understanding of every component in your service delivery chain is vital to maintain the availability and resiliency of your service. If even one component encounters challenges, the en…
…
continue reading

1
Inside the ChatGPT Outage & More News | Pulse Update
20:10
20:10
Play later
Play later
Lists
Like
Liked
20:10Go under the hood of a ChatGPT outage, H&R Block’s Tax Day disruption, and more incidents from the past few weeks. The Internet Report team will also discuss Microsoft’s update on recent subsea cable cuts and the latest global outage trends. ——— CHAPTERS: 00:00 Intro 00:57 ChatGPT Outage 03:35 Revisiting West Coast of Africa Cable Cuts 09:07 H&R Bl…
…
continue reading

1
WhatsApp & Apple Outages; Plus ITOps Tax Day Survival Tips
27:55
27:55
Play later
Play later
Lists
Like
Liked
27:55With tax season coming to a close in the United States, IT teams at tax preparation companies and other organizations in the industry will be taking extra care to make sure that their systems can handle a spike in traffic due to a potential last-minute rush of filings. Tune in to hear The Internet Report hosts discuss how IT teams can navigate majo…
…
continue reading

1
How Third-party Issues Led to McDonald’s, DMV Outages | Pulse Update
17:08
17:08
Play later
Play later
Lists
Like
Liked
17:08The end-to-end delivery of modern digital services can introduce a complex web of dependencies and failure points, which can stem from direct relationships as well as third-party providers, introducing layers of abstraction for operations teams to keep track of. Managing this complex ecosystem can be challenging. Unexpected issues may arise from se…
…
continue reading

1
Ep 36 - Hyper-Converged Infrastructures: The Answer to the Complexity of IT Systems? - with Lee Caswell
43:30
43:30
Play later
Play later
Lists
Like
Liked
43:30Over the next 3 years, more than 750 million new applications will hit the market... and nobody can predict what those applications will look like. In this episode, Lee Caswell, SVP of Product and Solutions Marketing at Nutanix, introduces Hyper-Converged Infrastructures: a groundbreaking solution that integrates computing, storage, and networking …
…
continue reading

1
Meta, LinkedIn, and Comcast Outages, Explained | Pulse Update
17:50
17:50
Play later
Play later
Lists
Like
Liked
17:50Over a two-day period this past week, major social media platforms—Meta’s Facebook and Instagram, LinkedIn, and Discord—all experienced disruptions. In the same timeframe, Comcast was also impacted by an outage that affected access to specific services and applications. Meta experienced issues with its log-in process, Discord navigated unexpectedly…
…
continue reading

1
Ep 35 - Cybersecurity Masterclass: Compliance and Breach Prevention in the Era of Cloud and AI - with Jason Ford
45:13
45:13
Play later
Play later
Lists
Like
Liked
45:13Cybersecurity as we know it today is still in its infancy, which begs the question: how will it mature in the wake of rapid cloud and AI innovations? In this episode, Dalarie is joined by Jason Ford, CEO and CISO at Steel Patriot Partners, who shares his in-depth insights into the evolving world of IT operations. From the Wild West of early cyberse…
…
continue reading

1
AT&T Outage and Disruptions at Google Cloud, Front, and More | Pulse Update
16:27
16:27
Play later
Play later
Lists
Like
Liked
16:27Load is a fundamental but, at times, challenging variable for networks and operations teams to handle. In the past few weeks, ThousandEyes saw various load-related problems impact organizations including Google Cloud, Front, several Australian banks, and Minnesota State University Moorhead. Tune in to learn more about what happened during these inc…
…
continue reading

1
Square Outage, Data Center Issues & Planning for Resiliency | Pulse Update
17:10
17:10
Play later
Play later
Lists
Like
Liked
17:10When outages happen, it’s what you do next that matters. It’s important to have a backup plan in place that you can quickly activate to minimize the impact of an incident. Over the past two weeks, companies initiated a range of resiliency actions, including asking customers to use alternate authentication methods (or to avoid logging out of a servi…
…
continue reading

1
Security, Great Digital Experiences & Why Visibility Matters
16:54
16:54
Play later
Play later
Lists
Like
Liked
16:54The ThousandEyes Internet Intelligence team joins us from Cisco Live in Amsterdam, talking about a major theme from the event—security. Tune in to hear their thoughts on how visibility can help companies in their security efforts, the sovereignty of data in flight, and why you don’t have to choose between security and performance. ——— CHAPTERS 00:0…
…
continue reading

1
Understanding the Microsoft Teams & Azure Disruptions | Pulse Update
16:48
16:48
Play later
Play later
Lists
Like
Liked
16:48What happened during the recent Microsoft Teams and Azure disruptions? Go under the hood of these incidents and also explore other recent disruptions in this week’s Pulse Update. CHAPTERS - 01:03 Network issue leads to Microsoft Teams service disruption - 04:09 Azure Resource Manager exhausts capacity, causing service issues - 06:20 Oracle Cloud ex…
…
continue reading

1
Unpacking Recent ChatGPT Issues & Other Outage News | Pulse Update
24:26
24:26
Play later
Play later
Lists
Like
Liked
24:26What caused recent dips in performance for OpenAI’s ChatGPT? Tune in to hear The Internet Report team unpack this and other recent disruptions, including a hack that led to an outage at the Spanish branch of the Orange mobile network, and a blip for customers of the cloud services provider DigitalOcean. They’ll also cover the outage trends they’re …
…
continue reading