Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
show episodes
 
The “ITOps, DevOps, AIOps - All Things Ops” podcast is dedicated to operating and managing modern large-scale IT infrastructures. If you want to learn best practices from other Leaders in IT operations, this show is for you. Each episode features an interview with a senior IT executive or Thought Leader, discussing topics like: 1. How to manage the increasing complexity of hybrid IT infrastructures 2. How to effectively leverage automation to “do more with less” 3. Getting the most out of mo ...
  continue reading
 
Artwork
 
Covering everything from the future of organizational culture to accelerated cloud adoption, in this inaugural podcast series, Splunk thought leaders sit down with Principal Analyst Daniel Newman to discuss the unique intricacies organizations are navigating in an era of rapid digital transformation and how data remains the key solution to thriving in uncertain times.
  continue reading
 
Artwork
 
This is The Internet Report, a podcast uncovering what’s working and what’s breaking on the Internet—and why. Tune in to hear ThousandEyes’ Internet experts dig into some of the most interesting outage events from the past couple weeks, discussing what went awry—was it the Internet, or an application issue? Plus, learn about the latest trends in ISP outages, cloud network outages, collaboration network outages, and more.
  continue reading
 
Loading …
show series
 
Dive into recent service disruptions at Zoom, Spotify, SAP Concur, and Vanguard UK, and explore what they reveal about troubleshooting best practices for ITOps teams. Tune in now for insights from The Internet Report team or use the chapters below to jump to the sections that most interest you. CHAPTERS: 00:00 Intro 00:52 Zoom Outage 04:40 SAP Conc…
  continue reading
 
How do you ensure reliability across hybrid infrastructure when the cloud half isn’t fully in your control? In this episode, host Elias Voelker talks with Divya Veerapandian, Senior Director for Infrastructure Reliability Engineering and Global Head of Infrastructure Network Services at Visa. From network automation and hybrid observability to AI-d…
  continue reading
 
Packet loss can be bad news for network flows and customer experience. However, in our experience, NetOps teams tend to focus on major spikes in packet loss, while overlooking smaller amounts like 1 or 2%. This might be a mistake. Tune in for a deep dive into research findings suggesting that even 1% packet loss can significantly impact user experi…
  continue reading
 
For Lydia Nicola’s IT team, a security breach isn’t just a risk—it could mean life or death for the people they protect. In this episode, Elias Voelker speaks with Lydia Nicola, Head of IT Operations at Amnesty International, about the unique challenges of running IT for a global nonprofit. Lydia shares how her team secures infrastructure in politi…
  continue reading
 
Go under the hood of recent service disruptions at X, Workday, and Mastercard—and explore why it’s so important to quickly (and accurately) identify the root cause of an outage. ——— CHAPTERS 00:00 Intro 00:59 X Outage 07:08 Workday Outage 11:00 Mastercard Service Disruption 14:48 By the Numbers 16:05 Get in Touch ——— For additional insights, check …
  continue reading
 
Dive into the recent Slack outage and disruptions at Microsoft 365, Grafana Cloud, and Otter.ai—plus, explore key takeaways for ITOps teams. ——— CHAPTERS: 00:00 Intro 00:48 Slack Outage 06:55 Microsoft 365 Outage 11:44 A Pair of Otter.ai Outages 14:21 Grafana Cloud Disruption 15:55 By the Numbers 17:58 Get in Touch ——— To learn more about how to de…
  continue reading
 
Outages connected to configuration mishaps were a common theme last year, and we’ve continued to see incidents like these in 2025. Configuration changes triggered two consecutive Asana outages in early February, and configuration or update-related issues may also have contributed to recent disruptions at Barclays, ChatGPT, Jira, and Discord. Tune i…
  continue reading
 
How can you drive data center efficiency without compromising performance? In this episode, host Elias Voelker sits down with Martin Casaulta, Chief Technologist at Hewlett Packard Enterprise Switzerland, and Martin Hirschvogel, Chief Product Officer at Checkmk. They discuss data center efficiency—from traditional metrics like PUE to the impact of …
  continue reading
 
What does it take to deliver successful digital experiences at major events like concerts and conferences? With special guest Dominic Hampton—Managing Director at attend2IT—we’ll explore the dynamic world of event IT and key takeaways ITOps teams at enterprise companies can apply to their own events as well as in their day-to-day operations. We’ll …
  continue reading
 
How do you hit 90% first-contact resolution—and keep it there? In this episode, Elias speaks with Peg Kearney, Director of IT Operations at the University of Arizona College of Nursing, about how her helpdesk team maintains a 90% first-contact resolution rate by hiring top talent and providing them with the right tools and system access. Peg also h…
  continue reading
 
Configuration changes played an outsized role 2024 outages. Tune in to hear more about this and other outage trends—and learn how ITOps teams should plan accordingly in the year ahead. We’ll also share insights from recent incidents at OpenAI and Google Cloud’s Pub/Sub, and dive deeper into a degradation incident that Netflix experienced at the end…
  continue reading
 
A major cyberattack led Blue Water Shipping to completely transform its IT infrastructure—and now, they're stronger than ever. In this episode, host Elias Voelker sits down with Thomas Klithav Hansen, Head of IT Operations at Blue Water Shipping, to discuss how the attack became the catalyst for a transformative IT journey. You’ll learn: 1. The rol…
  continue reading
 
With nearly a year of data available, the topline outage trends for 2024 are coming into focus. Tune in to see what the numbers are showing. The Internet Report team will discuss how Internet service provider (ISP) outage numbers are continuing to increase, while cloud service provider (CSP) outages are also becoming more frequent, indicating a cha…
  continue reading
 
The past few weeks are somewhat of a representative sample of 2024 from an outage perspective, with connectivity issues and updates at the root of the four recent incidents. Both DigitalOcean and real-time payments provider Worldline experienced connectivity issues to data centers that made services unreachable. Meanwhile, Microsoft and Reddit enco…
  continue reading
 
Since 2019, Thomas Singbartl, Head of Global IT Operations at Internationale Hochschule (IU), has supported the university's astonishing growth journey from 35,000 to 130,000 students. Join host Elias Voelker in this episode as Thomas shares how IT fueled IU’s exponential growth, why the term “digital transformation” no longer applies, and how Synt…
  continue reading
 
Powerful things happen when ITOps teams move beyond a break-fix approach and lean into proactive optimization. Instead of just responding to issues as they occur, when teams have independent visibility into their end-to-end service delivery chain, they can proactively identify possible areas for optimization and improvement. For example, streamlini…
  continue reading
 
What can we learn about resilience, scalability, and workforce development from the IT organization of the 6th-largest metro in the US? In this episode, Tameka Neely-Dudley, Director of IT Infrastructure Operations and Service Delivery for the City of Atlanta, shares insights from her nearly 25-year career, beginning as an intern and growing into a…
  continue reading
 
The Digital Operational Resilience Act (DORA) goes into effect on January 17, 2025, and financial institutions serving the EU will need to meet an enhanced set of requirements related to risk management, network resilience, and incident reporting. While DORA is directly applicable to EU financial institutions, it prompts important discussions about…
  continue reading
 
Which KPIs really matter in IT Service Management? In this episode, Elias sits down with Huseyin Uysal, Head of Global Service Desk at ISS, to uncover what separates successful IT service management from the rest. With a wealth of experience managing global teams and optimizing IT processes, Huseyin reveals the metrics that really matter, how custo…
  continue reading
 
A recent Salesforce outage highlighted the limitations of status pages and the importance of considering a variety of data points when identifying the source of an outage. Tune in to hear The Internet Report team discuss what happened and why. They’ll also share insights from a recent Microsoft Outlook outage and cover the latest Internet outage tr…
  continue reading
 
What separates a well-oiled IT operation from one constantly putting out fires? In this episode, we dive deep into the world of IT Service Management (ITSM) with Haroon Hasan, author of "Choose to Lead" and Director of IT Service Management and Governance at Computacenter. With 20+ years of experience, Haroon shares insights on optimizing ITSM for …
  continue reading
 
A recent certificate problem impacted ServiceNow, and other issues prevented users from accessing key cloud services including Microsoft 365, Azure Virtual Desktop, and Workday. Tune in to hear what happened during these incidents and a separate data center fire that caused a Reliance Jio outage for customers across multiple areas of India. Listen …
  continue reading
 
What happens when your infrastructure faces a live peak of millions of users worldwide—without cloud scalability? In this episode, Sofascore’s CTO Josip Stuhli breaks down how his team navigates massive traffic surges, optimizes caching, and saves big by ditching the cloud while still delivering real-time updates to 25 million monthly users. You'll…
  continue reading
 
During high-traffic seasons like Black Friday or a much-anticipated product launch, maintaining good digital experiences for customers is vital. We’ve all heard tales of floods of eager shoppers crashing a website during a major sale—leaving them unable to make their coveted purchases. To guard against a breakdown like this during high-traffic peri…
  continue reading
 
Successful cybersecurity isn’t about heroics, it’s about preventing disasters you’ll never hear about. In this episode, Andrew Nuxoll, Managing Director of IT Operations and Cybersecurity at UNICEF USA, shares his journey from working at various managed service providers to leading cybersecurity efforts at a global NGO. Andrew offers insights into …
  continue reading
 
Let’s dive into the fascinating world of subsea cables. With special guest Murray Burling—Executive Director of Oceans and Environment at RPS—we’ll explore the current subsea cable ecosystem and chat about what the future might hold. Tune in for insights on how important subsea cables are for today’s digital experiences, how decisions are made on w…
  continue reading
 
How can universities navigate the complexities of service delivery while pursuing growth and innovation? Mark Katsouros, Senior Director for IT Engineering and Operations at Duquesne University, brings nearly 40 years of higher education IT experience. From the University of Maryland to pivotal roles at the University of Iowa and Penn State, Mark h…
  continue reading
 
Explore the recent Google Cloud and GitHub outages, plus get insights from a network perspective into the August 12 X livestream event featuring Elon Musk and Donald Trump. In the case of Google Cloud, a power issue in one of its European regions impacted connectivity and affected several services and networking equipment. The problems disrupted co…
  continue reading
 
This week, The Internet Report team and special guest Dave Anderson—a tech industry veteran and co-host of "A Very Melbourne Podcast," which covers the Australian Football League and more—are chatting about how to assure great digital experiences at major sporting events. Large sporting events are always logistically complex, and today that’s even …
  continue reading
 
On July 19, many organizations around the globe—including airlines, banks, and hospitals—experienced outages as Windows machines reportedly got stuck in a boot loop that ultimately resulted in the Blue Screen of Death (BSOD). These disruptions had a common source: an update from CrowdStrike, a managed detection and response (MDR) service used to pr…
  continue reading
 
On May 17, X reached a major milestone when the social media platform completed its full migration from twitter.com to x.com. While the number and frequency of outages did increase after the company’s acquisition by Elon Musk, following the domain migration, there don’t appear to have been any significant disruptions to the X.com platform. In this …
  continue reading
 
Martijn Adams, General Manager at 4me, brings a lifetime of expertise in IT service management, having worked with leading companies such as Philips, Deloitte, and Danone. This episode delves into his journey and the unique approaches that 4me employs to streamline service management across IT, HR, and facilities. You'll discover how service manage…
  continue reading
 
How can you scale your tech company while maintaining rigorous operational standards? Senior VP of Operations at Cyware Joe Aurilia shares what he learned while 5x-ing the company. In this episode, Joe shares how he's building operations from the ground up, handling the complexities of international teams, and embedding a culture of security and co…
  continue reading
 
Three recent outages at Starlink, Charles Schwab, and the Internet Archive highlight key reminders for NetOps teams around backup options, the role of intelligence, and understanding your end-to-end service delivery chain. A subset of Starlink users were unable to establish a connection; some users of Schwab.com and its apps may have found themselv…
  continue reading
 
Believe it or not, we’re already about halfway through 2024. Looking at the outage data from this year so far, we see continued evolution, following patterns observed over the past few years. Notably, the percentage of cloud service provider (CSP) outages is still increasing—though at a more accelerated rate than seen in recent years. Tune on to le…
  continue reading
 
Learn from Paul Teodorescu's 25 years of IT experience as he shares the importance of transparency, credibility, and connecting with people in the tech industry. In this episode, Paul shares his journey from crawling under desks at Merrill Lynch to advising top firms like Morgan & Morgan. Explore the nuances of interim management versus advisory ro…
  continue reading
 
When it comes to assuring great digital experiences for your users, intermittent issues can be incredibly difficult to discover and diagnose because the service is both working and not working simultaneously—or, it may simply be running slow. Some users may experience issues, while for others, everything will work just fine. In this week’s episode,…
  continue reading
 
The end of the traditional SRE? How do you see the future unfolding as AI's role in IT operations grows? In this episode, we welcome Nathanial Smalley, Principal Sales Engineer at Transposit. He brings his rich experience from over a decade at Splunk and his current role at Transposit to discuss the impact of AI on IT operations. He delves into pra…
  continue reading
 
Explore what happened during recent outages at google.com, X (formerly Twitter), and CDN service jsDelivr. The Internet Report team will also discuss why a detailed understanding of every component in your service delivery chain is vital to maintain the availability and resiliency of your service. If even one component encounters challenges, the en…
  continue reading
 
Go under the hood of a ChatGPT outage, H&R Block’s Tax Day disruption, and more incidents from the past few weeks. The Internet Report team will also discuss Microsoft’s update on recent subsea cable cuts and the latest global outage trends. ——— CHAPTERS: 00:00 Intro 00:57 ChatGPT Outage 03:35 Revisiting West Coast of Africa Cable Cuts 09:07 H&R Bl…
  continue reading
 
With tax season coming to a close in the United States, IT teams at tax preparation companies and other organizations in the industry will be taking extra care to make sure that their systems can handle a spike in traffic due to a potential last-minute rush of filings. Tune in to hear The Internet Report hosts discuss how IT teams can navigate majo…
  continue reading
 
The end-to-end delivery of modern digital services can introduce a complex web of dependencies and failure points, which can stem from direct relationships as well as third-party providers, introducing layers of abstraction for operations teams to keep track of. Managing this complex ecosystem can be challenging. Unexpected issues may arise from se…
  continue reading
 
Over the next 3 years, more than 750 million new applications will hit the market... and nobody can predict what those applications will look like. In this episode, Lee Caswell, SVP of Product and Solutions Marketing at Nutanix, introduces Hyper-Converged Infrastructures: a groundbreaking solution that integrates computing, storage, and networking …
  continue reading
 
Over a two-day period this past week, major social media platforms—Meta’s Facebook and Instagram, LinkedIn, and Discord—all experienced disruptions. In the same timeframe, Comcast was also impacted by an outage that affected access to specific services and applications. Meta experienced issues with its log-in process, Discord navigated unexpectedly…
  continue reading
 
Cybersecurity as we know it today is still in its infancy, which begs the question: how will it mature in the wake of rapid cloud and AI innovations? In this episode, Dalarie is joined by Jason Ford, CEO and CISO at Steel Patriot Partners, who shares his in-depth insights into the evolving world of IT operations. From the Wild West of early cyberse…
  continue reading
 
Load is a fundamental but, at times, challenging variable for networks and operations teams to handle. In the past few weeks, ThousandEyes saw various load-related problems impact organizations including Google Cloud, Front, several Australian banks, and Minnesota State University Moorhead. Tune in to learn more about what happened during these inc…
  continue reading
 
When outages happen, it’s what you do next that matters. It’s important to have a backup plan in place that you can quickly activate to minimize the impact of an incident. Over the past two weeks, companies initiated a range of resiliency actions, including asking customers to use alternate authentication methods (or to avoid logging out of a servi…
  continue reading
 
The ThousandEyes Internet Intelligence team joins us from Cisco Live in Amsterdam, talking about a major theme from the event—security. Tune in to hear their thoughts on how visibility can help companies in their security efforts, the sovereignty of data in flight, and why you don’t have to choose between security and performance. ——— CHAPTERS 00:0…
  continue reading
 
What happened during the recent Microsoft Teams and Azure disruptions? Go under the hood of these incidents and also explore other recent disruptions in this week’s Pulse Update. CHAPTERS - 01:03 Network issue leads to Microsoft Teams service disruption - 04:09 Azure Resource Manager exhausts capacity, causing service issues - 06:20 Oracle Cloud ex…
  continue reading
 
What caused recent dips in performance for OpenAI’s ChatGPT? Tune in to hear The Internet Report team unpack this and other recent disruptions, including a hack that led to an outage at the Spanish branch of the Orange mobile network, and a blip for customers of the cloud services provider DigitalOcean. They’ll also cover the outage trends they’re …
  continue reading
 
Loading …
Listen to this show while you explore
Play