Manage episode 492115800 series 3657747
Did you know that software engineers often "learn things the hard way" because they lack a standardized system to share knowledge about reliability issues? While security professionals have CVEs to catalog vulnerabilities, reliability engineers have been left to reinvent the wheel with each new bug or outage.
Tony Meehan, co-founder and CTO of Prequel, introduces us to Common Reliability Enumerations (CREs) - an open-source approach that's doing for reliability what CVEs did for security. After spending a decade at the NSA hunting vulnerabilities, Tony recognized that the same community-driven approach could revolutionize how we handle reliability issues.
This conversation covers:
- How CREs help developers detect and mitigate reliability issues before they cause outages
- The open-source tools Preq and CRE that allow teams to leverage community knowledge
- Practical ways to implement these tools in your development workflow (locally, in CI/CD, and production)
- How this approach can reduce cloud costs by identifying issues rather than over-provisioning
- Tips for debugging mysterious production issues when no CRE exists yet
Guest: Tony Meehan, CTO at Prequel
Tony is an engineering leader obsessed with bugs. He dedicated a decade to vulnerability and exploit development at the National Security Agency (NSA) before leading Engineering at Endgame and Elastic. In 2023, Tony co-founded Prequel to change the way application failure is detected and resolved.
Links to interesting things from this episode:
- Blog post about the partial outage at Endgame
- Common Reliability Enumeration (CRE)
- Preq
- XKCD: Standards
- Episode on security with Danny Allan from Snyk
- Brendan Gregg's blog
31 episodes