cotalks.dev

State of Exception(s) - 2026

(link)
Speakers: Benno Rice

Summary

This talk explores the evolution of error handling across programming languages—from C's manual resource management to modern exceptions and Rust's `Result` type—while connecting these technical patterns to real-world, large-scale system failures. The discussion emphasizes that robust system design requires understanding not just single points of failure, but the complex, non-obvious interactions that lead to catastrophic outages.

Key Takeaways

  • Error handling is a fundamental design concern that varies drastically by language paradigm (e.g., manual return codes vs. exceptions vs. result types).
  • Modern systems failures often stem from complex interactions, not single points of failure, requiring deep system understanding for diagnosis and recovery.
  • Rust's `Result<T, E>` type forces explicit error handling at compile time, promoting safer, more predictable code compared to unchecked operations like `unwrap()`.
  • System resilience requires human adaptability and pattern matching—skills crucial for diagnosing complex, unexpected outages that automated systems (including LLMs) may miss.

Sections

The Evolution of Error Handling Paradigms

The talk traces how different languages encourage developers to think about failure. * **C/C++:** Error handling relies on return codes, global variables (`errno`), and manual cleanup (often using `goto` patterns) to maintain consistency. This approach is prone to undefined behavior and complexity. * **Exceptions:** Languages like Python and C++ use exceptions to separate the error handling logic from the main control flow. While powerful for separating concerns, they can make control flow difficult to follow and require runtime support (stack unwinding). * **Rust's `Result` Type:** Rust avoids runtime exceptions by using `Result<T, E>`, a public enumeration. This forces the developer to explicitly handle both the `Ok` and `Err` variants, leveraging the type system and compile-time guarantees to ensure that error paths are considered, leading to safer, more predictable code.

Systemic Failures and Complexity

The discussion uses real-world cloud outages (like the Cloudflare incident) to illustrate that large-scale failures are rarely due to a single bug. Instead, they are often the result of: 1. **Complex Interactions:** Failures arise from the interaction of multiple, seemingly unrelated components. 2. **Systemic Blind Spots:** Developers are good at modeling obvious failure modes, but the most dangerous failures are the non-obvious ones. 3. **The Role of Humans:** Human practitioners remain the most adaptable element in complex systems, capable of pattern matching and reasoning through obscure scenarios that automated tools or AI may not predict.

Keywords: error handling, rust result, exceptions, c programming, system reliability, distributed systems, concurrency, failure modes, cloud infrastructure

note