Kvasir
Built an automated performance testing framework with fully automated regression analysis for Netflix
Overview
Automated performance testing framework that runs benchmarks, detects regressions, and reports results integrated with the CI/CD pipeline
Problem
Performance regressions were often detected late in the release cycle or in production; manual performance testing was time-consuming and inconsistent
Constraints
- Must integrate with CI/CD pipeline
- Must detect regressions automatically
- Must handle noisy benchmark results
Approach
Built a framework that runs performance benchmarks on every commit, uses statistical methods to detect regressions, and reports results to pull requests
Key Decisions
Use statistical methods for regression detection
Benchmarks are inherently noisy; statistical methods (confidence intervals, t-tests) distinguish real regressions from noise
Block PRs with regressions
Making regressions block merges creates accountability and prevents performance degradation
Tech Stack
- Java
- Python
- Jenkins
- AWS
Result & Impact
Prevented performance regressions from reaching production, shifted performance testing left in the development cycle
Learnings
- Statistical methods are essential for noisy benchmarks
- Integration with CI/CD is critical for adoption
- Blocking PRs creates accountability
Features
Kvasir provides:
- Automated benchmark execution
- Statistical regression detection
- CI/CD integration
- PR status reporting
- Historical trend tracking