FlameCloud

Performance Engineering · 2017 · 1 min read

Built a continuous profiling solution collecting thousands of profiles and millions of stacks daily at Netflix

Overview

Cloud-based continuous profiling platform that collects, stores, and analyzes CPU, memory, and heapdump profiles from production systems

Production performance issues are difficult to reproduce in dev environments; engineers need visibility into production behavior at scale

Built a cloud-native platform with distributed profile collection, centralized storage with indexing, and integration with existing Netflix tools

Reasoning:

Commercial continuous profilers were expensive at Netflix scale and lacked integration with existing tooling

Reasoning:

Time-series indexing enables efficient querying of profiles by time range for trend analysis

Enabled proactive performance optimization and rapid debugging of production issues at Netflix scale

FlameCloud consists of: