FlameScope

Performance Engineering · 2018 · 2 min read

Open-sourced a visualization tool for exploring flame graphs across time ranges, widely adopted in the industry for performance analysis

Overview

A visualization tool that displays profiling data as an interactive subsecond-offset heat map, allowing engineers to explore flame graphs across specific time ranges to identify perturbations, variance, and single-threaded execution issues

Problem

Traditional flame graphs show aggregated data over entire profiling sessions, making it difficult to identify issues that only occur in specific time windows

Constraints

  • Must work with continuous profiling data
  • Must allow arbitrary time range selection
  • Must be usable by engineers without deep profiling expertise

Approach

Built a web-based tool with a time selection UI that dynamically generates flame graphs for selected ranges, using d3-flame-graph for rendering

Key Decisions

Open source the tool

Reasoning:

The broader community could benefit from this capability, and open sourcing would drive adoption and contributions

Alternatives considered:
  • Keep internal as Netflix proprietary tool

Use sub-second time selection

Reasoning:

Performance issues often manifest in sub-second windows, so the tool needs millisecond-level granularity

Tech Stack

  • Python
  • JavaScript
  • D3.js
  • Flask

Result & Impact

  • 3101 stars
    GitHub Stars
  • Widely adopted for production debugging
    Industry Adoption

Became a go-to tool for time-based performance analysis, featured in Netflix Tech Blog

Learnings

  • Time-based filtering reveals issues invisible in aggregated views
  • Open sourcing drives adoption and community contributions
  • Sub-second granularity is essential for production debugging

How It Works

FlameScope begins by displaying the input data as an interactive subsecond-offset heat map. This shows patterns in the data. You can then select a time range to highlight different patterns, and a flame graph will be generated just for that time range.

This is particularly useful for identifying:

  • Perturbations
  • Variance
  • Single-threaded execution issues
  • Sporadic performance problems