This directory contains documentation for performance monitoring and profiling tools used in optimization work.
- Intel® Tools Reference
- Other Tools Reference
- Environment Considerations: Baremetal vs. Cloud
- Choosing the Right Tool
Intel® PerfSpect
Easy to install and use. Comprehensive performance engineering toolkit for system health reporting, configuration analysis, architectural metrics, flamegraph generation, telemetry collection, and tuning parameter modification. Provides quick insights across multiple dimensions without the learning curve or deep complexity of other tools.
🎯 Expertise Level: 🟢 Beginner - 🟡 Intermediate
📊 Best for: System assessment, configuration validation, quick troubleshooting, health checks, getting started with performance analysis
⚡ Key advantage: Accessibility and speed of use, though with less depth than specialized tools
Intel® VTune™ Profiler
In-depth application and system profiler with microarchitecture analysis, parallelism examination, multi-node analysis, and GPU/accelerator optimization capabilities.
🎯 Expertise Level: 🔴 Advanced — requires expertise in microarchitecture concepts and profiling methodology
📊 Best for: Deep application optimization, microarchitecture analysis, GPU optimization, HPC workloads, complex debugging
API and toolset for monitoring performance and energy metrics of Intel processors including memory bandwidth, cache behavior, PCIe bandwidth, and energy states.
🎯 Expertise Level: 🟡 Intermediate - 🔴 Advanced — requires understanding of hardware performance counters
📊 Best for: Hardware-level metrics, memory analysis, power consumption, real-time dashboards
Intel® gProfiler
System-wide profiler combining multiple sampling profilers across native programs, Java, Python runtimes, and kernel routines. Includes optional gProfiler Performance Studio for cluster-wide aggregation.
🎯 Expertise Level: 🟡 Intermediate (single node) - 🔴 Advanced (multi-node cluster)
📊 Best for: Production monitoring, multi-language environments, cluster analysis, low-overhead continuous profiling
Powerful performance analysis tool for Linux systems, providing a wide range of profiling capabilities including CPU performance counters, tracepoints, and dynamic probes.
🎯 Expertise Level: 🔴 Advanced — requires familiarity with Linux internals and performance events
A powerful technology for tracing and monitoring kernel and user-space events with minimal overhead, allowing for custom performance analysis and observability.
🎯 Expertise Level: 🔴 Advanced — requires knowledge of kernel tracing, BPF programs, and Linux internals
Not all tools work equally well in every environment. The key factor is access to hardware configuration settings and hardware events, e.g., PMU (Performance Monitoring Unit) counter access, which varies significantly between baremetal and cloud deployments.
Full PMU counter access is typically available, so all tools can operate at their full potential. This is the ideal environment for deep hardware-level analysis with PerfSpect, VTune, PCM, and perf.
Cloud vendors vary in PMU counter availability. Many instance types restrict or disable access to hardware performance counters, which limits the effectiveness of tools that depend on them.
- Recommended starting point: Use PerfSpect for quick performance insights — it does not depend on full PMU access and works reliably across cloud environments.
- VTune depends on PMU counters. Check your cloud vendor's documentation for PMU support:
- Some vendors offer dedicated/metal instance types with full PMU access.
- Standard VM instances may have limited or no PMU counter availability.
- gProfiler works well in cloud environments for software-level profiling and does not require PMU access.
| Tool | Baremetal | Cloud (standard VM) | Cloud (metal/dedicated) |
|---|---|---|---|
| PerfSpect | ✅ Full support | ✅ Full support | |
| gProfiler | ✅ Full support | ✅ Full support | ✅ Full support |
| VTune | ✅ Full support | ✅ Full support | |
| PCM | ✅ Full support | ✅ Full support | |
| Linux perf | ✅ Full support | ✅ Full support | |
| Linux eBPF | ✅ Full support | ✅ Supported (needs kernel support) | ✅ Full support |
Start with your primary goal or problem, then follow the decision path to find the best tool(s).
| Tool | Expertise Level | Baremetal | Cloud | Best Starting Point |
|---|---|---|---|---|
| PerfSpect | 🟢 Beginner - 🟡 Intermediate | ✅ Yes | ✅ Yes | |
| gProfiler | 🟡 Intermediate - 🔴 Advanced | ✅ Yes | ✅ Yes | ✅ Sometimes |
| VTune | 🔴 Advanced | ✅ Yes | No | |
| PCM | 🟡 Intermediate - 🔴 Advanced | ✅ Yes | ✅ Sometimes | |
| Linux perf | 🔴 Advanced | ✅ Yes | No | |
| Linux eBPF | 🔴 Advanced | ✅ Yes | ✅ Yes | No |
→ Use: PerfSpect ⭐ Easiest to install and use
- Validating system configuration before performance testing
- Getting a health check and performance baseline
- Quick automated system tuning recommendations
- Pre-flight checks before running benchmarks
- Understanding current system telemetry and state
- Start here if you're new to performance analysis – no steep learning curve
→ Do you need to analyze multiple languages or continuous production monitoring?
-
YES (multi-language or continuous monitoring) → Use: gProfiler
- Multi-language environments (native, Java, Python) requiring unified profiling
- Finding performance bottlenecks in microservices architectures
- Analyzing resource utilization across production systems with low overhead
- Identifying hot functions and stack traces without code instrumentation
- Compare performance patterns across multiple machines over time
-
NO (ad-hoc analysis) → Use: PerfSpect
- Flamegraphs for quick visualization of call stacks and hot paths
- Simple setup for immediate insights during development
- Quick identification of performance bottlenecks without deep configuration
- System Telemetry collection for understanding overall system behavior during testing
- Architectural metrics for understanding how the application interacts with hardware resources
→ Do you have application source code?
-
YES (have source code) → Use: VTune
- Correlating application performance with microarchitecture metrics
- Analyzing cache behavior and memory bandwidth in relation to code execution
- Identifying specific code regions causing hardware bottlenecks
- GPU/accelerator optimization and analysis
-
NO (no source code) → Use: PerfSpect
- System-wide performance analysis without needing source code
- Architectural metrics to understand hardware interactions
- Flamegraphs to visualize hot paths even without code instrumentation
- System Telemetry for overall system health and performance insights
→ Do you need to aggregate data from multiple machines?
-
YES → Use: gProfiler + gProfiler Performance Studio
- Cluster-wide performance analysis
- Comparing performance patterns across multiple machines or time periods
- Holistic view of what is happening on your entire cluster
-
NO (single machine analysis) → Use: gProfiler or VTune (based on depth needed)
→ Are you investigating processor-level metrics?
-
YES → Use: PCM
- Analyzing memory bandwidth utilization and DRAM behavior
- Identifying memory bandwidth bottlenecks in data-intensive workloads
- Detecting inefficient cache usage patterns
- Monitoring cache miss latencies and PCIe bandwidth
- Detailed microarchitecture analysis (cache efficiency, memory stalls)
- Real-time system performance dashboards
-
NO (need application-level insights) → Use: VTune
- Identify which parts of code are causing memory issues
- Detailed cache miss analysis at the instruction level
→ Use: VTune
- Analyzing multi-threaded parallelism and scalability issues
- Debugging poor thread scaling in parallel applications
- Examining how effectively threads are utilized
→ Use: VTune
- GPU/accelerator optimization and analysis
- Analyzing GPU utilization and accelerator integration
- Multi-node cluster performance analysis for HPC applications
- AI/ML workload optimization and profiling
→ Use: PCM
- Tracking energy consumption and CPU sleep states
- Power consumption analysis for cloud deployments
- Integration with monitoring systems like Prometheus for continuous tracking
→ Do you want quick, shallow analysis or deep investigation?
-
Quick and easy → Use: PerfSpect
- Generating flamegraphs for visualization of call stacks
- Quick visualization of application hot paths
- Simple setup and immediate insights
-
Production-scale or deep analysis → Use: gProfiler
- System-wide flamegraphs across all processes
- Continuous profiling with minimal overhead
- More sophisticated analysis capabilities