Tools

This directory contains documentation for performance monitoring and profiling tools used in optimization work.

Intel® Tools Reference
Other Tools Reference
- Linux perf
- Linux eBPF
Environment Considerations: Baremetal vs. Cloud
Choosing the Right Tool

Intel® Tools Reference

Intel® PerfSpect

Easy to install and use. Comprehensive performance engineering toolkit for system health reporting, configuration analysis, architectural metrics, flamegraph generation, telemetry collection, and tuning parameter modification. Provides quick insights across multiple dimensions without the learning curve or deep complexity of other tools.

🎯 Expertise Level: 🟢 Beginner - 🟡 Intermediate

📊 Best for: System assessment, configuration validation, quick troubleshooting, health checks, getting started with performance analysis

⚡ Key advantage: Accessibility and speed of use, though with less depth than specialized tools

Intel® VTune™ Profiler

In-depth application and system profiler with microarchitecture analysis, parallelism examination, multi-node analysis, and GPU/accelerator optimization capabilities.

🎯 Expertise Level: 🔴 Advanced — requires expertise in microarchitecture concepts and profiling methodology

📊 Best for: Deep application optimization, microarchitecture analysis, GPU optimization, HPC workloads, complex debugging

Intel® Performance Counter Monitor (PCM)

API and toolset for monitoring performance and energy metrics of Intel processors including memory bandwidth, cache behavior, PCIe bandwidth, and energy states.

🎯 Expertise Level: 🟡 Intermediate - 🔴 Advanced — requires understanding of hardware performance counters

📊 Best for: Hardware-level metrics, memory analysis, power consumption, real-time dashboards

Intel® gProfiler

System-wide profiler combining multiple sampling profilers across native programs, Java, Python runtimes, and kernel routines. Includes optional gProfiler Performance Studio for cluster-wide aggregation.

🎯 Expertise Level: 🟡 Intermediate (single node) - 🔴 Advanced (multi-node cluster)

📊 Best for: Production monitoring, multi-language environments, cluster analysis, low-overhead continuous profiling

Other Tools Reference

Linux `perf`

Powerful performance analysis tool for Linux systems, providing a wide range of profiling capabilities including CPU performance counters, tracepoints, and dynamic probes.

🎯 Expertise Level: 🔴 Advanced — requires familiarity with Linux internals and performance events

Linux eBPF (extended Berkeley Packet Filter)

A powerful technology for tracing and monitoring kernel and user-space events with minimal overhead, allowing for custom performance analysis and observability.

🎯 Expertise Level: 🔴 Advanced — requires knowledge of kernel tracing, BPF programs, and Linux internals

Environment Considerations: Baremetal vs. Cloud

Not all tools work equally well in every environment. The key factor is access to hardware configuration settings and hardware events, e.g., PMU (Performance Monitoring Unit) counter access, which varies significantly between baremetal and cloud deployments.

Baremetal

Full PMU counter access is typically available, so all tools can operate at their full potential. This is the ideal environment for deep hardware-level analysis with PerfSpect, VTune, PCM, and perf.

Cloud

Cloud vendors vary in PMU counter availability. Many instance types restrict or disable access to hardware performance counters, which limits the effectiveness of tools that depend on them.

Recommended starting point: Use PerfSpect for quick performance insights — it does not depend on full PMU access and works reliably across cloud environments.
VTune depends on PMU counters. Check your cloud vendor's documentation for PMU support:
- Some vendors offer dedicated/metal instance types with full PMU access.
- Standard VM instances may have limited or no PMU counter availability.
gProfiler works well in cloud environments for software-level profiling and does not require PMU access.

Quick Reference by Environment

Tool	Baremetal	Cloud (standard VM)	Cloud (metal/dedicated)
PerfSpect	✅ Full support	⚠️ Some Features Limited	✅ Full support
gProfiler	✅ Full support	✅ Full support	✅ Full support
VTune	✅ Full support	⚠️ Limited	✅ Full support
PCM	✅ Full support	⚠️ Some Features Limited	✅ Full support
Linux perf	✅ Full support	⚠️ Some Features Limited	✅ Full support
Linux eBPF	✅ Full support	✅ Supported (needs kernel support)	✅ Full support

Choosing the Right Tool

Start with your primary goal or problem, then follow the decision path to find the best tool(s).

Tool Summary

Tool	Expertise Level	Baremetal	Cloud	Best Starting Point
PerfSpect	🟢 Beginner - 🟡 Intermediate	✅ Yes	⚠️ Limited	✅ Yes
gProfiler	🟡 Intermediate - 🔴 Advanced	✅ Yes	✅ Yes	✅ Sometimes
VTune	🔴 Advanced	✅ Yes	⚠️ Limited	No
PCM	🟡 Intermediate - 🔴 Advanced	✅ Yes	⚠️ Limited	✅ Sometimes
Linux perf	🔴 Advanced	✅ Yes	⚠️ Limited	No
Linux eBPF	🔴 Advanced	✅ Yes	✅ Yes	No

START: What is your primary goal?

"I need a quick system assessment" (Easy start)

→ Use: PerfSpect ⭐ Easiest to install and use

Validating system configuration before performance testing
Getting a health check and performance baseline
Quick automated system tuning recommendations
Pre-flight checks before running benchmarks
Understanding current system telemetry and state
Start here if you're new to performance analysis – no steep learning curve

"My application/workload is slow - I need to find where time is spent"

→ Do you need to analyze multiple languages or continuous production monitoring?

YES (multi-language or continuous monitoring) → Use: gProfiler
- Multi-language environments (native, Java, Python) requiring unified profiling
- Finding performance bottlenecks in microservices architectures
- Analyzing resource utilization across production systems with low overhead
- Identifying hot functions and stack traces without code instrumentation
- Compare performance patterns across multiple machines over time
NO (ad-hoc analysis) → Use: PerfSpect
- Flamegraphs for quick visualization of call stacks and hot paths
- Simple setup for immediate insights during development
- Quick identification of performance bottlenecks without deep configuration
- System Telemetry collection for understanding overall system behavior during testing
- Architectural metrics for understanding how the application interacts with hardware resources

"I want to correlate application performance with hardware performance metrics"

→ Do you have application source code?

YES (have source code) → Use: VTune
- Correlating application performance with microarchitecture metrics
- Analyzing cache behavior and memory bandwidth in relation to code execution
- Identifying specific code regions causing hardware bottlenecks
- GPU/accelerator optimization and analysis
NO (no source code) → Use: PerfSpect
- System-wide performance analysis without needing source code
- Architectural metrics to understand hardware interactions
- Flamegraphs to visualize hot paths even without code instrumentation
- System Telemetry for overall system health and performance insights

"I'm analyzing/optimizing distributed systems at scale"

→ Do you need to aggregate data from multiple machines?

YES → Use: gProfiler + gProfiler Performance Studio
- Cluster-wide performance analysis
- Comparing performance patterns across multiple machines or time periods
- Holistic view of what is happening on your entire cluster
NO (single machine analysis) → Use: gProfiler or VTune (based on depth needed)

"I'm experiencing memory or bandwidth issues"

→ Are you investigating processor-level metrics?

YES → Use: PCM
- Analyzing memory bandwidth utilization and DRAM behavior
- Identifying memory bandwidth bottlenecks in data-intensive workloads
- Detecting inefficient cache usage patterns
- Monitoring cache miss latencies and PCIe bandwidth
- Detailed microarchitecture analysis (cache efficiency, memory stalls)
- Real-time system performance dashboards
NO (need application-level insights) → Use: VTune
- Identify which parts of code are causing memory issues
- Detailed cache miss analysis at the instruction level

"My parallel/multi-threaded application doesn't scale"

→ Use: VTune

Analyzing multi-threaded parallelism and scalability issues
Debugging poor thread scaling in parallel applications
Examining how effectively threads are utilized

"I need to optimize GPU or accelerators"

→ Use: VTune

GPU/accelerator optimization and analysis
Analyzing GPU utilization and accelerator integration
Multi-node cluster performance analysis for HPC applications
AI/ML workload optimization and profiling

"I need to monitor power consumption or energy efficiency"

→ Use: PCM

Tracking energy consumption and CPU sleep states
Power consumption analysis for cloud deployments
Integration with monitoring systems like Prometheus for continuous tracking

"I need to visualize call stacks and hot code paths"

→ Do you want quick, shallow analysis or deep investigation?

Quick and easy → Use: PerfSpect
- Generating flamegraphs for visualization of call stacks
- Quick visualization of application hot paths
- Simple setup and immediate insights
Production-scale or deep analysis → Use: gProfiler
- System-wide flamegraphs across all processes
- Continuous profiling with minimal overhead
- More sophisticated analysis capabilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tools

Contents

Intel® Tools Reference

Intel® PerfSpect

Intel® VTune™ Profiler

Intel® Performance Counter Monitor (PCM)

Intel® gProfiler

Other Tools Reference

Linux `perf`

Linux eBPF (extended Berkeley Packet Filter)

Environment Considerations: Baremetal vs. Cloud

Baremetal

Cloud

Quick Reference by Environment

Choosing the Right Tool

Tool Summary

START: What is your primary goal?

"I need a quick system assessment" (Easy start)

"My application/workload is slow - I need to find where time is spent"

"I want to correlate application performance with hardware performance metrics"

"I'm analyzing/optimizing distributed systems at scale"

"I'm experiencing memory or bandwidth issues"

"My parallel/multi-threaded application doesn't scale"

"I need to optimize GPU or accelerators"

"I need to monitor power consumption or energy efficiency"

"I need to visualize call stacks and hot code paths"

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Tools

Contents

Intel® Tools Reference

Intel® PerfSpect

Intel® VTune™ Profiler

Intel® Performance Counter Monitor (PCM)

Intel® gProfiler

Other Tools Reference

Linux perf

Linux eBPF (extended Berkeley Packet Filter)

Environment Considerations: Baremetal vs. Cloud

Baremetal

Cloud

Quick Reference by Environment

Choosing the Right Tool

Tool Summary

START: What is your primary goal?

"I need a quick system assessment" (Easy start)

"My application/workload is slow - I need to find where time is spent"

"I want to correlate application performance with hardware performance metrics"

"I'm analyzing/optimizing distributed systems at scale"

"I'm experiencing memory or bandwidth issues"

"My parallel/multi-threaded application doesn't scale"

"I need to optimize GPU or accelerators"

"I need to monitor power consumption or energy efficiency"

"I need to visualize call stacks and hot code paths"

Linux `perf`