-
Notifications
You must be signed in to change notification settings - Fork 409
Add Perfetto tracing and a test output directory option #2742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Perfetto tracing and a test output directory option #2742
Conversation
- Add MATERIALX_BUILD_TRACING CMake option (OFF by default) - Add abstract MxTraceBackend interface for pluggable tracing backends - Add MxTraceCollector singleton (similar to USD's TraceCollector) - Add MxTraceScope RAII helper and MX_TRACE_* macros - Add MxPerfettoBackend implementation using Perfetto SDK - Initialize/shutdown Perfetto in MaterialXTest Main.cpp - Add trace instrumentation to RenderGlsl test The abstract interface allows USD/Hydra to inject their own tracing backend when calling MaterialX, enabling unified trace visualization. Usage: cmake -DMATERIALX_BUILD_TRACING=ON ... # Run tests, then open materialx_test_trace.perfetto-trace at ui.perfetto.dev
- Add NOMINMAX and WIN32_LEAN_AND_MEAN to prevent min/max macro conflicts - Add /bigobj flag for perfetto.cc (too many sections for MSVC default) - Use BUILD_INTERFACE generator expression for include path - Fix MxTracePerfetto.cpp: add missing <fstream> include and simplify Perfetto API usage (use fixed category with dynamic event names)
- Replace macro category definitions with constexpr namespace constants (MxTraceCategory::Render, ::ShaderGen, ::Optimize, ::Material) - Keep legacy macros for backward compatibility - Fix MX_TRACE_SCOPE macro to properly expand __LINE__ via helper macros - Add MX_TRACE_CAT_MATERIAL category for material/shader identity markers - Add material name trace scope in RenderGlsl for per-material tracking - Add timestamp to trace output filename (format: YYYYMMDD_HHMMSS) - Use MX_TRACE_FUNCTION macro for automatic function name extraction
The Catch2 Test Adapter for Visual Studio runs MaterialXTest.exe with --list-test-names-only to discover tests. Perfetto initialization was outputting to stderr, causing the adapter to fail with exit code 89. Now we detect --list-* arguments and skip tracing initialization, allowing VS Test Explorer to properly discover and debug tests.
Add optional outputDirectory setting to _options.mtlx that redirects all test artifacts (logs, shaders, images, traces) to a user-specified directory. This allows different test runs to be isolated without overwriting each other. Changes: - _options.mtlx: Add outputDirectory input (empty = default behavior) - TestSuiteOptions: Add outputDirectory member and resolveOutputPath() helpers - GenShaderUtil.cpp: Use outputDirectory for shader generation logs and dumps - RenderUtil.cpp: Use outputDirectory for render logs, images, and traces - Main.cpp: Remove tracing code (moved to RenderUtil.cpp) Tracing improvements: - Move Perfetto init/shutdown to ShaderRenderTester::validate() - Each render test now produces its own trace file (e.g., genglsl_render_trace.perfetto-trace) - Trace filenames now follow the same pattern as log files - Removed timestamps from trace filenames for consistency Usage: <input name="outputDirectory" type="string" value="C:/test_results/my_run" />
When outputDirectory is set, print the path at the end of test stdout. This makes it clickable in terminals like Cursor/VS Code for quick access.
- Suppress MSVC warnings from Perfetto SDK templates (C4127, C4146, C4369) - Replace MX_TRACE_CAT_* macros with 'namespace cat = mx::MxTraceCategory' for cleaner code and better IDE support - Update RenderGlsl.cpp to use the new category alias pattern
36ae774 to
fb056e3
Compare
Rename files and classes to better align with MaterialX naming conventions and improve API clarity: Files: - MxTrace.h/cpp -> Tracing.h/cpp - MxTracePerfetto.h/cpp -> PerfettoSink.h/cpp Classes (now in MaterialX::Tracing namespace): - MxTraceBackend -> Tracing::Sink - MxTraceCollector -> Tracing::Dispatcher - MxTraceScope -> Tracing::Scope - MxTraceCategory -> Tracing::Category - MxPerfettoBackend -> Tracing::PerfettoSink Rationale: - 'Sink' is common terminology in logging/tracing for data destinations - 'Dispatcher' better describes the routing behavior (vs 'Collector') - Nested Tracing:: namespace groups related types cleanly - File names match primary class names (PerfettoSink.h) - Macros keep MX_ prefix for collision safety Usage example: namespace trace = mx::Tracing; auto sink = trace::PerfettoSink::create(); trace::Dispatcher::getInstance().setSink(sink); MX_TRACE_SCOPE(trace::Category::Render, "MyEvent");
- Explain amalgamated source compilation is Google's official recommended approach per https://perfetto.dev/docs/instrumentation/tracing-sdk - Document ws2_32 dependency (Windows Sockets 2 for Perfetto IPC)
Dispatcher: - Takes ownership of sink via unique_ptr (setSink) - Explicit shutdownSink() destroys sink and writes output - Asserts on double-set to catch programming errors PerfettoSink: - Constructor takes output path, starts session immediately - Destructor writes trace file (true RAII) - Uses std::call_once for thread-safe global Perfetto init - Removed pimpl - simpler since only setup code includes this header Scope: - Removed _enabled caching - Dispatcher checks internally - One less bool on the stack per trace scope Usage: - Added TracingGuard scope guard in RenderUtil.cpp for exception safety - Added const to immutable members (_outputPath, _category) The design now follows proper RAII patterns with clear ownership.
Move the scope guard pattern into the Dispatcher class for reusability. Callers can now use mx::Tracing::Dispatcher::ShutdownGuard instead of defining their own local struct.
- New 'enableTracing' boolean option (default: false) - Parsed in TestSuiteOptions::readOptions() - RenderUtil.cpp conditionally initializes PerfettoSink based on this option - Uses std::optional<ShutdownGuard> for conditional scope guard - Allows profiling to be enabled/disabled without rebuilding
- Replace namespace Category with constexpr strings -> enum class Category - Template-based Scope<Category> avoids storing category on stack - PerfettoSink uses switch statement (compiler optimizes to jump table) - Removed categoryMatches() string comparison overhead - Type-safe: can only pass valid Category enum values The enum-based design is simpler, more efficient, and equally compatible with a future USD TraceCollector sink (which uses integer category IDs, not strings).
|
@ppenenko I have a few additional questions, as I'd like to better understand the advantages of the approach that you're proposing. Historically, we've used sampling profilers for performance analysis of the MaterialX project, and here's an example profile of the MaterialX Viewer using the built-in sampling profiler in Visual Studio:
Can you give a sense of the advantages provided by the instrumented profiling in your PR, in order to better understand the benefits that might be associated with this additional infrastructure in MaterialX? And when profiling the results of shader generation in hardware shading languages (e.g. GLSL, MSL, Slang), I would naturally think of GPU profilers such as RenderDoc as the natural approach for recording and comparing performance data. Is your proposed approach in this PR intended as a replacement for this form of traditional GPU profiling, or is it purely focused on CPU profiling? If it supports GPU profiling, what are its benefits over RenderDoc? |
|
The current catch2 based benchmark test in MaterialX is very simple and having a more robust tracing system that integrates well in libraries/applications that MaterialX is used in will be valuable. For example when MaterialX is integrated into USD we can enable USDTrace and MaterialX tracing to get captures with more detailed view. @jstone-lucasfilm would this PR be more acceptable if we create a new MaterialXTrace module that brings in Perfetto. So this would be similar to USD tracing but more simpler. Here is some additional info about USD Trace https://github.com/PixarAnimationStudios/OpenUSD/tree/dev/pxr/base/trace |
|
@ashwinbhat For adding an instrumented trace concept to MaterialX, I agree a I'd still like to learn more, though, about the advantages offered by instrumented profilers such as Perfetto over the existing sampled profilers we've used previously in MaterialX performance evaluation, which don't require additional code to be inserted into applications at all. So far, I haven't encountered cases where the precision of sampled profilers was insufficient to fully understand the impact of a proposed optimization or codebase change, and I'd love to learn more about the cases that other teams have run into where sampling profilers were not up to the task. Also, I'd like to better understand the relationship of Perfetto to GPU profiling, which is just as important (or perhaps more important) in evaluating the impact of a ShaderGen change on the real-time rendering performance of MaterialX content. |
|
@jstone-lucasfilm I agree that sampling profilers are of course useful tools in their own right, but instrumentation adds a different useful perspective. Some advantages:
|
|
Some benefits of Perfetto specifically:
|
|
Good point about CPU vs. GPU traces @jstone-lucasfilm . AFAIK Perfetto doesn't offer facilities for GPU profiling out of the box, but the application itself can perform GPU perf queries and report the results via Perfetto. In fact, this is what I'm working on in another branch. At the same time, CPU and GPU events can be analyzed in the same UI in Perfetto, which is convenient. Besides, MaterialX has no CPU-side tracing system anyway, so integrating Perfetto will help many more CPU performance experiments in the future. Of course, Renderdoc would offer a much more detailed picture of a GPU frame, but it's designed for profiling at a lower level. Its captures would contain GPU resources, which would have a much higher overhead (capture file size) than the per-material measurements I'm adding with Perfetto. So, I would suggest identifying the materials that are slow to render or benefit from a certain optimization by analyzing the existing render test across 180+ materials with Perfetto (I can already confirm that that is working well) and then, if necessary, profiling specific materials in Renderdoc, Nsight etc. By the way, if we wanted to see material and shader names in Renderdoc captures, we'd have to instrument our code either way. So the instrumentation macros I'm proposing could have a Renderdoc implementation in the future. |
|
Great idea about |
|
@ppenenko @ashwinbhat Pivoting for a moment, let's focus on the "steelman" arguments for why a Would That would be a really compelling argument, if true, and it's something that a sampling profiler couldn't hope to provide, as the distribution of time samples are profoundly dependent on the performance of the virtual machines on which our GitHub CI workflows are run. |
|
@jstone-lucasfilm Perfetto tracing is completely orthogonal to how a VM's configuration and load affect timings - it just records wall-clock times, so unfortunately it wouldn't improve on that. When testing on the same machine and avoiding heavy parallel loads (e.g. running Maya in parallel), the timings should be reliable. E.g. I can measure the expected speedups of @ld-kerley 's #2499 reliably. But of course, we could record traces in CI/CD. At the very least, these would give us a kind of a rich log, with meaningful events stacked and grouped by thread. We can be sure, for example, that this or that shader graph optimization pass has been applied to the given material. And optimization passes #2499 should show similar relative improvements across different configurations. A related use case would be if a user experiences performance issues with MaterialX - they can easily record a capture and share it with the maintainers. And, as I mentioned, it could one day be a USD trace including MaterialX events. |
|
It does sound interesting to integrate MaterialXTrace into our GitHub CI in the future, even if it doesn't produce perfectly consistent results between runs. It seems slightly surprising that you would want to use Perfetto to measure GPU timing improvements from shader generation optimizations, e.g. the hardware shader optimizations found in #2499. Am I wrong to think that this category of optimization would normally be evaluated with either RenderDoc or a real-time, smoothed frame timer like the one we use in MaterialXView? |
|
@jstone-lucasfilm could we please separate the concerns in this code review a bit? First of all, can we all agree that a tracing mechanism would be an essential addition to MaterialX? I stated the motivations above, and the most immediate and straightforward one is to to measure how long a particular operation took for a particular MaterialX entity - material, node, shader - which is not identified uniquely by a C++ symbol, and therefore can't be captured by a sampling profiler. Let's set GPU timings aside for a moment, because there are a few types of CPU events that we care about in the context of shader graph optimizations: shader codegen timings and shader compilation timings. So even if we disagree on GPU profiling, I would argue that such a mechanism is a must-have. Chrome and USD corroborate this argument. If we do agree on the above, let's discuss if Perfetto is the right implementation choice. I would argue that it is, due to its exceptional maturity and widespread adoption. As for GPU instrumentation - my main argument is that if we have a tracing mechanism anyway, for all the other reasons, recording GPU timings there is simple and convenient. Taking the frame duration with OpenGL queries is about a dozen lines of C++, and then the result can be recorded via Perfetto. As for Renderdoc - it's not well-suited for recording the durations of hundreds of frames because:
Now, regarding MaterialXView.
|
Perfetto SDK generates warnings that fail builds when MATERIALX_WARNINGS_AS_ERRORS is ON. Add -Wno-error for perfetto.cc on GCC/Clang.
- Suppress warnings-as-errors for Perfetto SDK on Unix (GCC/Clang generate warnings that fail builds with MATERIALX_WARNINGS_AS_ERRORS) - Skip pthread linking on iOS (threads are built-in to iOS SDK)
COMPILE_OPTIONS is a target property. For set_source_files_properties, use COMPILE_FLAGS instead. This fixes /bigobj on Windows and -Wno-error on Unix.
- Define PERFETTO_COMPONENT_EXPORT for shared libs (required by Perfetto) - Add -fvisibility=default for perfetto.cc on Unix (fixes GCC 10 monolithic build)
- Add createPerfettoSink() factory to Tracing.h (exported) - PerfettoSink class in PerfettoSink.h not exported (internal header) - Consumers include only Tracing.h and use factory function - Perfetto types never cross DLL boundaries - only abstract Sink does - Remove unnecessary PERFETTO_COMPONENT_EXPORT from CMakeLists.txt
- Define MATERIALX_PERFETTO_COMPILE_DEFINITIONS and MATERIALX_PERFETTO_COMPILE_FLAGS as CACHE INTERNAL variables in root CMakeLists.txt - Apply flags in both root (for monolithic builds) and MaterialXTrace (for non-monolithic) - set_source_files_properties is directory-scoped, so must be called in both places - Fixes -Werror=shadow on GCC (kDevNull shadowing) - Fixes min/max macro conflicts on Windows (NOMINMAX)
set_source_files_properties is directory-scoped, so: - Root CMakeLists.txt: defines MATERIALX_PERFETTO_* variables - source/CMakeLists.txt: applies flags for monolithic builds - MaterialXTrace/CMakeLists.txt: applies flags for non-monolithic builds
perfetto.cc is ~160k lines and generates gcov output that gcovr cannot parse. Exclude it from coverage since it's third-party code anyway.
…C warnings - Add --exclude .*perfetto.* to gcovr (coverage) - perfetto.cc is ~160k lines and generates gcov output that gcovr cannot parse - Add --suppress=*:*perfetto* to cppcheck (static analysis) - cppcheck doesn't define OS macros needed by Perfetto SDK's platform detection - Use /W0 to disable all warnings for Perfetto SDK on MSVC (third-party code triggers C4146, C4369, C4996, C4459, C4065, C4244, C4267, C4293)
e20b517 to
9db6ef1
Compare
/W0 alone doesn't override project-level /WX. Need /WX- explicitly to prevent STL header warnings (from tuple, etc.) being promoted to errors when compiling perfetto.cc.
Use recursive glob build/**/*.perfetto-trace since traces are written to current working directory (build/) not build/bin/
- Set up Perfetto sink in ShaderGeneratorTester::validate() when enableTracing=true - Add MX_TRACE_SCOPE around generateCode() calls with element names - Generates <target>_gen_trace.perfetto-trace files (e.g. genglsl_gen_trace.perfetto-trace) - These tests run on all CI platforms, providing trace artifacts for download
|
Refactoring done - |
|
FYI @jstone-lucasfilm https://github.com/autodesk-forks/MaterialX/actions/runs/21080475833/ is green except for JavaScript which always fails on forks, and the traces are downloadable as artifacts. |
When outputDirectory is specified and shaderInterfaces=REDUCED, the render tests were failing to create the parent material directory before attempting to create the /reduced subdirectory, causing shader dumps to fail. The fix ensures createDirectory() is called on the parent outputPath first, then on outputPath/reduced if in REDUCED mode. Both calls are now within the same ScopedTimer scope for consistent profiling. Fixed in all four render test implementations: - RenderGlsl.cpp - RenderOsl.cpp - RenderMsl.mm - RenderSlang.cpp
jstone-lucasfilm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on our last MaterialX TSC meeting, it's clear that we should move forward with this new work, so I've started writing up more detailed recommendations for improving it.
Address review: remove `using Cat` alias and spell out the full mx::Tracing::Category names for clarity while the Tracing library is still new.
Address review: use a Perfetto-specific flag name to leave a clear path for future extensions to Tracy and other profiling backends.
Address review: instead of enabling tracing in all extended builds, add a matrix.extended_build_perfetto flag and set it on one non-Windows build to produce a single clear tracing artifact.
Only attempt to upload trace artifacts from the build that actually has Perfetto enabled, avoiding no-op upload steps in other builds.
jstone-lucasfilm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, @ppenenko, and thanks for the contribution to MaterialX! Once we start seeing the new tracing artifacts in nightly builds, we can augment this with additional improvements.
0a1b924
into
AcademySoftwareFoundation:main

Summary
This PR adds a new
MaterialXTracemodule with optional Perfetto tracing infrastructure, plus a configurable test output directory, laying the groundwork for performance analysis and optimization work.Background and motivation
MaterialX doesn't currently offer a tracing/instrumentation infrastructure sufficient for the following goals:
In comparison, OpenUSD supports tracing in the Chrome Tracing format—the original format used by Chrome DevTools, viewable via
chrome://tracing. It's JSON-based, and OpenUSD implements serialization without external dependencies.Perfetto is the successor to Chrome Tracing which relies on a more optimal binary Protobuf-based serialization format. https://ui.perfetto.dev/ is capable of opening both Perfetto captures and legacy JSON Chrome Tracing captures. It has better performance than
chrome://tracing, can open larger captures and supports more advanced analysis workflows, e.g. SQL queries.I've chosen Perfetto for the implementation in this PR mostly because of the format's superior performance. For context, using new scene index workflows with Hydra Storm, it's possible to generate multi-gigabyte Chrome Tracing captures that are too big to load in either profiler UI. The implementation relies on the Perfetto SDK.
At the same time, the design remains open for a potential integration of MaterialX tracing with USD tracing, by abstracting out the tracing implementation behind an interface. Such an integration would make it possible to analyze MaterialX shader gen performance in Hydra Storm holistically, in the same profiler UI session or script.
The Lobe Pruning optimization effort (#2680) is the immediate application for this infrastructure.
Features
1. Perfetto Tracing (
MATERIALX_BUILD_TRACING)MaterialXTracemodule – keeps tracing infrastructure separate fromMaterialXCoreMATERIALX_BUILD_TRACING=OFFTracing::Sink) – designed for future USDTraceCollectorintegrationTracing::Category) for type safety and efficient dispatch:Render– rendering operationsShaderGen– shader generationOptimize– optimization passesMaterial– material identity markersTracing::Scope<Category>) – zero stack overhead for category storageMX_TRACE_FUNCTION,MX_TRACE_SCOPE) andDispatcher::ShutdownGuardGenShaderUtil.cpp) and render tests (RenderGlsl.cpp) demonstrating per-material trace markersFetchContent(v43.0).perfetto-traceartifacts2. Test Output Directory (
outputDirectoryin_options.mtlx)3. Runtime Tracing Toggle (
enableTracingin_options.mtlx)MATERIALX_BUILD_TRACING=ONat build time)falseto avoid overhead when not profilingUsage
Enable Tracing (Build Time)
Enable Tracing (Runtime)
In
resources/Materials/TestSuite/_options.mtlx:Configure Output Directory
Analyze Traces
Open
.perfetto-tracefiles at https://ui.perfetto.devFor CI builds, download trace artifacts from the GitHub Actions run page (extended builds only).
Related