Mitigate catastrophic cancellation in cross products and other code#435
Draft
Mitigate catastrophic cancellation in cross products and other code#435
Conversation
Have to do some precision improvements, so a baseline is needed. The debug perf is beyond awful, actually.
And the Vector3 version 5% slower in Release, on GCC at least. FFS, what was I thinking with the gather() things. Nice in user code, extremely bad in library code.
While this makes 32-bit float cross product precision basically equivalent to a 64-bit calculation casted back to 32-bit, it stays with the speed halfway between the straightforward 32- and 64-bit implementation. However only on platforms that actually have a FMA instruction. For example on Emscripten the code is TEN TIMES slower than the baseline implementation, which is not an acceptable tradeoff -- there simply using doubles to calculate the result is faster. And enabling the more precise variant only on some platforms doesn't seem like a good idea for portability. Stashing this aside until I'm clearer what to do with this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Original article: https://pharr.org/matt/blog/2019/11/03/difference-of-floats.html
While this makes 32-bit float cross product precision basically equivalent to a 64-bit calculation casted back to 32-bit, it stays with the speed halfway between the straightforward 32- and 64-bit implementation. Benchmark on Release:
However this happens only on platforms that actually have a FMA instruction. For example on Emscripten the code is ten times slower than the baseline implementation, which is not an acceptable tradeoff -- there simply using doubles to calculate the result is faster. And enabling the more precise variant only on some platforms doesn't seem like a good idea for portability. For the record, benchmark output on Chrome (node.js in the terminal gives similar results):
Stashing this aside until I'm clearer what to do with this. Things to keep an eye on:
lerp()as described at https://fgiesen.wordpress.com/2012/08/15/linear-interpolation-past-present-and-future/ , probably with very similar perf characteristic (okay on desktop, terrible on the web)