Mitigate catastrophic cancellation in cross products and other code by mosra · Pull Request #435 · mosra/magnum

mosra · 2020-04-21T21:15:55Z

Original article: https://pharr.org/matt/blog/2019/11/03/difference-of-floats.html

While this makes 32-bit float cross product precision basically equivalent to a 64-bit calculation casted back to 32-bit, it stays with the speed halfway between the straightforward 32- and 64-bit implementation. Benchmark on Release:

Starting Magnum::Math::Test::VectorBenchmark with 9 test cases...
 BENCH [2]   0.98 ± 0.05   ns cross2Baseline<Float>()@24999x100000 (wall time)
 BENCH [3]   3.44 ± 0.11   ns cross2Baseline<Double>()@24999x100000 (wall time)
 BENCH [4]   1.97 ± 0.08   ns cross2()@24999x100000 (wall time)
 BENCH [5]   2.22 ± 0.11   ns cross3Baseline<Float>()@24999x100000 (wall time)
 BENCH [6]   4.69 ± 0.22   ns cross3Baseline<Double>()@24999x100000 (wall time)
 BENCH [7]   3.32 ± 0.15   ns cross3()@24999x100000 (wall time)
Finished Magnum::Math::Test::VectorBenchmark with 0 errors out of 450000 checks.

However this happens only on platforms that actually have a FMA instruction. For example on Emscripten the code is ten times slower than the baseline implementation, which is not an acceptable tradeoff -- there simply using doubles to calculate the result is faster. And enabling the more precise variant only on some platforms doesn't seem like a good idea for portability. For the record, benchmark output on Chrome (node.js in the terminal gives similar results):

Starting Magnum::Math::Test::VectorBenchmark with 7 test cases...
 BENCH [2]   2.53 ± 0.34   ns cross2Baseline<Float>()@499x100000 (wall time)
 BENCH [3]   5.18 ± 1.30   ns cross2Baseline<Double>()@499x100000 (wall time)
 BENCH [4]   6.22 ± 0.46   ns cross2()@499x100000 (wall time)
 BENCH [5]   2.73 ± 0.35   ns cross3Baseline<Float>()@499x100000 (wall time)
 BENCH [6]   5.94 ± 0.61   ns cross3Baseline<Double>()@499x100000 (wall time)
 BENCH [7]  28.77 ± 2.40   ns cross3()@499x100000 (wall time)
Finished Magnum::Math::Test::VectorBenchmark with 0 errors out of 7000 checks.

Stashing this aside until I'm clearer what to do with this. Things to keep an eye on:

FMA in webassembly: https://github.com/WebAssembly/simd/issues/10
similar optimizations for lerp() as described at https://fgiesen.wordpress.com/2012/08/15/linear-interpolation-past-present-and-future/ , probably with very similar perf characteristic (okay on desktop, terrible on the web)

Have to do some precision improvements, so a baseline is needed. The debug perf is beyond awful, actually.

And the Vector3 version 5% slower in Release, on GCC at least. FFS, what was I thinking with the gather() things. Nice in user code, extremely bad in library code.

While this makes 32-bit float cross product precision basically equivalent to a 64-bit calculation casted back to 32-bit, it stays with the speed halfway between the straightforward 32- and 64-bit implementation. However only on platforms that actually have a FMA instruction. For example on Emscripten the code is TEN TIMES slower than the baseline implementation, which is not an acceptable tradeoff -- there simply using doubles to calculate the result is faster. And enabling the more precise variant only on some platforms doesn't seem like a good idea for portability. Stashing this aside until I'm clearer what to do with this.

mosra added 4 commits April 21, 2020 22:02

Math: benchmark vector dot() and cross().

341a497

Have to do some precision improvements, so a baseline is needed. The debug perf is beyond awful, actually.

Math: make cross() 10x faster in Debug.

573125d

And the Vector3 version 5% slower in Release, on GCC at least. FFS, what was I thinking with the gather() things. Nice in user code, extremely bad in library code.

Math: make dot() twice as fast in Debug.

fc3382e

mosra mentioned this pull request May 9, 2020

2020.06 release #411

Closed

87 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Mitigate catastrophic cancellation in cross products and other code#435

Mitigate catastrophic cancellation in cross products and other code#435
mosra wants to merge 4 commits intomasterfrom
catastrophic-cross

mosra commented Apr 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

mosra commented Apr 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant