Skip to content

samhaswon/simd_blend_modes

Repository files navigation

SIMD Blend Modes

This project reimplements the blend modes from blend_modes with C kernels and SIMD (SSE4.2/AVX2) acceleration. It supports uint8 and float32 NumPy inputs in the range 0..255 and returns output dtype/channel count matching the background image. Missing alpha channels are treated as fully opaque (255). Opacity defaults to 1.0.

This is mostly intended to be a mostly drop-in replacement, but with a more permissive API that allows you to go faster if you don't need FP32 arrays or the information of an Alpha channel for some layers.

Build and Install

General

pip install simd-blend-modes

Development

pip install -r requirements-dev.txt
pip install -e .

Usage

import numpy as np
import simd_blend_modes as sbm

background = np.zeros((512, 512, 4), dtype=np.uint8)
foreground = np.zeros((512, 512, 4), dtype=np.uint8)

out = sbm.screen(background, foreground, 0.5)

Inputs:

  • Dtypes: np.uint8 or np.float32 only.
  • Value range: 0..255 for both dtypes.
    • This expects float32 inputs to be cast from uint8, not normalized as well.
  • Shapes: H x W x C with C = 3 (RGB) or 4 (RGBA).
  • Output: dtype and channel count match the background image.
  • Alpha: if a source is RGB (3 channels), alpha is treated as 255 (fully opaque).
  • Opacity: the third argument is optional; defaults to 1.0.

Supported blend modes:

You can force a kernel by passing a string (or KernelKind value):

out = sbm.screen(background, foreground, 0.5, "avx2")

Tests

Correctness and performance:

python3 -m unittest discover tests/

Performance:

python3 -m unittest tests.test_performance

The performance test prints a markdown table of per-kernel speedups vs the NumPy reference for common square sizes and screen resolutions.

ARM

ARM isn't properly supported as I do not have a new enough ARM CPU to test on. Nor do I wish to use a cloud VM to test it. So, if you want ARM support, open a PR. It should build and be faster, but there's no SIMD support there (yet).

ARM builds run in scalar-only mode (x86 SIMD is compile-time gated). To test ARM under Docker, enable emulation and then build with the ARM platform.

If you don't already have buildx/binfmt configured, run:

docker run --privileged --rm tonistiigi/binfmt --install arm64

Then build or run the ARM container:

docker compose up --build

This is incredibly slow. I wouldn't actually do this, but it's here.

Notes

  • SIMD kernels are selected at runtime: AVX2 → SSE4.2 → scalar.
  • ARM builds are supported in scalar-only mode; x86 SIMD is compile-time gated. CI does not emit ARM artifacts.
  • Reference tests adapted from the original project live in tests/reference_blend_modes_tests.py and are skipped unless the blend_modes package and test assets are available.
  • The SIMD paths currently assume contiguous arrays (the input validation enforces this).

Performance

Mode Kernel Ref (s) Kernel (s) Speedup Percent Change
normal scalar 0.152080 0.032742 4.64x -78.47%
normal sse42 0.152080 0.010798 14.08x -92.90%
normal avx2 0.152080 0.010636 14.30x -93.01%
soft_light scalar 0.209721 0.038664 5.42x -81.56%
soft_light sse42 0.209721 0.013059 16.06x -93.77%
soft_light avx2 0.209721 0.011835 17.72x -94.36%
lighten_only scalar 0.153868 0.041726 3.69x -72.88%
lighten_only sse42 0.153868 0.011720 13.13x -92.38%
lighten_only avx2 0.153868 0.011296 13.62x -92.66%
screen scalar 0.162643 0.036807 4.42x -77.37%
screen sse42 0.162643 0.012259 13.27x -92.46%
screen avx2 0.162643 0.011528 14.11x -92.91%
dodge scalar 0.163841 0.039055 4.20x -76.16%
dodge sse42 0.163841 0.013628 12.02x -91.68%
dodge avx2 0.163841 0.011869 13.80x -92.76%
addition scalar 0.157343 0.059510 2.64x -62.18%
addition sse42 0.157343 0.012699 12.39x -91.93%
addition avx2 0.157343 0.011721 13.42x -92.55%
darken_only scalar 0.153869 0.041986 3.66x -72.71%
darken_only sse42 0.153869 0.011764 13.08x -92.35%
darken_only avx2 0.153869 0.011305 13.61x -92.65%
multiply scalar 0.157435 0.036593 4.30x -76.76%
multiply sse42 0.157435 0.011845 13.29x -92.48%
multiply avx2 0.157435 0.011343 13.88x -92.80%
hard_light scalar 0.231979 0.073631 3.15x -68.26%
hard_light sse42 0.231979 0.013737 16.89x -94.08%
hard_light avx2 0.231979 0.011871 19.54x -94.88%
difference scalar 0.213577 0.036500 5.85x -82.91%
difference sse42 0.213577 0.011911 17.93x -94.42%
difference avx2 0.213577 0.011371 18.78x -94.68%
subtract scalar 0.156726 0.037817 4.14x -75.87%
subtract sse42 0.156726 0.013245 11.83x -91.55%
subtract avx2 0.156726 0.011774 13.31x -92.49%
grain_extract scalar 0.161499 0.048936 3.30x -69.70%
grain_extract sse42 0.161499 0.012698 12.72x -92.14%
grain_extract avx2 0.161499 0.011656 13.86x -92.78%
grain_merge scalar 0.161065 0.048878 3.30x -69.65%
grain_merge sse42 0.161065 0.012660 12.72x -92.14%
grain_merge avx2 0.161065 0.011710 13.75x -92.73%
divide scalar 0.164504 0.037938 4.34x -76.94%
divide sse42 0.164504 0.013081 12.58x -92.05%
divide avx2 0.164504 0.011762 13.99x -92.85%
overlay scalar 0.215788 0.070267 3.07x -67.44%
overlay sse42 0.215788 0.013140 16.42x -93.91%
overlay avx2 0.215788 0.011764 18.34x -94.55%
Per-kernel, size, and type results
Case Input Channels Opacity Mode Kernel Ref (s) Kernel (s) Speedup Percent Change
256x256 uint8 3 0.50 normal scalar 0.006370 0.001590 4.01x -75.03%
256x256 uint8 3 0.50 normal sse42 0.006370 0.000699 9.11x -89.02%
256x256 uint8 3 0.50 normal avx2 0.006370 0.000705 9.03x -88.93%
256x256 uint8 3 0.50 soft_light scalar 0.008471 0.001787 4.74x -78.90%
256x256 uint8 3 0.50 soft_light sse42 0.008471 0.000875 9.69x -89.68%
256x256 uint8 3 0.50 soft_light avx2 0.008471 0.000801 10.58x -90.54%
256x256 uint8 3 0.50 lighten_only scalar 0.007139 0.001952 3.66x -72.66%
256x256 uint8 3 0.50 lighten_only sse42 0.007139 0.000798 8.94x -88.82%
256x256 uint8 3 0.50 lighten_only avx2 0.007139 0.000770 9.27x -89.21%
256x256 uint8 3 0.50 screen scalar 0.007168 0.001715 4.18x -76.07%
256x256 uint8 3 0.50 screen sse42 0.007168 0.000794 9.03x -88.92%
256x256 uint8 3 0.50 screen avx2 0.007168 0.000765 9.37x -89.33%
256x256 uint8 3 0.50 dodge scalar 0.007354 0.001816 4.05x -75.30%
256x256 uint8 3 0.50 dodge sse42 0.007354 0.000878 8.37x -88.06%
256x256 uint8 3 0.50 dodge avx2 0.007354 0.000796 9.24x -89.18%
256x256 uint8 3 0.50 addition scalar 0.007341 0.002490 2.95x -66.09%
256x256 uint8 3 0.50 addition sse42 0.007341 0.000791 9.28x -89.22%
256x256 uint8 3 0.50 addition avx2 0.007341 0.000747 9.82x -89.82%
256x256 uint8 3 0.50 darken_only scalar 0.007225 0.001920 3.76x -73.43%
256x256 uint8 3 0.50 darken_only sse42 0.007225 0.000796 9.08x -88.98%
256x256 uint8 3 0.50 darken_only avx2 0.007225 0.000775 9.32x -89.28%
256x256 uint8 3 0.50 multiply scalar 0.006988 0.001737 4.02x -75.15%
256x256 uint8 3 0.50 multiply sse42 0.006988 0.000808 8.65x -88.44%
256x256 uint8 3 0.50 multiply avx2 0.006988 0.000782 8.94x -88.81%
256x256 uint8 3 0.50 hard_light scalar 0.008761 0.002959 2.96x -66.23%
256x256 uint8 3 0.50 hard_light sse42 0.008761 0.000934 9.38x -89.34%
256x256 uint8 3 0.50 hard_light avx2 0.008761 0.000796 11.01x -90.92%
256x256 uint8 3 0.50 difference scalar 0.008801 0.001744 5.05x -80.18%
256x256 uint8 3 0.50 difference sse42 0.008801 0.000801 10.99x -90.90%
256x256 uint8 3 0.50 difference avx2 0.008801 0.000774 11.37x -91.20%
256x256 uint8 3 0.50 subtract scalar 0.007266 0.001603 4.53x -77.94%
256x256 uint8 3 0.50 subtract sse42 0.007266 0.000875 8.30x -87.95%
256x256 uint8 3 0.50 subtract avx2 0.007266 0.000815 8.92x -88.78%
256x256 uint8 3 0.50 grain_extract scalar 0.007047 0.002080 3.39x -70.48%
256x256 uint8 3 0.50 grain_extract sse42 0.007047 0.000840 8.39x -88.07%
256x256 uint8 3 0.50 grain_extract avx2 0.007047 0.000781 9.02x -88.91%
256x256 uint8 3 0.50 grain_merge scalar 0.007489 0.002105 3.56x -71.89%
256x256 uint8 3 0.50 grain_merge sse42 0.007489 0.000869 8.62x -88.40%
256x256 uint8 3 0.50 grain_merge avx2 0.007489 0.000859 8.72x -88.53%
256x256 uint8 3 0.50 divide scalar 0.007437 0.001764 4.22x -76.29%
256x256 uint8 3 0.50 divide sse42 0.007437 0.000866 8.58x -88.35%
256x256 uint8 3 0.50 divide avx2 0.007437 0.000810 9.19x -89.11%
256x256 uint8 3 0.50 overlay scalar 0.008993 0.002876 3.13x -68.02%
256x256 uint8 3 0.50 overlay sse42 0.008993 0.000865 10.40x -90.38%
256x256 uint8 3 0.50 overlay avx2 0.008993 0.000806 11.16x -91.04%
256x256 uint8 4 0.50 normal scalar 0.003095 0.001316 2.35x -57.46%
256x256 uint8 4 0.50 normal sse42 0.003095 0.000178 17.40x -94.25%
256x256 uint8 4 0.50 normal avx2 0.003095 0.000162 19.15x -94.78%
256x256 uint8 4 0.50 soft_light scalar 0.006785 0.001655 4.10x -75.61%
256x256 uint8 4 0.50 soft_light sse42 0.006785 0.000222 30.59x -96.73%
256x256 uint8 4 0.50 soft_light avx2 0.006785 0.000197 34.46x -97.10%
256x256 uint8 4 0.50 lighten_only scalar 0.005477 0.001777 3.08x -67.55%
256x256 uint8 4 0.50 lighten_only sse42 0.005477 0.000188 29.08x -96.56%
256x256 uint8 4 0.50 lighten_only avx2 0.005477 0.000194 28.28x -96.46%
256x256 uint8 4 0.50 screen scalar 0.005569 0.001538 3.62x -72.37%
256x256 uint8 4 0.50 screen sse42 0.005569 0.000208 26.73x -96.26%
256x256 uint8 4 0.50 screen avx2 0.005569 0.000194 28.64x -96.51%
256x256 uint8 4 0.50 dodge scalar 0.005804 0.001616 3.59x -72.16%
256x256 uint8 4 0.50 dodge sse42 0.005804 0.000233 24.87x -95.98%
256x256 uint8 4 0.50 dodge avx2 0.005804 0.000194 29.93x -96.66%
256x256 uint8 4 0.50 addition scalar 0.005651 0.001928 2.93x -65.89%
256x256 uint8 4 0.50 addition sse42 0.005651 0.000260 21.75x -95.40%
256x256 uint8 4 0.50 addition avx2 0.005651 0.000200 28.26x -96.46%
256x256 uint8 4 0.50 darken_only scalar 0.005410 0.001722 3.14x -68.18%
256x256 uint8 4 0.50 darken_only sse42 0.005410 0.000193 28.09x -96.44%
256x256 uint8 4 0.50 darken_only avx2 0.005410 0.000189 28.55x -96.50%
256x256 uint8 4 0.50 multiply scalar 0.005434 0.001546 3.51x -71.55%
256x256 uint8 4 0.50 multiply sse42 0.005434 0.000196 27.79x -96.40%
256x256 uint8 4 0.50 multiply avx2 0.005434 0.000186 29.16x -96.57%
256x256 uint8 4 0.50 hard_light scalar 0.007276 0.002551 2.85x -64.94%
256x256 uint8 4 0.50 hard_light sse42 0.007276 0.000236 30.83x -96.76%
256x256 uint8 4 0.50 hard_light avx2 0.007276 0.000196 37.19x -97.31%
256x256 uint8 4 0.50 difference scalar 0.007216 0.001544 4.67x -78.61%
256x256 uint8 4 0.50 difference sse42 0.007216 0.000193 37.42x -97.33%
256x256 uint8 4 0.50 difference avx2 0.007216 0.000192 37.66x -97.34%
256x256 uint8 4 0.50 subtract scalar 0.005564 0.001470 3.79x -73.59%
256x256 uint8 4 0.50 subtract sse42 0.005564 0.000269 20.70x -95.17%
256x256 uint8 4 0.50 subtract avx2 0.005564 0.000198 28.15x -96.45%
256x256 uint8 4 0.50 grain_extract scalar 0.005604 0.001861 3.01x -66.79%
256x256 uint8 4 0.50 grain_extract sse42 0.005604 0.000217 25.78x -96.12%
256x256 uint8 4 0.50 grain_extract avx2 0.005604 0.000195 28.75x -96.52%
256x256 uint8 4 0.50 grain_merge scalar 0.005473 0.001873 2.92x -65.78%
256x256 uint8 4 0.50 grain_merge sse42 0.005473 0.000213 25.68x -96.11%
256x256 uint8 4 0.50 grain_merge avx2 0.005473 0.000192 28.52x -96.49%
256x256 uint8 4 0.50 divide scalar 0.005750 0.001582 3.63x -72.48%
256x256 uint8 4 0.50 divide sse42 0.005750 0.000216 26.59x -96.24%
256x256 uint8 4 0.50 divide avx2 0.005750 0.000191 30.11x -96.68%
256x256 uint8 4 0.50 overlay scalar 0.006926 0.002479 2.79x -64.21%
256x256 uint8 4 0.50 overlay sse42 0.006926 0.000243 28.54x -96.50%
256x256 uint8 4 0.50 overlay avx2 0.006926 0.000194 35.76x -97.20%
256x256 float32 3 0.50 normal scalar 0.005580 0.000497 11.22x -91.09%
256x256 float32 3 0.50 normal sse42 0.005580 0.000219 25.49x -96.08%
256x256 float32 3 0.50 normal avx2 0.005580 0.000141 39.54x -97.47%
256x256 float32 3 0.50 soft_light scalar 0.008864 0.000618 14.36x -93.03%
256x256 float32 3 0.50 soft_light sse42 0.008864 0.000258 34.33x -97.09%
256x256 float32 3 0.50 soft_light avx2 0.008864 0.000194 45.67x -97.81%
256x256 float32 3 0.50 lighten_only scalar 0.007150 0.000737 9.70x -89.69%
256x256 float32 3 0.50 lighten_only sse42 0.007150 0.000217 32.98x -96.97%
256x256 float32 3 0.50 lighten_only avx2 0.007150 0.000178 40.21x -97.51%
256x256 float32 3 0.50 screen scalar 0.007325 0.000551 13.28x -92.47%
256x256 float32 3 0.50 screen sse42 0.007325 0.000240 30.48x -96.72%
256x256 float32 3 0.50 screen avx2 0.007325 0.000184 39.83x -97.49%
256x256 float32 3 0.50 dodge scalar 0.007305 0.000631 11.59x -91.37%
256x256 float32 3 0.50 dodge sse42 0.007305 0.000275 26.53x -96.23%
256x256 float32 3 0.50 dodge avx2 0.007305 0.000211 34.63x -97.11%
256x256 float32 3 0.50 addition scalar 0.007376 0.001557 4.74x -78.89%
256x256 float32 3 0.50 addition sse42 0.007376 0.000233 31.68x -96.84%
256x256 float32 3 0.50 addition avx2 0.007376 0.000184 40.09x -97.51%
256x256 float32 3 0.50 darken_only scalar 0.007243 0.000734 9.86x -89.86%
256x256 float32 3 0.50 darken_only sse42 0.007243 0.000215 33.69x -97.03%
256x256 float32 3 0.50 darken_only avx2 0.007243 0.000177 40.90x -97.56%
256x256 float32 3 0.50 multiply scalar 0.007388 0.000540 13.68x -92.69%
256x256 float32 3 0.50 multiply sse42 0.007388 0.000220 33.53x -97.02%
256x256 float32 3 0.50 multiply avx2 0.007388 0.000179 41.38x -97.58%
256x256 float32 3 0.50 hard_light scalar 0.009079 0.001760 5.16x -80.62%
256x256 float32 3 0.50 hard_light sse42 0.009079 0.000279 32.53x -96.93%
256x256 float32 3 0.50 hard_light avx2 0.009079 0.000185 49.19x -97.97%
256x256 float32 3 0.50 difference scalar 0.009052 0.000547 16.53x -93.95%
256x256 float32 3 0.50 difference sse42 0.009052 0.000227 39.82x -97.49%
256x256 float32 3 0.50 difference avx2 0.009052 0.000183 49.41x -97.98%
256x256 float32 3 0.50 subtract scalar 0.007449 0.000692 10.77x -90.71%
256x256 float32 3 0.50 subtract sse42 0.007449 0.000241 30.94x -96.77%
256x256 float32 3 0.50 subtract avx2 0.007449 0.000185 40.31x -97.52%
256x256 float32 3 0.50 grain_extract scalar 0.007552 0.001002 7.54x -86.73%
256x256 float32 3 0.50 grain_extract sse42 0.007552 0.000232 32.52x -96.92%
256x256 float32 3 0.50 grain_extract avx2 0.007552 0.000179 42.08x -97.62%
256x256 float32 3 0.50 grain_merge scalar 0.007136 0.001000 7.14x -85.99%
256x256 float32 3 0.50 grain_merge sse42 0.007136 0.000266 26.79x -96.27%
256x256 float32 3 0.50 grain_merge avx2 0.007136 0.000182 39.16x -97.45%
256x256 float32 3 0.50 divide scalar 0.007395 0.000607 12.18x -91.79%
256x256 float32 3 0.50 divide sse42 0.007395 0.000263 28.17x -96.45%
256x256 float32 3 0.50 divide avx2 0.007395 0.000181 40.89x -97.55%
256x256 float32 3 0.50 overlay scalar 0.008732 0.001631 5.35x -81.32%
256x256 float32 3 0.50 overlay sse42 0.008732 0.000254 34.38x -97.09%
256x256 float32 3 0.50 overlay avx2 0.008732 0.000183 47.81x -97.91%
256x256 float32 4 0.50 normal scalar 0.004371 0.000616 7.10x -85.91%
256x256 float32 4 0.50 normal sse42 0.004371 0.000143 30.52x -96.72%
256x256 float32 4 0.50 normal avx2 0.004371 0.000153 28.49x -96.49%
256x256 float32 4 0.50 soft_light scalar 0.006858 0.000720 9.52x -89.50%
256x256 float32 4 0.50 soft_light sse42 0.006858 0.000182 37.74x -97.35%
256x256 float32 4 0.50 soft_light avx2 0.006858 0.000182 37.69x -97.35%
256x256 float32 4 0.50 lighten_only scalar 0.005419 0.000769 7.05x -85.82%
256x256 float32 4 0.50 lighten_only sse42 0.005419 0.000165 32.90x -96.96%
256x256 float32 4 0.50 lighten_only avx2 0.005419 0.000181 29.90x -96.66%
256x256 float32 4 0.50 screen scalar 0.005604 0.000672 8.34x -88.01%
256x256 float32 4 0.50 screen sse42 0.005604 0.000174 32.25x -96.90%
256x256 float32 4 0.50 screen avx2 0.005604 0.000179 31.30x -96.80%
256x256 float32 4 0.50 dodge scalar 0.005572 0.000751 7.42x -86.52%
256x256 float32 4 0.50 dodge sse42 0.005572 0.000216 25.76x -96.12%
256x256 float32 4 0.50 dodge avx2 0.005572 0.000189 29.55x -96.62%
256x256 float32 4 0.50 addition scalar 0.005738 0.001335 4.30x -76.73%
256x256 float32 4 0.50 addition sse42 0.005738 0.000186 30.80x -96.75%
256x256 float32 4 0.50 addition avx2 0.005738 0.000185 31.02x -96.78%
256x256 float32 4 0.50 darken_only scalar 0.005302 0.000759 6.99x -85.69%
256x256 float32 4 0.50 darken_only sse42 0.005302 0.000159 33.44x -97.01%
256x256 float32 4 0.50 darken_only avx2 0.005302 0.000176 30.16x -96.68%
256x256 float32 4 0.50 multiply scalar 0.005762 0.000639 9.02x -88.91%
256x256 float32 4 0.50 multiply sse42 0.005762 0.000167 34.59x -97.11%
256x256 float32 4 0.50 multiply avx2 0.005762 0.000179 32.26x -96.90%
256x256 float32 4 0.50 hard_light scalar 0.007241 0.001865 3.88x -74.24%
256x256 float32 4 0.50 hard_light sse42 0.007241 0.000222 32.58x -96.93%
256x256 float32 4 0.50 hard_light avx2 0.007241 0.000183 39.61x -97.48%
256x256 float32 4 0.50 difference scalar 0.007348 0.000647 11.36x -91.20%
256x256 float32 4 0.50 difference sse42 0.007348 0.000173 42.36x -97.64%
256x256 float32 4 0.50 difference avx2 0.007348 0.000183 40.04x -97.50%
256x256 float32 4 0.50 subtract scalar 0.005691 0.000862 6.60x -84.85%
256x256 float32 4 0.50 subtract sse42 0.005691 0.000191 29.83x -96.65%
256x256 float32 4 0.50 subtract avx2 0.005691 0.000184 31.01x -96.77%
256x256 float32 4 0.50 grain_extract scalar 0.005481 0.001052 5.21x -80.81%
256x256 float32 4 0.50 grain_extract sse42 0.005481 0.000170 32.19x -96.89%
256x256 float32 4 0.50 grain_extract avx2 0.005481 0.000182 30.09x -96.68%
256x256 float32 4 0.50 grain_merge scalar 0.005445 0.001059 5.14x -80.56%
256x256 float32 4 0.50 grain_merge sse42 0.005445 0.000178 30.52x -96.72%
256x256 float32 4 0.50 grain_merge avx2 0.005445 0.000181 30.04x -96.67%
256x256 float32 4 0.50 divide scalar 0.005570 0.000705 7.90x -87.34%
256x256 float32 4 0.50 divide sse42 0.005570 0.000179 31.09x -96.78%
256x256 float32 4 0.50 divide avx2 0.005570 0.000180 30.95x -96.77%
256x256 float32 4 0.50 overlay scalar 0.007038 0.001723 4.08x -75.52%
256x256 float32 4 0.50 overlay sse42 0.007038 0.000183 38.42x -97.40%
256x256 float32 4 0.50 overlay avx2 0.007038 0.000177 39.77x -97.49%
512x512 uint8 3 0.50 normal scalar 0.032664 0.006359 5.14x -80.53%
512x512 uint8 3 0.50 normal sse42 0.032664 0.002753 11.86x -91.57%
512x512 uint8 3 0.50 normal avx2 0.032664 0.002755 11.86x -91.56%
512x512 uint8 3 0.00 normal scalar 0.032962 0.002511 13.12x -92.38%
512x512 uint8 3 0.00 normal sse42 0.032962 0.002717 12.13x -91.76%
512x512 uint8 3 0.00 normal avx2 0.032962 0.002575 12.80x -92.19%
512x512 uint8 3 1.00 normal scalar 0.032701 0.002452 13.34x -92.50%
512x512 uint8 3 1.00 normal sse42 0.032701 0.002444 13.38x -92.53%
512x512 uint8 3 1.00 normal avx2 0.032701 0.002456 13.32x -92.49%
512x512 uint8 3 0.50 soft_light scalar 0.043186 0.007168 6.03x -83.40%
512x512 uint8 3 0.50 soft_light sse42 0.043186 0.003435 12.57x -92.05%
512x512 uint8 3 0.50 soft_light avx2 0.043186 0.003369 12.82x -92.20%
512x512 uint8 3 0.00 soft_light scalar 0.042857 0.002442 17.55x -94.30%
512x512 uint8 3 0.00 soft_light sse42 0.042857 0.002439 17.57x -94.31%
512x512 uint8 3 0.00 soft_light avx2 0.042857 0.002448 17.51x -94.29%
512x512 uint8 3 1.00 soft_light scalar 0.042902 0.007092 6.05x -83.47%
512x512 uint8 3 1.00 soft_light sse42 0.042902 0.003376 12.71x -92.13%
512x512 uint8 3 1.00 soft_light avx2 0.042902 0.003155 13.60x -92.65%
512x512 uint8 3 0.50 lighten_only scalar 0.036846 0.007739 4.76x -79.00%
512x512 uint8 3 0.50 lighten_only sse42 0.036846 0.003063 12.03x -91.69%
512x512 uint8 3 0.50 lighten_only avx2 0.036846 0.003028 12.17x -91.78%
512x512 uint8 3 0.00 lighten_only scalar 0.036623 0.002479 14.77x -93.23%
512x512 uint8 3 0.00 lighten_only sse42 0.036623 0.002485 14.74x -93.21%
512x512 uint8 3 0.00 lighten_only avx2 0.036623 0.002544 14.40x -93.05%
512x512 uint8 3 1.00 lighten_only scalar 0.036539 0.007693 4.75x -78.95%
512x512 uint8 3 1.00 lighten_only sse42 0.036539 0.003188 11.46x -91.27%
512x512 uint8 3 1.00 lighten_only avx2 0.036539 0.002953 12.38x -91.92%
512x512 uint8 3 0.50 screen scalar 0.037287 0.007076 5.27x -81.02%
512x512 uint8 3 0.50 screen sse42 0.037287 0.003285 11.35x -91.19%
512x512 uint8 3 0.50 screen avx2 0.037287 0.003154 11.82x -91.54%
512x512 uint8 3 0.00 screen scalar 0.038598 0.002465 15.66x -93.61%
512x512 uint8 3 0.00 screen sse42 0.038598 0.002453 15.73x -93.64%
512x512 uint8 3 0.00 screen avx2 0.038598 0.002452 15.74x -93.65%
512x512 uint8 3 1.00 screen scalar 0.038291 0.006831 5.61x -82.16%
512x512 uint8 3 1.00 screen sse42 0.038291 0.003159 12.12x -91.75%
512x512 uint8 3 1.00 screen avx2 0.038291 0.003032 12.63x -92.08%
512x512 uint8 3 0.50 dodge scalar 0.037752 0.007065 5.34x -81.29%
512x512 uint8 3 0.50 dodge sse42 0.037752 0.003489 10.82x -90.76%
512x512 uint8 3 0.50 dodge avx2 0.037752 0.003187 11.85x -91.56%
512x512 uint8 3 0.00 dodge scalar 0.037179 0.002459 15.12x -93.39%
512x512 uint8 3 0.00 dodge sse42 0.037179 0.002452 15.16x -93.40%
512x512 uint8 3 0.00 dodge avx2 0.037179 0.002465 15.08x -93.37%
512x512 uint8 3 1.00 dodge scalar 0.037046 0.007060 5.25x -80.94%
512x512 uint8 3 1.00 dodge sse42 0.037046 0.003464 10.69x -90.65%
512x512 uint8 3 1.00 dodge avx2 0.037046 0.003211 11.54x -91.33%
512x512 uint8 3 0.50 addition scalar 0.037243 0.010142 3.67x -72.77%
512x512 uint8 3 0.50 addition sse42 0.037243 0.003262 11.42x -91.24%
512x512 uint8 3 0.50 addition avx2 0.037243 0.003050 12.21x -91.81%
512x512 uint8 3 0.00 addition scalar 0.036870 0.002529 14.58x -93.14%
512x512 uint8 3 0.00 addition sse42 0.036870 0.002482 14.86x -93.27%
512x512 uint8 3 0.00 addition avx2 0.036870 0.002518 14.64x -93.17%
512x512 uint8 3 1.00 addition scalar 0.036626 0.013275 2.76x -63.76%
512x512 uint8 3 1.00 addition sse42 0.036626 0.003178 11.52x -91.32%
512x512 uint8 3 1.00 addition avx2 0.036626 0.003021 12.12x -91.75%
512x512 uint8 3 0.50 darken_only scalar 0.036627 0.007679 4.77x -79.04%
512x512 uint8 3 0.50 darken_only sse42 0.036627 0.003123 11.73x -91.47%
512x512 uint8 3 0.50 darken_only avx2 0.036627 0.003037 12.06x -91.71%
512x512 uint8 3 0.00 darken_only scalar 0.037025 0.002500 14.81x -93.25%
512x512 uint8 3 0.00 darken_only sse42 0.037025 0.002501 14.81x -93.25%
512x512 uint8 3 0.00 darken_only avx2 0.037025 0.002535 14.60x -93.15%
512x512 uint8 3 1.00 darken_only scalar 0.037378 0.007924 4.72x -78.80%
512x512 uint8 3 1.00 darken_only sse42 0.037378 0.003105 12.04x -91.69%
512x512 uint8 3 1.00 darken_only avx2 0.037378 0.003077 12.15x -91.77%
512x512 uint8 3 0.50 multiply scalar 0.037533 0.006937 5.41x -81.52%
512x512 uint8 3 0.50 multiply sse42 0.037533 0.003240 11.58x -91.37%
512x512 uint8 3 0.50 multiply avx2 0.037533 0.003026 12.40x -91.94%
512x512 uint8 3 0.00 multiply scalar 0.036780 0.002619 14.04x -92.88%
512x512 uint8 3 0.00 multiply sse42 0.036780 0.002651 13.87x -92.79%
512x512 uint8 3 0.00 multiply avx2 0.036780 0.002491 14.76x -93.23%
512x512 uint8 3 1.00 multiply scalar 0.036960 0.007474 4.95x -79.78%
512x512 uint8 3 1.00 multiply sse42 0.036960 0.003158 11.70x -91.46%
512x512 uint8 3 1.00 multiply avx2 0.036960 0.003044 12.14x -91.76%
512x512 uint8 3 0.50 hard_light scalar 0.046398 0.011862 3.91x -74.44%
512x512 uint8 3 0.50 hard_light sse42 0.046398 0.003543 13.10x -92.36%
512x512 uint8 3 0.50 hard_light avx2 0.046398 0.003261 14.23x -92.97%
512x512 uint8 3 0.00 hard_light scalar 0.046417 0.002519 18.43x -94.57%
512x512 uint8 3 0.00 hard_light sse42 0.046417 0.002527 18.37x -94.55%
512x512 uint8 3 0.00 hard_light avx2 0.046417 0.002500 18.57x -94.62%
512x512 uint8 3 1.00 hard_light scalar 0.045839 0.011953 3.84x -73.92%
512x512 uint8 3 1.00 hard_light sse42 0.045839 0.003688 12.43x -91.96%
512x512 uint8 3 1.00 hard_light avx2 0.045839 0.003292 13.92x -92.82%
512x512 uint8 3 0.50 difference scalar 0.044237 0.006964 6.35x -84.26%
512x512 uint8 3 0.50 difference sse42 0.044237 0.003116 14.20x -92.96%
512x512 uint8 3 0.50 difference avx2 0.044237 0.003000 14.74x -93.22%
512x512 uint8 3 0.00 difference scalar 0.044217 0.002493 17.74x -94.36%
512x512 uint8 3 0.00 difference sse42 0.044217 0.002477 17.85x -94.40%
512x512 uint8 3 0.00 difference avx2 0.044217 0.002508 17.63x -94.33%
512x512 uint8 3 1.00 difference scalar 0.044061 0.007299 6.04x -83.44%
512x512 uint8 3 1.00 difference sse42 0.044061 0.003133 14.06x -92.89%
512x512 uint8 3 1.00 difference avx2 0.044061 0.003031 14.54x -93.12%
512x512 uint8 3 0.50 subtract scalar 0.036994 0.006517 5.68x -82.38%
512x512 uint8 3 0.50 subtract sse42 0.036994 0.003437 10.76x -90.71%
512x512 uint8 3 0.50 subtract avx2 0.036994 0.003179 11.64x -91.41%
512x512 uint8 3 0.00 subtract scalar 0.036976 0.002518 14.68x -93.19%
512x512 uint8 3 0.00 subtract sse42 0.036976 0.002503 14.78x -93.23%
512x512 uint8 3 0.00 subtract avx2 0.036976 0.002529 14.62x -93.16%
512x512 uint8 3 1.00 subtract scalar 0.037033 0.006383 5.80x -82.76%
512x512 uint8 3 1.00 subtract sse42 0.037033 0.003389 10.93x -90.85%
512x512 uint8 3 1.00 subtract avx2 0.037033 0.003159 11.72x -91.47%
512x512 uint8 3 0.50 grain_extract scalar 0.037885 0.008370 4.53x -77.91%
512x512 uint8 3 0.50 grain_extract sse42 0.037885 0.003371 11.24x -91.10%
512x512 uint8 3 0.50 grain_extract avx2 0.037885 0.003178 11.92x -91.61%
512x512 uint8 3 0.00 grain_extract scalar 0.037451 0.002503 14.96x -93.32%
512x512 uint8 3 0.00 grain_extract sse42 0.037451 0.002484 15.08x -93.37%
512x512 uint8 3 0.00 grain_extract avx2 0.037451 0.002504 14.95x -93.31%
512x512 uint8 3 1.00 grain_extract scalar 0.037683 0.008524 4.42x -77.38%
512x512 uint8 3 1.00 grain_extract sse42 0.037683 0.003434 10.97x -90.89%
512x512 uint8 3 1.00 grain_extract avx2 0.037683 0.003117 12.09x -91.73%
512x512 uint8 3 0.50 grain_merge scalar 0.037053 0.008445 4.39x -77.21%
512x512 uint8 3 0.50 grain_merge sse42 0.037053 0.003344 11.08x -90.98%
512x512 uint8 3 0.50 grain_merge avx2 0.037053 0.003106 11.93x -91.62%
512x512 uint8 3 0.00 grain_merge scalar 0.036786 0.002546 14.45x -93.08%
512x512 uint8 3 0.00 grain_merge sse42 0.036786 0.002445 15.05x -93.35%
512x512 uint8 3 0.00 grain_merge avx2 0.036786 0.002449 15.02x -93.34%
512x512 uint8 3 1.00 grain_merge scalar 0.037150 0.008365 4.44x -77.48%
512x512 uint8 3 1.00 grain_merge sse42 0.037150 0.003336 11.14x -91.02%
512x512 uint8 3 1.00 grain_merge avx2 0.037150 0.003114 11.93x -91.62%
512x512 uint8 3 0.50 divide scalar 0.037859 0.006928 5.46x -81.70%
512x512 uint8 3 0.50 divide sse42 0.037859 0.003395 11.15x -91.03%
512x512 uint8 3 0.50 divide avx2 0.037859 0.003177 11.92x -91.61%
512x512 uint8 3 0.00 divide scalar 0.037761 0.002573 14.68x -93.19%
512x512 uint8 3 0.00 divide sse42 0.037761 0.002566 14.71x -93.20%
512x512 uint8 3 0.00 divide avx2 0.037761 0.002564 14.73x -93.21%
512x512 uint8 3 1.00 divide scalar 0.037679 0.006972 5.40x -81.50%
512x512 uint8 3 1.00 divide sse42 0.037679 0.003401 11.08x -90.97%
512x512 uint8 3 1.00 divide avx2 0.037679 0.003197 11.79x -91.51%
512x512 uint8 3 0.50 overlay scalar 0.043801 0.011473 3.82x -73.81%
512x512 uint8 3 0.50 overlay sse42 0.043801 0.003422 12.80x -92.19%
512x512 uint8 3 0.50 overlay avx2 0.043801 0.003172 13.81x -92.76%
512x512 uint8 3 0.00 overlay scalar 0.043639 0.002450 17.81x -94.39%
512x512 uint8 3 0.00 overlay sse42 0.043639 0.002456 17.77x -94.37%
512x512 uint8 3 0.00 overlay avx2 0.043639 0.002460 17.74x -94.36%
512x512 uint8 3 1.00 overlay scalar 0.043776 0.011610 3.77x -73.48%
512x512 uint8 3 1.00 overlay sse42 0.043776 0.003438 12.73x -92.15%
512x512 uint8 3 1.00 overlay avx2 0.043776 0.003149 13.90x -92.81%
512x512 uint8 4 0.50 normal scalar 0.023805 0.005147 4.62x -78.38%
512x512 uint8 4 0.50 normal sse42 0.023805 0.000693 34.37x -97.09%
512x512 uint8 4 0.50 normal avx2 0.023805 0.000625 38.07x -97.37%
512x512 uint8 4 0.00 normal scalar 0.024992 0.000048 515.98x -99.81%
512x512 uint8 4 0.00 normal sse42 0.024992 0.000058 427.71x -99.77%
512x512 uint8 4 0.00 normal avx2 0.024992 0.000048 515.82x -99.81%
512x512 uint8 4 1.00 normal scalar 0.024000 0.005155 4.66x -78.52%
512x512 uint8 4 1.00 normal sse42 0.024000 0.000693 34.63x -97.11%
512x512 uint8 4 1.00 normal avx2 0.024000 0.000626 38.36x -97.39%
512x512 uint8 4 0.50 soft_light scalar 0.034637 0.006560 5.28x -81.06%
512x512 uint8 4 0.50 soft_light sse42 0.034637 0.000877 39.51x -97.47%
512x512 uint8 4 0.50 soft_light avx2 0.034637 0.000780 44.42x -97.75%
512x512 uint8 4 0.00 soft_light scalar 0.034354 0.000049 701.69x -99.86%
512x512 uint8 4 0.00 soft_light sse42 0.034354 0.000049 697.51x -99.86%
512x512 uint8 4 0.00 soft_light avx2 0.034354 0.000050 688.44x -99.85%
512x512 uint8 4 1.00 soft_light scalar 0.034581 0.006454 5.36x -81.34%
512x512 uint8 4 1.00 soft_light sse42 0.034581 0.000876 39.47x -97.47%
512x512 uint8 4 1.00 soft_light avx2 0.034581 0.000780 44.36x -97.75%
512x512 uint8 4 0.50 lighten_only scalar 0.027773 0.006963 3.99x -74.93%
512x512 uint8 4 0.50 lighten_only sse42 0.027773 0.000757 36.71x -97.28%
512x512 uint8 4 0.50 lighten_only avx2 0.027773 0.000754 36.85x -97.29%
512x512 uint8 4 0.00 lighten_only scalar 0.027653 0.000046 603.84x -99.83%
512x512 uint8 4 0.00 lighten_only sse42 0.027653 0.000045 612.45x -99.84%
512x512 uint8 4 0.00 lighten_only avx2 0.027653 0.000046 602.23x -99.83%
512x512 uint8 4 1.00 lighten_only scalar 0.027743 0.006936 4.00x -75.00%
512x512 uint8 4 1.00 lighten_only sse42 0.027743 0.000758 36.62x -97.27%
512x512 uint8 4 1.00 lighten_only avx2 0.027743 0.000761 36.45x -97.26%
512x512 uint8 4 0.50 screen scalar 0.028744 0.006211 4.63x -78.39%
512x512 uint8 4 0.50 screen sse42 0.028744 0.000824 34.87x -97.13%
512x512 uint8 4 0.50 screen avx2 0.028744 0.000785 36.62x -97.27%
512x512 uint8 4 0.00 screen scalar 0.028670 0.000047 611.63x -99.84%
512x512 uint8 4 0.00 screen sse42 0.028670 0.000046 624.48x -99.84%
512x512 uint8 4 0.00 screen avx2 0.028670 0.000046 626.63x -99.84%
512x512 uint8 4 1.00 screen scalar 0.028771 0.006188 4.65x -78.49%
512x512 uint8 4 1.00 screen sse42 0.028771 0.000826 34.82x -97.13%
512x512 uint8 4 1.00 screen avx2 0.028771 0.000780 36.88x -97.29%
512x512 uint8 4 0.50 dodge scalar 0.028966 0.006473 4.48x -77.65%
512x512 uint8 4 0.50 dodge sse42 0.028966 0.000939 30.86x -96.76%
512x512 uint8 4 0.50 dodge avx2 0.028966 0.000767 37.75x -97.35%
512x512 uint8 4 0.00 dodge scalar 0.028616 0.000046 627.82x -99.84%
512x512 uint8 4 0.00 dodge sse42 0.028616 0.000046 625.48x -99.84%
512x512 uint8 4 0.00 dodge avx2 0.028616 0.000045 629.20x -99.84%
512x512 uint8 4 1.00 dodge scalar 0.028442 0.006487 4.38x -77.19%
512x512 uint8 4 1.00 dodge sse42 0.028442 0.000928 30.66x -96.74%
512x512 uint8 4 1.00 dodge avx2 0.028442 0.000775 36.70x -97.27%
512x512 uint8 4 0.50 addition scalar 0.028275 0.007755 3.65x -72.57%
512x512 uint8 4 0.50 addition sse42 0.028275 0.001016 27.84x -96.41%
512x512 uint8 4 0.50 addition avx2 0.028275 0.000802 35.26x -97.16%
512x512 uint8 4 0.00 addition scalar 0.028062 0.000048 582.61x -99.83%
512x512 uint8 4 0.00 addition sse42 0.028062 0.000047 594.20x -99.83%
512x512 uint8 4 0.00 addition avx2 0.028062 0.000048 579.87x -99.83%
512x512 uint8 4 1.00 addition scalar 0.027896 0.009102 3.06x -67.37%
512x512 uint8 4 1.00 addition sse42 0.027896 0.001013 27.53x -96.37%
512x512 uint8 4 1.00 addition avx2 0.027896 0.000796 35.04x -97.15%
512x512 uint8 4 0.50 darken_only scalar 0.028096 0.006905 4.07x -75.42%
512x512 uint8 4 0.50 darken_only sse42 0.028096 0.000757 37.11x -97.31%
512x512 uint8 4 0.50 darken_only avx2 0.028096 0.000758 37.07x -97.30%
512x512 uint8 4 0.00 darken_only scalar 0.027982 0.000052 535.53x -99.81%
512x512 uint8 4 0.00 darken_only sse42 0.027982 0.000047 594.02x -99.83%
512x512 uint8 4 0.00 darken_only avx2 0.027982 0.000048 585.64x -99.83%
512x512 uint8 4 1.00 darken_only scalar 0.027881 0.006917 4.03x -75.19%
512x512 uint8 4 1.00 darken_only sse42 0.027881 0.000753 37.02x -97.30%
512x512 uint8 4 1.00 darken_only avx2 0.027881 0.000756 36.87x -97.29%
512x512 uint8 4 0.50 multiply scalar 0.028184 0.006278 4.49x -77.73%
512x512 uint8 4 0.50 multiply sse42 0.028184 0.000779 36.16x -97.23%
512x512 uint8 4 0.50 multiply avx2 0.028184 0.000748 37.65x -97.34%
512x512 uint8 4 0.00 multiply scalar 0.028292 0.000053 533.84x -99.81%
512x512 uint8 4 0.00 multiply sse42 0.028292 0.000054 525.15x -99.81%
512x512 uint8 4 0.00 multiply avx2 0.028292 0.000045 631.53x -99.84%
512x512 uint8 4 1.00 multiply scalar 0.028126 0.006241 4.51x -77.81%
512x512 uint8 4 1.00 multiply sse42 0.028126 0.000818 34.38x -97.09%
512x512 uint8 4 1.00 multiply avx2 0.028126 0.000749 37.55x -97.34%
512x512 uint8 4 0.50 hard_light scalar 0.036658 0.010267 3.57x -71.99%
512x512 uint8 4 0.50 hard_light sse42 0.036658 0.000957 38.32x -97.39%
512x512 uint8 4 0.50 hard_light avx2 0.036658 0.000779 47.04x -97.87%
512x512 uint8 4 0.00 hard_light scalar 0.039645 0.000049 802.07x -99.88%
512x512 uint8 4 0.00 hard_light sse42 0.039645 0.000048 833.68x -99.88%
512x512 uint8 4 0.00 hard_light avx2 0.039645 0.000047 840.31x -99.88%
512x512 uint8 4 1.00 hard_light scalar 0.036601 0.010226 3.58x -72.06%
512x512 uint8 4 1.00 hard_light sse42 0.036601 0.000999 36.63x -97.27%
512x512 uint8 4 1.00 hard_light avx2 0.036601 0.000785 46.64x -97.86%
512x512 uint8 4 0.50 difference scalar 0.035155 0.006095 5.77x -82.66%
512x512 uint8 4 0.50 difference sse42 0.035155 0.000769 45.73x -97.81%
512x512 uint8 4 0.50 difference avx2 0.035155 0.000760 46.24x -97.84%
512x512 uint8 4 0.00 difference scalar 0.035323 0.000048 739.06x -99.86%
512x512 uint8 4 0.00 difference sse42 0.035323 0.000048 741.70x -99.87%
512x512 uint8 4 0.00 difference avx2 0.035323 0.000057 623.11x -99.84%
512x512 uint8 4 1.00 difference scalar 0.035549 0.006143 5.79x -82.72%
512x512 uint8 4 1.00 difference sse42 0.035549 0.000771 46.13x -97.83%
512x512 uint8 4 1.00 difference avx2 0.035549 0.000757 46.93x -97.87%
512x512 uint8 4 0.50 subtract scalar 0.028077 0.006016 4.67x -78.57%
512x512 uint8 4 0.50 subtract sse42 0.028077 0.001066 26.34x -96.20%
512x512 uint8 4 0.50 subtract avx2 0.028077 0.000798 35.16x -97.16%
512x512 uint8 4 0.00 subtract scalar 0.027791 0.000048 578.83x -99.83%
512x512 uint8 4 0.00 subtract sse42 0.027791 0.000060 461.17x -99.78%
512x512 uint8 4 0.00 subtract avx2 0.027791 0.000051 548.31x -99.82%
512x512 uint8 4 1.00 subtract scalar 0.028086 0.005809 4.83x -79.32%
512x512 uint8 4 1.00 subtract sse42 0.028086 0.001060 26.50x -96.23%
512x512 uint8 4 1.00 subtract avx2 0.028086 0.000790 35.54x -97.19%
512x512 uint8 4 0.50 grain_extract scalar 0.028407 0.007457 3.81x -73.75%
512x512 uint8 4 0.50 grain_extract sse42 0.028407 0.000842 33.72x -97.03%
512x512 uint8 4 0.50 grain_extract avx2 0.028407 0.000769 36.94x -97.29%
512x512 uint8 4 0.00 grain_extract scalar 0.028795 0.000047 612.72x -99.84%
512x512 uint8 4 0.00 grain_extract sse42 0.028795 0.000045 636.01x -99.84%
512x512 uint8 4 0.00 grain_extract avx2 0.028795 0.000045 634.32x -99.84%
512x512 uint8 4 1.00 grain_extract scalar 0.028413 0.007481 3.80x -73.67%
512x512 uint8 4 1.00 grain_extract sse42 0.028413 0.000844 33.68x -97.03%
512x512 uint8 4 1.00 grain_extract avx2 0.028413 0.000767 37.06x -97.30%
512x512 uint8 4 0.50 grain_merge scalar 0.028391 0.007483 3.79x -73.64%
512x512 uint8 4 0.50 grain_merge sse42 0.028391 0.000845 33.60x -97.02%
512x512 uint8 4 0.50 grain_merge avx2 0.028391 0.000764 37.14x -97.31%
512x512 uint8 4 0.00 grain_merge scalar 0.028487 0.000057 501.63x -99.80%
512x512 uint8 4 0.00 grain_merge sse42 0.028487 0.000048 588.05x -99.83%
512x512 uint8 4 0.00 grain_merge avx2 0.028487 0.000047 602.06x -99.83%
512x512 uint8 4 1.00 grain_merge scalar 0.028637 0.007507 3.81x -73.79%
512x512 uint8 4 1.00 grain_merge sse42 0.028637 0.000834 34.34x -97.09%
512x512 uint8 4 1.00 grain_merge avx2 0.028637 0.000766 37.39x -97.33%
512x512 uint8 4 0.50 divide scalar 0.029055 0.006269 4.63x -78.42%
512x512 uint8 4 0.50 divide sse42 0.029055 0.000879 33.05x -96.97%
512x512 uint8 4 0.50 divide avx2 0.029055 0.000772 37.64x -97.34%
512x512 uint8 4 0.00 divide scalar 0.028954 0.000047 620.31x -99.84%
512x512 uint8 4 0.00 divide sse42 0.028954 0.000045 641.05x -99.84%
512x512 uint8 4 0.00 divide avx2 0.028954 0.000045 640.05x -99.84%
512x512 uint8 4 1.00 divide scalar 0.028891 0.006386 4.52x -77.90%
512x512 uint8 4 1.00 divide sse42 0.028891 0.000877 32.93x -96.96%
512x512 uint8 4 1.00 divide avx2 0.028891 0.000760 38.02x -97.37%
512x512 uint8 4 0.50 overlay scalar 0.034941 0.009920 3.52x -71.61%
512x512 uint8 4 0.50 overlay sse42 0.034941 0.000921 37.96x -97.37%
512x512 uint8 4 0.50 overlay avx2 0.034941 0.000773 45.21x -97.79%
512x512 uint8 4 0.00 overlay scalar 0.035762 0.000048 737.65x -99.86%
512x512 uint8 4 0.00 overlay sse42 0.035762 0.000064 561.37x -99.82%
512x512 uint8 4 0.00 overlay avx2 0.035762 0.000048 748.36x -99.87%
512x512 uint8 4 1.00 overlay scalar 0.035436 0.009979 3.55x -71.84%
512x512 uint8 4 1.00 overlay sse42 0.035436 0.000909 38.96x -97.43%
512x512 uint8 4 1.00 overlay avx2 0.035436 0.000775 45.73x -97.81%
512x512 float32 3 0.50 normal scalar 0.028967 0.002337 12.39x -91.93%
512x512 float32 3 0.50 normal sse42 0.028967 0.000900 32.18x -96.89%
512x512 float32 3 0.50 normal avx2 0.028967 0.000590 49.10x -97.96%
512x512 float32 3 0.00 normal scalar 0.028648 0.000663 43.22x -97.69%
512x512 float32 3 0.00 normal sse42 0.028648 0.000369 77.59x -98.71%
512x512 float32 3 0.00 normal avx2 0.028648 0.000361 79.33x -98.74%
512x512 float32 3 1.00 normal scalar 0.028377 0.000635 44.70x -97.76%
512x512 float32 3 1.00 normal sse42 0.028377 0.000359 79.05x -98.74%
512x512 float32 3 1.00 normal avx2 0.028377 0.000352 80.55x -98.76%
512x512 float32 3 0.50 soft_light scalar 0.041450 0.002810 14.75x -93.22%
512x512 float32 3 0.50 soft_light sse42 0.041450 0.001039 39.91x -97.49%
512x512 float32 3 0.50 soft_light avx2 0.041450 0.000736 56.29x -98.22%
512x512 float32 3 0.00 soft_light scalar 0.039327 0.000651 60.38x -98.34%
512x512 float32 3 0.00 soft_light sse42 0.039327 0.000346 113.60x -99.12%
512x512 float32 3 0.00 soft_light avx2 0.039327 0.000354 110.99x -99.10%
512x512 float32 3 1.00 soft_light scalar 0.039581 0.002796 14.16x -92.94%
512x512 float32 3 1.00 soft_light sse42 0.039581 0.001058 37.41x -97.33%
512x512 float32 3 1.00 soft_light avx2 0.039581 0.000767 51.60x -98.06%
512x512 float32 3 0.50 lighten_only scalar 0.034206 0.003333 10.26x -90.26%
512x512 float32 3 0.50 lighten_only sse42 0.034206 0.000886 38.63x -97.41%
512x512 float32 3 0.50 lighten_only avx2 0.034206 0.000740 46.22x -97.84%
512x512 float32 3 0.00 lighten_only scalar 0.032857 0.000675 48.66x -97.94%
512x512 float32 3 0.00 lighten_only sse42 0.032857 0.000367 89.61x -98.88%
512x512 float32 3 0.00 lighten_only avx2 0.032857 0.000371 88.54x -98.87%
512x512 float32 3 1.00 lighten_only scalar 0.034250 0.003287 10.42x -90.40%
512x512 float32 3 1.00 lighten_only sse42 0.034250 0.000873 39.22x -97.45%
512x512 float32 3 1.00 lighten_only avx2 0.034250 0.000729 47.01x -97.87%
512x512 float32 3 0.50 screen scalar 0.034032 0.002649 12.85x -92.22%
512x512 float32 3 0.50 screen sse42 0.034032 0.000945 36.02x -97.22%
512x512 float32 3 0.50 screen avx2 0.034032 0.000760 44.79x -97.77%
512x512 float32 3 0.00 screen scalar 0.033928 0.000676 50.15x -98.01%
512x512 float32 3 0.00 screen sse42 0.033928 0.000370 91.76x -98.91%
512x512 float32 3 0.00 screen avx2 0.033928 0.000368 92.32x -98.92%
512x512 float32 3 1.00 screen scalar 0.033989 0.002579 13.18x -92.41%
512x512 float32 3 1.00 screen sse42 0.033989 0.000966 35.18x -97.16%
512x512 float32 3 1.00 screen avx2 0.033989 0.000768 44.27x -97.74%
512x512 float32 3 0.50 dodge scalar 0.034160 0.003991 8.56x -88.32%
512x512 float32 3 0.50 dodge sse42 0.034160 0.001113 30.69x -96.74%
512x512 float32 3 0.50 dodge avx2 0.034160 0.000773 44.20x -97.74%
512x512 float32 3 0.00 dodge scalar 0.035695 0.000658 54.23x -98.16%
512x512 float32 3 0.00 dodge sse42 0.035695 0.000367 97.25x -98.97%
512x512 float32 3 0.00 dodge avx2 0.035695 0.000374 95.53x -98.95%
512x512 float32 3 1.00 dodge scalar 0.033754 0.002847 11.86x -91.57%
512x512 float32 3 1.00 dodge sse42 0.033754 0.001086 31.07x -96.78%
512x512 float32 3 1.00 dodge avx2 0.033754 0.000751 44.92x -97.77%
512x512 float32 3 0.50 addition scalar 0.032977 0.006577 5.01x -80.06%
512x512 float32 3 0.50 addition sse42 0.032977 0.000914 36.09x -97.23%
512x512 float32 3 0.50 addition avx2 0.032977 0.000750 43.97x -97.73%
512x512 float32 3 0.00 addition scalar 0.032895 0.000661 49.74x -97.99%
512x512 float32 3 0.00 addition sse42 0.032895 0.000367 89.60x -98.88%
512x512 float32 3 0.00 addition avx2 0.032895 0.000368 89.41x -98.88%
512x512 float32 3 1.00 addition scalar 0.033105 0.009584 3.45x -71.05%
512x512 float32 3 1.00 addition sse42 0.033105 0.000915 36.19x -97.24%
512x512 float32 3 1.00 addition avx2 0.033105 0.000758 43.65x -97.71%
512x512 float32 3 0.50 darken_only scalar 0.033081 0.003259 10.15x -90.15%
512x512 float32 3 0.50 darken_only sse42 0.033081 0.000869 38.06x -97.37%
512x512 float32 3 0.50 darken_only avx2 0.033081 0.000720 45.94x -97.82%
512x512 float32 3 0.00 darken_only scalar 0.033263 0.000676 49.23x -97.97%
512x512 float32 3 0.00 darken_only sse42 0.033263 0.000364 91.30x -98.90%
512x512 float32 3 0.00 darken_only avx2 0.033263 0.000363 91.62x -98.91%
512x512 float32 3 1.00 darken_only scalar 0.033051 0.003278 10.08x -90.08%
512x512 float32 3 1.00 darken_only sse42 0.033051 0.000889 37.19x -97.31%
512x512 float32 3 1.00 darken_only avx2 0.033051 0.000768 43.01x -97.68%
512x512 float32 3 0.50 multiply scalar 0.033713 0.002520 13.38x -92.52%
512x512 float32 3 0.50 multiply sse42 0.033713 0.000889 37.91x -97.36%
512x512 float32 3 0.50 multiply avx2 0.033713 0.000723 46.60x -97.85%
512x512 float32 3 0.00 multiply scalar 0.033236 0.000652 50.97x -98.04%
512x512 float32 3 0.00 multiply sse42 0.033236 0.000350 95.03x -98.95%
512x512 float32 3 0.00 multiply avx2 0.033236 0.000361 92.01x -98.91%
512x512 float32 3 1.00 multiply scalar 0.033668 0.002554 13.18x -92.41%
512x512 float32 3 1.00 multiply sse42 0.033668 0.000915 36.78x -97.28%
512x512 float32 3 1.00 multiply avx2 0.033668 0.000745 45.20x -97.79%
512x512 float32 3 0.50 hard_light scalar 0.042600 0.007445 5.72x -82.52%
512x512 float32 3 0.50 hard_light sse42 0.042600 0.001188 35.85x -97.21%
512x512 float32 3 0.50 hard_light avx2 0.042600 0.000752 56.65x -98.23%
512x512 float32 3 0.00 hard_light scalar 0.041654 0.000671 62.10x -98.39%
512x512 float32 3 0.00 hard_light sse42 0.041654 0.000375 111.18x -99.10%
512x512 float32 3 0.00 hard_light avx2 0.041654 0.000371 112.33x -99.11%
512x512 float32 3 1.00 hard_light scalar 0.041561 0.007458 5.57x -82.06%
512x512 float32 3 1.00 hard_light sse42 0.041561 0.001099 37.82x -97.36%
512x512 float32 3 1.00 hard_light avx2 0.041561 0.000758 54.84x -98.18%
512x512 float32 3 0.50 difference scalar 0.040430 0.002542 15.91x -93.71%
512x512 float32 3 0.50 difference sse42 0.040430 0.000916 44.14x -97.73%
512x512 float32 3 0.50 difference avx2 0.040430 0.000757 53.41x -98.13%
512x512 float32 3 0.00 difference scalar 0.040600 0.000668 60.78x -98.35%
512x512 float32 3 0.00 difference sse42 0.040600 0.000380 106.77x -99.06%
512x512 float32 3 0.00 difference avx2 0.040600 0.000383 106.07x -99.06%
512x512 float32 3 1.00 difference scalar 0.040088 0.002555 15.69x -93.63%
512x512 float32 3 1.00 difference sse42 0.040088 0.000927 43.25x -97.69%
512x512 float32 3 1.00 difference avx2 0.040088 0.000768 52.19x -98.08%
512x512 float32 3 0.50 subtract scalar 0.033341 0.003122 10.68x -90.64%
512x512 float32 3 0.50 subtract sse42 0.033341 0.000951 35.06x -97.15%
512x512 float32 3 0.50 subtract avx2 0.033341 0.000752 44.31x -97.74%
512x512 float32 3 0.00 subtract scalar 0.032981 0.000673 49.03x -97.96%
512x512 float32 3 0.00 subtract sse42 0.032981 0.000369 89.46x -98.88%
512x512 float32 3 0.00 subtract avx2 0.032981 0.000360 91.63x -98.91%
512x512 float32 3 1.00 subtract scalar 0.033186 0.003057 10.86x -90.79%
512x512 float32 3 1.00 subtract sse42 0.033186 0.000969 34.25x -97.08%
512x512 float32 3 1.00 subtract avx2 0.033186 0.000771 43.05x -97.68%
512x512 float32 3 0.50 grain_extract scalar 0.033812 0.004354 7.77x -87.12%
512x512 float32 3 0.50 grain_extract sse42 0.033812 0.000959 35.27x -97.16%
512x512 float32 3 0.50 grain_extract avx2 0.033812 0.000747 45.27x -97.79%
512x512 float32 3 0.00 grain_extract scalar 0.033881 0.000651 52.08x -98.08%
512x512 float32 3 0.00 grain_extract sse42 0.033881 0.000347 97.62x -98.98%
512x512 float32 3 0.00 grain_extract avx2 0.033881 0.000357 94.96x -98.95%
512x512 float32 3 1.00 grain_extract scalar 0.033576 0.004427 7.58x -86.82%
512x512 float32 3 1.00 grain_extract sse42 0.033576 0.000960 34.97x -97.14%
512x512 float32 3 1.00 grain_extract avx2 0.033576 0.000749 44.80x -97.77%
512x512 float32 3 0.50 grain_merge scalar 0.033513 0.004338 7.73x -87.06%
512x512 float32 3 0.50 grain_merge sse42 0.033513 0.000955 35.10x -97.15%
512x512 float32 3 0.50 grain_merge avx2 0.033513 0.000757 44.26x -97.74%
512x512 float32 3 0.00 grain_merge scalar 0.033553 0.000656 51.13x -98.04%
512x512 float32 3 0.00 grain_merge sse42 0.033553 0.000357 94.09x -98.94%
512x512 float32 3 0.00 grain_merge avx2 0.033553 0.000358 93.82x -98.93%
512x512 float32 3 1.00 grain_merge scalar 0.033600 0.004409 7.62x -86.88%
512x512 float32 3 1.00 grain_merge sse42 0.033600 0.000976 34.44x -97.10%
512x512 float32 3 1.00 grain_merge avx2 0.033600 0.000827 40.63x -97.54%
512x512 float32 3 0.50 divide scalar 0.036122 0.002845 12.70x -92.12%
512x512 float32 3 0.50 divide sse42 0.036122 0.001063 33.98x -97.06%
512x512 float32 3 0.50 divide avx2 0.036122 0.000770 46.92x -97.87%
512x512 float32 3 0.00 divide scalar 0.036017 0.000714 50.44x -98.02%
512x512 float32 3 0.00 divide sse42 0.036017 0.000436 82.52x -98.79%
512x512 float32 3 0.00 divide avx2 0.036017 0.000416 86.48x -98.84%
512x512 float32 3 1.00 divide scalar 0.035087 0.002792 12.57x -92.04%
512x512 float32 3 1.00 divide sse42 0.035087 0.001074 32.67x -96.94%
512x512 float32 3 1.00 divide avx2 0.035087 0.000763 46.01x -97.83%
512x512 float32 3 0.50 overlay scalar 0.040747 0.006828 5.97x -83.24%
512x512 float32 3 0.50 overlay sse42 0.040747 0.001013 40.24x -97.52%
512x512 float32 3 0.50 overlay avx2 0.040747 0.000737 55.32x -98.19%
512x512 float32 3 0.00 overlay scalar 0.040638 0.000680 59.80x -98.33%
512x512 float32 3 0.00 overlay sse42 0.040638 0.000347 117.27x -99.15%
512x512 float32 3 0.00 overlay avx2 0.040638 0.000356 114.01x -99.12%
512x512 float32 3 1.00 overlay scalar 0.039881 0.006863 5.81x -82.79%
512x512 float32 3 1.00 overlay sse42 0.039881 0.001019 39.12x -97.44%
512x512 float32 3 1.00 overlay avx2 0.039881 0.000737 54.08x -98.15%
512x512 float32 4 0.50 normal scalar 0.021040 0.002725 7.72x -87.05%
512x512 float32 4 0.50 normal sse42 0.021040 0.000701 30.00x -96.67%
512x512 float32 4 0.50 normal avx2 0.021040 0.000701 30.00x -96.67%
512x512 float32 4 0.00 normal scalar 0.020639 0.000547 37.71x -97.35%
512x512 float32 4 0.00 normal sse42 0.020639 0.000290 71.17x -98.59%
512x512 float32 4 0.00 normal avx2 0.020639 0.000338 61.03x -98.36%
512x512 float32 4 1.00 normal scalar 0.020831 0.002729 7.63x -86.90%
512x512 float32 4 1.00 normal sse42 0.020831 0.000705 29.57x -96.62%
512x512 float32 4 1.00 normal avx2 0.020831 0.000709 29.38x -96.60%
512x512 float32 4 0.50 soft_light scalar 0.032395 0.003195 10.14x -90.14%
512x512 float32 4 0.50 soft_light sse42 0.032395 0.000815 39.77x -97.49%
512x512 float32 4 0.50 soft_light avx2 0.032395 0.000842 38.49x -97.40%
512x512 float32 4 0.00 soft_light scalar 0.031700 0.000581 54.58x -98.17%
512x512 float32 4 0.00 soft_light sse42 0.031700 0.000322 98.35x -98.98%
512x512 float32 4 0.00 soft_light avx2 0.031700 0.000288 110.16x -99.09%
512x512 float32 4 1.00 soft_light scalar 0.031587 0.003166 9.98x -89.98%
512x512 float32 4 1.00 soft_light sse42 0.031587 0.000752 41.99x -97.62%
512x512 float32 4 1.00 soft_light avx2 0.031587 0.000782 40.39x -97.52%
512x512 float32 4 0.50 lighten_only scalar 0.024621 0.003406 7.23x -86.16%
512x512 float32 4 0.50 lighten_only sse42 0.024621 0.000733 33.58x -97.02%
512x512 float32 4 0.50 lighten_only avx2 0.024621 0.000792 31.10x -96.78%
512x512 float32 4 0.00 lighten_only scalar 0.024517 0.000548 44.71x -97.76%
512x512 float32 4 0.00 lighten_only sse42 0.024517 0.000290 84.61x -98.82%
512x512 float32 4 0.00 lighten_only avx2 0.024517 0.000291 84.28x -98.81%
512x512 float32 4 1.00 lighten_only scalar 0.025115 0.003398 7.39x -86.47%
512x512 float32 4 1.00 lighten_only sse42 0.025115 0.000724 34.67x -97.12%
512x512 float32 4 1.00 lighten_only avx2 0.025115 0.000779 32.25x -96.90%
512x512 float32 4 0.50 screen scalar 0.025578 0.003039 8.42x -88.12%
512x512 float32 4 0.50 screen sse42 0.025578 0.000762 33.57x -97.02%
512x512 float32 4 0.50 screen avx2 0.025578 0.000790 32.40x -96.91%
512x512 float32 4 0.00 screen scalar 0.025610 0.000556 46.08x -97.83%
512x512 float32 4 0.00 screen sse42 0.025610 0.000292 87.75x -98.86%
512x512 float32 4 0.00 screen avx2 0.025610 0.000298 85.96x -98.84%
512x512 float32 4 1.00 screen scalar 0.025825 0.002980 8.67x -88.46%
512x512 float32 4 1.00 screen sse42 0.025825 0.000732 35.26x -97.16%
512x512 float32 4 1.00 screen avx2 0.025825 0.000749 34.47x -97.10%
512x512 float32 4 0.50 dodge scalar 0.025621 0.003324 7.71x -87.03%
512x512 float32 4 0.50 dodge sse42 0.025621 0.000891 28.76x -96.52%
512x512 float32 4 0.50 dodge avx2 0.025621 0.000802 31.93x -96.87%
512x512 float32 4 0.00 dodge scalar 0.025462 0.000555 45.84x -97.82%
512x512 float32 4 0.00 dodge sse42 0.025462 0.000282 90.29x -98.89%
512x512 float32 4 0.00 dodge avx2 0.025462 0.000288 88.29x -98.87%
512x512 float32 4 1.00 dodge scalar 0.025647 0.003301 7.77x -87.13%
512x512 float32 4 1.00 dodge sse42 0.025647 0.000867 29.59x -96.62%
512x512 float32 4 1.00 dodge avx2 0.025647 0.000768 33.40x -97.01%
512x512 float32 4 0.50 addition scalar 0.024919 0.005641 4.42x -77.36%
512x512 float32 4 0.50 addition sse42 0.024919 0.000795 31.34x -96.81%
512x512 float32 4 0.50 addition avx2 0.024919 0.000822 30.31x -96.70%
512x512 float32 4 0.00 addition scalar 0.024599 0.000611 40.25x -97.52%
512x512 float32 4 0.00 addition sse42 0.024599 0.000282 87.22x -98.85%
512x512 float32 4 0.00 addition avx2 0.024599 0.000435 56.51x -98.23%
512x512 float32 4 1.00 addition scalar 0.025330 0.007265 3.49x -71.32%
512x512 float32 4 1.00 addition sse42 0.025330 0.000805 31.45x -96.82%
512x512 float32 4 1.00 addition avx2 0.025330 0.000819 30.94x -96.77%
512x512 float32 4 0.50 darken_only scalar 0.025205 0.003415 7.38x -86.45%
512x512 float32 4 0.50 darken_only sse42 0.025205 0.000722 34.89x -97.13%
512x512 float32 4 0.50 darken_only avx2 0.025205 0.000766 32.91x -96.96%
512x512 float32 4 0.00 darken_only scalar 0.024833 0.000559 44.42x -97.75%
512x512 float32 4 0.00 darken_only sse42 0.024833 0.000285 87.14x -98.85%
512x512 float32 4 0.00 darken_only avx2 0.024833 0.000286 86.86x -98.85%
512x512 float32 4 1.00 darken_only scalar 0.025021 0.003435 7.28x -86.27%
512x512 float32 4 1.00 darken_only sse42 0.025021 0.000763 32.81x -96.95%
512x512 float32 4 1.00 darken_only avx2 0.025021 0.000788 31.74x -96.85%
512x512 float32 4 0.50 multiply scalar 0.025131 0.002882 8.72x -88.53%
512x512 float32 4 0.50 multiply sse42 0.025131 0.000730 34.43x -97.10%
512x512 float32 4 0.50 multiply avx2 0.025131 0.000772 32.57x -96.93%
512x512 float32 4 0.00 multiply scalar 0.025260 0.000548 46.12x -97.83%
512x512 float32 4 0.00 multiply sse42 0.025260 0.000295 85.62x -98.83%
512x512 float32 4 0.00 multiply avx2 0.025260 0.000293 86.15x -98.84%
512x512 float32 4 1.00 multiply scalar 0.025151 0.002891 8.70x -88.51%
512x512 float32 4 1.00 multiply sse42 0.025151 0.000771 32.61x -96.93%
512x512 float32 4 1.00 multiply avx2 0.025151 0.000802 31.37x -96.81%
512x512 float32 4 0.50 hard_light scalar 0.033231 0.007697 4.32x -76.84%
512x512 float32 4 0.50 hard_light sse42 0.033231 0.000904 36.75x -97.28%
512x512 float32 4 0.50 hard_light avx2 0.033231 0.000790 42.05x -97.62%
512x512 float32 4 0.00 hard_light scalar 0.033613 0.000578 58.16x -98.28%
512x512 float32 4 0.00 hard_light sse42 0.033613 0.000285 117.78x -99.15%
512x512 float32 4 0.00 hard_light avx2 0.033613 0.000341 98.65x -98.99%
512x512 float32 4 1.00 hard_light scalar 0.033492 0.007699 4.35x -77.01%
512x512 float32 4 1.00 hard_light sse42 0.033492 0.000908 36.87x -97.29%
512x512 float32 4 1.00 hard_light avx2 0.033492 0.000766 43.71x -97.71%
512x512 float32 4 0.50 difference scalar 0.032128 0.002905 11.06x -90.96%
512x512 float32 4 0.50 difference sse42 0.032128 0.000724 44.39x -97.75%
512x512 float32 4 0.50 difference avx2 0.032128 0.000759 42.34x -97.64%
512x512 float32 4 0.00 difference scalar 0.031976 0.000551 58.03x -98.28%
512x512 float32 4 0.00 difference sse42 0.031976 0.000280 114.02x -99.12%
512x512 float32 4 0.00 difference avx2 0.031976 0.000373 85.81x -98.83%
512x512 float32 4 1.00 difference scalar 0.032505 0.002888 11.25x -91.11%
512x512 float32 4 1.00 difference sse42 0.032505 0.000731 44.49x -97.75%
512x512 float32 4 1.00 difference avx2 0.032505 0.000762 42.64x -97.65%
512x512 float32 4 0.50 subtract scalar 0.024697 0.003737 6.61x -84.87%
512x512 float32 4 0.50 subtract sse42 0.024697 0.000821 30.08x -96.68%
512x512 float32 4 0.50 subtract avx2 0.024697 0.000898 27.49x -96.36%
512x512 float32 4 0.00 subtract scalar 0.024910 0.000584 42.64x -97.65%
512x512 float32 4 0.00 subtract sse42 0.024910 0.000325 76.62x -98.69%
512x512 float32 4 0.00 subtract avx2 0.024910 0.000289 86.18x -98.84%
512x512 float32 4 1.00 subtract scalar 0.025036 0.003557 7.04x -85.79%
512x512 float32 4 1.00 subtract sse42 0.025036 0.000811 30.89x -96.76%
512x512 float32 4 1.00 subtract avx2 0.025036 0.000782 32.03x -96.88%
512x512 float32 4 0.50 grain_extract scalar 0.025654 0.004514 5.68x -82.40%
512x512 float32 4 0.50 grain_extract sse42 0.025654 0.000779 32.92x -96.96%
512x512 float32 4 0.50 grain_extract avx2 0.025654 0.000831 30.89x -96.76%
512x512 float32 4 0.00 grain_extract scalar 0.025531 0.000549 46.49x -97.85%
512x512 float32 4 0.00 grain_extract sse42 0.025531 0.000289 88.44x -98.87%
512x512 float32 4 0.00 grain_extract avx2 0.025531 0.000281 90.92x -98.90%
512x512 float32 4 1.00 grain_extract scalar 0.025511 0.004504 5.66x -82.35%
512x512 float32 4 1.00 grain_extract sse42 0.025511 0.000737 34.60x -97.11%
512x512 float32 4 1.00 grain_extract avx2 0.025511 0.000782 32.64x -96.94%
512x512 float32 4 0.50 grain_merge scalar 0.025282 0.004514 5.60x -82.15%
512x512 float32 4 0.50 grain_merge sse42 0.025282 0.000783 32.30x -96.90%
512x512 float32 4 0.50 grain_merge avx2 0.025282 0.000801 31.57x -96.83%
512x512 float32 4 0.00 grain_merge scalar 0.025842 0.000551 46.87x -97.87%
512x512 float32 4 0.00 grain_merge sse42 0.025842 0.000282 91.52x -98.91%
512x512 float32 4 0.00 grain_merge avx2 0.025842 0.000289 89.36x -98.88%
512x512 float32 4 1.00 grain_merge scalar 0.025590 0.004519 5.66x -82.34%
512x512 float32 4 1.00 grain_merge sse42 0.025590 0.000764 33.50x -97.01%
512x512 float32 4 1.00 grain_merge avx2 0.025590 0.000792 32.31x -96.91%
512x512 float32 4 0.50 divide scalar 0.026216 0.004051 6.47x -84.55%
512x512 float32 4 0.50 divide sse42 0.026216 0.000758 34.60x -97.11%
512x512 float32 4 0.50 divide avx2 0.026216 0.000771 34.02x -97.06%
512x512 float32 4 0.00 divide scalar 0.026078 0.000518 50.32x -98.01%
512x512 float32 4 0.00 divide sse42 0.026078 0.000248 105.13x -99.05%
512x512 float32 4 0.00 divide avx2 0.026078 0.000270 96.60x -98.96%
512x512 float32 4 1.00 divide scalar 0.025651 0.003166 8.10x -87.66%
512x512 float32 4 1.00 divide sse42 0.025651 0.000773 33.17x -96.99%
512x512 float32 4 1.00 divide avx2 0.025651 0.000764 33.60x -97.02%
512x512 float32 4 0.50 overlay scalar 0.032134 0.007142 4.50x -77.77%
512x512 float32 4 0.50 overlay sse42 0.032134 0.000778 41.28x -97.58%
512x512 float32 4 0.50 overlay avx2 0.032134 0.000748 42.95x -97.67%
512x512 float32 4 0.00 overlay scalar 0.032136 0.000549 58.53x -98.29%
512x512 float32 4 0.00 overlay sse42 0.032136 0.000246 130.76x -99.24%
512x512 float32 4 0.00 overlay avx2 0.032136 0.000258 124.41x -99.20%
512x512 float32 4 1.00 overlay scalar 0.032218 0.007168 4.49x -77.75%
512x512 float32 4 1.00 overlay sse42 0.032218 0.000816 39.47x -97.47%
512x512 float32 4 1.00 overlay avx2 0.032218 0.000855 37.66x -97.34%
1024x1024 uint8 3 0.50 normal scalar 0.094839 0.025280 3.75x -73.34%
1024x1024 uint8 3 0.50 normal sse42 0.094839 0.010864 8.73x -88.54%
1024x1024 uint8 3 0.50 normal avx2 0.094839 0.011009 8.61x -88.39%
1024x1024 uint8 3 0.50 soft_light scalar 0.126995 0.028117 4.52x -77.86%
1024x1024 uint8 3 0.50 soft_light sse42 0.126995 0.013559 9.37x -89.32%
1024x1024 uint8 3 0.50 soft_light avx2 0.126995 0.012633 10.05x -90.05%
1024x1024 uint8 3 0.50 lighten_only scalar 0.100069 0.030163 3.32x -69.86%
1024x1024 uint8 3 0.50 lighten_only sse42 0.100069 0.012367 8.09x -87.64%
1024x1024 uint8 3 0.50 lighten_only avx2 0.100069 0.011895 8.41x -88.11%
1024x1024 uint8 3 0.50 screen scalar 0.104290 0.027393 3.81x -73.73%
1024x1024 uint8 3 0.50 screen sse42 0.104290 0.012699 8.21x -87.82%
1024x1024 uint8 3 0.50 screen avx2 0.104290 0.012159 8.58x -88.34%
1024x1024 uint8 3 0.50 dodge scalar 0.104213 0.028293 3.68x -72.85%
1024x1024 uint8 3 0.50 dodge sse42 0.104213 0.013897 7.50x -86.66%
1024x1024 uint8 3 0.50 dodge avx2 0.104213 0.012778 8.16x -87.74%
1024x1024 uint8 3 0.50 addition scalar 0.100048 0.039617 2.53x -60.40%
1024x1024 uint8 3 0.50 addition sse42 0.100048 0.012574 7.96x -87.43%
1024x1024 uint8 3 0.50 addition avx2 0.100048 0.012084 8.28x -87.92%
1024x1024 uint8 3 0.50 darken_only scalar 0.100404 0.030402 3.30x -69.72%
1024x1024 uint8 3 0.50 darken_only sse42 0.100404 0.012296 8.17x -87.75%
1024x1024 uint8 3 0.50 darken_only avx2 0.100404 0.011932 8.41x -88.12%
1024x1024 uint8 3 0.50 multiply scalar 0.102176 0.027350 3.74x -73.23%
1024x1024 uint8 3 0.50 multiply sse42 0.102176 0.012445 8.21x -87.82%
1024x1024 uint8 3 0.50 multiply avx2 0.102176 0.011982 8.53x -88.27%
1024x1024 uint8 3 0.50 hard_light scalar 0.134382 0.046849 2.87x -65.14%
1024x1024 uint8 3 0.50 hard_light sse42 0.134382 0.013894 9.67x -89.66%
1024x1024 uint8 3 0.50 hard_light avx2 0.134382 0.012752 10.54x -90.51%
1024x1024 uint8 3 0.50 difference scalar 0.129480 0.027437 4.72x -78.81%
1024x1024 uint8 3 0.50 difference sse42 0.129480 0.012287 10.54x -90.51%
1024x1024 uint8 3 0.50 difference avx2 0.129480 0.011906 10.88x -90.81%
1024x1024 uint8 3 0.50 subtract scalar 0.100542 0.025217 3.99x -74.92%
1024x1024 uint8 3 0.50 subtract sse42 0.100542 0.013588 7.40x -86.49%
1024x1024 uint8 3 0.50 subtract avx2 0.100542 0.012492 8.05x -87.57%
1024x1024 uint8 3 0.50 grain_extract scalar 0.102238 0.034312 2.98x -66.44%
1024x1024 uint8 3 0.50 grain_extract sse42 0.102238 0.013706 7.46x -86.59%
1024x1024 uint8 3 0.50 grain_extract avx2 0.102238 0.012787 8.00x -87.49%
1024x1024 uint8 3 0.50 grain_merge scalar 0.104394 0.033550 3.11x -67.86%
1024x1024 uint8 3 0.50 grain_merge sse42 0.104394 0.013519 7.72x -87.05%
1024x1024 uint8 3 0.50 grain_merge avx2 0.104394 0.012724 8.20x -87.81%
1024x1024 uint8 3 0.50 divide scalar 0.106128 0.028238 3.76x -73.39%
1024x1024 uint8 3 0.50 divide sse42 0.106128 0.013875 7.65x -86.93%
1024x1024 uint8 3 0.50 divide avx2 0.106128 0.013703 7.74x -87.09%
1024x1024 uint8 3 0.50 overlay scalar 0.130377 0.045998 2.83x -64.72%
1024x1024 uint8 3 0.50 overlay sse42 0.130377 0.014143 9.22x -89.15%
1024x1024 uint8 3 0.50 overlay avx2 0.130377 0.012946 10.07x -90.07%
1024x1024 uint8 4 0.50 normal scalar 0.070143 0.020905 3.36x -70.20%
1024x1024 uint8 4 0.50 normal sse42 0.070143 0.002835 24.74x -95.96%
1024x1024 uint8 4 0.50 normal avx2 0.070143 0.002521 27.83x -96.41%
1024x1024 uint8 4 0.50 soft_light scalar 0.101605 0.026332 3.86x -74.08%
1024x1024 uint8 4 0.50 soft_light sse42 0.101605 0.003833 26.51x -96.23%
1024x1024 uint8 4 0.50 soft_light avx2 0.101605 0.003143 32.32x -96.91%
1024x1024 uint8 4 0.50 lighten_only scalar 0.075982 0.027670 2.75x -63.58%
1024x1024 uint8 4 0.50 lighten_only sse42 0.075982 0.003082 24.66x -95.94%
1024x1024 uint8 4 0.50 lighten_only avx2 0.075982 0.003042 24.98x -96.00%
1024x1024 uint8 4 0.50 screen scalar 0.079442 0.025210 3.15x -68.27%
1024x1024 uint8 4 0.50 screen sse42 0.079442 0.003338 23.80x -95.80%
1024x1024 uint8 4 0.50 screen avx2 0.079442 0.003180 24.98x -96.00%
1024x1024 uint8 4 0.50 dodge scalar 0.079289 0.026128 3.03x -67.05%
1024x1024 uint8 4 0.50 dodge sse42 0.079289 0.003801 20.86x -95.21%
1024x1024 uint8 4 0.50 dodge avx2 0.079289 0.003144 25.22x -96.03%
1024x1024 uint8 4 0.50 addition scalar 0.075894 0.031374 2.42x -58.66%
1024x1024 uint8 4 0.50 addition sse42 0.075894 0.004133 18.37x -94.55%
1024x1024 uint8 4 0.50 addition avx2 0.075894 0.003228 23.51x -95.75%
1024x1024 uint8 4 0.50 darken_only scalar 0.076462 0.027792 2.75x -63.65%
1024x1024 uint8 4 0.50 darken_only sse42 0.076462 0.003027 25.26x -96.04%
1024x1024 uint8 4 0.50 darken_only avx2 0.076462 0.003049 25.08x -96.01%
1024x1024 uint8 4 0.50 multiply scalar 0.076996 0.026624 2.89x -65.42%
1024x1024 uint8 4 0.50 multiply sse42 0.076996 0.003191 24.13x -95.86%
1024x1024 uint8 4 0.50 multiply avx2 0.076996 0.003045 25.29x -96.05%
1024x1024 uint8 4 0.50 hard_light scalar 0.111533 0.041143 2.71x -63.11%
1024x1024 uint8 4 0.50 hard_light sse42 0.111533 0.003836 29.08x -96.56%
1024x1024 uint8 4 0.50 hard_light avx2 0.111533 0.003130 35.63x -97.19%
1024x1024 uint8 4 0.50 difference scalar 0.107496 0.024799 4.33x -76.93%
1024x1024 uint8 4 0.50 difference sse42 0.107496 0.003149 34.14x -97.07%
1024x1024 uint8 4 0.50 difference avx2 0.107496 0.003047 35.28x -97.17%
1024x1024 uint8 4 0.50 subtract scalar 0.076173 0.023894 3.19x -68.63%
1024x1024 uint8 4 0.50 subtract sse42 0.076173 0.004270 17.84x -94.39%
1024x1024 uint8 4 0.50 subtract avx2 0.076173 0.003235 23.55x -95.75%
1024x1024 uint8 4 0.50 grain_extract scalar 0.078209 0.030427 2.57x -61.09%
1024x1024 uint8 4 0.50 grain_extract sse42 0.078209 0.003401 22.99x -95.65%
1024x1024 uint8 4 0.50 grain_extract avx2 0.078209 0.003094 25.28x -96.04%
1024x1024 uint8 4 0.50 grain_merge scalar 0.078319 0.030088 2.60x -61.58%
1024x1024 uint8 4 0.50 grain_merge sse42 0.078319 0.003358 23.33x -95.71%
1024x1024 uint8 4 0.50 grain_merge avx2 0.078319 0.003071 25.50x -96.08%
1024x1024 uint8 4 0.50 divide scalar 0.080021 0.025897 3.09x -67.64%
1024x1024 uint8 4 0.50 divide sse42 0.080021 0.003639 21.99x -95.45%
1024x1024 uint8 4 0.50 divide avx2 0.080021 0.003499 22.87x -95.63%
1024x1024 uint8 4 0.50 overlay scalar 0.106051 0.040344 2.63x -61.96%
1024x1024 uint8 4 0.50 overlay sse42 0.106051 0.003624 29.27x -96.58%
1024x1024 uint8 4 0.50 overlay avx2 0.106051 0.003162 33.54x -97.02%
1024x1024 float32 3 0.50 normal scalar 0.083505 0.007947 10.51x -90.48%
1024x1024 float32 3 0.50 normal sse42 0.083505 0.003595 23.23x -95.69%
1024x1024 float32 3 0.50 normal avx2 0.083505 0.002396 34.85x -97.13%
1024x1024 float32 3 0.50 soft_light scalar 0.116328 0.010063 11.56x -91.35%
1024x1024 float32 3 0.50 soft_light sse42 0.116328 0.004263 27.29x -96.34%
1024x1024 float32 3 0.50 soft_light avx2 0.116328 0.003077 37.80x -97.35%
1024x1024 float32 3 0.50 lighten_only scalar 0.098583 0.012342 7.99x -87.48%
1024x1024 float32 3 0.50 lighten_only sse42 0.098583 0.003588 27.48x -96.36%
1024x1024 float32 3 0.50 lighten_only avx2 0.098583 0.003156 31.24x -96.80%
1024x1024 float32 3 0.50 screen scalar 0.099742 0.009306 10.72x -90.67%
1024x1024 float32 3 0.50 screen sse42 0.099742 0.003905 25.54x -96.08%
1024x1024 float32 3 0.50 screen avx2 0.099742 0.003066 32.53x -96.93%
1024x1024 float32 3 0.50 dodge scalar 0.095159 0.010322 9.22x -89.15%
1024x1024 float32 3 0.50 dodge sse42 0.095159 0.004425 21.50x -95.35%
1024x1024 float32 3 0.50 dodge avx2 0.095159 0.003153 30.18x -96.69%
1024x1024 float32 3 0.50 addition scalar 0.090771 0.025361 3.58x -72.06%
1024x1024 float32 3 0.50 addition sse42 0.090771 0.003725 24.37x -95.90%
1024x1024 float32 3 0.50 addition avx2 0.090771 0.003074 29.53x -96.61%
1024x1024 float32 3 0.50 darken_only scalar 0.089637 0.011961 7.49x -86.66%
1024x1024 float32 3 0.50 darken_only sse42 0.089637 0.003513 25.51x -96.08%
1024x1024 float32 3 0.50 darken_only avx2 0.089637 0.002934 30.55x -96.73%
1024x1024 float32 3 0.50 multiply scalar 0.091740 0.008749 10.49x -90.46%
1024x1024 float32 3 0.50 multiply sse42 0.091740 0.003538 25.93x -96.14%
1024x1024 float32 3 0.50 multiply avx2 0.091740 0.002969 30.90x -96.76%
1024x1024 float32 3 0.50 hard_light scalar 0.124994 0.028500 4.39x -77.20%
1024x1024 float32 3 0.50 hard_light sse42 0.124994 0.004353 28.71x -96.52%
1024x1024 float32 3 0.50 hard_light avx2 0.124994 0.003023 41.35x -97.58%
1024x1024 float32 3 0.50 difference scalar 0.120875 0.008875 13.62x -92.66%
1024x1024 float32 3 0.50 difference sse42 0.120875 0.003768 32.08x -96.88%
1024x1024 float32 3 0.50 difference avx2 0.120875 0.002989 40.44x -97.53%
1024x1024 float32 3 0.50 subtract scalar 0.089327 0.011248 7.94x -87.41%
1024x1024 float32 3 0.50 subtract sse42 0.089327 0.003877 23.04x -95.66%
1024x1024 float32 3 0.50 subtract avx2 0.089327 0.003067 29.12x -96.57%
1024x1024 float32 3 0.50 grain_extract scalar 0.092664 0.016290 5.69x -82.42%
1024x1024 float32 3 0.50 grain_extract sse42 0.092664 0.003849 24.07x -95.85%
1024x1024 float32 3 0.50 grain_extract avx2 0.092664 0.003027 30.61x -96.73%
1024x1024 float32 3 0.50 grain_merge scalar 0.092773 0.016445 5.64x -82.27%
1024x1024 float32 3 0.50 grain_merge sse42 0.092773 0.004002 23.18x -95.69%
1024x1024 float32 3 0.50 grain_merge avx2 0.092773 0.003070 30.22x -96.69%
1024x1024 float32 3 0.50 divide scalar 0.093690 0.009880 9.48x -89.46%
1024x1024 float32 3 0.50 divide sse42 0.093690 0.004248 22.06x -95.47%
1024x1024 float32 3 0.50 divide avx2 0.093690 0.003007 31.15x -96.79%
1024x1024 float32 3 0.50 overlay scalar 0.118617 0.026093 4.55x -78.00%
1024x1024 float32 3 0.50 overlay sse42 0.118617 0.004260 27.85x -96.41%
1024x1024 float32 3 0.50 overlay avx2 0.118617 0.003074 38.58x -97.41%
1024x1024 float32 4 0.50 normal scalar 0.063601 0.009967 6.38x -84.33%
1024x1024 float32 4 0.50 normal sse42 0.063601 0.002920 21.78x -95.41%
1024x1024 float32 4 0.50 normal avx2 0.063601 0.004480 14.20x -92.96%
1024x1024 float32 4 0.50 soft_light scalar 0.097016 0.011639 8.34x -88.00%
1024x1024 float32 4 0.50 soft_light sse42 0.097016 0.003163 30.67x -96.74%
1024x1024 float32 4 0.50 soft_light avx2 0.097016 0.003113 31.16x -96.79%
1024x1024 float32 4 0.50 lighten_only scalar 0.070163 0.012308 5.70x -82.46%
1024x1024 float32 4 0.50 lighten_only sse42 0.070163 0.002959 23.71x -95.78%
1024x1024 float32 4 0.50 lighten_only avx2 0.070163 0.003136 22.37x -95.53%
1024x1024 float32 4 0.50 screen scalar 0.073247 0.010763 6.81x -85.31%
1024x1024 float32 4 0.50 screen sse42 0.073247 0.002961 24.74x -95.96%
1024x1024 float32 4 0.50 screen avx2 0.073247 0.003084 23.75x -95.79%
1024x1024 float32 4 0.50 dodge scalar 0.073465 0.012168 6.04x -83.44%
1024x1024 float32 4 0.50 dodge sse42 0.073465 0.003503 20.97x -95.23%
1024x1024 float32 4 0.50 dodge avx2 0.073465 0.003065 23.97x -95.83%
1024x1024 float32 4 0.50 addition scalar 0.070711 0.021524 3.29x -69.56%
1024x1024 float32 4 0.50 addition sse42 0.070711 0.003230 21.89x -95.43%
1024x1024 float32 4 0.50 addition avx2 0.070711 0.003196 22.13x -95.48%
1024x1024 float32 4 0.50 darken_only scalar 0.070094 0.012382 5.66x -82.34%
1024x1024 float32 4 0.50 darken_only sse42 0.070094 0.003118 22.48x -95.55%
1024x1024 float32 4 0.50 darken_only avx2 0.070094 0.003098 22.63x -95.58%
1024x1024 float32 4 0.50 multiply scalar 0.071957 0.010373 6.94x -85.59%
1024x1024 float32 4 0.50 multiply sse42 0.071957 0.003409 21.11x -95.26%
1024x1024 float32 4 0.50 multiply avx2 0.071957 0.003166 22.73x -95.60%
1024x1024 float32 4 0.50 hard_light scalar 0.105849 0.029988 3.53x -71.67%
1024x1024 float32 4 0.50 hard_light sse42 0.105849 0.003628 29.17x -96.57%
1024x1024 float32 4 0.50 hard_light avx2 0.105849 0.003139 33.72x -97.03%
1024x1024 float32 4 0.50 difference scalar 0.101612 0.010435 9.74x -89.73%
1024x1024 float32 4 0.50 difference sse42 0.101612 0.002947 34.48x -97.10%
1024x1024 float32 4 0.50 difference avx2 0.101612 0.003239 31.38x -96.81%
1024x1024 float32 4 0.50 subtract scalar 0.070172 0.013966 5.02x -80.10%
1024x1024 float32 4 0.50 subtract sse42 0.070172 0.003253 21.57x -95.36%
1024x1024 float32 4 0.50 subtract avx2 0.070172 0.003146 22.30x -95.52%
1024x1024 float32 4 0.50 grain_extract scalar 0.073062 0.017063 4.28x -76.65%
1024x1024 float32 4 0.50 grain_extract sse42 0.073062 0.003076 23.75x -95.79%
1024x1024 float32 4 0.50 grain_extract avx2 0.073062 0.003149 23.21x -95.69%
1024x1024 float32 4 0.50 grain_merge scalar 0.072788 0.017026 4.28x -76.61%
1024x1024 float32 4 0.50 grain_merge sse42 0.072788 0.003053 23.85x -95.81%
1024x1024 float32 4 0.50 grain_merge avx2 0.072788 0.003087 23.58x -95.76%
1024x1024 float32 4 0.50 divide scalar 0.074411 0.011527 6.46x -84.51%
1024x1024 float32 4 0.50 divide sse42 0.074411 0.003182 23.39x -95.72%
1024x1024 float32 4 0.50 divide avx2 0.074411 0.003106 23.96x -95.83%
1024x1024 float32 4 0.50 overlay scalar 0.099707 0.027888 3.58x -72.03%
1024x1024 float32 4 0.50 overlay sse42 0.099707 0.003340 29.85x -96.65%
1024x1024 float32 4 0.50 overlay avx2 0.099707 0.003180 31.36x -96.81%
2048x2048 uint8 3 0.50 normal scalar 0.374464 0.103645 3.61x -72.32%
2048x2048 uint8 3 0.50 normal sse42 0.374464 0.044552 8.41x -88.10%
2048x2048 uint8 3 0.50 normal avx2 0.374464 0.045008 8.32x -87.98%
2048x2048 uint8 3 0.50 soft_light scalar 0.483461 0.115179 4.20x -76.18%
2048x2048 uint8 3 0.50 soft_light sse42 0.483461 0.055359 8.73x -88.55%
2048x2048 uint8 3 0.50 soft_light avx2 0.483461 0.051581 9.37x -89.33%
2048x2048 uint8 3 0.50 lighten_only scalar 0.369076 0.123138 3.00x -66.64%
2048x2048 uint8 3 0.50 lighten_only sse42 0.369076 0.050094 7.37x -86.43%
2048x2048 uint8 3 0.50 lighten_only avx2 0.369076 0.048164 7.66x -86.95%
2048x2048 uint8 3 0.50 screen scalar 0.387794 0.111794 3.47x -71.17%
2048x2048 uint8 3 0.50 screen sse42 0.387794 0.051754 7.49x -86.65%
2048x2048 uint8 3 0.50 screen avx2 0.387794 0.049450 7.84x -87.25%
2048x2048 uint8 3 0.50 dodge scalar 0.392020 0.115456 3.40x -70.55%
2048x2048 uint8 3 0.50 dodge sse42 0.392020 0.056414 6.95x -85.61%
2048x2048 uint8 3 0.50 dodge avx2 0.392020 0.052008 7.54x -86.73%
2048x2048 uint8 3 0.50 addition scalar 0.378307 0.160187 2.36x -57.66%
2048x2048 uint8 3 0.50 addition sse42 0.378307 0.051064 7.41x -86.50%
2048x2048 uint8 3 0.50 addition avx2 0.378307 0.048815 7.75x -87.10%
2048x2048 uint8 3 0.50 darken_only scalar 0.366131 0.123695 2.96x -66.22%
2048x2048 uint8 3 0.50 darken_only sse42 0.366131 0.050151 7.30x -86.30%
2048x2048 uint8 3 0.50 darken_only avx2 0.366131 0.048195 7.60x -86.84%
2048x2048 uint8 3 0.50 multiply scalar 0.379532 0.112395 3.38x -70.39%
2048x2048 uint8 3 0.50 multiply sse42 0.379532 0.050151 7.57x -86.79%
2048x2048 uint8 3 0.50 multiply avx2 0.379532 0.048644 7.80x -87.18%
2048x2048 uint8 3 0.50 hard_light scalar 0.532761 0.189780 2.81x -64.38%
2048x2048 uint8 3 0.50 hard_light sse42 0.532761 0.056670 9.40x -89.36%
2048x2048 uint8 3 0.50 hard_light avx2 0.532761 0.052141 10.22x -90.21%
2048x2048 uint8 3 0.50 difference scalar 0.486546 0.112062 4.34x -76.97%
2048x2048 uint8 3 0.50 difference sse42 0.486546 0.050259 9.68x -89.67%
2048x2048 uint8 3 0.50 difference avx2 0.486546 0.048119 10.11x -90.11%
2048x2048 uint8 3 0.50 subtract scalar 0.375578 0.103654 3.62x -72.40%
2048x2048 uint8 3 0.50 subtract sse42 0.375578 0.054861 6.85x -85.39%
2048x2048 uint8 3 0.50 subtract avx2 0.375578 0.050844 7.39x -86.46%
2048x2048 uint8 3 0.50 grain_extract scalar 0.386779 0.135699 2.85x -64.92%
2048x2048 uint8 3 0.50 grain_extract sse42 0.386779 0.055061 7.02x -85.76%
2048x2048 uint8 3 0.50 grain_extract avx2 0.386779 0.050881 7.60x -86.84%
2048x2048 uint8 3 0.50 grain_merge scalar 0.384837 0.135041 2.85x -64.91%
2048x2048 uint8 3 0.50 grain_merge sse42 0.384837 0.054198 7.10x -85.92%
2048x2048 uint8 3 0.50 grain_merge avx2 0.384837 0.050985 7.55x -86.75%
2048x2048 uint8 3 0.50 divide scalar 0.393058 0.112230 3.50x -71.45%
2048x2048 uint8 3 0.50 divide sse42 0.393058 0.054405 7.22x -86.16%
2048x2048 uint8 3 0.50 divide avx2 0.393058 0.050570 7.77x -87.13%
2048x2048 uint8 3 0.50 overlay scalar 0.490372 0.183278 2.68x -62.62%
2048x2048 uint8 3 0.50 overlay sse42 0.490372 0.054802 8.95x -88.82%
2048x2048 uint8 3 0.50 overlay avx2 0.490372 0.051032 9.61x -89.59%
2048x2048 uint8 4 0.50 normal scalar 0.270092 0.083275 3.24x -69.17%
2048x2048 uint8 4 0.50 normal sse42 0.270092 0.011070 24.40x -95.90%
2048x2048 uint8 4 0.50 normal avx2 0.270092 0.010054 26.86x -96.28%
2048x2048 uint8 4 0.50 soft_light scalar 0.379618 0.103523 3.67x -72.73%
2048x2048 uint8 4 0.50 soft_light sse42 0.379618 0.014022 27.07x -96.31%
2048x2048 uint8 4 0.50 soft_light avx2 0.379618 0.012421 30.56x -96.73%
2048x2048 uint8 4 0.50 lighten_only scalar 0.264956 0.109360 2.42x -58.73%
2048x2048 uint8 4 0.50 lighten_only sse42 0.264956 0.011998 22.08x -95.47%
2048x2048 uint8 4 0.50 lighten_only avx2 0.264956 0.011967 22.14x -95.48%
2048x2048 uint8 4 0.50 screen scalar 0.284002 0.098384 2.89x -65.36%
2048x2048 uint8 4 0.50 screen sse42 0.284002 0.013233 21.46x -95.34%
2048x2048 uint8 4 0.50 screen avx2 0.284002 0.012438 22.83x -95.62%
2048x2048 uint8 4 0.50 dodge scalar 0.286230 0.102536 2.79x -64.18%
2048x2048 uint8 4 0.50 dodge sse42 0.286230 0.014821 19.31x -94.82%
2048x2048 uint8 4 0.50 dodge avx2 0.286230 0.012204 23.45x -95.74%
2048x2048 uint8 4 0.50 addition scalar 0.273076 0.123245 2.22x -54.87%
2048x2048 uint8 4 0.50 addition sse42 0.273076 0.016178 16.88x -94.08%
2048x2048 uint8 4 0.50 addition avx2 0.273076 0.012549 21.76x -95.40%
2048x2048 uint8 4 0.50 darken_only scalar 0.261902 0.109916 2.38x -58.03%
2048x2048 uint8 4 0.50 darken_only sse42 0.261902 0.012068 21.70x -95.39%
2048x2048 uint8 4 0.50 darken_only avx2 0.261902 0.012016 21.80x -95.41%
2048x2048 uint8 4 0.50 multiply scalar 0.272909 0.099279 2.75x -63.62%
2048x2048 uint8 4 0.50 multiply sse42 0.272909 0.012472 21.88x -95.43%
2048x2048 uint8 4 0.50 multiply avx2 0.272909 0.011845 23.04x -95.66%
2048x2048 uint8 4 0.50 hard_light scalar 0.428528 0.164495 2.61x -61.61%
2048x2048 uint8 4 0.50 hard_light sse42 0.428528 0.015101 28.38x -96.48%
2048x2048 uint8 4 0.50 hard_light avx2 0.428528 0.012414 34.52x -97.10%
2048x2048 uint8 4 0.50 difference scalar 0.384338 0.098096 3.92x -74.48%
2048x2048 uint8 4 0.50 difference sse42 0.384338 0.012252 31.37x -96.81%
2048x2048 uint8 4 0.50 difference avx2 0.384338 0.012060 31.87x -96.86%
2048x2048 uint8 4 0.50 subtract scalar 0.272340 0.094237 2.89x -65.40%
2048x2048 uint8 4 0.50 subtract sse42 0.272340 0.016731 16.28x -93.86%
2048x2048 uint8 4 0.50 subtract avx2 0.272340 0.012587 21.64x -95.38%
2048x2048 uint8 4 0.50 grain_extract scalar 0.283471 0.120733 2.35x -57.41%
2048x2048 uint8 4 0.50 grain_extract sse42 0.283471 0.014587 19.43x -94.85%
2048x2048 uint8 4 0.50 grain_extract avx2 0.283471 0.012207 23.22x -95.69%
2048x2048 uint8 4 0.50 grain_merge scalar 0.281448 0.119447 2.36x -57.56%
2048x2048 uint8 4 0.50 grain_merge sse42 0.281448 0.013379 21.04x -95.25%
2048x2048 uint8 4 0.50 grain_merge avx2 0.281448 0.012191 23.09x -95.67%
2048x2048 uint8 4 0.50 divide scalar 0.289081 0.100308 2.88x -65.30%
2048x2048 uint8 4 0.50 divide sse42 0.289081 0.013670 21.15x -95.27%
2048x2048 uint8 4 0.50 divide avx2 0.289081 0.012018 24.05x -95.84%
2048x2048 uint8 4 0.50 overlay scalar 0.391074 0.158915 2.46x -59.36%
2048x2048 uint8 4 0.50 overlay sse42 0.391074 0.014163 27.61x -96.38%
2048x2048 uint8 4 0.50 overlay avx2 0.391074 0.012246 31.93x -96.87%
2048x2048 float32 3 0.50 normal scalar 0.321316 0.036645 8.77x -88.60%
2048x2048 float32 3 0.50 normal sse42 0.321316 0.019276 16.67x -94.00%
2048x2048 float32 3 0.50 normal avx2 0.321316 0.014435 22.26x -95.51%
2048x2048 float32 3 0.50 soft_light scalar 0.429202 0.044338 9.68x -89.67%
2048x2048 float32 3 0.50 soft_light sse42 0.429202 0.021857 19.64x -94.91%
2048x2048 float32 3 0.50 soft_light avx2 0.429202 0.017080 25.13x -96.02%
2048x2048 float32 3 0.50 lighten_only scalar 0.314519 0.052067 6.04x -83.45%
2048x2048 float32 3 0.50 lighten_only sse42 0.314519 0.018553 16.95x -94.10%
2048x2048 float32 3 0.50 lighten_only avx2 0.314519 0.016470 19.10x -94.76%
2048x2048 float32 3 0.50 screen scalar 0.336441 0.040346 8.34x -88.01%
2048x2048 float32 3 0.50 screen sse42 0.336441 0.020338 16.54x -93.95%
2048x2048 float32 3 0.50 screen avx2 0.336441 0.016899 19.91x -94.98%
2048x2048 float32 3 0.50 dodge scalar 0.336910 0.045444 7.41x -86.51%
2048x2048 float32 3 0.50 dodge sse42 0.336910 0.022459 15.00x -93.33%
2048x2048 float32 3 0.50 dodge avx2 0.336910 0.016955 19.87x -94.97%
2048x2048 float32 3 0.50 addition scalar 0.325902 0.104359 3.12x -67.98%
2048x2048 float32 3 0.50 addition sse42 0.325902 0.020338 16.02x -93.76%
2048x2048 float32 3 0.50 addition avx2 0.325902 0.017048 19.12x -94.77%
2048x2048 float32 3 0.50 darken_only scalar 0.315598 0.051633 6.11x -83.64%
2048x2048 float32 3 0.50 darken_only sse42 0.315598 0.018989 16.62x -93.98%
2048x2048 float32 3 0.50 darken_only avx2 0.315598 0.016517 19.11x -94.77%
2048x2048 float32 3 0.50 multiply scalar 0.326323 0.039715 8.22x -87.83%
2048x2048 float32 3 0.50 multiply sse42 0.326323 0.018845 17.32x -94.23%
2048x2048 float32 3 0.50 multiply avx2 0.326323 0.016419 19.88x -94.97%
2048x2048 float32 3 0.50 hard_light scalar 0.479711 0.118394 4.05x -75.32%
2048x2048 float32 3 0.50 hard_light sse42 0.479711 0.022660 21.17x -95.28%
2048x2048 float32 3 0.50 hard_light avx2 0.479711 0.017210 27.87x -96.41%
2048x2048 float32 3 0.50 difference scalar 0.448123 0.040332 11.11x -91.00%
2048x2048 float32 3 0.50 difference sse42 0.448123 0.020253 22.13x -95.48%
2048x2048 float32 3 0.50 difference avx2 0.448123 0.017314 25.88x -96.14%
2048x2048 float32 3 0.50 subtract scalar 0.330480 0.049264 6.71x -85.09%
2048x2048 float32 3 0.50 subtract sse42 0.330480 0.020582 16.06x -93.77%
2048x2048 float32 3 0.50 subtract avx2 0.330480 0.017168 19.25x -94.81%
2048x2048 float32 3 0.50 grain_extract scalar 0.337144 0.069314 4.86x -79.44%
2048x2048 float32 3 0.50 grain_extract sse42 0.337144 0.020634 16.34x -93.88%
2048x2048 float32 3 0.50 grain_extract avx2 0.337144 0.017011 19.82x -94.95%
2048x2048 float32 3 0.50 grain_merge scalar 0.334497 0.069459 4.82x -79.23%
2048x2048 float32 3 0.50 grain_merge sse42 0.334497 0.020719 16.14x -93.81%
2048x2048 float32 3 0.50 grain_merge avx2 0.334497 0.016864 19.84x -94.96%
2048x2048 float32 3 0.50 divide scalar 0.340278 0.044082 7.72x -87.05%
2048x2048 float32 3 0.50 divide sse42 0.340278 0.021898 15.54x -93.56%
2048x2048 float32 3 0.50 divide avx2 0.340278 0.016871 20.17x -95.04%
2048x2048 float32 3 0.50 overlay scalar 0.444791 0.111071 4.00x -75.03%
2048x2048 float32 3 0.50 overlay sse42 0.444791 0.021504 20.68x -95.17%
2048x2048 float32 3 0.50 overlay avx2 0.444791 0.016844 26.41x -96.21%
2048x2048 float32 4 0.50 normal scalar 0.251054 0.045493 5.52x -81.88%
2048x2048 float32 4 0.50 normal sse42 0.251054 0.015821 15.87x -93.70%
2048x2048 float32 4 0.50 normal avx2 0.251054 0.019886 12.62x -92.08%
2048x2048 float32 4 0.50 soft_light scalar 0.359495 0.052759 6.81x -85.32%
2048x2048 float32 4 0.50 soft_light sse42 0.359495 0.018929 18.99x -94.73%
2048x2048 float32 4 0.50 soft_light avx2 0.359495 0.018609 19.32x -94.82%
2048x2048 float32 4 0.50 lighten_only scalar 0.244256 0.056936 4.29x -76.69%
2048x2048 float32 4 0.50 lighten_only sse42 0.244256 0.017613 13.87x -92.79%
2048x2048 float32 4 0.50 lighten_only avx2 0.244256 0.017948 13.61x -92.65%
2048x2048 float32 4 0.50 screen scalar 0.265482 0.049522 5.36x -81.35%
2048x2048 float32 4 0.50 screen sse42 0.265482 0.017429 15.23x -93.44%
2048x2048 float32 4 0.50 screen avx2 0.265482 0.017973 14.77x -93.23%
2048x2048 float32 4 0.50 dodge scalar 0.266317 0.054908 4.85x -79.38%
2048x2048 float32 4 0.50 dodge sse42 0.266317 0.020365 13.08x -92.35%
2048x2048 float32 4 0.50 dodge avx2 0.266317 0.018294 14.56x -93.13%
2048x2048 float32 4 0.50 addition scalar 0.253254 0.093490 2.71x -63.08%
2048x2048 float32 4 0.50 addition sse42 0.253254 0.018933 13.38x -92.52%
2048x2048 float32 4 0.50 addition avx2 0.253254 0.018850 13.43x -92.56%
2048x2048 float32 4 0.50 darken_only scalar 0.243009 0.056450 4.30x -76.77%
2048x2048 float32 4 0.50 darken_only sse42 0.243009 0.017792 13.66x -92.68%
2048x2048 float32 4 0.50 darken_only avx2 0.243009 0.018091 13.43x -92.56%
2048x2048 float32 4 0.50 multiply scalar 0.252708 0.047275 5.35x -81.29%
2048x2048 float32 4 0.50 multiply sse42 0.252708 0.018295 13.81x -92.76%
2048x2048 float32 4 0.50 multiply avx2 0.252708 0.017982 14.05x -92.88%
2048x2048 float32 4 0.50 hard_light scalar 0.406807 0.127620 3.19x -68.63%
2048x2048 float32 4 0.50 hard_light sse42 0.406807 0.020327 20.01x -95.00%
2048x2048 float32 4 0.50 hard_light avx2 0.406807 0.018844 21.59x -95.37%
2048x2048 float32 4 0.50 difference scalar 0.360045 0.048024 7.50x -86.66%
2048x2048 float32 4 0.50 difference sse42 0.360045 0.018832 19.12x -94.77%
2048x2048 float32 4 0.50 difference avx2 0.360045 0.018371 19.60x -94.90%
2048x2048 float32 4 0.50 subtract scalar 0.253724 0.063271 4.01x -75.06%
2048x2048 float32 4 0.50 subtract sse42 0.253724 0.019295 13.15x -92.40%
2048x2048 float32 4 0.50 subtract avx2 0.253724 0.018646 13.61x -92.65%
2048x2048 float32 4 0.50 grain_extract scalar 0.261850 0.074524 3.51x -71.54%
2048x2048 float32 4 0.50 grain_extract sse42 0.261850 0.018604 14.07x -92.90%
2048x2048 float32 4 0.50 grain_extract avx2 0.261850 0.018078 14.48x -93.10%
2048x2048 float32 4 0.50 grain_merge scalar 0.260619 0.074579 3.49x -71.38%
2048x2048 float32 4 0.50 grain_merge sse42 0.260619 0.018503 14.09x -92.90%
2048x2048 float32 4 0.50 grain_merge avx2 0.260619 0.018099 14.40x -93.06%
2048x2048 float32 4 0.50 divide scalar 0.267110 0.051379 5.20x -80.76%
2048x2048 float32 4 0.50 divide sse42 0.267110 0.018851 14.17x -92.94%
2048x2048 float32 4 0.50 divide avx2 0.267110 0.018110 14.75x -93.22%
2048x2048 float32 4 0.50 overlay scalar 0.371593 0.118348 3.14x -68.15%
2048x2048 float32 4 0.50 overlay sse42 0.371593 0.019552 19.01x -94.74%
2048x2048 float32 4 0.50 overlay avx2 0.371593 0.018667 19.91x -94.98%
1280x720 uint8 3 0.50 normal scalar 0.080071 0.022203 3.61x -72.27%
1280x720 uint8 3 0.50 normal sse42 0.080071 0.009594 8.35x -88.02%
1280x720 uint8 3 0.50 normal avx2 0.080071 0.009693 8.26x -87.89%
1280x720 uint8 3 0.50 soft_light scalar 0.107784 0.024711 4.36x -77.07%
1280x720 uint8 3 0.50 soft_light sse42 0.107784 0.011889 9.07x -88.97%
1280x720 uint8 3 0.50 soft_light avx2 0.107784 0.011171 9.65x -89.64%
1280x720 uint8 3 0.50 lighten_only scalar 0.084644 0.026467 3.20x -68.73%
1280x720 uint8 3 0.50 lighten_only sse42 0.084644 0.010815 7.83x -87.22%
1280x720 uint8 3 0.50 lighten_only avx2 0.084644 0.010427 8.12x -87.68%
1280x720 uint8 3 0.50 screen scalar 0.087560 0.024071 3.64x -72.51%
1280x720 uint8 3 0.50 screen sse42 0.087560 0.011115 7.88x -87.31%
1280x720 uint8 3 0.50 screen avx2 0.087560 0.010817 8.09x -87.65%
1280x720 uint8 3 0.50 dodge scalar 0.087554 0.025050 3.50x -71.39%
1280x720 uint8 3 0.50 dodge sse42 0.087554 0.012218 7.17x -86.05%
1280x720 uint8 3 0.50 dodge avx2 0.087554 0.011217 7.81x -87.19%
1280x720 uint8 3 0.50 addition scalar 0.082146 0.035215 2.33x -57.13%
1280x720 uint8 3 0.50 addition sse42 0.082146 0.011030 7.45x -86.57%
1280x720 uint8 3 0.50 addition avx2 0.082146 0.010510 7.82x -87.21%
1280x720 uint8 3 0.50 darken_only scalar 0.085138 0.026807 3.18x -68.51%
1280x720 uint8 3 0.50 darken_only sse42 0.085138 0.010804 7.88x -87.31%
1280x720 uint8 3 0.50 darken_only avx2 0.085138 0.010434 8.16x -87.74%
1280x720 uint8 3 0.50 multiply scalar 0.085644 0.023979 3.57x -72.00%
1280x720 uint8 3 0.50 multiply sse42 0.085644 0.010806 7.93x -87.38%
1280x720 uint8 3 0.50 multiply avx2 0.085644 0.010675 8.02x -87.54%
1280x720 uint8 3 0.50 hard_light scalar 0.114641 0.041178 2.78x -64.08%
1280x720 uint8 3 0.50 hard_light sse42 0.114641 0.012265 9.35x -89.30%
1280x720 uint8 3 0.50 hard_light avx2 0.114641 0.011194 10.24x -90.24%
1280x720 uint8 3 0.50 difference scalar 0.112069 0.024398 4.59x -78.23%
1280x720 uint8 3 0.50 difference sse42 0.112069 0.010812 10.36x -90.35%
1280x720 uint8 3 0.50 difference avx2 0.112069 0.010580 10.59x -90.56%
1280x720 uint8 3 0.50 subtract scalar 0.081920 0.022187 3.69x -72.92%
1280x720 uint8 3 0.50 subtract sse42 0.081920 0.011849 6.91x -85.54%
1280x720 uint8 3 0.50 subtract avx2 0.081920 0.011008 7.44x -86.56%
1280x720 uint8 3 0.50 grain_extract scalar 0.086326 0.029630 2.91x -65.68%
1280x720 uint8 3 0.50 grain_extract sse42 0.086326 0.011737 7.35x -86.40%
1280x720 uint8 3 0.50 grain_extract avx2 0.086326 0.010989 7.86x -87.27%
1280x720 uint8 3 0.50 grain_merge scalar 0.087316 0.029334 2.98x -66.40%
1280x720 uint8 3 0.50 grain_merge sse42 0.087316 0.011820 7.39x -86.46%
1280x720 uint8 3 0.50 grain_merge avx2 0.087316 0.010981 7.95x -87.42%
1280x720 uint8 3 0.50 divide scalar 0.088661 0.024478 3.62x -72.39%
1280x720 uint8 3 0.50 divide sse42 0.088661 0.011979 7.40x -86.49%
1280x720 uint8 3 0.50 divide avx2 0.088661 0.011273 7.86x -87.28%
1280x720 uint8 3 0.50 overlay scalar 0.108731 0.040331 2.70x -62.91%
1280x720 uint8 3 0.50 overlay sse42 0.108731 0.012291 8.85x -88.70%
1280x720 uint8 3 0.50 overlay avx2 0.108731 0.011166 9.74x -89.73%
1280x720 uint8 4 0.50 normal scalar 0.060620 0.018112 3.35x -70.12%
1280x720 uint8 4 0.50 normal sse42 0.060620 0.002405 25.20x -96.03%
1280x720 uint8 4 0.50 normal avx2 0.060620 0.002196 27.60x -96.38%
1280x720 uint8 4 0.50 soft_light scalar 0.094735 0.022692 4.17x -76.05%
1280x720 uint8 4 0.50 soft_light sse42 0.094735 0.003046 31.10x -96.78%
1280x720 uint8 4 0.50 soft_light avx2 0.094735 0.002724 34.78x -97.12%
1280x720 uint8 4 0.50 lighten_only scalar 0.071775 0.024017 2.99x -66.54%
1280x720 uint8 4 0.50 lighten_only sse42 0.071775 0.002705 26.53x -96.23%
1280x720 uint8 4 0.50 lighten_only avx2 0.071775 0.002658 27.00x -96.30%
1280x720 uint8 4 0.50 screen scalar 0.074851 0.021669 3.45x -71.05%
1280x720 uint8 4 0.50 screen sse42 0.074851 0.002902 25.79x -96.12%
1280x720 uint8 4 0.50 screen avx2 0.074851 0.002718 27.54x -96.37%
1280x720 uint8 4 0.50 dodge scalar 0.074690 0.022718 3.29x -69.58%
1280x720 uint8 4 0.50 dodge sse42 0.074690 0.003259 22.92x -95.64%
1280x720 uint8 4 0.50 dodge avx2 0.074690 0.002682 27.85x -96.41%
1280x720 uint8 4 0.50 addition scalar 0.070497 0.027039 2.61x -61.64%
1280x720 uint8 4 0.50 addition sse42 0.070497 0.003533 19.95x -94.99%
1280x720 uint8 4 0.50 addition avx2 0.070497 0.002804 25.15x -96.02%
1280x720 uint8 4 0.50 darken_only scalar 0.071545 0.024190 2.96x -66.19%
1280x720 uint8 4 0.50 darken_only sse42 0.071545 0.002647 27.03x -96.30%
1280x720 uint8 4 0.50 darken_only avx2 0.071545 0.002648 27.02x -96.30%
1280x720 uint8 4 0.50 multiply scalar 0.072742 0.021803 3.34x -70.03%
1280x720 uint8 4 0.50 multiply sse42 0.072742 0.002736 26.58x -96.24%
1280x720 uint8 4 0.50 multiply avx2 0.072742 0.002619 27.77x -96.40%
1280x720 uint8 4 0.50 hard_light scalar 0.102061 0.036052 2.83x -64.68%
1280x720 uint8 4 0.50 hard_light sse42 0.102061 0.003300 30.93x -96.77%
1280x720 uint8 4 0.50 hard_light avx2 0.102061 0.002735 37.32x -97.32%
1280x720 uint8 4 0.50 difference scalar 0.099504 0.021466 4.64x -78.43%
1280x720 uint8 4 0.50 difference sse42 0.099504 0.002698 36.88x -97.29%
1280x720 uint8 4 0.50 difference avx2 0.099504 0.002642 37.66x -97.34%
1280x720 uint8 4 0.50 subtract scalar 0.070561 0.020939 3.37x -70.32%
1280x720 uint8 4 0.50 subtract sse42 0.070561 0.003773 18.70x -94.65%
1280x720 uint8 4 0.50 subtract avx2 0.070561 0.002766 25.51x -96.08%
1280x720 uint8 4 0.50 grain_extract scalar 0.077527 0.026171 2.96x -66.24%
1280x720 uint8 4 0.50 grain_extract sse42 0.077527 0.002997 25.87x -96.13%
1280x720 uint8 4 0.50 grain_extract avx2 0.077527 0.002672 29.02x -96.55%
1280x720 uint8 4 0.50 grain_merge scalar 0.073560 0.026189 2.81x -64.40%
1280x720 uint8 4 0.50 grain_merge sse42 0.073560 0.002929 25.11x -96.02%
1280x720 uint8 4 0.50 grain_merge avx2 0.073560 0.002697 27.27x -96.33%
1280x720 uint8 4 0.50 divide scalar 0.075317 0.022110 3.41x -70.64%
1280x720 uint8 4 0.50 divide sse42 0.075317 0.003000 25.10x -96.02%
1280x720 uint8 4 0.50 divide avx2 0.075317 0.002643 28.50x -96.49%
1280x720 uint8 4 0.50 overlay scalar 0.096431 0.034995 2.76x -63.71%
1280x720 uint8 4 0.50 overlay sse42 0.096431 0.003136 30.75x -96.75%
1280x720 uint8 4 0.50 overlay avx2 0.096431 0.002704 35.66x -97.20%
1280x720 float32 3 0.50 normal scalar 0.069425 0.006986 9.94x -89.94%
1280x720 float32 3 0.50 normal sse42 0.069425 0.003117 22.28x -95.51%
1280x720 float32 3 0.50 normal avx2 0.069425 0.002109 32.92x -96.96%
1280x720 float32 3 0.50 soft_light scalar 0.099034 0.008611 11.50x -91.30%
1280x720 float32 3 0.50 soft_light sse42 0.099034 0.003609 27.44x -96.36%
1280x720 float32 3 0.50 soft_light avx2 0.099034 0.002588 38.27x -97.39%
1280x720 float32 3 0.50 lighten_only scalar 0.076219 0.010349 7.37x -86.42%
1280x720 float32 3 0.50 lighten_only sse42 0.076219 0.002992 25.48x -96.07%
1280x720 float32 3 0.50 lighten_only avx2 0.076219 0.002483 30.69x -96.74%
1280x720 float32 3 0.50 screen scalar 0.079757 0.007915 10.08x -90.08%
1280x720 float32 3 0.50 screen sse42 0.079757 0.003314 24.07x -95.85%
1280x720 float32 3 0.50 screen avx2 0.079757 0.002587 30.84x -96.76%
1280x720 float32 3 0.50 dodge scalar 0.079434 0.008935 8.89x -88.75%
1280x720 float32 3 0.50 dodge sse42 0.079434 0.003829 20.74x -95.18%
1280x720 float32 3 0.50 dodge avx2 0.079434 0.002695 29.47x -96.61%
1280x720 float32 3 0.50 addition scalar 0.075138 0.021973 3.42x -70.76%
1280x720 float32 3 0.50 addition sse42 0.075138 0.003200 23.48x -95.74%
1280x720 float32 3 0.50 addition avx2 0.075138 0.002603 28.87x -96.54%
1280x720 float32 3 0.50 darken_only scalar 0.075713 0.010376 7.30x -86.30%
1280x720 float32 3 0.50 darken_only sse42 0.075713 0.002974 25.46x -96.07%
1280x720 float32 3 0.50 darken_only avx2 0.075713 0.002512 30.14x -96.68%
1280x720 float32 3 0.50 multiply scalar 0.077318 0.007687 10.06x -90.06%
1280x720 float32 3 0.50 multiply sse42 0.077318 0.003087 25.05x -96.01%
1280x720 float32 3 0.50 multiply avx2 0.077318 0.002474 31.26x -96.80%
1280x720 float32 3 0.50 hard_light scalar 0.106360 0.025090 4.24x -76.41%
1280x720 float32 3 0.50 hard_light sse42 0.106360 0.003859 27.56x -96.37%
1280x720 float32 3 0.50 hard_light avx2 0.106360 0.002638 40.31x -97.52%
1280x720 float32 3 0.50 difference scalar 0.107376 0.007636 14.06x -92.89%
1280x720 float32 3 0.50 difference sse42 0.107376 0.003177 33.80x -97.04%
1280x720 float32 3 0.50 difference avx2 0.107376 0.002629 40.84x -97.55%
1280x720 float32 3 0.50 subtract scalar 0.075215 0.009711 7.75x -87.09%
1280x720 float32 3 0.50 subtract sse42 0.075215 0.003354 22.43x -95.54%
1280x720 float32 3 0.50 subtract avx2 0.075215 0.002634 28.56x -96.50%
1280x720 float32 3 0.50 grain_extract scalar 0.078281 0.014250 5.49x -81.80%
1280x720 float32 3 0.50 grain_extract sse42 0.078281 0.003380 23.16x -95.68%
1280x720 float32 3 0.50 grain_extract avx2 0.078281 0.002587 30.26x -96.70%
1280x720 float32 3 0.50 grain_merge scalar 0.079160 0.014203 5.57x -82.06%
1280x720 float32 3 0.50 grain_merge sse42 0.079160 0.003341 23.69x -95.78%
1280x720 float32 3 0.50 grain_merge avx2 0.079160 0.002648 29.89x -96.65%
1280x720 float32 3 0.50 divide scalar 0.080059 0.008487 9.43x -89.40%
1280x720 float32 3 0.50 divide sse42 0.080059 0.003652 21.92x -95.44%
1280x720 float32 3 0.50 divide avx2 0.080059 0.002572 31.12x -96.79%
1280x720 float32 3 0.50 overlay scalar 0.102808 0.023120 4.45x -77.51%
1280x720 float32 3 0.50 overlay sse42 0.102808 0.003546 28.99x -96.55%
1280x720 float32 3 0.50 overlay avx2 0.102808 0.002545 40.39x -97.52%
1280x720 float32 4 0.50 normal scalar 0.053552 0.008554 6.26x -84.03%
1280x720 float32 4 0.50 normal sse42 0.053552 0.002480 21.60x -95.37%
1280x720 float32 4 0.50 normal avx2 0.053552 0.002320 23.08x -95.67%
1280x720 float32 4 0.50 soft_light scalar 0.086830 0.010181 8.53x -88.27%
1280x720 float32 4 0.50 soft_light sse42 0.086830 0.002731 31.79x -96.85%
1280x720 float32 4 0.50 soft_light avx2 0.086830 0.002604 33.35x -97.00%
1280x720 float32 4 0.50 lighten_only scalar 0.063381 0.010671 5.94x -83.16%
1280x720 float32 4 0.50 lighten_only sse42 0.063381 0.002535 25.01x -96.00%
1280x720 float32 4 0.50 lighten_only avx2 0.063381 0.002641 24.00x -95.83%
1280x720 float32 4 0.50 screen scalar 0.065418 0.009376 6.98x -85.67%
1280x720 float32 4 0.50 screen sse42 0.065418 0.002774 23.58x -95.76%
1280x720 float32 4 0.50 screen avx2 0.065418 0.002590 25.26x -96.04%
1280x720 float32 4 0.50 dodge scalar 0.066256 0.010685 6.20x -83.87%
1280x720 float32 4 0.50 dodge sse42 0.066256 0.003119 21.24x -95.29%
1280x720 float32 4 0.50 dodge avx2 0.066256 0.002652 24.98x -96.00%
1280x720 float32 4 0.50 addition scalar 0.062488 0.018887 3.31x -69.78%
1280x720 float32 4 0.50 addition sse42 0.062488 0.002760 22.64x -95.58%
1280x720 float32 4 0.50 addition avx2 0.062488 0.002752 22.71x -95.60%
1280x720 float32 4 0.50 darken_only scalar 0.063886 0.010755 5.94x -83.16%
1280x720 float32 4 0.50 darken_only sse42 0.063886 0.002540 25.15x -96.02%
1280x720 float32 4 0.50 darken_only avx2 0.063886 0.002654 24.07x -95.85%
1280x720 float32 4 0.50 multiply scalar 0.064628 0.008938 7.23x -86.17%
1280x720 float32 4 0.50 multiply sse42 0.064628 0.002480 26.06x -96.16%
1280x720 float32 4 0.50 multiply avx2 0.064628 0.002697 23.96x -95.83%
1280x720 float32 4 0.50 hard_light scalar 0.094296 0.025976 3.63x -72.45%
1280x720 float32 4 0.50 hard_light sse42 0.094296 0.003240 29.10x -96.56%
1280x720 float32 4 0.50 hard_light avx2 0.094296 0.002662 35.42x -97.18%
1280x720 float32 4 0.50 difference scalar 0.089460 0.009272 9.65x -89.64%
1280x720 float32 4 0.50 difference sse42 0.089460 0.002693 33.22x -96.99%
1280x720 float32 4 0.50 difference avx2 0.089460 0.002657 33.67x -97.03%
1280x720 float32 4 0.50 subtract scalar 0.062571 0.012158 5.15x -80.57%
1280x720 float32 4 0.50 subtract sse42 0.062571 0.002819 22.19x -95.49%
1280x720 float32 4 0.50 subtract avx2 0.062571 0.002690 23.26x -95.70%
1280x720 float32 4 0.50 grain_extract scalar 0.065961 0.014832 4.45x -77.51%
1280x720 float32 4 0.50 grain_extract sse42 0.065961 0.002743 24.05x -95.84%
1280x720 float32 4 0.50 grain_extract avx2 0.065961 0.002657 24.82x -95.97%
1280x720 float32 4 0.50 grain_merge scalar 0.065400 0.014782 4.42x -77.40%
1280x720 float32 4 0.50 grain_merge sse42 0.065400 0.002769 23.62x -95.77%
1280x720 float32 4 0.50 grain_merge avx2 0.065400 0.002657 24.61x -95.94%
1280x720 float32 4 0.50 divide scalar 0.067018 0.010000 6.70x -85.08%
1280x720 float32 4 0.50 divide sse42 0.067018 0.002592 25.85x -96.13%
1280x720 float32 4 0.50 divide avx2 0.067018 0.002624 25.54x -96.09%
1280x720 float32 4 0.50 overlay scalar 0.088559 0.024248 3.65x -72.62%
1280x720 float32 4 0.50 overlay sse42 0.088559 0.002877 30.78x -96.75%
1280x720 float32 4 0.50 overlay avx2 0.088559 0.002663 33.26x -96.99%
1920x1080 uint8 3 0.50 normal scalar 0.178329 0.051089 3.49x -71.35%
1920x1080 uint8 3 0.50 normal sse42 0.178329 0.021518 8.29x -87.93%
1920x1080 uint8 3 0.50 normal avx2 0.178329 0.021954 8.12x -87.69%
1920x1080 uint8 3 0.50 soft_light scalar 0.231888 0.055632 4.17x -76.01%
1920x1080 uint8 3 0.50 soft_light sse42 0.231888 0.026752 8.67x -88.46%
1920x1080 uint8 3 0.50 soft_light avx2 0.231888 0.025119 9.23x -89.17%
1920x1080 uint8 3 0.50 lighten_only scalar 0.183124 0.059979 3.05x -67.25%
1920x1080 uint8 3 0.50 lighten_only sse42 0.183124 0.024337 7.52x -86.71%
1920x1080 uint8 3 0.50 lighten_only avx2 0.183124 0.023727 7.72x -87.04%
1920x1080 uint8 3 0.50 screen scalar 0.188155 0.054202 3.47x -71.19%
1920x1080 uint8 3 0.50 screen sse42 0.188155 0.025134 7.49x -86.64%
1920x1080 uint8 3 0.50 screen avx2 0.188155 0.024074 7.82x -87.21%
1920x1080 uint8 3 0.50 dodge scalar 0.187504 0.055930 3.35x -70.17%
1920x1080 uint8 3 0.50 dodge sse42 0.187504 0.027428 6.84x -85.37%
1920x1080 uint8 3 0.50 dodge avx2 0.187504 0.025177 7.45x -86.57%
1920x1080 uint8 3 0.50 addition scalar 0.183592 0.078733 2.33x -57.11%
1920x1080 uint8 3 0.50 addition sse42 0.183592 0.024795 7.40x -86.49%
1920x1080 uint8 3 0.50 addition avx2 0.183592 0.023970 7.66x -86.94%
1920x1080 uint8 3 0.50 darken_only scalar 0.182550 0.060096 3.04x -67.08%
1920x1080 uint8 3 0.50 darken_only sse42 0.182550 0.024335 7.50x -86.67%
1920x1080 uint8 3 0.50 darken_only avx2 0.182550 0.023663 7.71x -87.04%
1920x1080 uint8 3 0.50 multiply scalar 0.184255 0.054078 3.41x -70.65%
1920x1080 uint8 3 0.50 multiply sse42 0.184255 0.024528 7.51x -86.69%
1920x1080 uint8 3 0.50 multiply avx2 0.184255 0.023789 7.75x -87.09%
1920x1080 uint8 3 0.50 hard_light scalar 0.248438 0.092767 2.68x -62.66%
1920x1080 uint8 3 0.50 hard_light sse42 0.248438 0.027670 8.98x -88.86%
1920x1080 uint8 3 0.50 hard_light avx2 0.248438 0.025266 9.83x -89.83%
1920x1080 uint8 3 0.50 difference scalar 0.239445 0.054576 4.39x -77.21%
1920x1080 uint8 3 0.50 difference sse42 0.239445 0.024341 9.84x -89.83%
1920x1080 uint8 3 0.50 difference avx2 0.239445 0.023582 10.15x -90.15%
1920x1080 uint8 3 0.50 subtract scalar 0.183817 0.049855 3.69x -72.88%
1920x1080 uint8 3 0.50 subtract sse42 0.183817 0.026967 6.82x -85.33%
1920x1080 uint8 3 0.50 subtract avx2 0.183817 0.024730 7.43x -86.55%
1920x1080 uint8 3 0.50 grain_extract scalar 0.186623 0.065803 2.84x -64.74%
1920x1080 uint8 3 0.50 grain_extract sse42 0.186623 0.026441 7.06x -85.83%
1920x1080 uint8 3 0.50 grain_extract avx2 0.186623 0.024719 7.55x -86.75%
1920x1080 uint8 3 0.50 grain_merge scalar 0.186932 0.065918 2.84x -64.74%
1920x1080 uint8 3 0.50 grain_merge sse42 0.186932 0.026523 7.05x -85.81%
1920x1080 uint8 3 0.50 grain_merge avx2 0.186932 0.024638 7.59x -86.82%
1920x1080 uint8 3 0.50 divide scalar 0.189024 0.054808 3.45x -71.00%
1920x1080 uint8 3 0.50 divide sse42 0.189024 0.026883 7.03x -85.78%
1920x1080 uint8 3 0.50 divide avx2 0.189024 0.024999 7.56x -86.77%
1920x1080 uint8 3 0.50 overlay scalar 0.236708 0.090381 2.62x -61.82%
1920x1080 uint8 3 0.50 overlay sse42 0.236708 0.027276 8.68x -88.48%
1920x1080 uint8 3 0.50 overlay avx2 0.236708 0.024988 9.47x -89.44%
1920x1080 uint8 4 0.50 normal scalar 0.128106 0.040784 3.14x -68.16%
1920x1080 uint8 4 0.50 normal sse42 0.128106 0.005405 23.70x -95.78%
1920x1080 uint8 4 0.50 normal avx2 0.128106 0.004929 25.99x -96.15%
1920x1080 uint8 4 0.50 soft_light scalar 0.186097 0.051095 3.64x -72.54%
1920x1080 uint8 4 0.50 soft_light sse42 0.186097 0.006841 27.20x -96.32%
1920x1080 uint8 4 0.50 soft_light avx2 0.186097 0.006095 30.53x -96.72%
1920x1080 uint8 4 0.50 lighten_only scalar 0.135359 0.054362 2.49x -59.84%
1920x1080 uint8 4 0.50 lighten_only sse42 0.135359 0.005917 22.88x -95.63%
1920x1080 uint8 4 0.50 lighten_only avx2 0.135359 0.005913 22.89x -95.63%
1920x1080 uint8 4 0.50 screen scalar 0.143102 0.048798 2.93x -65.90%
1920x1080 uint8 4 0.50 screen sse42 0.143102 0.006545 21.87x -95.43%
1920x1080 uint8 4 0.50 screen avx2 0.143102 0.006082 23.53x -95.75%
1920x1080 uint8 4 0.50 dodge scalar 0.141218 0.050756 2.78x -64.06%
1920x1080 uint8 4 0.50 dodge sse42 0.141218 0.007383 19.13x -94.77%
1920x1080 uint8 4 0.50 dodge avx2 0.141218 0.006040 23.38x -95.72%
1920x1080 uint8 4 0.50 addition scalar 0.137256 0.062144 2.21x -54.72%
1920x1080 uint8 4 0.50 addition sse42 0.137256 0.007937 17.29x -94.22%
1920x1080 uint8 4 0.50 addition avx2 0.137256 0.006272 21.89x -95.43%
1920x1080 uint8 4 0.50 darken_only scalar 0.137162 0.054749 2.51x -60.08%
1920x1080 uint8 4 0.50 darken_only sse42 0.137162 0.005932 23.12x -95.68%
1920x1080 uint8 4 0.50 darken_only avx2 0.137162 0.005977 22.95x -95.64%
1920x1080 uint8 4 0.50 multiply scalar 0.137705 0.049048 2.81x -64.38%
1920x1080 uint8 4 0.50 multiply sse42 0.137705 0.006189 22.25x -95.51%
1920x1080 uint8 4 0.50 multiply avx2 0.137705 0.005921 23.26x -95.70%
1920x1080 uint8 4 0.50 hard_light scalar 0.201249 0.081502 2.47x -59.50%
1920x1080 uint8 4 0.50 hard_light sse42 0.201249 0.007465 26.96x -96.29%
1920x1080 uint8 4 0.50 hard_light avx2 0.201249 0.006134 32.81x -96.95%
1920x1080 uint8 4 0.50 difference scalar 0.193495 0.048440 3.99x -74.97%
1920x1080 uint8 4 0.50 difference sse42 0.193495 0.006059 31.93x -96.87%
1920x1080 uint8 4 0.50 difference avx2 0.193495 0.005944 32.55x -96.93%
1920x1080 uint8 4 0.50 subtract scalar 0.137287 0.046880 2.93x -65.85%
1920x1080 uint8 4 0.50 subtract sse42 0.137287 0.008283 16.58x -93.97%
1920x1080 uint8 4 0.50 subtract avx2 0.137287 0.006258 21.94x -95.44%
1920x1080 uint8 4 0.50 grain_extract scalar 0.140270 0.059258 2.37x -57.75%
1920x1080 uint8 4 0.50 grain_extract sse42 0.140270 0.006624 21.18x -95.28%
1920x1080 uint8 4 0.50 grain_extract avx2 0.140270 0.006020 23.30x -95.71%
1920x1080 uint8 4 0.50 grain_merge scalar 0.140673 0.059376 2.37x -57.79%
1920x1080 uint8 4 0.50 grain_merge sse42 0.140673 0.006577 21.39x -95.32%
1920x1080 uint8 4 0.50 grain_merge avx2 0.140673 0.006018 23.37x -95.72%
1920x1080 uint8 4 0.50 divide scalar 0.142035 0.049674 2.86x -65.03%
1920x1080 uint8 4 0.50 divide sse42 0.142035 0.006778 20.96x -95.23%
1920x1080 uint8 4 0.50 divide avx2 0.142035 0.005916 24.01x -95.83%
1920x1080 uint8 4 0.50 overlay scalar 0.190905 0.078591 2.43x -58.83%
1920x1080 uint8 4 0.50 overlay sse42 0.190905 0.007046 27.09x -96.31%
1920x1080 uint8 4 0.50 overlay avx2 0.190905 0.006110 31.24x -96.80%
1920x1080 float32 3 0.50 normal scalar 0.152390 0.016562 9.20x -89.13%
1920x1080 float32 3 0.50 normal sse42 0.152390 0.007055 21.60x -95.37%
1920x1080 float32 3 0.50 normal avx2 0.152390 0.004865 31.32x -96.81%
1920x1080 float32 3 0.50 soft_light scalar 0.211907 0.020213 10.48x -90.46%
1920x1080 float32 3 0.50 soft_light sse42 0.211907 0.008103 26.15x -96.18%
1920x1080 float32 3 0.50 soft_light avx2 0.211907 0.005889 35.98x -97.22%
1920x1080 float32 3 0.50 lighten_only scalar 0.161018 0.023887 6.74x -85.17%
1920x1080 float32 3 0.50 lighten_only sse42 0.161018 0.006812 23.64x -95.77%
1920x1080 float32 3 0.50 lighten_only avx2 0.161018 0.005651 28.50x -96.49%
1920x1080 float32 3 0.50 screen scalar 0.167185 0.018361 9.11x -89.02%
1920x1080 float32 3 0.50 screen sse42 0.167185 0.007532 22.20x -95.49%
1920x1080 float32 3 0.50 screen avx2 0.167185 0.005948 28.11x -96.44%
1920x1080 float32 3 0.50 dodge scalar 0.170216 0.020697 8.22x -87.84%
1920x1080 float32 3 0.50 dodge sse42 0.170216 0.008523 19.97x -94.99%
1920x1080 float32 3 0.50 dodge avx2 0.170216 0.005903 28.83x -96.53%
1920x1080 float32 3 0.50 addition scalar 0.159729 0.050342 3.17x -68.48%
1920x1080 float32 3 0.50 addition sse42 0.159729 0.007194 22.20x -95.50%
1920x1080 float32 3 0.50 addition avx2 0.159729 0.006030 26.49x -96.22%
1920x1080 float32 3 0.50 darken_only scalar 0.161209 0.023882 6.75x -85.19%
1920x1080 float32 3 0.50 darken_only sse42 0.161209 0.006803 23.70x -95.78%
1920x1080 float32 3 0.50 darken_only avx2 0.161209 0.005755 28.01x -96.43%
1920x1080 float32 3 0.50 multiply scalar 0.162961 0.017920 9.09x -89.00%
1920x1080 float32 3 0.50 multiply sse42 0.162961 0.006941 23.48x -95.74%
1920x1080 float32 3 0.50 multiply avx2 0.162961 0.005721 28.48x -96.49%
1920x1080 float32 3 0.50 hard_light scalar 0.227699 0.057550 3.96x -74.73%
1920x1080 float32 3 0.50 hard_light sse42 0.227699 0.008580 26.54x -96.23%
1920x1080 float32 3 0.50 hard_light avx2 0.227699 0.005955 38.24x -97.38%
1920x1080 float32 3 0.50 difference scalar 0.219765 0.018061 12.17x -91.78%
1920x1080 float32 3 0.50 difference sse42 0.219765 0.007136 30.80x -96.75%
1920x1080 float32 3 0.50 difference avx2 0.219765 0.005799 37.90x -97.36%
1920x1080 float32 3 0.50 subtract scalar 0.160355 0.024139 6.64x -84.95%
1920x1080 float32 3 0.50 subtract sse42 0.160355 0.007465 21.48x -95.34%
1920x1080 float32 3 0.50 subtract avx2 0.160355 0.005891 27.22x -96.33%
1920x1080 float32 3 0.50 grain_extract scalar 0.164935 0.033032 4.99x -79.97%
1920x1080 float32 3 0.50 grain_extract sse42 0.164935 0.007503 21.98x -95.45%
1920x1080 float32 3 0.50 grain_extract avx2 0.164935 0.005854 28.18x -96.45%
1920x1080 float32 3 0.50 grain_merge scalar 0.164264 0.032829 5.00x -80.01%
1920x1080 float32 3 0.50 grain_merge sse42 0.164264 0.007499 21.91x -95.43%
1920x1080 float32 3 0.50 grain_merge avx2 0.164264 0.005818 28.23x -96.46%
1920x1080 float32 3 0.50 divide scalar 0.166134 0.020039 8.29x -87.94%
1920x1080 float32 3 0.50 divide sse42 0.166134 0.008222 20.20x -95.05%
1920x1080 float32 3 0.50 divide avx2 0.166134 0.005873 28.29x -96.47%
1920x1080 float32 3 0.50 overlay scalar 0.219111 0.052779 4.15x -75.91%
1920x1080 float32 3 0.50 overlay sse42 0.219111 0.008042 27.24x -96.33%
1920x1080 float32 3 0.50 overlay avx2 0.219111 0.005851 37.45x -97.33%
1920x1080 float32 4 0.50 normal scalar 0.118436 0.019197 6.17x -83.79%
1920x1080 float32 4 0.50 normal sse42 0.118436 0.005119 23.14x -95.68%
1920x1080 float32 4 0.50 normal avx2 0.118436 0.007531 15.73x -93.64%
1920x1080 float32 4 0.50 soft_light scalar 0.176563 0.022513 7.84x -87.25%
1920x1080 float32 4 0.50 soft_light sse42 0.176563 0.005950 29.67x -96.63%
1920x1080 float32 4 0.50 soft_light avx2 0.176563 0.005934 29.76x -96.64%
1920x1080 float32 4 0.50 lighten_only scalar 0.125354 0.024120 5.20x -80.76%
1920x1080 float32 4 0.50 lighten_only sse42 0.125354 0.005427 23.10x -95.67%
1920x1080 float32 4 0.50 lighten_only avx2 0.125354 0.005834 21.49x -95.35%
1920x1080 float32 4 0.50 screen scalar 0.132034 0.021015 6.28x -84.08%
1920x1080 float32 4 0.50 screen sse42 0.132034 0.005674 23.27x -95.70%
1920x1080 float32 4 0.50 screen avx2 0.132034 0.006032 21.89x -95.43%
1920x1080 float32 4 0.50 dodge scalar 0.132411 0.023661 5.60x -82.13%
1920x1080 float32 4 0.50 dodge sse42 0.132411 0.006839 19.36x -94.83%
1920x1080 float32 4 0.50 dodge avx2 0.132411 0.006003 22.06x -95.47%
1920x1080 float32 4 0.50 addition scalar 0.128232 0.042175 3.04x -67.11%
1920x1080 float32 4 0.50 addition sse42 0.128232 0.006078 21.10x -95.26%
1920x1080 float32 4 0.50 addition avx2 0.128232 0.006165 20.80x -95.19%
1920x1080 float32 4 0.50 darken_only scalar 0.126858 0.024016 5.28x -81.07%
1920x1080 float32 4 0.50 darken_only sse42 0.126858 0.005461 23.23x -95.70%
1920x1080 float32 4 0.50 darken_only avx2 0.126858 0.005987 21.19x -95.28%
1920x1080 float32 4 0.50 multiply scalar 0.127328 0.020449 6.23x -83.94%
1920x1080 float32 4 0.50 multiply sse42 0.127328 0.005670 22.46x -95.55%
1920x1080 float32 4 0.50 multiply avx2 0.127328 0.005976 21.31x -95.31%
1920x1080 float32 4 0.50 hard_light scalar 0.191745 0.058264 3.29x -69.61%
1920x1080 float32 4 0.50 hard_light sse42 0.191745 0.006904 27.77x -96.40%
1920x1080 float32 4 0.50 hard_light avx2 0.191745 0.005993 31.99x -96.87%
1920x1080 float32 4 0.50 difference scalar 0.185137 0.020425 9.06x -88.97%
1920x1080 float32 4 0.50 difference sse42 0.185137 0.005655 32.74x -96.95%
1920x1080 float32 4 0.50 difference avx2 0.185137 0.006018 30.76x -96.75%
1920x1080 float32 4 0.50 subtract scalar 0.127840 0.027229 4.69x -78.70%
1920x1080 float32 4 0.50 subtract sse42 0.127840 0.006281 20.36x -95.09%
1920x1080 float32 4 0.50 subtract avx2 0.127840 0.006074 21.05x -95.25%
1920x1080 float32 4 0.50 grain_extract scalar 0.133941 0.033724 3.97x -74.82%
1920x1080 float32 4 0.50 grain_extract sse42 0.133941 0.005660 23.67x -95.77%
1920x1080 float32 4 0.50 grain_extract avx2 0.133941 0.005889 22.74x -95.60%
1920x1080 float32 4 0.50 grain_merge scalar 0.130575 0.033254 3.93x -74.53%
1920x1080 float32 4 0.50 grain_merge sse42 0.130575 0.005567 23.46x -95.74%
1920x1080 float32 4 0.50 grain_merge avx2 0.130575 0.005942 21.97x -95.45%
1920x1080 float32 4 0.50 divide scalar 0.133021 0.022286 5.97x -83.25%
1920x1080 float32 4 0.50 divide sse42 0.133021 0.005650 23.54x -95.75%
1920x1080 float32 4 0.50 divide avx2 0.133021 0.005934 22.42x -95.54%
1920x1080 float32 4 0.50 overlay scalar 0.182389 0.054219 3.36x -70.27%
1920x1080 float32 4 0.50 overlay sse42 0.182389 0.006018 30.31x -96.70%
1920x1080 float32 4 0.50 overlay avx2 0.182389 0.005850 31.18x -96.79%
2560x1440 uint8 3 0.50 normal scalar 0.313608 0.089129 3.52x -71.58%
2560x1440 uint8 3 0.50 normal sse42 0.313608 0.038308 8.19x -87.78%
2560x1440 uint8 3 0.50 normal avx2 0.313608 0.038868 8.07x -87.61%
2560x1440 uint8 3 0.50 soft_light scalar 0.416662 0.098922 4.21x -76.26%
2560x1440 uint8 3 0.50 soft_light sse42 0.416662 0.047836 8.71x -88.52%
2560x1440 uint8 3 0.50 soft_light avx2 0.416662 0.044841 9.29x -89.24%
2560x1440 uint8 3 0.50 lighten_only scalar 0.316356 0.105798 2.99x -66.56%
2560x1440 uint8 3 0.50 lighten_only sse42 0.316356 0.043586 7.26x -86.22%
2560x1440 uint8 3 0.50 lighten_only avx2 0.316356 0.042074 7.52x -86.70%
2560x1440 uint8 3 0.50 screen scalar 0.330699 0.096528 3.43x -70.81%
2560x1440 uint8 3 0.50 screen sse42 0.330699 0.044501 7.43x -86.54%
2560x1440 uint8 3 0.50 screen avx2 0.330699 0.042667 7.75x -87.10%
2560x1440 uint8 3 0.50 dodge scalar 0.332803 0.099483 3.35x -70.11%
2560x1440 uint8 3 0.50 dodge sse42 0.332803 0.048854 6.81x -85.32%
2560x1440 uint8 3 0.50 dodge avx2 0.332803 0.044766 7.43x -86.55%
2560x1440 uint8 3 0.50 addition scalar 0.323630 0.139858 2.31x -56.78%
2560x1440 uint8 3 0.50 addition sse42 0.323630 0.044392 7.29x -86.28%
2560x1440 uint8 3 0.50 addition avx2 0.323630 0.042118 7.68x -86.99%
2560x1440 uint8 3 0.50 darken_only scalar 0.314227 0.106816 2.94x -66.01%
2560x1440 uint8 3 0.50 darken_only sse42 0.314227 0.043456 7.23x -86.17%
2560x1440 uint8 3 0.50 darken_only avx2 0.314227 0.041678 7.54x -86.74%
2560x1440 uint8 3 0.50 multiply scalar 0.321665 0.095942 3.35x -70.17%
2560x1440 uint8 3 0.50 multiply sse42 0.321665 0.043439 7.40x -86.50%
2560x1440 uint8 3 0.50 multiply avx2 0.321665 0.042692 7.53x -86.73%
2560x1440 uint8 3 0.50 hard_light scalar 0.456902 0.164503 2.78x -64.00%
2560x1440 uint8 3 0.50 hard_light sse42 0.456902 0.049085 9.31x -89.26%
2560x1440 uint8 3 0.50 hard_light avx2 0.456902 0.044839 10.19x -90.19%
2560x1440 uint8 3 0.50 difference scalar 0.415910 0.096615 4.30x -76.77%
2560x1440 uint8 3 0.50 difference sse42 0.415910 0.043322 9.60x -89.58%
2560x1440 uint8 3 0.50 difference avx2 0.415910 0.041992 9.90x -89.90%
2560x1440 uint8 3 0.50 subtract scalar 0.320067 0.088393 3.62x -72.38%
2560x1440 uint8 3 0.50 subtract sse42 0.320067 0.047635 6.72x -85.12%
2560x1440 uint8 3 0.50 subtract avx2 0.320067 0.043891 7.29x -86.29%
2560x1440 uint8 3 0.50 grain_extract scalar 0.330550 0.116991 2.83x -64.61%
2560x1440 uint8 3 0.50 grain_extract sse42 0.330550 0.047054 7.02x -85.76%
2560x1440 uint8 3 0.50 grain_extract avx2 0.330550 0.043705 7.56x -86.78%
2560x1440 uint8 3 0.50 grain_merge scalar 0.328237 0.116566 2.82x -64.49%
2560x1440 uint8 3 0.50 grain_merge sse42 0.328237 0.047157 6.96x -85.63%
2560x1440 uint8 3 0.50 grain_merge avx2 0.328237 0.044248 7.42x -86.52%
2560x1440 uint8 3 0.50 divide scalar 0.333987 0.098319 3.40x -70.56%
2560x1440 uint8 3 0.50 divide sse42 0.333987 0.047626 7.01x -85.74%
2560x1440 uint8 3 0.50 divide avx2 0.333987 0.045206 7.39x -86.46%
2560x1440 uint8 3 0.50 overlay scalar 0.427780 0.160792 2.66x -62.41%
2560x1440 uint8 3 0.50 overlay sse42 0.427780 0.048677 8.79x -88.62%
2560x1440 uint8 3 0.50 overlay avx2 0.427780 0.044500 9.61x -89.60%
2560x1440 uint8 4 0.50 normal scalar 0.235295 0.074157 3.17x -68.48%
2560x1440 uint8 4 0.50 normal sse42 0.235295 0.009799 24.01x -95.84%
2560x1440 uint8 4 0.50 normal avx2 0.235295 0.008777 26.81x -96.27%
2560x1440 uint8 4 0.50 soft_light scalar 0.333531 0.090960 3.67x -72.73%
2560x1440 uint8 4 0.50 soft_light sse42 0.333531 0.012266 27.19x -96.32%
2560x1440 uint8 4 0.50 soft_light avx2 0.333531 0.010974 30.39x -96.71%
2560x1440 uint8 4 0.50 lighten_only scalar 0.232726 0.096565 2.41x -58.51%
2560x1440 uint8 4 0.50 lighten_only sse42 0.232726 0.010550 22.06x -95.47%
2560x1440 uint8 4 0.50 lighten_only avx2 0.232726 0.010559 22.04x -95.46%
2560x1440 uint8 4 0.50 screen scalar 0.247209 0.086611 2.85x -64.96%
2560x1440 uint8 4 0.50 screen sse42 0.247209 0.011694 21.14x -95.27%
2560x1440 uint8 4 0.50 screen avx2 0.247209 0.010911 22.66x -95.59%
2560x1440 uint8 4 0.50 dodge scalar 0.250397 0.090592 2.76x -63.82%
2560x1440 uint8 4 0.50 dodge sse42 0.250397 0.013206 18.96x -94.73%
2560x1440 uint8 4 0.50 dodge avx2 0.250397 0.010763 23.27x -95.70%
2560x1440 uint8 4 0.50 addition scalar 0.239138 0.108756 2.20x -54.52%
2560x1440 uint8 4 0.50 addition sse42 0.239138 0.014134 16.92x -94.09%
2560x1440 uint8 4 0.50 addition avx2 0.239138 0.011240 21.27x -95.30%
2560x1440 uint8 4 0.50 darken_only scalar 0.232203 0.098629 2.35x -57.52%
2560x1440 uint8 4 0.50 darken_only sse42 0.232203 0.010633 21.84x -95.42%
2560x1440 uint8 4 0.50 darken_only avx2 0.232203 0.010616 21.87x -95.43%
2560x1440 uint8 4 0.50 multiply scalar 0.237409 0.087414 2.72x -63.18%
2560x1440 uint8 4 0.50 multiply sse42 0.237409 0.010938 21.70x -95.39%
2560x1440 uint8 4 0.50 multiply avx2 0.237409 0.010468 22.68x -95.59%
2560x1440 uint8 4 0.50 hard_light scalar 0.376283 0.143979 2.61x -61.74%
2560x1440 uint8 4 0.50 hard_light sse42 0.376283 0.013224 28.46x -96.49%
2560x1440 uint8 4 0.50 hard_light avx2 0.376283 0.010923 34.45x -97.10%
2560x1440 uint8 4 0.50 difference scalar 0.336579 0.086170 3.91x -74.40%
2560x1440 uint8 4 0.50 difference sse42 0.336579 0.011267 29.87x -96.65%
2560x1440 uint8 4 0.50 difference avx2 0.336579 0.010675 31.53x -96.83%
2560x1440 uint8 4 0.50 subtract scalar 0.238894 0.083403 2.86x -65.09%
2560x1440 uint8 4 0.50 subtract sse42 0.238894 0.014654 16.30x -93.87%
2560x1440 uint8 4 0.50 subtract avx2 0.238894 0.011126 21.47x -95.34%
2560x1440 uint8 4 0.50 grain_extract scalar 0.246146 0.105096 2.34x -57.30%
2560x1440 uint8 4 0.50 grain_extract sse42 0.246146 0.011754 20.94x -95.22%
2560x1440 uint8 4 0.50 grain_extract avx2 0.246146 0.010791 22.81x -95.62%
2560x1440 uint8 4 0.50 grain_merge scalar 0.245970 0.105346 2.33x -57.17%
2560x1440 uint8 4 0.50 grain_merge sse42 0.245970 0.011782 20.88x -95.21%
2560x1440 uint8 4 0.50 grain_merge avx2 0.245970 0.010655 23.09x -95.67%
2560x1440 uint8 4 0.50 divide scalar 0.252140 0.088711 2.84x -64.82%
2560x1440 uint8 4 0.50 divide sse42 0.252140 0.012109 20.82x -95.20%
2560x1440 uint8 4 0.50 divide avx2 0.252140 0.010599 23.79x -95.80%
2560x1440 uint8 4 0.50 overlay scalar 0.348320 0.140391 2.48x -59.69%
2560x1440 uint8 4 0.50 overlay sse42 0.348320 0.012579 27.69x -96.39%
2560x1440 uint8 4 0.50 overlay avx2 0.348320 0.010836 32.15x -96.89%
2560x1440 float32 3 0.50 normal scalar 0.280736 0.027865 10.07x -90.07%
2560x1440 float32 3 0.50 normal sse42 0.280736 0.012485 22.49x -95.55%
2560x1440 float32 3 0.50 normal avx2 0.280736 0.008320 33.74x -97.04%
2560x1440 float32 3 0.50 soft_light scalar 0.375551 0.034381 10.92x -90.85%
2560x1440 float32 3 0.50 soft_light sse42 0.375551 0.014349 26.17x -96.18%
2560x1440 float32 3 0.50 soft_light avx2 0.375551 0.010380 36.18x -97.24%
2560x1440 float32 3 0.50 lighten_only scalar 0.276306 0.040972 6.74x -85.17%
2560x1440 float32 3 0.50 lighten_only sse42 0.276306 0.011946 23.13x -95.68%
2560x1440 float32 3 0.50 lighten_only avx2 0.276306 0.010100 27.36x -96.34%
2560x1440 float32 3 0.50 screen scalar 0.296159 0.031076 9.53x -89.51%
2560x1440 float32 3 0.50 screen sse42 0.296159 0.013230 22.39x -95.53%
2560x1440 float32 3 0.50 screen avx2 0.296159 0.010403 28.47x -96.49%
2560x1440 float32 3 0.50 dodge scalar 0.294769 0.035928 8.20x -87.81%
2560x1440 float32 3 0.50 dodge sse42 0.294769 0.015222 19.36x -94.84%
2560x1440 float32 3 0.50 dodge avx2 0.294769 0.010749 27.42x -96.35%
2560x1440 float32 3 0.50 addition scalar 0.284695 0.087974 3.24x -69.10%
2560x1440 float32 3 0.50 addition sse42 0.284695 0.012782 22.27x -95.51%
2560x1440 float32 3 0.50 addition avx2 0.284695 0.010569 26.94x -96.29%
2560x1440 float32 3 0.50 darken_only scalar 0.276454 0.041037 6.74x -85.16%
2560x1440 float32 3 0.50 darken_only sse42 0.276454 0.011985 23.07x -95.66%
2560x1440 float32 3 0.50 darken_only avx2 0.276454 0.010124 27.31x -96.34%
2560x1440 float32 3 0.50 multiply scalar 0.284074 0.030556 9.30x -89.24%
2560x1440 float32 3 0.50 multiply sse42 0.284074 0.012331 23.04x -95.66%
2560x1440 float32 3 0.50 multiply avx2 0.284074 0.010088 28.16x -96.45%
2560x1440 float32 3 0.50 hard_light scalar 0.423613 0.100167 4.23x -76.35%
2560x1440 float32 3 0.50 hard_light sse42 0.423613 0.015197 27.88x -96.41%
2560x1440 float32 3 0.50 hard_light avx2 0.423613 0.010559 40.12x -97.51%
2560x1440 float32 3 0.50 difference scalar 0.379177 0.030822 12.30x -91.87%
2560x1440 float32 3 0.50 difference sse42 0.379177 0.012534 30.25x -96.69%
2560x1440 float32 3 0.50 difference avx2 0.379177 0.010298 36.82x -97.28%
2560x1440 float32 3 0.50 subtract scalar 0.284956 0.038763 7.35x -86.40%
2560x1440 float32 3 0.50 subtract sse42 0.284956 0.013250 21.51x -95.35%
2560x1440 float32 3 0.50 subtract avx2 0.284956 0.010532 27.06x -96.30%
2560x1440 float32 3 0.50 grain_extract scalar 0.292747 0.056742 5.16x -80.62%
2560x1440 float32 3 0.50 grain_extract sse42 0.292747 0.013300 22.01x -95.46%
2560x1440 float32 3 0.50 grain_extract avx2 0.292747 0.010460 27.99x -96.43%
2560x1440 float32 3 0.50 grain_merge scalar 0.290353 0.056622 5.13x -80.50%
2560x1440 float32 3 0.50 grain_merge sse42 0.290353 0.013257 21.90x -95.43%
2560x1440 float32 3 0.50 grain_merge avx2 0.290353 0.010385 27.96x -96.42%
2560x1440 float32 3 0.50 divide scalar 0.299406 0.034359 8.71x -88.52%
2560x1440 float32 3 0.50 divide sse42 0.299406 0.014718 20.34x -95.08%
2560x1440 float32 3 0.50 divide avx2 0.299406 0.010469 28.60x -96.50%
2560x1440 float32 3 0.50 overlay scalar 0.392206 0.091805 4.27x -76.59%
2560x1440 float32 3 0.50 overlay sse42 0.392206 0.014359 27.31x -96.34%
2560x1440 float32 3 0.50 overlay avx2 0.392206 0.010407 37.69x -97.35%
2560x1440 float32 4 0.50 normal scalar 0.224276 0.040151 5.59x -82.10%
2560x1440 float32 4 0.50 normal sse42 0.224276 0.014580 15.38x -93.50%
2560x1440 float32 4 0.50 normal avx2 0.224276 0.015730 14.26x -92.99%
2560x1440 float32 4 0.50 soft_light scalar 0.322657 0.048230 6.69x -85.05%
2560x1440 float32 4 0.50 soft_light sse42 0.322657 0.016963 19.02x -94.74%
2560x1440 float32 4 0.50 soft_light avx2 0.322657 0.016261 19.84x -94.96%
2560x1440 float32 4 0.50 lighten_only scalar 0.221026 0.049422 4.47x -77.64%
2560x1440 float32 4 0.50 lighten_only sse42 0.221026 0.015336 14.41x -93.06%
2560x1440 float32 4 0.50 lighten_only avx2 0.221026 0.016106 13.72x -92.71%
2560x1440 float32 4 0.50 screen scalar 0.237798 0.044213 5.38x -81.41%
2560x1440 float32 4 0.50 screen sse42 0.237798 0.015688 15.16x -93.40%
2560x1440 float32 4 0.50 screen avx2 0.237798 0.015928 14.93x -93.30%
2560x1440 float32 4 0.50 dodge scalar 0.268932 0.049702 5.41x -81.52%
2560x1440 float32 4 0.50 dodge sse42 0.268932 0.018350 14.66x -93.18%
2560x1440 float32 4 0.50 dodge avx2 0.268932 0.017033 15.79x -93.67%
2560x1440 float32 4 0.50 addition scalar 0.237844 0.081400 2.92x -65.78%
2560x1440 float32 4 0.50 addition sse42 0.237844 0.019054 12.48x -91.99%
2560x1440 float32 4 0.50 addition avx2 0.237844 0.021809 10.91x -90.83%
2560x1440 float32 4 0.50 darken_only scalar 0.226804 0.051275 4.42x -77.39%
2560x1440 float32 4 0.50 darken_only sse42 0.226804 0.016168 14.03x -92.87%
2560x1440 float32 4 0.50 darken_only avx2 0.226804 0.016084 14.10x -92.91%
2560x1440 float32 4 0.50 multiply scalar 0.229089 0.042009 5.45x -81.66%
2560x1440 float32 4 0.50 multiply sse42 0.229089 0.015354 14.92x -93.30%
2560x1440 float32 4 0.50 multiply avx2 0.229089 0.015995 14.32x -93.02%
2560x1440 float32 4 0.50 hard_light scalar 0.363251 0.110490 3.29x -69.58%
2560x1440 float32 4 0.50 hard_light sse42 0.363251 0.018170 19.99x -95.00%
2560x1440 float32 4 0.50 hard_light avx2 0.363251 0.015943 22.78x -95.61%
2560x1440 float32 4 0.50 difference scalar 0.325529 0.042336 7.69x -86.99%
2560x1440 float32 4 0.50 difference sse42 0.325529 0.015541 20.95x -95.23%
2560x1440 float32 4 0.50 difference avx2 0.325529 0.016062 20.27x -95.07%
2560x1440 float32 4 0.50 subtract scalar 0.224633 0.054275 4.14x -75.84%
2560x1440 float32 4 0.50 subtract sse42 0.224633 0.016516 13.60x -92.65%
2560x1440 float32 4 0.50 subtract avx2 0.224633 0.016502 13.61x -92.65%
2560x1440 float32 4 0.50 grain_extract scalar 0.236799 0.065266 3.63x -72.44%
2560x1440 float32 4 0.50 grain_extract sse42 0.236799 0.016232 14.59x -93.15%
2560x1440 float32 4 0.50 grain_extract avx2 0.236799 0.016093 14.71x -93.20%
2560x1440 float32 4 0.50 grain_merge scalar 0.235364 0.065202 3.61x -72.30%
2560x1440 float32 4 0.50 grain_merge sse42 0.235364 0.015711 14.98x -93.32%
2560x1440 float32 4 0.50 grain_merge avx2 0.235364 0.015914 14.79x -93.24%
2560x1440 float32 4 0.50 divide scalar 0.240367 0.045754 5.25x -80.96%
2560x1440 float32 4 0.50 divide sse42 0.240367 0.015866 15.15x -93.40%
2560x1440 float32 4 0.50 divide avx2 0.240367 0.016496 14.57x -93.14%
2560x1440 float32 4 0.50 overlay scalar 0.335596 0.102591 3.27x -69.43%
2560x1440 float32 4 0.50 overlay sse42 0.335596 0.016241 20.66x -95.16%
2560x1440 float32 4 0.50 overlay avx2 0.335596 0.016202 20.71x -95.17%
3840x2160 uint8 3 0.50 normal scalar 0.709083 0.203875 3.48x -71.25%
3840x2160 uint8 3 0.50 normal sse42 0.709083 0.086418 8.21x -87.81%
3840x2160 uint8 3 0.50 normal avx2 0.709083 0.087814 8.07x -87.62%
3840x2160 uint8 3 0.50 soft_light scalar 0.941015 0.225982 4.16x -75.99%
3840x2160 uint8 3 0.50 soft_light sse42 0.941015 0.107721 8.74x -88.55%
3840x2160 uint8 3 0.50 soft_light avx2 0.941015 0.100447 9.37x -89.33%
3840x2160 uint8 3 0.50 lighten_only scalar 0.697293 0.241476 2.89x -65.37%
3840x2160 uint8 3 0.50 lighten_only sse42 0.697293 0.097573 7.15x -86.01%
3840x2160 uint8 3 0.50 lighten_only avx2 0.697293 0.094437 7.38x -86.46%
3840x2160 uint8 3 0.50 screen scalar 0.732962 0.219430 3.34x -70.06%
3840x2160 uint8 3 0.50 screen sse42 0.732962 0.100952 7.26x -86.23%
3840x2160 uint8 3 0.50 screen avx2 0.732962 0.096547 7.59x -86.83%
3840x2160 uint8 3 0.50 dodge scalar 0.734328 0.226346 3.24x -69.18%
3840x2160 uint8 3 0.50 dodge sse42 0.734328 0.110054 6.67x -85.01%
3840x2160 uint8 3 0.50 dodge avx2 0.734328 0.101827 7.21x -86.13%
3840x2160 uint8 3 0.50 addition scalar 0.713763 0.316895 2.25x -55.60%
3840x2160 uint8 3 0.50 addition sse42 0.713763 0.099800 7.15x -86.02%
3840x2160 uint8 3 0.50 addition avx2 0.713763 0.095190 7.50x -86.66%
3840x2160 uint8 3 0.50 darken_only scalar 0.698627 0.243302 2.87x -65.17%
3840x2160 uint8 3 0.50 darken_only sse42 0.698627 0.097579 7.16x -86.03%
3840x2160 uint8 3 0.50 darken_only avx2 0.698627 0.094407 7.40x -86.49%
3840x2160 uint8 3 0.50 multiply scalar 0.714200 0.219599 3.25x -69.25%
3840x2160 uint8 3 0.50 multiply sse42 0.714200 0.098472 7.25x -86.21%
3840x2160 uint8 3 0.50 multiply avx2 0.714200 0.095518 7.48x -86.63%
3840x2160 uint8 3 0.50 hard_light scalar 1.010458 0.372801 2.71x -63.11%
3840x2160 uint8 3 0.50 hard_light sse42 1.010458 0.110697 9.13x -89.04%
3840x2160 uint8 3 0.50 hard_light avx2 1.010458 0.101248 9.98x -89.98%
3840x2160 uint8 3 0.50 difference scalar 0.943463 0.219043 4.31x -76.78%
3840x2160 uint8 3 0.50 difference sse42 0.943463 0.097685 9.66x -89.65%
3840x2160 uint8 3 0.50 difference avx2 0.943463 0.094348 10.00x -90.00%
3840x2160 uint8 3 0.50 subtract scalar 0.713700 0.201307 3.55x -71.79%
3840x2160 uint8 3 0.50 subtract sse42 0.713700 0.106519 6.70x -85.08%
3840x2160 uint8 3 0.50 subtract avx2 0.713700 0.098488 7.25x -86.20%
3840x2160 uint8 3 0.50 grain_extract scalar 0.731294 0.264525 2.76x -63.83%
3840x2160 uint8 3 0.50 grain_extract sse42 0.731294 0.105547 6.93x -85.57%
3840x2160 uint8 3 0.50 grain_extract avx2 0.731294 0.098679 7.41x -86.51%
3840x2160 uint8 3 0.50 grain_merge scalar 0.731032 0.265399 2.75x -63.70%
3840x2160 uint8 3 0.50 grain_merge sse42 0.731032 0.106474 6.87x -85.44%
3840x2160 uint8 3 0.50 grain_merge avx2 0.731032 0.100975 7.24x -86.19%
3840x2160 uint8 3 0.50 divide scalar 0.745601 0.222013 3.36x -70.22%
3840x2160 uint8 3 0.50 divide sse42 0.745601 0.111496 6.69x -85.05%
3840x2160 uint8 3 0.50 divide avx2 0.745601 0.100567 7.41x -86.51%
3840x2160 uint8 3 0.50 overlay scalar 0.947464 0.365498 2.59x -61.42%
3840x2160 uint8 3 0.50 overlay sse42 0.947464 0.108108 8.76x -88.59%
3840x2160 uint8 3 0.50 overlay avx2 0.947464 0.100328 9.44x -89.41%
3840x2160 uint8 4 0.50 normal scalar 0.519559 0.164467 3.16x -68.34%
3840x2160 uint8 4 0.50 normal sse42 0.519559 0.021531 24.13x -95.86%
3840x2160 uint8 4 0.50 normal avx2 0.519559 0.019689 26.39x -96.21%
3840x2160 uint8 4 0.50 soft_light scalar 0.729378 0.206068 3.54x -71.75%
3840x2160 uint8 4 0.50 soft_light sse42 0.729378 0.027568 26.46x -96.22%
3840x2160 uint8 4 0.50 soft_light avx2 0.729378 0.024460 29.82x -96.65%
3840x2160 uint8 4 0.50 lighten_only scalar 0.513208 0.217009 2.36x -57.72%
3840x2160 uint8 4 0.50 lighten_only sse42 0.513208 0.023744 21.61x -95.37%
3840x2160 uint8 4 0.50 lighten_only avx2 0.513208 0.023745 21.61x -95.37%
3840x2160 uint8 4 0.50 screen scalar 0.550269 0.196555 2.80x -64.28%
3840x2160 uint8 4 0.50 screen sse42 0.550269 0.025998 21.17x -95.28%
3840x2160 uint8 4 0.50 screen avx2 0.550269 0.024480 22.48x -95.55%
3840x2160 uint8 4 0.50 dodge scalar 0.550685 0.203741 2.70x -63.00%
3840x2160 uint8 4 0.50 dodge sse42 0.550685 0.029214 18.85x -94.69%
3840x2160 uint8 4 0.50 dodge avx2 0.550685 0.024014 22.93x -95.64%
3840x2160 uint8 4 0.50 addition scalar 0.526750 0.245244 2.15x -53.44%
3840x2160 uint8 4 0.50 addition sse42 0.526750 0.031871 16.53x -93.95%
3840x2160 uint8 4 0.50 addition avx2 0.526750 0.024888 21.16x -95.28%
3840x2160 uint8 4 0.50 darken_only scalar 0.516854 0.219431 2.36x -57.54%
3840x2160 uint8 4 0.50 darken_only sse42 0.516854 0.023679 21.83x -95.42%
3840x2160 uint8 4 0.50 darken_only avx2 0.516854 0.023747 21.77x -95.41%
3840x2160 uint8 4 0.50 multiply scalar 0.527983 0.197600 2.67x -62.57%
3840x2160 uint8 4 0.50 multiply sse42 0.527983 0.024473 21.57x -95.36%
3840x2160 uint8 4 0.50 multiply avx2 0.527983 0.023500 22.47x -95.55%
3840x2160 uint8 4 0.50 hard_light scalar 0.824746 0.325212 2.54x -60.57%
3840x2160 uint8 4 0.50 hard_light sse42 0.824746 0.029619 27.84x -96.41%
3840x2160 uint8 4 0.50 hard_light avx2 0.824746 0.024442 33.74x -97.04%
3840x2160 uint8 4 0.50 difference scalar 0.743171 0.196072 3.79x -73.62%
3840x2160 uint8 4 0.50 difference sse42 0.743171 0.024231 30.67x -96.74%
3840x2160 uint8 4 0.50 difference avx2 0.743171 0.023756 31.28x -96.80%
3840x2160 uint8 4 0.50 subtract scalar 0.525804 0.188805 2.78x -64.09%
3840x2160 uint8 4 0.50 subtract sse42 0.525804 0.033139 15.87x -93.70%
3840x2160 uint8 4 0.50 subtract avx2 0.525804 0.024922 21.10x -95.26%
3840x2160 uint8 4 0.50 grain_extract scalar 0.543327 0.237283 2.29x -56.33%
3840x2160 uint8 4 0.50 grain_extract sse42 0.543327 0.026458 20.54x -95.13%
3840x2160 uint8 4 0.50 grain_extract avx2 0.543327 0.024059 22.58x -95.57%
3840x2160 uint8 4 0.50 grain_merge scalar 0.543895 0.237303 2.29x -56.37%
3840x2160 uint8 4 0.50 grain_merge sse42 0.543895 0.026359 20.63x -95.15%
3840x2160 uint8 4 0.50 grain_merge avx2 0.543895 0.024036 22.63x -95.58%
3840x2160 uint8 4 0.50 divide scalar 0.555767 0.199311 2.79x -64.14%
3840x2160 uint8 4 0.50 divide sse42 0.555767 0.027009 20.58x -95.14%
3840x2160 uint8 4 0.50 divide avx2 0.555767 0.023702 23.45x -95.74%
3840x2160 uint8 4 0.50 overlay scalar 0.761599 0.316015 2.41x -58.51%
3840x2160 uint8 4 0.50 overlay sse42 0.761599 0.028343 26.87x -96.28%
3840x2160 uint8 4 0.50 overlay avx2 0.761599 0.024316 31.32x -96.81%
3840x2160 float32 3 0.50 normal scalar 0.612312 0.070997 8.62x -88.41%
3840x2160 float32 3 0.50 normal sse42 0.612312 0.036957 16.57x -93.96%
3840x2160 float32 3 0.50 normal avx2 0.612312 0.027122 22.58x -95.57%
3840x2160 float32 3 0.50 soft_light scalar 0.817177 0.086589 9.44x -89.40%
3840x2160 float32 3 0.50 soft_light sse42 0.817177 0.041461 19.71x -94.93%
3840x2160 float32 3 0.50 soft_light avx2 0.817177 0.032176 25.40x -96.06%
3840x2160 float32 3 0.50 lighten_only scalar 0.599570 0.101667 5.90x -83.04%
3840x2160 float32 3 0.50 lighten_only sse42 0.599570 0.036166 16.58x -93.97%
3840x2160 float32 3 0.50 lighten_only avx2 0.599570 0.031126 19.26x -94.81%
3840x2160 float32 3 0.50 screen scalar 0.636353 0.079052 8.05x -87.58%
3840x2160 float32 3 0.50 screen sse42 0.636353 0.038370 16.58x -93.97%
3840x2160 float32 3 0.50 screen avx2 0.636353 0.032137 19.80x -94.95%
3840x2160 float32 3 0.50 dodge scalar 0.639829 0.088934 7.19x -86.10%
3840x2160 float32 3 0.50 dodge sse42 0.639829 0.042996 14.88x -93.28%
3840x2160 float32 3 0.50 dodge avx2 0.639829 0.032196 19.87x -94.97%
3840x2160 float32 3 0.50 addition scalar 0.618517 0.205120 3.02x -66.84%
3840x2160 float32 3 0.50 addition sse42 0.618517 0.037745 16.39x -93.90%
3840x2160 float32 3 0.50 addition avx2 0.618517 0.032546 19.00x -94.74%
3840x2160 float32 3 0.50 darken_only scalar 0.599404 0.100409 5.97x -83.25%
3840x2160 float32 3 0.50 darken_only sse42 0.599404 0.035412 16.93x -94.09%
3840x2160 float32 3 0.50 darken_only avx2 0.599404 0.030989 19.34x -94.83%
3840x2160 float32 3 0.50 multiply scalar 0.615415 0.077085 7.98x -87.47%
3840x2160 float32 3 0.50 multiply sse42 0.615415 0.036077 17.06x -94.14%
3840x2160 float32 3 0.50 multiply avx2 0.615415 0.030643 20.08x -95.02%
3840x2160 float32 3 0.50 hard_light scalar 0.916496 0.232678 3.94x -74.61%
3840x2160 float32 3 0.50 hard_light sse42 0.916496 0.042771 21.43x -95.33%
3840x2160 float32 3 0.50 hard_light avx2 0.916496 0.032003 28.64x -96.51%
3840x2160 float32 3 0.50 difference scalar 0.829396 0.076793 10.80x -90.74%
3840x2160 float32 3 0.50 difference sse42 0.829396 0.036806 22.53x -95.56%
3840x2160 float32 3 0.50 difference avx2 0.829396 0.031384 26.43x -96.22%
3840x2160 float32 3 0.50 subtract scalar 0.616042 0.094606 6.51x -84.64%
3840x2160 float32 3 0.50 subtract sse42 0.616042 0.038332 16.07x -93.78%
3840x2160 float32 3 0.50 subtract avx2 0.616042 0.031889 19.32x -94.82%
3840x2160 float32 3 0.50 grain_extract scalar 0.629526 0.134487 4.68x -78.64%
3840x2160 float32 3 0.50 grain_extract sse42 0.629526 0.038556 16.33x -93.88%
3840x2160 float32 3 0.50 grain_extract avx2 0.629526 0.031934 19.71x -94.93%
3840x2160 float32 3 0.50 grain_merge scalar 0.633060 0.135176 4.68x -78.65%
3840x2160 float32 3 0.50 grain_merge sse42 0.633060 0.038696 16.36x -93.89%
3840x2160 float32 3 0.50 grain_merge avx2 0.633060 0.031747 19.94x -94.99%
3840x2160 float32 3 0.50 divide scalar 0.644497 0.085483 7.54x -86.74%
3840x2160 float32 3 0.50 divide sse42 0.644497 0.041092 15.68x -93.62%
3840x2160 float32 3 0.50 divide avx2 0.644497 0.031351 20.56x -95.14%
3840x2160 float32 3 0.50 overlay scalar 0.840030 0.212885 3.95x -74.66%
3840x2160 float32 3 0.50 overlay sse42 0.840030 0.040582 20.70x -95.17%
3840x2160 float32 3 0.50 overlay avx2 0.840030 0.031497 26.67x -96.25%
3840x2160 float32 4 0.50 normal scalar 0.479357 0.087061 5.51x -81.84%
3840x2160 float32 4 0.50 normal sse42 0.479357 0.030277 15.83x -93.68%
3840x2160 float32 4 0.50 normal avx2 0.479357 0.040416 11.86x -91.57%
3840x2160 float32 4 0.50 soft_light scalar 0.684756 0.100070 6.84x -85.39%
3840x2160 float32 4 0.50 soft_light sse42 0.684756 0.033373 20.52x -95.13%
3840x2160 float32 4 0.50 soft_light avx2 0.684756 0.034363 19.93x -94.98%
3840x2160 float32 4 0.50 lighten_only scalar 0.465821 0.106568 4.37x -77.12%
3840x2160 float32 4 0.50 lighten_only sse42 0.465821 0.032561 14.31x -93.01%
3840x2160 float32 4 0.50 lighten_only avx2 0.465821 0.033476 13.92x -92.81%
3840x2160 float32 4 0.50 screen scalar 0.500780 0.094903 5.28x -81.05%
3840x2160 float32 4 0.50 screen sse42 0.500780 0.033208 15.08x -93.37%
3840x2160 float32 4 0.50 screen avx2 0.500780 0.033644 14.88x -93.28%
3840x2160 float32 4 0.50 dodge scalar 0.504651 0.104703 4.82x -79.25%
3840x2160 float32 4 0.50 dodge sse42 0.504651 0.037417 13.49x -92.59%
3840x2160 float32 4 0.50 dodge avx2 0.504651 0.033843 14.91x -93.29%
3840x2160 float32 4 0.50 addition scalar 0.487424 0.178975 2.72x -63.28%
3840x2160 float32 4 0.50 addition sse42 0.487424 0.034920 13.96x -92.84%
3840x2160 float32 4 0.50 addition avx2 0.487424 0.035007 13.92x -92.82%
3840x2160 float32 4 0.50 darken_only scalar 0.468246 0.107703 4.35x -77.00%
3840x2160 float32 4 0.50 darken_only sse42 0.468246 0.033651 13.91x -92.81%
3840x2160 float32 4 0.50 darken_only avx2 0.468246 0.033861 13.83x -92.77%
3840x2160 float32 4 0.50 multiply scalar 0.483024 0.091757 5.26x -81.00%
3840x2160 float32 4 0.50 multiply sse42 0.483024 0.031892 15.15x -93.40%
3840x2160 float32 4 0.50 multiply avx2 0.483024 0.033760 14.31x -93.01%
3840x2160 float32 4 0.50 hard_light scalar 0.786308 0.242696 3.24x -69.13%
3840x2160 float32 4 0.50 hard_light sse42 0.786308 0.038761 20.29x -95.07%
3840x2160 float32 4 0.50 hard_light avx2 0.786308 0.033834 23.24x -95.70%
3840x2160 float32 4 0.50 difference scalar 0.696396 0.092195 7.55x -86.76%
3840x2160 float32 4 0.50 difference sse42 0.696396 0.033089 21.05x -95.25%
3840x2160 float32 4 0.50 difference avx2 0.696396 0.034164 20.38x -95.09%
3840x2160 float32 4 0.50 subtract scalar 0.481969 0.118633 4.06x -75.39%
3840x2160 float32 4 0.50 subtract sse42 0.481969 0.035183 13.70x -92.70%
3840x2160 float32 4 0.50 subtract avx2 0.481969 0.034637 13.91x -92.81%
3840x2160 float32 4 0.50 grain_extract scalar 0.498023 0.142622 3.49x -71.36%
3840x2160 float32 4 0.50 grain_extract sse42 0.498023 0.032448 15.35x -93.48%
3840x2160 float32 4 0.50 grain_extract avx2 0.498023 0.033433 14.90x -93.29%
3840x2160 float32 4 0.50 grain_merge scalar 0.497692 0.142563 3.49x -71.36%
3840x2160 float32 4 0.50 grain_merge sse42 0.497692 0.032725 15.21x -93.42%
3840x2160 float32 4 0.50 grain_merge avx2 0.497692 0.033476 14.87x -93.27%
3840x2160 float32 4 0.50 divide scalar 0.511828 0.100229 5.11x -80.42%
3840x2160 float32 4 0.50 divide sse42 0.511828 0.033761 15.16x -93.40%
3840x2160 float32 4 0.50 divide avx2 0.511828 0.033745 15.17x -93.41%
3840x2160 float32 4 0.50 overlay scalar 0.713519 0.226674 3.15x -68.23%
3840x2160 float32 4 0.50 overlay sse42 0.713519 0.034133 20.90x -95.22%
3840x2160 float32 4 0.50 overlay avx2 0.713519 0.034005 20.98x -95.23%

About

Image blending modes, written for Python in C.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages