Skip to content

Improve performance of first/last aggregates #3022

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Comet is slower than Spark for first and last aggregates. Also, the behavior is not consistent with Spark (see #1646 and #1630). We should probably implement a custom version in Comet to match Spark behavior and try and optimize it.

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
first:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                48             56          13         21.8          45.8       1.0X
Comet (Scan)                                         58             60           2         18.1          55.2       0.8X
Comet (Scan + Exec)                                  61             65           3         17.1          58.6       0.8X

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
first_ignore_nulls:                       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                47             50           3         22.4          44.7       1.0X
Comet (Scan)                                         50             52           1         21.0          47.6       0.9X
Comet (Scan + Exec)                                  60             64           2         17.5          57.2       0.8X

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
last:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                45             48           3         23.1          43.2       1.0X
Comet (Scan)                                         47             54           9         22.4          44.7       1.0X
Comet (Scan + Exec)                                  76             79           4         13.7          72.7       0.6X

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
last_ignore_nulls:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                46             48           1         22.8          43.9       1.0X
Comet (Scan)                                         58             64           5         18.1          55.1       0.8X
Comet (Scan + Exec)                                  76             84          10         13.7          72.9       0.6X

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions