Skip to content

Automatic sorting type is slow #21

@51-code

Description

@51-code

Describe the bug

Using the automatic sorting type in sort command results in a significant increase of query time. The culprit seems to be the numericalStringCheck() function. The function should be implemented differently, performance in mind.

Expected behavior

The automatic sorting shouldn't increase the query time too much.

How to reproduce

Run sort first with default sorting:

%dpl
index=crud earliest=-3y | spath | sort elapsed

The query took 4 min 22 sec for me.

Then run sort with the auto sorting:

%dpl
index=crud earliest=-3y | spath | sort auto(elapsed)

The query took 7 min 39 sec for me, almost doubling the query time.

sort can also take multiple columns to sort with. Two columns with auto sorting would again increase the query time close to 11 minutes.

Screenshots

Software version

DPF-02 version 3.0.0
PTH-10 version 5.3.0-7-ge44d00e9

Desktop (please complete the following information if relevant):

  • OS:
  • Browser:
  • Version:

Additional context

The auto sorting is a very useful tool for many cases because in PTH-10 some commands change the datatype of columns to String, as they use Spark's User Defined Functions that can only return a single datatype. The downside for that is that it brakes any sorting for numerical values, which in turn the auto sorting deals with.

For example in PTH-10 issue #256 default sorting for chart and stats are being made, but they suffer from the same performance issues, if the auto sorting is to be used to fix the problem of using e.g. spath before the commands. (spath uses UDF's and changes everything in the dataset to String)

Matching numbers with regex in numericalStringCheck() already tried, but it didn't improve performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfeatureExisting feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions