Skip to content

New multi-tenant data management #64986

@xiangguangyxg

Description

@xiangguangyxg

Feature request

StarRocks provides strong capabilities for large-scale analytical workloads, its current design introduces usability challenges, lacks adaptive mechanisms for handling data skew, and may deliver suboptimal performance for small-tenant in multi-tenant scenarios.

A new multi-tenant data management is needed for StarRocks, aiming to:

Fewer concepts and simpler usage

  • In most cases, table creation should require only column definitions and the specification of ORDER BY clauses.
  • Users don't need to define DISTRIBUTED BY clauses.

Ability to handle data skew

  • Tablets support automatic splitting and merging, enabling the system to rebalance data dynamically and mitigate skew.

Balanced data locality and distribution

  • Data from a small tenant can be located on a single compute node for optimal locality.
  • Data from a large tenant can be distributed across multiple compute nodes to maintain scalability and performance.

No dependency on time-based partitioning

  • System can no longer rely on time-based partitioning for data management.
  • A single partition will be capable of storing large data volumes, and the amount of data in different partitions can vary greatly, allowing users to decide partitioning strategies based on their specific needs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions