Reindexing rule providers with cascading interval based reindexing by capistrant · Pull Request #18939 · apache/druid

capistrant · 2026-01-21T22:33:13Z

follow up to #18844 ... at least in terms of the quest to begin the transition from the term compaction to reindexing. More info can be found in that PR description about the naming change and new centralized indexing state storage that the supervisor uses to determine if segments need to be reindexed (replacement for lastCompactionState stored per segment). In this PR I will use the term reindexing whenever possible. When the term compaction is used it will only be to refer to an actual Java class that is yet to be refactored.

Description

Extend reindexing supervisors (AKA compaction supervisors) to allow for a single Druid data source to apply different reindexing configurations to different segments depending on what "reindexing rules" are defined for the data source that apply to the time interval of the segment being reindexed.

Value Proposition

Timeseries data often provides value in different ways over time. Data for the last 7 days is often interacted with differently than data for the last 30 days and once again for data from some number of years ago. In Druid, we should give data owners the ability to use reindexing supervisors (AKA compaction supervisors) to change the state of a single datasource as the data ages. Operators should have the ability to define a data lifecycle of sorts for their datasources and allow reindexing supervisors to dynamically apply that definition to the underlying segments as they age. A great, and simple, example is query granularity.

realtime use cases often call for fine grained query granularity as data is flowing into druid. I may want to analyze metrics by minute or even second. As that data ages to be days or months old, I may still want to be able to check trends for that data over time, but it is often highly unlikely that I care about second to second analysis for data that is past a certain age. And beyond that, there may be tiers of value. For instance, where I want second granularity for the last 24 hours, minute granularity for everything in the past week, and hour granularity for everything older than that. The new functionality in this PR intends to make achieving this as simple as defining a few rules for a datasource using reindexing supervisors, and allowing Druid to reindex the data as segments age out of one time interval and into another that calls for less finely grained query granularity.

Design

CompactionConfigBasedJobTemplate underpins all of this

The existing CompactionConfigBasedJobTemplate underpins all of the new functionality. At the end of the day, the cascading reindexing template uses per-datasource ReindexingRules that are supplied by a ReindexingRuleProvider to create multiple non-overlapping search intervals that each apply a distinct InlineSchemaDataSourceCompactionConfig.

`ReindexingRule`

A good place to start review is to take a quick look at the ReindexingRule concept and the different implementations. The basic idea is that the components of the existing CompactionState concept have been broken out into multiple rule types:

`ReindexingDataSchemaRule`

Query Granularity and rollup bool
DimensionsSpec
MetricsSpec
Projections

`ReindexingSegmentGranularityRule`

`ReindexingTuningConfigRule`

`ReindexingIOConfigRule`

`ReindexingDeletionRule`

CompactionState#TransformSpec maps to what we call ReindexingDeletionRule which is a DimFilter under the hood, and all rows matching are deleted during reindexing.

Rule Anatomy

private final String id;
private final String description;
private final Period olderThan;
//...
//The special bits of an individual rule. Like a `Granularity` for a ReindexingSegmentGranularityRule

The key concept to take away is that olderThan reads as: "Apply this rule to Druid segments who have intervals ending at or before now - olderThan"

Rule selection when `> 1` rules of the same type exist for a datasource

The whole point of this PR is so a datasource can have different reindexing configuration at different points in its timeline. That means there will inevitably multiple rules of the same type that exist for a dataSource. For all but one rule type, only a single rule can be selected per search interval and its associated InlineSchemaDataSourceCompactionConfig. When this happens, it is the ReindexingRuleProvider implementation that makes the selection of which rule to pick for an interval. In this PR, the only concrete implementation is InlineReindexingRuleProvider and the logic for selection is simple:

If N rules of type T apply to the search interval [2025-01-01/2025-01-02) select the one most specific to that interval. "most specific" being the rule with an olderThan period that when subtracted from the current time UTC when the supervisor is running, is closest to the interval end (but never before it, we don't apply rules to intervals they partially overlap with the inline provider)

####### Special Case: ReindexingDeletionRule

The ReindexingDeletionRule is the only rule type that is additive. All matching rules for a search interval become part of the InlineSchemaDataSourceCompactionConfig for that interval. The take the form of:

NOT(A OR B OR C OR D)

where A, B, C, ... are individual ReindexingDeletionRules

######## ReindexingDeletionRule optimization

We avoid applying ReindexingDeletionRules to CompactionCandidates whose segments have all applied the rule already. This rule pruning is done using an optimizer before the underling CompactionConfigBasedJobTemplate creates the CompactionTasks. This is done to avoid wasting work during re-indexing.

`ReindexingRuleProvider`

The next logical place to go from here in review is the rule provider itself. This is meant to be extensible, with this PR providing one concrete implementation, InlineReindexingRuleProvider, that defines all of the rules within the supervisor spec itself.

Classes to check out are: ReindexingRuleProvider and InlineReindexingRuleProvider

A note on ComposingReindexingRuleProvider: This guy is a little funny in this PR as it is meant to give you power once there are multiple ReindexingRuleProvider implementations in Druid. You can chain them together, with their ordering mattering in determining what provider provides what rules.

`InlineReindexingRuleProvider` rule application

A rule applies to an interval IF the reference time - olderThan period for the rule is explicitly AFTER the end time of that interval. This means that rules only apply to intervals that are fully before their threshold.

`CascadingReindexingTemplate`

This is the brains of the operation when it comes to how the supervisor takes a collection of rules that exist for a datasource and turns them into CompactionTasks that can be run.

Non-Overlapping Timeline Generation

The most interesting thing about this is generating the non-overlapping timeline of CompactionConfigBasedJobTemplate objects where the non-overlapping terminology pertains to the search intervals that each template will cover.

Target SegmentGranularity Requirement

It is mandatory that every CompactionConfigBasedJobTemplate has a target SegmentGranularity defined in its config. This target granularity comes from either an explicit ReindexingSegmentGranularityRule that applies to the search interval or the default segment granularity that the cascadeReindex must provide in the spec to ensure that even if no rules exist, the default will stand in to fulfill this requirement.

Search interval boundary adjustment to match target segment granularity

I will use an example to show this in action as I think it is easier than words.

Supervisor Run Time: 2024-02-04T22:12:04.873Z
Supervisor Segment Granularity Rules:
- olderThan=P7D segmentGranularity=DAY
- olderThan=P1M segmentGranularity=MONTH
- olderThan=P1Y segmentGranularity=YEAR

Timeline with no adjustments to align search intervals with target granularity:

[-INF/2023-02-04T22:12:04.873)
[2023-02-04T22:12:04.873/2024-01-04T22:12:04.873)
[2024-01-04T22:12:04.873/2024-01-28T22:12:04.873)

Timeline with our adjustments to align search intervals with target granularity:

[-INF/2023-01-01T00:00:00.000Z)
[2023-01-01T00:00:00.000Z/2024-01-01T00:00:00.000Z)
[2024-01-01T00:00:00.000Z/2024-01-28T00:00:00.00)

A side effect of this is that rules will technically be delayed in their being applied, but it will ensure our search intervals align nicely with the segment granularity that they are targeting.

Two unfortunate constraints:

The source segments being re-indexed need to be at the same or finer grain than the target granularity of the search interval covering them. If not, there will be segments that overlap the interval boundaries in our timeline and get ignored.
The target granularities in the search intervals generated need to the same or getting more coarse as you get further in the past. Not doing this will cause issues with granularity aligning the search intervals while generating the timeline.
- I think it is important to strictly enforce this. I have added a change that during the creation of the initial granularity aligned timeline for creating reindexing search intervals, an IAE is thrown if the granularity is not stable or coarsening over time into the past.

Splitting the timeline built over the Granularity rules to accommodate the non-segment gran rules

We still have tons of rule types beyond these segment granularity rules that we fussed over to create our initial timeline. The reference_time - olderThan thresholds for each of these additional rules will fall within an existing interval in our timeline (except for one case we will cover at the end of this section). For each of these rules, we first use the segment granularity of the interval they fall in to granularity align their threshold time Granularity.bucketStart(threshold). If this adjusted threshold still falls within the interval, we need to split the interval at this time to allow precise application of the rule. We repeat this with all of the distinct olderThan periods that exist for our non segment granularity rules.

Edge cases

No Segment Granularity Rule exists
- This is what the default granularity is for. We create a synthetic granularity aligned timeline using the smallest period from our non segment granularity rules and then split this synthetic interval just as we would for the normal case.
Segment granularity rules exist, but the smallest olderThan period for a segment granularity rule is larger than the smallest period for non segment granularity rules
- we again use the default segment granularity and the smallest olderThan period to create a synthetic interval to make sure all of our search intervals have a segment granularity.

Example Inline Rule Provider Spec

{
  "type": "autocompact",
  "spec": {
    "type": "reindexCascade",
    "dataSource": "wikipedia",
    "defaultSegmentGranularity": "HOUR",
    "taskPriority": 25,
    "inputSegmentSizeBytes": 500000000,
    "skipOffsetFromLatest": "P1D",
    "engine": "msq",
    "taskContext": {
      "maxNumTasks": 10
    },
    "ruleProvider": {
      "type": "inline",
      "segmentGranularityRules": [
        {
          "id": "recent-fine-grain",
          "olderThan": "P1D",
          "segmentGranularity": "HOUR"
        },
        {
          "id": "week-old-medium-grain",
          "olderThan": "P7D",
          "segmentGranularity": "DAY"
        },
        {
          "id": "old-coarse-grain",
          "olderThan": "P30D",
          "segmentGranularity": "MONTH"
        }
      ],
      "deletionRules": [
        {
          "id": "remove-bots",
          "description": "Remove robot traffic from old data",
          "olderThan": "P30D",
          "deleteWhere": {
            "type": "equals",
            "column": "is_bot",
            "matchValueType": "STRING",
            "matchValue": "true"
          }
        },
        {
          "id": "remove-test-users",
          "description": "Remove test users from very old data",
          "olderThan": "P1Y",
          "deleteWhere": {
            "type": "equals",
            "column": "userType",
            "matchValueType": "STRING",
            "matchValue": "test"
          }
        }
      ],
      "tuningConfigRules": [
        {
          "id": "partition-by-country",
          "description": "Partition data by country for better query performance for common use cases",
          "olderThan": "P1D",
          "tuningConfig": {
            "partitionsSpec": {
              "type": "range",
              "maxRowsPerSegment": 10000000,
              "partitionDimensions": [
                "countryName"
              ]
            }
          }
        }
      ],
      "dataSchemaRules": [
        {
          "id": "base-data-schema-rule",
          "olderThan": "P1D",
          "dimensionsSpec": {
            "dimensions": [
              "channel",
              "cityName",
              "comment",
              "countryIsoCode",
              "countryName",
              "namespace",
              "page",
              "regionIsoCode",
              "regionName",
              "user",
              "isAnonymous",
              "isMinor",
              "isNew",
              "isRobot",
              "isUnpatrolled",
              "metroCode"
            ]
          },
          "metricsSpec": [
            {
              "type": "longSum",
              "name": "added",
              "fieldName": "added"
            },
            {
              "type": "longSum",
              "name": "deleted",
              "fieldName": "deleted"
            },
            {
              "type": "longSum",
              "name": "delta",
              "fieldName": "delta"
            }
          ],
          "queryGranularity": "MINUTE",
          "rollup": true
        }
      ]
    }
  },
  "suspended": true
}

This spec will:

Apply the following for all data older than 1 day
- Use range partitioning on countryName
- pass through base metrics and dimensions specs for all time
Apply the following for data 1 to 7 days old
- Reindex to HOUR segment gran
Apply the following for data 1 to 30 days old
- Reindex to MINUTE query granularity
Apply the following for data 7 to 30 days old
- Reindex to DAY segment gran
Apply the following for data older than 30 days old
Reindex to MONTH segment gran
Reindex to HOUR query granularity
Remove robot traffic when data
Apply the following for data older than 1 year
- Remove test users when data is older than 1 year

Future Work

documentation

I think this should come with the console addition PR since that will be when this really becomes realistic to operate without getting too confused with what is going on under the hood for a config.

Console (Future Addition)

Edit: After discussions we decided this is better fit for a follow up PR. I will leave this section as preview for now. but the console changes will actually come in 2nd PR

Based of review with others, adding visualizations of the reindexing timeline in the console seemed like a nice to have for operators. And after experimenting with adding it, I think it is actually a critical piece for sane operations. Using an AI coding agent (I'm know nothing about front end dev), I iterated on console changes and came up with a proposal for a console view of the reindexing timeline so an operator can see how the underlying rules break up into search intervals and what configuration will be applied to each search interval. Below are some pictures using the example inline supervisor above

catalog provider?

Would be cool to tie this into the catalog definitions of tables

Release note

Key changed/added classes in this PR

CascadingReindexingTemplate
ReindexingConfigOptimizer
- ReindexingDeletionRuleOptimizer
ReindexingRuleProvider
- InlineReindexingRuleProvider
- ComposingReindexingRuleProvider
ReindexingRule + AbstractReindexingRule
- ReindexingDimensionsRule
- ReindexingMetricsRule
- ReindexingSegmentGranularityRule
- ReindexingQueryGranularityRule
- ReindexingIOConfigRule
- ReindexingProjectionRule
- ReindexingDeletionRule

This PR has:

Improvements and bugfixes Fix sompaction status after rebasing Fix missing import after rebase fix checkstyle issues fill out javadocs address claude code review comments Add isReady concept to compaction rule provider and gate task creation on provider being ready Fix an issue in AbstractRuleProvider when it comes to variable length periods like month and year Implement a composing rule provider for chaining multiple rule providers

…ding reindex

Using 1 row and creating 0 row segments makes the test fail for native compaction runner. I cannot reproduce in docker to figure out how the test is misconfigured

… issue with range dim and all rows filtered out

...rc/main/java/org/apache/druid/server/coordinator/InlineSchemaDataSourceCompactionConfig.java

+    return new Builder()
+        .forDataSource(this.dataSource)
+        .withTaskPriority(this.taskPriority)
+        .withInputSegmentSizeBytes(this.inputSegmentSizeBytes)
+        .withMaxRowsPerSegment(this.maxRowsPerSegment)


server/src/test/java/org/apache/druid/server/compaction/CompactionStatusTest.java

… is going to be a bad time

…eindexing

…exing

...service/src/test/java/org/apache/druid/indexing/compact/CascadingReindexingTemplateTest.java

capistrant · 2026-02-16T22:25:39Z

we discussed this a bit offline, but to add here for more visibility - I think it would be quite useful to have an API available that could resolve a timeline of all the effective compaction configs for a datasource, to help operators visualize what is going to happen the the table over time (and could also serve as the basis of a UI to do the same in some future). It would probably be fine if this was a follow-up PR too if you prefer. I think the logic would basically be like creating jobs except just returning the config and interval instead of an actual compaction jobs

After toying around with a coding agent I feel like I was able to drum up a surprisingly nice UI in the supervisor view (for cascading reindex type only). The code is currently in this PR, but I'm down to move it to a follow in terms of sanity for review.

…at are deleted showing up

capistrant · 2026-02-17T15:27:18Z

@clintropolis TY for the review. I made some changes based off our conversation and off some findings during further testing. rough summary is:

consolidated those rule types we talked about into a single schema rule type
did some refactoring of the "finalizer", now called "optimizer". No more utility class full of static helpers, but rather just a single implementation of the deletion rule optimizer which is all the cascading template needs (for now).
moved and changed visibility of the builder of the builder that you questioned the need for. Still haven't thought of a better way to achieve what it is doing, but hoping this way is easier to refactor out of in the future if we don't replace it in this PR specifically
~~added a supervisor api for getting the timeline of search intervals and applied rules~~
- ~~added a UI for this api in the console~~

not directly related to your review:

Fixed a bug in the building of the configs where the wrong segment granularity rule was getting applied to intervals
Changed the logic for skipOffsets, where if a skip offset calculation lands in the middle of a search interval, it just skips compaction of that whole interval instead of trying to truncate the interval and process part of it. I think this simplification of that bit takes some complexity out of this code creating the search intervals, making it a little easier to reason about.
I also tried to break down some of the dense methods with lots of logic in CascadingReindexingTemplate by splitting out some private methods. Some of this was driven by the need for some re-use now that the template services the timeline creation request from the supervisor API

…eindexing

clintropolis · 2026-02-18T21:26:08Z

...-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionSupervisorTest.java

nit:

Suggested change

// Virtual Collumns on nested data is only supported with MSQ compaction engine right now.

// Virtual Columns on nested data is only supported with MSQ compaction engine right now.

clintropolis · 2026-02-18T21:29:25Z

...-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionSupervisorTest.java

nit: i think can leave out the list part and use the other 'create'

clintropolis · 2026-02-18T21:33:17Z

...ing-service/src/main/java/org/apache/druid/indexing/compact/CascadingReindexingTemplate.java

should these use InvalidInput.conditionalException to throw a proper DruidException to signal user error? (same comments for other validations)

clintropolis · 2026-02-18T21:37:13Z

...ervice/src/main/java/org/apache/druid/indexing/compact/CompactionConfigBasedJobTemplate.java

nit: variable name no longer matches

clintropolis · 2026-02-18T21:41:07Z

processing/src/main/java/org/apache/druid/segment/transform/CompactionTransformSpec.java

i think if we modified this to

@JsonInclude(JsonInclude.Include.NON_DEFAULT)

we could modify the constructor to initialize to VirtualColumns.EMPTY and save ourselves some null checks and not risk messing up fingerprinting stuff for things without virtual columns defined

thanks for the tip. I was definitely confused about what to do here to make it so everything that is already fingerprinted didn't end up being re-processed

I went down a little bit of a rabbit hole and found that @JsonInclude(JsonInclude.Include.NON_EMPTY) maybe covers this more robustly?

oh, since it just serializes the array as the value 👍

clintropolis · 2026-02-18T23:17:40Z

server/src/main/java/org/apache/druid/server/compaction/ReindexingDataSchemaRule.java

+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ReindexingDataSchemaRule extends AbstractReindexingRule


may not matter, but this is missing equals/hashcode/tostring (provider uses them in its implementation of these methods, other rules missing these methods too)

also worth a small javadoc

clintropolis · 2026-02-19T00:58:32Z

server/src/main/java/org/apache/druid/server/compaction/AbstractReindexingRule.java

is there a case for having a rule that should always apply? (e.g. i would think period of 0?)

ya. I think I accidentally pulled it from the description that period of 0 and handling of future data need to be addressed. Perhaps this PR warrants supporting P0D. but future data should be left for follow on if needed

clintropolis · 2026-02-19T01:06:22Z

server/src/main/java/org/apache/druid/server/compaction/ReindexingSegmentGranularityRule.java

+      @JsonProperty("id") @Nonnull String id,
+      @JsonProperty("description") @Nullable String description,
+      @JsonProperty("olderThan") @Nonnull Period olderThan,
+      @JsonProperty("segmentGranularity") @Nonnull Granularity segmentGranularity


granularity allows a lot of weird stuff, should we really allow all kinds of granularities here?

I guess I am not sure? I was basing off of what the old compaction granularity allowed and that was just Granularity there

per your comment below I am trending towards just laying the hammer and having a very narrow set of allowed values here to prevent people from causing themselves harm

clintropolis · 2026-02-19T01:09:39Z

server/src/main/java/org/apache/druid/server/compaction/ReindexingRule.java

+   */
+  enum AppliesToMode
+  {
+    PARTIAL,


partial seems underutilized since afaict we only pick rules for FULL, what is the plan for it?

good question. I was under the impression when I initially was writing this that it may be useful, but from where I am now after dealing with trying to make all of this possible to reason about as an operator, I think a rule provider trying to manage partial overlaps somehow instead of forcing full coverage, would be a nightmare. so maybe I should delete?

i'm not completely sure, it seems semi useful to track, if for nothing else from a debugging standpoint in case it is useful later

clintropolis · 2026-02-19T01:55:49Z

...ing-service/src/main/java/org/apache/druid/indexing/compact/CascadingReindexingTemplate.java

+      DateTime rawEnd = referenceTime.minus(rule.getOlderThan());
+      DateTime alignedEnd = rule.getSegmentGranularity().bucketStart(rawEnd);
+      DateTime alignedStart = (previousAlignedEnd != null) ? previousAlignedEnd : DateTimes.MIN;


you can get some pretty weird stuff going on here, perhaps we just encourage people not to create unhinged combinations that don't make sense?

LIke, I guess I wonder if we should take an opinionated stance and add some additional validation to make sure that the granularity is itself also like aligned with the interval?

Like I was curious what happens with a total freedom here, so for example imagine i have gone mad and define 3 segment granularity reindexing rules:

'hour-rule' - olderThan p1d, hourly granularity

'3day-rule' - olderThan p2d, period granularity of p3d

'5day-rule' - olderThan p3d, period granularity of p5d

Using the 'reference time' of the tests, this resolves into something like

IntervalGranularityInfo{interval=-146136543-09-08T08:23:32.096Z/2025-01-22T00:00:00.000Z, granularity={type=period, period=P5D, timeZone=UTC, origin=null}}, IntervalGranularityInfo{interval=2025-01-22T00:00:00.000Z/2025-01-27T00:00:00.000Z, granularity={type=period, period=P3D, timeZone=UTC, origin=null}}, IntervalGranularityInfo{interval=2025-01-27T00:00:00.000Z/2025-01-28T16:00:00.000Z, granularity={type=period, period=PT1H, timeZone=UTC, origin=null}}

seems strange because there is a 5 day interval of 3 day segments, what does that mean? 🤷

I didn't even try adding timezones on period granularities or using like, duration granularity, but I suspect they can make stuff even weirder.

Is this ok? (i think the answer is maybe not, since i tried running a weird configuration like this on CompactionSupervisorTest.test_cascadingCompactionTemplate_multiplePeriodsApplyDifferentCompactionRules and MSQ got stuck in some weird state if i specified a timezone.. i didn't dig very deep into it yet)

I almost want to say we take a hard stance and limit segment granularity rules to something like FIFTEEN_MINUTE, THIRTY_MINUTE, HOUR, DAY, MONTH, YEAR. at this point. The management of this timeline is proving to a real pain

i think it would be fine to start out super restrictive, we can always revisit this part later, probably easier to do that than start out super permissive and try to rein it in later

capistrant · 2026-02-19T05:37:53Z

thanks again for review round @clintropolis I think a lot makes sense and I've got most the changes locally. The segment granularity thing I'm at the point where it is so ripe for edge cases and errors where an operator may get themselves in trouble and regret it that I think the static set of standard granularities I called out in the comment is the way. my take is that it still gives people a lot more flexibility than operators have today, while trying to make our lives easier coding it, and prevent them (operators) from doing something crazy that they hate. what do you think?

…der eligible for compaction

capistrant added 11 commits January 21, 2026 16:13

Renaming refactor

ae21c82

add support for more config knobs in the cascading reindex spec

71830fc

add some testing

1940267

stop using forbidden apis

bdb514e

working on some test coverage

0f71da2

simplify search interval creation and enhance embedded test for casca…

acad478

…ding reindex

fixup checkstyle

d49fbf4

temporary fixup to test

5efb4bf

Using 1 row and creating 0 row segments makes the test fail for native compaction runner. I cannot reproduce in docker to figure out how the test is misconfigured

fix checkstyle

5a87759

remove native runner for one compaction supervisor test due to native…

0850a8f

… issue with range dim and all rows filtered out

capistrant added the WIP label Jan 21, 2026

github-actions bot added the Area - Ingestion label Jan 21, 2026

github-advanced-security bot found potential problems Jan 21, 2026

View reviewed changes

refactorings from self review

3667456

capistrant added the Area - Compaction label Jan 22, 2026

capistrant added 2 commits January 22, 2026 12:06

Fixup naming to prefer reindexing over compaction

a280690

fix up a javadoc with up to date design spec

3108b61

capistrant changed the title ~~[WIP] Reindexing rule providers with cascading reindexing~~ [WIP] Reindexing rule providers with cascading interval based reindexing Jan 22, 2026

capistrant added 11 commits January 22, 2026 14:31

Fill in UT gaps for the composing provider

1f671e1

refactor test class for inline rule provider

3be0da1

Self review refactorings

79ff44b

Trying to transform cascadingreindexingtemplate to a compaction state…

420f3b2

… is going to be a bad time

refactor the location of the reindexing filter rule optimizer

bf2e02d

Refactor this idea of additivity and how it works for building configs

5b4f3d2

Add a missing test class

6f5ead7

fix checkstyle

6d1fc6e

Merge branch 'master' into reindexing-rule-providers-with-cascading-r…

799db27

…eindexing

clean up a javadoc

6853b02

trivial fixes

2200467

capistrant added 9 commits February 16, 2026 12:30

web console testing and fixes found while writing tests

66d6815

clean refactor of reindexing config builder

bbb05b9

clean up refactor of cascading reindexing template

a39f222

self review cleanup

6c842a7

add missing tests for supervisor resource api

33fc482

Refactor exception class for timeline to be more descriptive

ccd6c71

remove redundant naming patterns

c7ed4e1

Add some java docs

f6ef9a9

refactor how the final config customization works for cascading reind…

83766fa

…exing

github-actions bot added the Area - Web Console label Feb 16, 2026

github-advanced-security bot found potential problems Feb 16, 2026

View reviewed changes

capistrant added 2 commits February 16, 2026 16:19

fix serde issues I was having

f07b97a

fixup random bits after CI ran

707904f

waiting for segments to be available should avoid issues with segs th…

a24cd32

…at are deleted showing up

capistrant requested a review from clintropolis February 17, 2026 15:27

capistrant added 5 commits February 17, 2026 10:35

Move ReindexingConfigBuilder to a hopefully better place

bfff8de

Merge branch 'master' into reindexing-rule-providers-with-cascading-r…

5fa4146

…eindexing

Remove web-console changes from this PR

3f2747c

Remove support for the reindexing timeline creation from app code

0d0d0c5

missed a few spots

4b211a0

capistrant removed the Area - Web Console label Feb 17, 2026

fix silly jackson serde error

387eb5a

clintropolis reviewed Feb 19, 2026

View reviewed changes

capistrant added 4 commits February 19, 2026 00:10

addressing review comments

34d31ab

Create strict allow list for segment granularity options

30d2d77

add some compaction transform serde tests out of paranoia

a381179

All P0D older than to essentially make all data from referance and ol…

e560a97

…der eligible for compaction

	// Virtual Collumns on nested data is only supported with MSQ compaction engine right now.
	// Virtual Columns on nested data is only supported with MSQ compaction engine right now.

Conversation

capistrant commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Value Proposition

Design

CompactionConfigBasedJobTemplate underpins all of this

ReindexingRule

ReindexingDataSchemaRule

ReindexingSegmentGranularityRule

ReindexingTuningConfigRule

ReindexingIOConfigRule

ReindexingDeletionRule

Rule Anatomy

Rule selection when > 1 rules of the same type exist for a datasource

ReindexingRuleProvider

InlineReindexingRuleProvider rule application

CascadingReindexingTemplate

Non-Overlapping Timeline Generation

Target SegmentGranularity Requirement

Search interval boundary adjustment to match target segment granularity

Splitting the timeline built over the Granularity rules to accommodate the non-segment gran rules

Example Inline Rule Provider Spec

Future Work

documentation

Console (Future Addition)

catalog provider?

Release note

Key changed/added classes in this PR

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

capistrant commented Feb 16, 2026

Uh oh!

capistrant commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

capistrant Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

capistrant commented Jan 21, 2026 •

edited

Loading

`ReindexingRule`

`ReindexingDataSchemaRule`

`ReindexingSegmentGranularityRule`

`ReindexingTuningConfigRule`

`ReindexingIOConfigRule`

`ReindexingDeletionRule`

Rule selection when `> 1` rules of the same type exist for a datasource

`ReindexingRuleProvider`

`InlineReindexingRuleProvider` rule application

`CascadingReindexingTemplate`

capistrant commented Feb 17, 2026 •

edited

Loading

clintropolis Feb 18, 2026 •

edited

Loading

capistrant Feb 19, 2026 •

edited

Loading

clintropolis Feb 19, 2026 •

edited

Loading