Skip to content

Commit 2adf9df

Browse files
authored
Merge branch 'master' into fix_meta_tool
2 parents 9ec572e + 3cdc482 commit 2adf9df

File tree

2,826 files changed

+113317
-112390
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,826 files changed

+113317
-112390
lines changed
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
name: doris-docker-regression
3+
description: Run Doris docker-based regression tests from a clean package
4+
compatibility: opencode
5+
---
6+
7+
## Purpose
8+
9+
Run Doris docker-based regression tests from a clean package, avoiding contamination from local `conf/` or startup script modifications.
10+
11+
## Prerequisites
12+
13+
### Python Environment
14+
15+
Requires Python 3. Install dependencies from `docker/runtime/doris-compose/requirements.txt`:
16+
17+
```bash
18+
python -m pip install --user -r docker/runtime/doris-compose/requirements.txt
19+
```
20+
21+
If installation fails (especially PyYAML conflicts), use specific versions:
22+
```bash
23+
python -m pip install --user pyyaml==5.3.1 docker==6.1.3
24+
python -m pip install --user -r docker/runtime/doris-compose/requirements.txt
25+
```
26+
27+
Alternatively, use a virtual environment:
28+
```bash
29+
python -m venv doris-compose-env
30+
source doris-compose-env/bin/activate
31+
pip install -r docker/runtime/doris-compose/requirements.txt
32+
```
33+
34+
### Docker Environment
35+
36+
```bash
37+
docker run hello-world # Docker works
38+
docker compose version # Compose v2
39+
docker-compose version # Should resolve to Compose v2
40+
```
41+
42+
If `docker-compose` shows `TypeError: kwargs_from_env()`, make sure it is indeed Docker Compose V2; if not, you need to install and point to the correct version.
43+
44+
Check port availability:
45+
```bash
46+
lsof -nP -iTCP:8030,8040,8050,8060,8070,9010,9020,9030,9050,9060 -sTCP:LISTEN
47+
```
48+
If there is a conflict, ask the user how to handle it.
49+
50+
## Build
51+
52+
```bash
53+
./build.sh --fe --be -j<parallel_jobs> --output <output_directory>
54+
```
55+
56+
## Prepare Clean Package
57+
58+
Sanitize configs from git HEAD:
59+
60+
```bash
61+
git show HEAD:conf/fe.conf > <output_directory>/fe/conf/fe.conf
62+
git show HEAD:conf/be.conf > <output_directory>/be/conf/be.conf
63+
git show HEAD:bin/start_fe.sh > <output_directory>/fe/bin/start_fe.sh
64+
chmod 755 <output_directory>/fe/bin/start_fe.sh
65+
```
66+
67+
Verify clean configs use **default ports**:
68+
- `fe.conf`: 8030, 9020, 9030, 9010, 8070
69+
- `be.conf`: 9060, 8040, 9050, 8060, 8050
70+
- `start_fe.sh`: no `-agentlib:jdwp` debug agent
71+
72+
## Build Docker Image
73+
74+
```bash
75+
docker build \
76+
--build-arg OUTPUT_PATH=<output_directory> \
77+
-f docker/runtime/doris-compose/Dockerfile \
78+
-t <image_name>:latest \
79+
.
80+
```
81+
82+
## Configure Regression
83+
84+
Create `regression-test/conf/regression-conf-custom.groovy`:
85+
86+
```groovy
87+
image = "<image_name>:latest"
88+
excludeDockerTest = false
89+
testGroups = "docker"
90+
```
91+
92+
## Run Regression
93+
94+
```bash
95+
./run-regression-test.sh --run -d <directory> -s <suite_name>
96+
```
97+
98+
## Troubleshooting
99+
100+
| Symptom | Cause | Fix |
101+
|---------|-------|-----|
102+
| 0 suites | `excludeDockerTest=true` or missing `testGroups="docker"` | Check `regression-conf-custom.groovy` |
103+
| JDWP error | `start_fe.sh` has debug agent | Re-sanitize from `git show HEAD:bin/start_fe.sh` |
104+
| Wrong ports | Configs have local edits | Re-sanitize from `git show HEAD:conf/fe.conf` / `be.conf` |
105+
| Port conflict | Processes using default ports | `lsof` then kill |
106+
107+
## Debug Logs
108+
109+
Runtime cluster at `/tmp/doris/<suite_name>/`:
110+
- `fe-1/log/fe.log`, `fe.warn.log`, `fe.out`, `health.out`
111+
- `be-1/log/be.INFO`, `be.WARNING`, `be.out`, `health.out`
112+
- `doris-compose.log`
113+
114+
## Full Command Sequence
115+
116+
```bash
117+
# 1. Check environment
118+
python --version
119+
docker run hello-world
120+
docker compose version
121+
lsof -nP -iTCP:8030,8040,8050,8060,8070,9010,9020,9030,9050,9060 -sTCP:LISTEN
122+
123+
# 2. Build
124+
./build.sh --fe --be -j60
125+
126+
# 3. Sanitize package configs
127+
git show HEAD:conf/fe.conf > <output_directory>/fe/conf/fe.conf
128+
git show HEAD:conf/be.conf > <output_directory>/be/conf/be.conf
129+
git show HEAD:bin/start_fe.sh > <output_directory>/fe/bin/start_fe.sh
130+
chmod 755 <output_directory>/fe/bin/start_fe.sh
131+
132+
# 4. Build image
133+
docker build --build-arg OUTPUT_PATH=<output_directory> -f docker/runtime/doris-compose/Dockerfile -t <image_name>:latest .
134+
135+
# 5. Configure (create regression-conf-custom.groovy)
136+
# 6. Run
137+
./run-regression-test.sh --run -d <directory> -s <suite_name>
138+
```

.github/workflows/build-extension.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ jobs:
114114
run: |
115115
pushd thirdparty
116116
branch="${{ github.base_ref }}"
117-
if [[ -z "${branch}" ]] || [[ "${branch}" == 'master' || "${branch}" == 'branch-4.0' || "${branch}" == 'branch-3.0' || "${branch}" == 'branch-2.1' ]]; then
117+
if [[ -z "${branch}" ]] || [[ "${branch}" == 'master' || "${branch}" == 'branch-4.1'|| "${branch}" == 'branch-4.0' || "${branch}" == 'branch-3.0' || "${branch}" == 'branch-2.1' ]]; then
118118
curl -L https://github.com/apache/doris-thirdparty/releases/download/automation/doris-thirdparty-prebuilt-linux-x86_64.tar.xz \
119119
-o doris-thirdparty-prebuilt-linux-x86_64.tar.xz
120120
else

.github/workflows/opencode-review.yml

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,9 @@ jobs:
9292
- After completing the review, you MUST provide a final summary opinion based on the rules defined in AGENTS.md and the code-review skill. The summary must include conclusions for each applicable critical checkpoint.
9393
- If no issues to report, submit a short summary comment saying no issues found using: gh pr comment PLACEHOLDER_PR_NUMBER --body "<summary>"
9494
- If issues found, submit a review with inline comments plus a comprehensive summary body. Use GitHub Reviews API to ensure comments are inline:
95+
- Inline comment bodies may include GitHub suggested changes blocks when you can propose a precise patch.
96+
- Prefer suggested changes for small, self-contained fixes (for example typos, trivial refactors, or narrowly scoped code corrections).
97+
- Do not force suggested changes for broad, architectural, or multi-file issues; explain those normally.
9598
- Build a JSON array of comments like: [{ "path": "<file>", "position": <diff_position>, "body": "..." }]
9699
- Submit via: gh api repos/PLACEHOLDER_REPO/pulls/PLACEHOLDER_PR_NUMBER/reviews --input <json_file>
97100
- The JSON file should contain: {"event":"COMMENT","body":"<summary>","comments":[...]}
@@ -109,14 +112,15 @@ jobs:
109112

110113
- name: Run automated code review
111114
id: review
115+
timeout-minutes: 55
112116
continue-on-error: true
113117
env:
114118
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
115119
run: |
116120
PROMPT=$(cat /tmp/review_prompt.txt)
117121
118122
set +e
119-
opencode run "$PROMPT" -m "github-copilot/claude-opus-4.6" 2>&1 | tee /tmp/opencode-review.log
123+
opencode run "$PROMPT" -m "github-copilot/gpt-5.4" 2>&1 | tee /tmp/opencode-review.log
120124
status=${PIPESTATUS[0]}
121125
set -e
122126
@@ -139,26 +143,30 @@ jobs:
139143
fi
140144
141145
- name: Comment PR on review failure
142-
if: ${{ steps.review.outcome == 'failure' }}
146+
if: ${{ always() && steps.review.outcome != 'success' }}
143147
env:
144148
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
145149
FAILURE_REASON: ${{ steps.review.outputs.failure_reason }}
150+
REVIEW_OUTCOME: ${{ steps.review.outcome }}
146151
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
147152
run: |
153+
error_msg="${FAILURE_REASON:-Review step was $REVIEW_OUTCOME (possibly timeout or cancelled)}"
148154
gh pr comment "${{ github.event.issue.number }}" --body "$(cat <<EOF
149155
OpenCode automated review failed and did not complete.
150156
151-
Error: ${FAILURE_REASON}
157+
Error: ${error_msg}
152158
Workflow run: ${RUN_URL}
153159
154160
Please inspect the workflow logs and rerun the review after the underlying issue is resolved.
155161
EOF
156162
)"
157163
158164
- name: Fail workflow if review failed
159-
if: ${{ steps.review.outcome == 'failure' }}
165+
if: ${{ always() && steps.review.outcome != 'success' }}
160166
env:
161167
FAILURE_REASON: ${{ steps.review.outputs.failure_reason }}
168+
REVIEW_OUTCOME: ${{ steps.review.outcome }}
162169
run: |
163-
echo "OpenCode automated review failed: ${FAILURE_REASON}"
170+
error_msg="${FAILURE_REASON:-Review step was $REVIEW_OUTCOME (possibly timeout or cancelled)}"
171+
echo "OpenCode automated review failed: ${error_msg}"
164172
exit 1

be/CMakeLists.txt

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,10 @@ endif()
428428
# -DNDEBUG: Turn off dchecks/asserts/debug only code.
429429
set(CXX_FLAGS_RELEASE "-O3 -DNDEBUG")
430430
message(STATUS "UBSAN_IGNORELIST is ${UBSAN_IGNORELIST}")
431-
set(CXX_FLAGS_ASAN "-O0 -fsanitize=address -fsanitize=undefined -fno-sanitize=float-cast-overflow -fsanitize-ignorelist=${UBSAN_IGNORELIST} -DUNDEFINED_BEHAVIOR_SANITIZER -DADDRESS_SANITIZER")
431+
set(CXX_FLAGS_ASAN "-O0 -fsanitize=address -fsanitize=undefined -fno-sanitize=float-cast-overflow -DUNDEFINED_BEHAVIOR_SANITIZER -DADDRESS_SANITIZER")
432+
if (COMPILER_CLANG)
433+
set(CXX_FLAGS_ASAN "${CXX_FLAGS_ASAN} -fsanitize-ignorelist=${UBSAN_IGNORELIST}")
434+
endif()
432435
set(CXX_FLAGS_LSAN "-O0 -fsanitize=leak -DLEAK_SANITIZER")
433436
## Use for BE-UT
434437
set(CXX_FLAGS_ASAN_UT "-O0 -fsanitize=address -DADDRESS_SANITIZER")
@@ -706,6 +709,11 @@ else()
706709
-lresolv
707710
-liconv
708711
)
712+
# On macOS arm64, large binaries (>2GB) can cause the virtual address space
713+
# to overlap with the dyld shared cache region, resulting in "dyld cache not
714+
# loaded: syscall to map cache into shared region failed". Use -dead_strip to
715+
# remove unreachable code/data and reduce binary size.
716+
add_link_options(-Wl,-dead_strip)
709717
endif()
710718

711719

be/src/agent/agent_server.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,8 @@ void AgentServer::cloud_start_workers(CloudStorageEngine& engine, ExecEnv* exec_
211211

212212
_workers[TTaskType::ALTER] = std::make_unique<TaskWorkerPool>(
213213
"ALTER_TABLE", config::alter_tablet_worker_count,
214-
[&engine](auto&& task) { return alter_cloud_tablet_callback(engine, task); });
214+
[&engine](auto&& task) { return alter_cloud_tablet_callback(engine, task); },
215+
[&engine](auto&& task) { set_alter_version_before_enqueue(engine, task); });
215216

216217
_workers[TTaskType::CALCULATE_DELETE_BITMAP] = std::make_unique<TaskWorkerPool>(
217218
"CALC_DBM_TASK", config::calc_delete_bitmap_worker_count,

be/src/agent/task_worker_pool.cpp

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -525,9 +525,11 @@ bvar::Adder<uint64_t> report_index_policy_failed("report", "index_policy_failed"
525525

526526
} // namespace
527527

528-
TaskWorkerPool::TaskWorkerPool(std::string_view name, int worker_count,
529-
std::function<void(const TAgentTaskRequest& task)> callback)
530-
: _callback(std::move(callback)) {
528+
TaskWorkerPool::TaskWorkerPool(
529+
std::string_view name, int worker_count,
530+
std::function<void(const TAgentTaskRequest& task)> callback,
531+
std::function<void(const TAgentTaskRequest& task)> pre_submit_callback)
532+
: _callback(std::move(callback)), _pre_submit_callback(std::move(pre_submit_callback)) {
531533
auto st = ThreadPoolBuilder(fmt::format("TaskWP_{}", name))
532534
.set_min_threads(worker_count)
533535
.set_max_threads(worker_count)
@@ -551,6 +553,9 @@ void TaskWorkerPool::stop() {
551553

552554
Status TaskWorkerPool::submit_task(const TAgentTaskRequest& task) {
553555
return _submit_task(task, [this](auto&& task) {
556+
if (_pre_submit_callback) {
557+
_pre_submit_callback(task);
558+
}
554559
add_task_count(task, 1);
555560
return _thread_pool->submit_func([this, task]() {
556561
_callback(task);
@@ -2244,9 +2249,48 @@ void alter_cloud_tablet_callback(CloudStorageEngine& engine, const TAgentTaskReq
22442249
std::chrono::system_clock::now().time_since_epoch())
22452250
.count();
22462251
g_fragment_last_active_time.set_value(now);
2252+
2253+
// Clean up alter_version before remove_task_info to avoid race:
2254+
// remove_task_info allows same-signature re-submit, whose pre_submit_callback
2255+
// would set alter_version, then this cleanup would wipe it.
2256+
if (req.__isset.alter_tablet_req_v2) {
2257+
const auto& alter_req = req.alter_tablet_req_v2;
2258+
auto new_tablet = engine.tablet_mgr().get_tablet(alter_req.new_tablet_id);
2259+
auto base_tablet = engine.tablet_mgr().get_tablet(alter_req.base_tablet_id);
2260+
if (new_tablet.has_value()) {
2261+
new_tablet.value()->set_alter_version(-1);
2262+
}
2263+
if (base_tablet.has_value()) {
2264+
base_tablet.value()->set_alter_version(-1);
2265+
}
2266+
}
2267+
22472268
remove_task_info(req.task_type, req.signature);
22482269
}
22492270

2271+
void set_alter_version_before_enqueue(CloudStorageEngine& engine, const TAgentTaskRequest& req) {
2272+
if (!req.__isset.alter_tablet_req_v2) {
2273+
return;
2274+
}
2275+
const auto& alter_req = req.alter_tablet_req_v2;
2276+
if (alter_req.alter_version <= 1) {
2277+
return;
2278+
}
2279+
auto new_tablet = engine.tablet_mgr().get_tablet(alter_req.new_tablet_id);
2280+
if (!new_tablet.has_value() || new_tablet.value()->tablet_state() == TABLET_RUNNING) {
2281+
return;
2282+
}
2283+
auto base_tablet = engine.tablet_mgr().get_tablet(alter_req.base_tablet_id);
2284+
if (!base_tablet.has_value()) {
2285+
return;
2286+
}
2287+
new_tablet.value()->set_alter_version(alter_req.alter_version);
2288+
base_tablet.value()->set_alter_version(alter_req.alter_version);
2289+
LOG(INFO) << "set alter_version=" << alter_req.alter_version
2290+
<< " before enqueue, base_tablet=" << alter_req.base_tablet_id
2291+
<< ", new_tablet=" << alter_req.new_tablet_id;
2292+
}
2293+
22502294
void gc_binlog_callback(StorageEngine& engine, const TAgentTaskRequest& req) {
22512295
std::unordered_map<int64_t, int64_t> gc_tablet_infos;
22522296
if (!req.__isset.gc_binlog_req) {

be/src/agent/task_worker_pool.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ class TaskWorkerPoolIf {
5050
class TaskWorkerPool : public TaskWorkerPoolIf {
5151
public:
5252
TaskWorkerPool(std::string_view name, int worker_count,
53-
std::function<void(const TAgentTaskRequest&)> callback);
53+
std::function<void(const TAgentTaskRequest&)> callback,
54+
std::function<void(const TAgentTaskRequest&)> pre_submit_callback = nullptr);
5455

5556
~TaskWorkerPool() override;
5657

@@ -62,6 +63,7 @@ class TaskWorkerPool : public TaskWorkerPoolIf {
6263
std::atomic_bool _stopped {false};
6364
std::unique_ptr<ThreadPool> _thread_pool;
6465
std::function<void(const TAgentTaskRequest&)> _callback;
66+
std::function<void(const TAgentTaskRequest&)> _pre_submit_callback;
6567
};
6668

6769
class PublishVersionWorkerPool final : public TaskWorkerPool {
@@ -180,6 +182,8 @@ void alter_tablet_callback(StorageEngine& engine, const TAgentTaskRequest& req);
180182

181183
void alter_cloud_tablet_callback(CloudStorageEngine& engine, const TAgentTaskRequest& req);
182184

185+
void set_alter_version_before_enqueue(CloudStorageEngine& engine, const TAgentTaskRequest& req);
186+
183187
void clone_callback(StorageEngine& engine, const ClusterInfo* cluster_info,
184188
const TAgentTaskRequest& req);
185189

be/src/cloud/cloud_cumulative_compaction.cpp

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,22 @@ Status CloudCumulativeCompaction::modify_rowsets() {
292292
LOG(INFO) << "CloudCumulativeCompaction::modify_rowsets.enable_spin_wait, exit";
293293
});
294294

295+
// Block only NOTREADY tablets (SC new tablets) before compaction commit.
296+
// RUNNING tablets (system tables, base tablets) are not affected.
297+
DBUG_EXECUTE_IF("CloudCumulativeCompaction::modify_rowsets.block_notready", {
298+
if (_tablet->tablet_state() == TABLET_NOTREADY) {
299+
LOG(INFO) << "block NOTREADY tablet compaction before commit"
300+
<< ", tablet_id=" << _tablet->tablet_id() << ", output=["
301+
<< _input_rowsets.front()->start_version() << "-"
302+
<< _input_rowsets.back()->end_version() << "]";
303+
while (DebugPoints::instance()->is_enable(
304+
"CloudCumulativeCompaction::modify_rowsets.block_notready")) {
305+
std::this_thread::sleep_for(std::chrono::milliseconds(50));
306+
}
307+
LOG(INFO) << "release NOTREADY tablet compaction, tablet_id=" << _tablet->tablet_id();
308+
}
309+
});
310+
295311
DeleteBitmapPtr output_rowset_delete_bitmap = nullptr;
296312
int64_t initiator = this->initiator();
297313
int64_t get_delete_bitmap_lock_start_time = 0;

0 commit comments

Comments
 (0)