Skip to content

List similar tasks from previous pushes to identify regressions#1151

Closed
vrigal wants to merge 12 commits intomozilla:masterfrom
vrigal:regression-similar-jobs
Closed

List similar tasks from previous pushes to identify regressions#1151
vrigal wants to merge 12 commits intomozilla:masterfrom
vrigal:regression-similar-jobs

Conversation

@vrigal
Copy link
Copy Markdown
Contributor

@vrigal vrigal commented Apr 14, 2025

References #1145
Based on #1148

@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 14, 2025

Uses HGMO to detect similar tasks on previous build. It is slower than with treeherder, so limited to 20 pushes (< 5 minutes of runtime). This changes a bit the behavior specified in #1145 (comment), as we will never get more than 20 pushes

The changes in the regression command are only intended to simplify development and will be removed:

mozci regression 828a657b93e430f78716c4aa5e6ac9db2ed5372e --max-results=20 -v
2025-04-14 17:43:45.752 | WARNING  | mozci.configuration:__init__:149 - Configuration path /home/valentin/.config/mozci/config.toml is not a file.
17:43:45.803 | INFO     | mozci.data.base:50  - Sources selected, in order of priority: ('hgmo', 'taskcluster', 'treeherder_client', 'errorsummary', 'bugbug').
17:43:50.661 | INFO     | mozci.console.commands.regression:33  - Fetched 393 tasks
17:43:50.661 | INFO     | mozci.console.commands.regression:42  - Kept 20 tasks that are potential regressions
17:44:11.864 | ERROR    |                        mozci.task:88  - Bad group name in task WeASGGDiTzmJADl-9i8_4Q: ''
17:44:33.750 | WARNING  |       mozci.data.sources.artifact:99  - Some groups in FNtCr6sRTZSpfMARdVRAvQ are missing results: {'devtools/shared/commands/target/tests/browser.toml', 'devtools/client/framework/test/browser.toml', 'devtools/client/netmonitor/src/har/test/browser-harautomation.toml'}
+------------------------+---------------------------------------------+-----------------------+
| ID                     | Label                                       | Similar count (mozci) |
+------------------------+---------------------------------------------+-----------------------+
| BdmsZzfbR6aUAAmniaA4GQ | build-win64-shippable/opt                   | 1                     |
| cO59VVvdTTSLojmt2L-SrA | generate-profile-win64-shippable/opt        | 1                     |
| RfTgwACkQzyJ-uRDfv-a3Q | generate-profile-win64-shippable/opt        | 1                     |
| Ekv5_Rx5RHWLQgjGk3Yh0g | build-win64-shippable/opt                   | 1                     |
| G0_1FUeoSRanYxa1j4CNOw | build-signing-win32/opt                     | 5                     |
| YCWCtfV4TWW2GkyMLNfjhg | generate-profile-win64-nightlyasrelease/opt | 5                     |
| TWoTrB4TQ_WYb3Lad4R-cw | build-signing-linux64-aarch64/opt           | 5                     |
| Ht4KhlCmQ9i9ebunDkoM6g | build-linux64-nightlyasrelease/opt          | 5                     |
| Gv3SKBZoTuC2sX0SjIRbfw | build-macosx64-nightlyasrelease/opt         | 5                     |
| IZqavllXT-WcumQm06r9UQ | l10n-macosx64/opt                           | 13                    |
| DjmiQsqOREWWfZQ5O1HPxA | l10n-linux64-aarch64/opt                    | 13                    |
| a7mTuzgjTIWTLXGmxPdFUA | l10n-linux64/opt                            | 13                    |
| NPmP2j6bS0upxCVHQa9Mxw | build-win64-nightlyasrelease/opt            | 5                     |
| M2vBKwIxRG-eDY0pmPBEIA | l10n-win64/opt                              | 13                    |
| G1piUZbgTaWtenoA_t21cA | l10n-win32/opt                              | 13                    |
| UonV55tpRD2LZ_I26nXXEg | instrumented-build-apk-fenix                | 1                     |
| ZUDYK5FDQjidn6aZ8bawPQ | repackage-signing-msix-win64/opt            | 10                    |
| dgM9oA1uT2ilXrVGNnP6ZQ | repackage-macosx64/opt                      | 5                     |
| VGlKJUppRQuW-XLt8Dg7gQ | repackage-msix-win64/opt                    | 10                    |
| EFjnPOySTDWQRTWEPe8QLg | repackage-msix-win64/debug                  | 1                     |
+------------------------+---------------------------------------------+-----------------------+

Comment thread mozci/task.py Outdated
Comment on lines +279 to +293
@property
def is_potential_regression(self):
# Check if the job is either tier 1 or 2
if self.tier not in (1, 2):
return False

# Check if job result is busted or exception
if self.result not in ("busted", "exception"):
return False

# Check if extra.treeherder.jobKind is build
if self.job_kind != "build":
return False

return True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #1148 (comment), we should move this in mozci/push.py

Comment thread mozci/task.py Outdated
Comment thread mozci/task.py Outdated
@La0
Copy link
Copy Markdown
Collaborator

La0 commented Apr 15, 2025

@vrigal will rebase his work on master instead of my own branch so he can iterate on your comments

@vrigal vrigal force-pushed the regression-similar-jobs branch from 38facea to 57627b7 Compare April 15, 2025 09:51
@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 15, 2025

I removed most of the code.

@marco-c Using push.get_regressions("label") I have close results:

ID Label Regression count (mozci.tasks implem) Regression count (mozci.push implem)
BdmsZzfbR6aUAAmniaA4GQ build-win64-shippable/opt 1 0
cO59VVvdTTSLojmt2L-SrA generate-profile-win64-shippable/opt 1 0
RfTgwACkQzyJ-uRDfv-a3Q generate-profile-win64-shippable/opt 1 0
Ekv5_Rx5RHWLQgjGk3Yh0g build-win64-shippable/opt 1 0
G0_1FUeoSRanYxa1j4CNOw build-signing-win32/opt 5 4
YCWCtfV4TWW2GkyMLNfjhg generate-profile-win64-nightlyasrelease/opt 5 4
TWoTrB4TQ_WYb3Lad4R-cw build-signing-linux64-aarch64/opt 5 4
Ht4KhlCmQ9i9ebunDkoM6g build-linux64-nightlyasrelease/opt 5 4
Gv3SKBZoTuC2sX0SjIRbfw build-macosx64-nightlyasrelease/opt 5 4
IZqavllXT-WcumQm06r9UQ l10n-macosx64/opt 13 None
DjmiQsqOREWWfZQ5O1HPxA l10n-linux64-aarch64/opt 13 None
a7mTuzgjTIWTLXGmxPdFUA l10n-linux64/opt 13 None
NPmP2j6bS0upxCVHQa9Mxw build-win64-nightlyasrelease/opt 5 4
M2vBKwIxRG-eDY0pmPBEIA l10n-win64/opt 13 None
G1piUZbgTaWtenoA_t21cA l10n-win32/opt 13 None
UonV55tpRD2LZ_I26nXXEg instrumented-build-apk-fenix 1 0
ZUDYK5FDQjidn6aZ8bawPQ repackage-signing-msix-win64/opt 10 9
dgM9oA1uT2ilXrVGNnP6ZQ repackage-macosx64/opt 5 4
VGlKJUppRQuW-XLt8Dg7gQ repackage-msix-win64/opt 10 9
EFjnPOySTDWQRTWEPe8QLg repackage-msix-win64/debug 1 0

It does not count the actual push (hence the -1). But also excludes some tasks. I got 393 tasks in total, 29 tasks are returned by the actual push.classify_builds filter and 81 by `push.get_regression("label").

I still have to merge build specific things with existing mozci.push functions. This is what I plan to do:

  • Only detect build tasks in classify_builds, the filter for potential regressions is done in push.get_regressions.
  • Exclude build tasks from classify_regressions.
  • Add a parameter to push.get_regressions, like build=False. When set to True, initially look to tasks from classify_build and apply the specific filters.

@marco-c
Copy link
Copy Markdown
Collaborator

marco-c commented Apr 15, 2025

The l10n- tasks never actually run, so it's OK that they are missing (the builds they depend on failed, so they were never scheduled to run).

@vrigal vrigal force-pushed the regression-similar-jobs branch from 57627b7 to 5046214 Compare April 15, 2025 16:03
@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 17, 2025

I tried to implement a custom fetching of possible regression tasks (by label only) in push.get_regression, and kept the command from #1148 to compare the results:

$ mozci regression 828a657b93e430f78716c4aa5e6ac9db2ed5372e -v
+---------------------------------------------+----------------------+
| Label                                       | Previous occurrences |
+---------------------------------------------+----------------------+
| build-win64-shippable/opt                   | 0                    |
| generate-profile-win64-shippable/opt        | 0                    |
| build-signing-win32/opt                     | 4                    |
| generate-profile-win64-nightlyasrelease/opt | 4                    |
| build-signing-linux64-aarch64/opt           | 4                    |
| build-linux64-nightlyasrelease/opt          | 4                    |
| build-macosx64-nightlyasrelease/opt         | 4                    |
| build-win64-nightlyasrelease/opt            | 4                    |
| instrumented-build-apk-fenix                | 0                    |
| repackage-signing-msix-win64/opt            | 9                    |
| repackage-macosx64/opt                      | 4                    |
| repackage-msix-win64/opt                    | 9                    |
| repackage-msix-win64/debug                  | 0                    |
| repackage-signing-msix-win64/debug          | 0                    |
| generate-profile-linux64-shippable/opt      | 4                    |
| build-signing-win64/debug                   | 0                    |
| build-mac-signing-macosx64/debug            | 4                    |
| build-mac-signing-macosx64/opt              | 4                    |
| build-signing-linux64/opt                   | 4                    |
| build-fat-aar-android-geckoview-fat-aar/opt | 0                    |
| build-signing-win64/opt                     | 4                    |
| generate-profile-macosx64-shippable/opt     | 4                    |
+---------------------------------------------+----------------------+

*** If there are less than 20 pushes (default interval of pushes which ran every job type on autoland):
**** Add build task for same job type between G and B if there is not already a build scheduled or running.
**** If requested build job succeeds, narrow down regression range.
**** If requested build fails: check if the error message is similar: build regressions can overlap and a later code change cause a compilation failure earlier in the compilation process, altering the failure message.
***** Start a second bisection for the build failure on the earlier push, narrow regression window for later, initially investigated build failure.
***** Rerun the failed build once to reduce the risk of an intermittent failure.

The above rules are from #1145. Maybe we could for now apply a simpler rule, like "Retry the build if previous_occurrences == 0"? Then look for the state of the requested build in another PR?

Comment thread mozci/push.py Outdated
@vrigal vrigal force-pushed the regression-similar-jobs branch from 5046214 to 41a2a6e Compare April 17, 2025 10:10
Comment thread mozci/push.py Outdated
Comment thread mozci/push.py Outdated
Comment thread mozci/push.py Outdated
Comment on lines +898 to +903
return {
task.label: (0.0, summary)
for task in self.tasks
if self._check_build_task_regression(task)
and (summary := self.label_summaries.get(task.label))
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if we don't add this, doesn't get_candidate_regressions already returns builds?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, by default get_candidate_regressions will return 61 results that include the 7 returned by _should_retrigger_build.

Comment thread mozci/push.py Outdated

return find("label") or find("config_group")

def _check_build_task_regression(self, task: Task):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _check_build_task_regression(self, task: Task):
def _should_retrigger_build(self, task: Task):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to separate the logic for "task is a build failure" (so I renamed this to _is_build_failure(self, task)) and rules for retriggering (in the regression command).

@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 17, 2025

Here is the output of the actual command:

mozci regression --rev 828a657b93e430f78716c4aa5e6ac9db2ed5372e -vvv    [0]
[…]
15:13:30.747 | INFO     | mozci.console.commands.regression:47  - Detected 22 build tasks that may contain a regression. (7 potentially introduced by this push).
15:13:30.748 | INFO     | mozci.console.commands.regression:63  - 7 tasks should be retrigerred:
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * build-win64-shippable/opt [BdmsZzfbR6aUAAmniaA4GQ]
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * generate-profile-win64-shippable/opt [cO59VVvdTTSLojmt2L-SrA]
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * instrumented-build-apk-fenix [UonV55tpRD2LZ_I26nXXEg]
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * repackage-msix-win64/debug [EFjnPOySTDWQRTWEPe8QLg]
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * repackage-signing-msix-win64/debug [OFNg9IU3T5K3bFS558R4RA]
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * build-signing-win64/debug [Uj8bwqRZRzaI4X6i0nkMRA]
15:13:30.748 | INFO     | mozci.console.commands.regression:65  -  * build-fat-aar-android-geckoview-fat-aar/opt [XvvGmKleTvqM1Mo-0s3cmg]

Actually it only looks at pushes that have no child (so I have to cheat a bit to test with the above revision). I'm closing #1148 as most things happens here now.

@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 17, 2025

@marco-c I implemented most of the points we discussed this afternoon. I tried to play with MAX_DEPTH, but setting a value of 0 also prevents from going backwards (counting for previous occurrences) as the same setting is used for both directions. I'm not sure it is worth changing this anyway. I actually have the same output than the above (in ~2 minutes).

@marco-c
Copy link
Copy Markdown
Collaborator

marco-c commented Apr 17, 2025

OK, makes sense. We would have to split that into two values (backward_max_depth and forward_max_depth), but maybe it is not worth it, we can just filter by count == 0.

@La0
Copy link
Copy Markdown
Collaborator

La0 commented Apr 18, 2025

@marco-c @Archaeopteryx Could you use the latest version of this command mozci regression --rev <REV> against bad builds, and check the output (listing tasks you should retrigger).

If that looks 🆗 @vrigal can integrate the build filter in mozci push classify

@Archaeopteryx
Copy link
Copy Markdown
Contributor

78368f6e8144dff55f60ec291d25b0dd6d3c5985 backed out by cf16ff032ba42816626237cf3a1f6bb5ab63208e for Bp-nu failures - mozci detects no regression
3efa51ad1d24fa7882d426589d595a9db82d8631 backed out by h2ba81390e44f1a9653721a7c156f810335510052 for Bp-nu failures - mozci detects no regression
34432d794310d1d218ac4d35b88295be681db0e1 backed out by 0f90bec7a181744ba70124a81f0f0bd890536346 for macOS build failures - mozci detects no regression
5b709337bd5b7da2b8a024216e713d06cdb52bcb fixed by ad48d7da261df9eaa3a42fe8e16c12593f74b532 - macOS build failures not detected by mozci
40c099b9cf6d290bdde51f7331b31c7a42d174d3 backed out by 6b026a085f33aff0b0e6ccc9e214ff985524a328 for mass build bustage - mozci correctly suggests 6 build tasks to retrigger

@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 25, 2025

78368f6e8144dff55f60ec291d25b0dd6d3c5985 backed out by cf16ff032ba42816626237cf3a1f6bb5ab63208e for Bp-nu failures - mozci detects no regression 3efa51ad1d24fa7882d426589d595a9db82d8631 backed out by h2ba81390e44f1a9653721a7c156f810335510052 for Bp-nu failures - mozci detects no regression 34432d794310d1d218ac4d35b88295be681db0e1 backed out by 0f90bec7a181744ba70124a81f0f0bd890536346 for macOS build failures - mozci detects no regression 5b709337bd5b7da2b8a024216e713d06cdb52bcb fixed by ad48d7da261df9eaa3a42fe8e16c12593f74b532 - macOS build failures not detected by mozci 40c099b9cf6d290bdde51f7331b31c7a42d174d3 backed out by 6b026a085f33aff0b0e6ccc9e214ff985524a328 for mass build bustage - mozci correctly suggests 6 build tasks to retrigger

I did some investigating. It does not seem to be related to filtering by new regressions (previous occurrences == 0).
The command returned 2 tasks to be retriggered for revision 34432d794310d1d218ac4d35b88295be681db0e1, but it returned nothing for the others. I would say the culprit is RegressionCommand.is_build_failure because some tasks such as build-macosx64-non-unified/plain, build-win64-non-unified/plain are excluded, and I think it is because of the result being failed (we only handled busted or exception yet). I pushed a patch giving me the above result:

$ mozci regression --rev 78368f6e8144dff55f60ec291d25b0dd6d3c5985
[…]
3 tasks should be retrigerred:
  * build-macosx64-non-unified/plain [LV6AhyDpQnWGgRdD7_BbQw]
  * build-win64-non-unified/plain [ZzkZJeomT_KTcHRuxfmEKw]
  * build-linux64-non-unified/plain [JIYFz-mqTmiKlr4JMed4vQ]
$ mozci regression --rev 3efa51ad1d24fa7882d426589d595a9db82d8631
[…]
1 tasks should be retrigerred:
  * build-linux64-non-unified/plain [Sa4pxflvRa-ovOhmuv3yYQ]
$ mozci regression --rev 34432d794310d1d218ac4d35b88295be681db0e1
[…]
3 tasks should be retrigerred:
  * build-macosx64/debug [QbjjSw0aQ9KvEwXZutxRYg]
  * build-macosx64/opt [PlyPxIvVS-WtnVbp5XmIHw]
  * build-ios-non-unified/plain [Hu7YH2xMSIyQzpW80VAGrQ]
$ mozci regression --rev 5b709337bd5b7da2b8a024216e713d06cdb52bcb
[…]
No regression detected.
$ mozci regression --rev 40c099b9cf6d290bdde51f7331b31c7a42d174d3
[…]
119 tasks should be retrigerred:
    * build-apk-fenix-nightly-simulation [T7b0vl05Sxeoxi20gSYlug]
    * build-fat-aar-android-geckoview-fat-aar-shippable/opt [dfjwdTEkQyKBSgtxxF7J8w]
    * build-macosx64-x64-shippable/opt [Gcqon11uSsmbGcrARIaBRQ]
    * build-android-x86-shippable/opt [Ap4Eeq2nSHaW-V0j4hCvTg]
[…]

We still have to figure out why nothing is returned for revision 5b709337bd5b7da2b8a024216e713d06cdb52bcb.

@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 25, 2025

If the push was not backed-out and was not "bustage fixed", it can't have caused regressions.

https://github.com/mozilla/mozci/blob/master/mozci/push.py#L1002

It seems that revision 5b709337bd5b7da2b8a024216e713d06cdb52bcb match this specific case.

@vrigal vrigal marked this pull request as ready for review April 25, 2025 15:25
@vrigal vrigal force-pushed the regression-similar-jobs branch from 1fa78c9 to 0f34f76 Compare April 29, 2025 12:49
@vrigal
Copy link
Copy Markdown
Contributor Author

vrigal commented Apr 29, 2025

As suggested by @marco-c, setting historical_analysis to False allows to detect failures for revision 5b709337:

$ mozci regression --rev 5b709337bd5b7da2b8a024216e713d06cdb52bcb -vvv
Detected 5 build tasks that may contain a regression (2 potentially introduced by this push).
 3 tasks should be retrigerred:
  * build-macosx64/opt [dRCLrqxNTYGL2R5FxgTY5g]
  * instrumented-build-macosx64-shippable/opt [MVFSzwwoTBSs_ugP0RYywg]
  * build-macosx64/debug [euVhWKuoSwK-viMIaP5aBQ]

However, it will not work if the push is older than 1 day because of the Push.is_finalized() check https://github.com/mozilla/mozci/blob/master/mozci/push.py#L1005. But I suppose it will never be the case during the regression analysis.

@La0
Copy link
Copy Markdown
Collaborator

La0 commented May 20, 2025

Closing as this has been merged with #1157

@La0 La0 closed this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants