add exit code dependent retry policy#9276
add exit code dependent retry policy#9276aspiringmind-code wants to merge 12 commits intodmwm:masterfrom
Conversation
|
Jenkins results:
|
| delay = policy.get("delay", 900) | ||
| self.logger.info(f"Sleeping {delay} seconds before retry (exit code {exitCode})") | ||
| time.sleep(delay) | ||
| if exitCode in [8020, 8021, 8022, 8028, 84, 85, 86, 92, 134, 8001, 65]: |
There was a problem hiding this comment.
why is this line there ? Isn't what to do fully defined by the table ?
| # Exit-code dependent retry policy | ||
| # ---------------------------------------------------------------------- | ||
|
|
||
| EXIT_RETRY_POLICY = { |
There was a problem hiding this comment.
This dictionary, which will grow as we add other exit codes, could be better organized.
By increasing exit code value and contain only the "long ones" e.g. 8021
The short exit codes can be added later by using short_code=long_code%128 or if we want to keep making it easy for the reader to find a short exit code here, add it as key in the sub-dictionary
belforte
left a comment
There was a problem hiding this comment.
see a couple inline comments
|
more on the "substance", it is not good to use Notice that delaying the PostJob also delays the status reporting, the DAG node is still not completed. Rather once we introduce re-submission delays of several hours (days ?) we should worry about properly reporting this to user. |
|
Jenkins results:
|
| if os.path.exists(retry_info_file): | ||
| try: | ||
| with open(retry_info_file, "r", encoding="utf-8") as fd: | ||
| retry_info = literal_eval(fd.read()) |
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
|
Jenkins results:
|
Fix #9264