Skip to content

Conversation

@binary-husky
Copy link
Collaborator

No description provided.

@binary-husky binary-husky merged commit 2eb7db9 into main Jan 16, 2026
0 of 2 checks passed
@gemini-code-assist
Copy link

Summary of Changes

Hello @binary-husky, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the project's code quality and maintainability. It refines the pre-commit setup by replacing multiple code style enforcement tools with a more focused utility for removing unused imports. Concurrently, it systematically cleans up redundant import statements and resolves minor formatting inconsistencies across a broad spectrum of files, resulting in a leaner and more consistent codebase.

Highlights

  • Pre-commit Hook Refactoring: The pre-commit configuration has been updated to remove the 'black', 'isort', and 'flake8' hooks, replacing them with 'autoflake'. This change streamlines the code quality checks by focusing on automatic unused import removal.
  • Extensive Unused Import Cleanup: Numerous Python files across the codebase have been cleaned by removing unused import statements. This includes various modules under ajet/, scripts/, and tutorial/ directories, improving code clarity and reducing unnecessary dependencies.
  • Minor Formatting and Whitespace Fixes: A significant number of minor formatting issues, such as the removal of '\ No newline at end of file' markers and inconsistent whitespace, have been addressed in both code and documentation files, contributing to overall code consistency.
  • Removal of Networking Test File: The tests/test_networking.py file, along with its associated networking utility tests, has been entirely removed from the project.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/doc.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request applies pre-commit fixes, primarily removing unused imports and correcting formatting across many files. While these cleanups are beneficial, this PR introduces two major regressions. First, the pre-commit configuration in .pre-commit-config.yaml has been significantly weakened by replacing black, isort, and flake8 with only autoflake. This removes crucial code formatting and linting capabilities, which can lead to inconsistent code style and missed bugs. I strongly recommend restoring a robust linting and formatting setup. Second, the test file tests/test_networking.py has been deleted, removing test coverage for the networking utilities in ajet/utils/networking.py. Removing tests is highly discouraged and reduces confidence in the codebase's correctness. These tests should be restored.

Comment on lines +16 to +24
- repo: https://github.com/myint/autoflake
rev: v2.2.0
hooks:
- id: flake8
additional_dependencies: [flake8-docstrings]
args: [
"--max-line-length=100",
"--max-complexity=20",
"--select=C,E,F,W,B,B950",
"--ignore=E203,E266,E501,W503",
]
- id: autoflake
args: [
--in-place,
--remove-all-unused-imports,
--ignore-init-module-imports
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Replacing black, isort, and flake8 with only autoflake is a significant step back for code quality. autoflake only handles unused imports, while the previous tools enforced code style and caught a wide range of potential bugs.

I strongly recommend using a more comprehensive tool. ruff is a modern, high-performance tool that can replace flake8, isort, autoflake, and even black's formatting. Here is a suggested configuration using ruff for both linting and formatting.

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.2.1
    hooks:
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]
      - id: ruff-format

except Exception:

return "127.0.0.1" No newline at end of file
return "127.0.0.1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The test file for this utility, tests/test_networking.py, was removed in this pull request. Removing tests is highly discouraged as it reduces confidence in the code's correctness and can lead to regressions. Please restore the test file.

Comment on lines 1 to +2
from typing import List, Tuple, Union
from typing import List, Union, Tuple, Dict, Optional, Any
from typing import List, Union, Tuple, Dict, Optional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These two import statements from typing are partially redundant. They can be merged into a single, sorted import statement for better readability and maintainability.

Suggested change
from typing import List, Tuple, Union
from typing import List, Union, Tuple, Dict, Optional, Any
from typing import List, Union, Tuple, Dict, Optional
from typing import Dict, List, Optional, Tuple, Union

TaoShuchang pushed a commit that referenced this pull request Jan 20, 2026
* fix end of files

* autoflake import fix

* add mypy check
binary-husky added a commit that referenced this pull request Jan 21, 2026
* feat(finworld): Added AgentScope learning protocol and OpenJudge evaluation functionality to the FinWorld task.

- Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions.

- Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance.

- Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking.

- Added a finworld.yaml configuration file to define project training and rollout parameters.

- Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran).

- Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability.

- Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination.

* Precommit fix (#4)

* fix end of files

* autoflake import fix

* add mypy check

* fix test bench import

* refactor(finworld): Replace agent protocol and unify configuration updates

- Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature.
- Unified the parameter name of the model tuner to `tuner` and its related attribute references.
- Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`.
- Modified the context overflow judgment logic to prevent tool call blocking.
- Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters.
- Modified the default environment variable values ​​and log saving paths in finworld_judge.py.
- Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading.
- Added the finworld_single.yaml template to adapt to single-machine training configurations.
- Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path.

* feat(finworld): Added FinWorld training environment configuration scripts and templates

- Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import.
- Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths.
- Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests.
- Added a new finworld example directory and related documentation, improving the example project structure.

* refactor(utils): Remove unused extract and compute functions `extract_tool_stats_from_cmts`

* refactor(finworld): Replace the old model with OpenJudge, update evaluation configuration and scripts

- Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method
- Read Judge model parameters from the configuration file first, using environment variables as a fallback
- Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing
- Cleaned up and removed the old `_init_model` singleton method and related code
- Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations
- Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items
- Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script
- Adjusted the `env_service` startup path to improve environment activation compatibility
- Adjusted script log output format and content to enhance the clarity of configuration parameter printing

* feat(task_reader): Support data reading of type jsonl_with_env_service

- Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service.
- Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service.
- Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file.
- Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination.
- Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment.
- Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration.

* feat(core): add finworld task reader support to framework

* feat(finworld): implement specialized data reader and openjudge-based grading logic

* refactor(finworld): optimize configuration templates and prompt engineering

* chore(finworld): update launch scripts and add variant experiment scripts

* feat(finworld): Added support for multi-machine, multi-GPU training scripts and configuration templates:

* chore(git): ignore finworld/yaml/*

* fix(metrics): Fix and enhance the compatibility and debugging output of the metrics update logic

- Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics.
- Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`.
- Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting.
- Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints.
- Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field.
- Removed redundant and deprecated code for extracting `reward_stats` and calculation functions.
- Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data.

* fix(metrics): Remove debug prints and synchronize reward statistics

- Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py`
- Removed debug print statements related to the `log_metrics` key in `finworld.py`
- Removed debug print statements before updating `metadata_stats` in `finworld_judge.py`
- Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation
- Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability.

* chore: "Stop tracking existing yaml files in tutorial directory"

* fix(task_runner): Synchronize reward_stats to log_metrics

feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script

* refactor(script): Refactored the finworld training script, integrating configuration and startup processes.

* Refactor(deep_finance): Replace and remove finworld-related implementations

- Switched the example directory from example_finworld to example_deep_finance
- Modified startup parameters and logic to support deep_finance, replacing the finworld option
- Replaced finworld_reader with deep_finance_reader in the task reader
- Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks
- Updated reward metric tool documentation to support deep_finance
- Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts
- Replaced the keyword "finworld" with "deep_finance" in comments and logs

* refactor(deepfinance): Rename and unify DeepFinance module and config references

- Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format.
- Modify command-line arguments to `--with-deepfinance` for consistency.
- Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`.
- Update the documentation description and file name of the `metric_helper` module to DeepFinance.
- Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix.
- Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration.
- Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`.
- Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`.
- Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`.
- Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`.
- Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names.

* refactor(tutorial): Optimize dynamic generation logic for configuration file paths

* fix(deep_finance): argparse: with-deepfinance

* fix(tutorial): Fixed issues with multi-machine training environment variable settings

* fix(env): Corrected the assignment logic for reward and info when returning environment state

- Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields.
- Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`.
- Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items.
- Modified `save_trajectory_as_json` to always print trajectory saving confirmation information.
- Corrected log comments in `example_deep_finance` to avoid meaningless log output.
- Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality.

* chore(config): Update example_deep_finance configuration and clean up files

- Added a new ignore rule for config file paths in .gitignore
- Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance
- Refactored the deep_finance.yaml configuration file, adjusting project and experiment names
- Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models
- Optimized model paths and training parameter configurations, adding parallel and batch processing settings
- Adjusted data reading methods and training/validation set path placeholders
- Reduced GPU memory usage ratio for rollout to 0.8
- Updated the default save directory path for the trainer to a placeholder variable
- Cleaned up unused and commented-out code to improve configuration file conciseness

* Refactor(metric): Optimize tool metric calculation and data saving logic

- Corrected the data source field for timeline data used during trajectory saving.
- Removed redundant fields in tool execution time, cache hit rate, and error rate statistics.
- Updated .gitignore to add ignore rules for the example script directory.
- Removed unnecessary debugging information from logs to reduce log noise.
- Adjusted log printing in the multi-round interaction execution process to simplify output content.
- Streamlined log code for environment observation and termination checks to improve code readability.

* fix(metric_helper): fix tool cache metric

* fix little bug

* fix(utils): Suppress httpx AsyncClient.aclose() exception warnings

* comments to english

---------

Co-authored-by: binary-husky <[email protected]>
Co-authored-by: Qingxu Fu <[email protected]>
Co-authored-by: qingxu.fu <[email protected]>
binary-husky added a commit that referenced this pull request Jan 23, 2026
* feat(finworld): Added AgentScope learning protocol and OpenJudge evaluation functionality to the FinWorld task.

- Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions.

- Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance.

- Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking.

- Added a finworld.yaml configuration file to define project training and rollout parameters.

- Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran).

- Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability.

- Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination.

* Precommit fix (#4)

* fix end of files

* autoflake import fix

* add mypy check

* fix test bench import

* refactor(finworld): Replace agent protocol and unify configuration updates

- Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature.
- Unified the parameter name of the model tuner to `tuner` and its related attribute references.
- Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`.
- Modified the context overflow judgment logic to prevent tool call blocking.
- Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters.
- Modified the default environment variable values ​​and log saving paths in finworld_judge.py.
- Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading.
- Added the finworld_single.yaml template to adapt to single-machine training configurations.
- Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path.

* feat(finworld): Added FinWorld training environment configuration scripts and templates

- Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import.
- Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths.
- Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests.
- Added a new finworld example directory and related documentation, improving the example project structure.

* refactor(utils): Remove unused extract and compute functions `extract_tool_stats_from_cmts`

* refactor(finworld): Replace the old model with OpenJudge, update evaluation configuration and scripts

- Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method
- Read Judge model parameters from the configuration file first, using environment variables as a fallback
- Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing
- Cleaned up and removed the old `_init_model` singleton method and related code
- Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations
- Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items
- Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script
- Adjusted the `env_service` startup path to improve environment activation compatibility
- Adjusted script log output format and content to enhance the clarity of configuration parameter printing

* feat(task_reader): Support data reading of type jsonl_with_env_service

- Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service.
- Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service.
- Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file.
- Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination.
- Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment.
- Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration.

* feat(core): add finworld task reader support to framework

* feat(finworld): implement specialized data reader and openjudge-based grading logic

* refactor(finworld): optimize configuration templates and prompt engineering

* chore(finworld): update launch scripts and add variant experiment scripts

* feat(finworld): Added support for multi-machine, multi-GPU training scripts and configuration templates:

* chore(git): ignore finworld/yaml/*

* fix(metrics): Fix and enhance the compatibility and debugging output of the metrics update logic

- Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics.
- Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`.
- Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting.
- Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints.
- Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field.
- Removed redundant and deprecated code for extracting `reward_stats` and calculation functions.
- Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data.

* fix(metrics): Remove debug prints and synchronize reward statistics

- Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py`
- Removed debug print statements related to the `log_metrics` key in `finworld.py`
- Removed debug print statements before updating `metadata_stats` in `finworld_judge.py`
- Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation
- Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability.

* chore: "Stop tracking existing yaml files in tutorial directory"

* fix(task_runner): Synchronize reward_stats to log_metrics

feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script

* refactor(script): Refactored the finworld training script, integrating configuration and startup processes.

* Refactor(deep_finance): Replace and remove finworld-related implementations

- Switched the example directory from example_finworld to example_deep_finance
- Modified startup parameters and logic to support deep_finance, replacing the finworld option
- Replaced finworld_reader with deep_finance_reader in the task reader
- Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks
- Updated reward metric tool documentation to support deep_finance
- Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts
- Replaced the keyword "finworld" with "deep_finance" in comments and logs

* refactor(deepfinance): Rename and unify DeepFinance module and config references

- Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format.
- Modify command-line arguments to `--with-deepfinance` for consistency.
- Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`.
- Update the documentation description and file name of the `metric_helper` module to DeepFinance.
- Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix.
- Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration.
- Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`.
- Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`.
- Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`.
- Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`.
- Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names.

* refactor(tutorial): Optimize dynamic generation logic for configuration file paths

* fix(deep_finance): argparse: with-deepfinance

* fix(tutorial): Fixed issues with multi-machine training environment variable settings

* fix(env): Corrected the assignment logic for reward and info when returning environment state

- Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields.
- Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`.
- Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items.
- Modified `save_trajectory_as_json` to always print trajectory saving confirmation information.
- Corrected log comments in `example_deep_finance` to avoid meaningless log output.
- Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality.

* chore(config): Update example_deep_finance configuration and clean up files

- Added a new ignore rule for config file paths in .gitignore
- Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance
- Refactored the deep_finance.yaml configuration file, adjusting project and experiment names
- Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models
- Optimized model paths and training parameter configurations, adding parallel and batch processing settings
- Adjusted data reading methods and training/validation set path placeholders
- Reduced GPU memory usage ratio for rollout to 0.8
- Updated the default save directory path for the trainer to a placeholder variable
- Cleaned up unused and commented-out code to improve configuration file conciseness

* Refactor(metric): Optimize tool metric calculation and data saving logic

- Corrected the data source field for timeline data used during trajectory saving.
- Removed redundant fields in tool execution time, cache hit rate, and error rate statistics.
- Updated .gitignore to add ignore rules for the example script directory.
- Removed unnecessary debugging information from logs to reduce log noise.
- Adjusted log printing in the multi-round interaction execution process to simplify output content.
- Streamlined log code for environment observation and termination checks to improve code readability.

* fix(metric_helper): fix tool cache metric

* fix little bug

* fix(utils): Suppress httpx AsyncClient.aclose() exception warnings

* comments to english

* feat: 支持服务名称前缀功能

- 在 launcher 中添加 --prefix 参数支持
- 在 pty_launch 函数中实现前缀逻辑
- 更新 deep_finance.sh 脚本以使用前缀功能
- 允许在同一环境中运行多个服务实例

* fix: 改进 MultiAgent 消息内容解析逻辑

- 支持 tool_result 格式的消息内容块
- 改进非文本内容的处理逻辑,继续处理其他项而非跳过整个消息
- 添加 tool_use 类型的处理(跳过,因为已通过 tool_calls 字段处理)
- 优化代码结构和注释,提高可读性

* fix: 优化 DeepFinance 判断逻辑和配置

- 修复 tool_stats 提取逻辑,从 log_metrics 中正确获取数据
- 添加惩罚项调试信息输出
- 启用 tool calls 功能(force_disable_toolcalls: False)
- 确保奖励计算准确性

* chore(deps): bump agentscope from 1.0.7 to 1.0.8

* fix(metric_helper): correct trajectory save path and add tool call metric

- Change trajectory save directory from "ctx_trackers" to "trajectory" to organize files better
- Add recording of tool call counts alongside error rates in tool metrics
- Update experiment suffix in deep finance example script for clearer naming convention

* revise message parsing

---------

Co-authored-by: binary-husky <[email protected]>
Co-authored-by: Qingxu Fu <[email protected]>
Co-authored-by: qingxu.fu <[email protected]>
binary-husky added a commit that referenced this pull request Jan 29, 2026
* feat(finworld): Added AgentScope learning protocol and OpenJudge evaluation functionality to the FinWorld task.

- Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions.

- Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance.

- Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking.

- Added a finworld.yaml configuration file to define project training and rollout parameters.

- Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran).

- Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability.

- Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination.

* Precommit fix (#4)

* fix end of files

* autoflake import fix

* add mypy check

* fix test bench import

* refactor(finworld): Replace agent protocol and unify configuration updates

- Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature.
- Unified the parameter name of the model tuner to `tuner` and its related attribute references.
- Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`.
- Modified the context overflow judgment logic to prevent tool call blocking.
- Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters.
- Modified the default environment variable values ​​and log saving paths in finworld_judge.py.
- Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading.
- Added the finworld_single.yaml template to adapt to single-machine training configurations.
- Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path.

* feat(finworld): Added FinWorld training environment configuration scripts and templates

- Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import.
- Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths.
- Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests.
- Added a new finworld example directory and related documentation, improving the example project structure.

* refactor(utils): Remove unused extract and compute functions `extract_tool_stats_from_cmts`

* refactor(finworld): Replace the old model with OpenJudge, update evaluation configuration and scripts

- Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method
- Read Judge model parameters from the configuration file first, using environment variables as a fallback
- Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing
- Cleaned up and removed the old `_init_model` singleton method and related code
- Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations
- Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items
- Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script
- Adjusted the `env_service` startup path to improve environment activation compatibility
- Adjusted script log output format and content to enhance the clarity of configuration parameter printing

* feat(task_reader): Support data reading of type jsonl_with_env_service

- Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service.
- Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service.
- Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file.
- Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination.
- Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment.
- Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration.

* feat(core): add finworld task reader support to framework

* feat(finworld): implement specialized data reader and openjudge-based grading logic

* refactor(finworld): optimize configuration templates and prompt engineering

* chore(finworld): update launch scripts and add variant experiment scripts

* feat(finworld): Added support for multi-machine, multi-GPU training scripts and configuration templates:

* chore(git): ignore finworld/yaml/*

* fix(metrics): Fix and enhance the compatibility and debugging output of the metrics update logic

- Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics.
- Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`.
- Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting.
- Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints.
- Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field.
- Removed redundant and deprecated code for extracting `reward_stats` and calculation functions.
- Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data.

* fix(metrics): Remove debug prints and synchronize reward statistics

- Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py`
- Removed debug print statements related to the `log_metrics` key in `finworld.py`
- Removed debug print statements before updating `metadata_stats` in `finworld_judge.py`
- Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation
- Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability.

* chore: "Stop tracking existing yaml files in tutorial directory"

* fix(task_runner): Synchronize reward_stats to log_metrics

feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script

* refactor(script): Refactored the finworld training script, integrating configuration and startup processes.

* Refactor(deep_finance): Replace and remove finworld-related implementations

- Switched the example directory from example_finworld to example_deep_finance
- Modified startup parameters and logic to support deep_finance, replacing the finworld option
- Replaced finworld_reader with deep_finance_reader in the task reader
- Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks
- Updated reward metric tool documentation to support deep_finance
- Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts
- Replaced the keyword "finworld" with "deep_finance" in comments and logs

* refactor(deepfinance): Rename and unify DeepFinance module and config references

- Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format.
- Modify command-line arguments to `--with-deepfinance` for consistency.
- Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`.
- Update the documentation description and file name of the `metric_helper` module to DeepFinance.
- Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix.
- Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration.
- Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`.
- Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`.
- Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`.
- Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`.
- Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names.

* refactor(tutorial): Optimize dynamic generation logic for configuration file paths

* fix(deep_finance): argparse: with-deepfinance

* fix(tutorial): Fixed issues with multi-machine training environment variable settings

* fix(env): Corrected the assignment logic for reward and info when returning environment state

- Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields.
- Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`.
- Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items.
- Modified `save_trajectory_as_json` to always print trajectory saving confirmation information.
- Corrected log comments in `example_deep_finance` to avoid meaningless log output.
- Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality.

* chore(config): Update example_deep_finance configuration and clean up files

- Added a new ignore rule for config file paths in .gitignore
- Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance
- Refactored the deep_finance.yaml configuration file, adjusting project and experiment names
- Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models
- Optimized model paths and training parameter configurations, adding parallel and batch processing settings
- Adjusted data reading methods and training/validation set path placeholders
- Reduced GPU memory usage ratio for rollout to 0.8
- Updated the default save directory path for the trainer to a placeholder variable
- Cleaned up unused and commented-out code to improve configuration file conciseness

* Refactor(metric): Optimize tool metric calculation and data saving logic

- Corrected the data source field for timeline data used during trajectory saving.
- Removed redundant fields in tool execution time, cache hit rate, and error rate statistics.
- Updated .gitignore to add ignore rules for the example script directory.
- Removed unnecessary debugging information from logs to reduce log noise.
- Adjusted log printing in the multi-round interaction execution process to simplify output content.
- Streamlined log code for environment observation and termination checks to improve code readability.

* fix(metric_helper): fix tool cache metric

* fix little bug

* fix(utils): Suppress httpx AsyncClient.aclose() exception warnings

* comments to english

* feat: 支持服务名称前缀功能

- 在 launcher 中添加 --prefix 参数支持
- 在 pty_launch 函数中实现前缀逻辑
- 更新 deep_finance.sh 脚本以使用前缀功能
- 允许在同一环境中运行多个服务实例

* fix: 改进 MultiAgent 消息内容解析逻辑

- 支持 tool_result 格式的消息内容块
- 改进非文本内容的处理逻辑,继续处理其他项而非跳过整个消息
- 添加 tool_use 类型的处理(跳过,因为已通过 tool_calls 字段处理)
- 优化代码结构和注释,提高可读性

* fix: 优化 DeepFinance 判断逻辑和配置

- 修复 tool_stats 提取逻辑,从 log_metrics 中正确获取数据
- 添加惩罚项调试信息输出
- 启用 tool calls 功能(force_disable_toolcalls: False)
- 确保奖励计算准确性

* chore(deps): bump agentscope from 1.0.7 to 1.0.8

* fix(metric_helper): correct trajectory save path and add tool call metric

- Change trajectory save directory from "ctx_trackers" to "trajectory" to organize files better
- Add recording of tool call counts alongside error rates in tool metrics
- Update experiment suffix in deep finance example script for clearer naming convention

* revise message parsing

* fix(metric_helper): update openjudge graders list in reward metric helper

* feat(deep_finance): replace OpenJudge graders with PresentationQualityGrader

- Remove legacy graders and integrate PresentationQualityGrader and GroundingGrader
- Update grader weights and disable unused graders in config and code
- Simplify grader configuration creation with new mappers for report content and traj
- Refactor DeepFinanceJudgeByOpenJudge to support new grading scheme
- Add PresentationQualityGrader implementation with strict JSON output format
- Include utilities for JSON parsing and validation in presentation quality grader
- Add prompt templates for presentation quality grading criteria and instructions
- Provide example script to run PresentationQualityGrader with OpenAIChatModel
- Add traj_adapter utilities to normalize and extract user query and final report
- Update YAML template to replace old grader weights with presentation quality weight
- Create init files to expose PresentationQualityGrader in judge package

* feat(grounding): implement grounding grader for citation compliance evaluation

- add GroundingGrader class to evaluate citation coverage and truthfulness based on dialogue traj
- provide default OpenAIChatModel creation with deterministic options
- implement prompt construction and JSON parsing utilities for model interaction
- calculate scores including coverage, grounding, and invalid citation penalties
- add detailed json_utils module for strict JSON extraction and validation
- introduce prompt templates defining citation auditing rules and user prompts
- supply reference.py with related grounding evaluation logic and RefJudgeEvaluator class
- create __init__.py to expose GroundingGrader module
- add presentation_quality module __init__.py with PresentationQualityGrader export

* fix(deep_finance_judge): add debug logging for OpenJudge evaluation process

* feat(deep_finance): enhance reward metadata and zero score debugging

- Add populate_reward_metadata_from_stats to copy reward stats into reward metadata
- Populate reward metadata in GeneralRunner if reward_stats present in workflow output
- Refine compute_reward_metrics with updated OpenJudge graders: presentation_quality, grounding, planning
- Add _save_zero_score_debug method in DeepFinanceJudgeByOpenJudge to save debug info for zero grader scores
- Remove deprecated RewardStats usage in deep_finance_judge
- Update judge __init__ to export GroundingGrader alongside PresentationQualityGrader
- Clean up debug print statements and logging in deep_finance_judge.py
- Update .gitignore to exclude prepare_data and judge/analytical_sufficiency folders in example_deep_finance tutorial

* feat(presentation_quality): upgrade grading to 1/3/5 scoring system with markdown cleanup

- Add function to strip markdown code block fences in grounding and presentation_quality modules
- Change presentation quality grader to score each of 8 criteria on a 1/3/5 scale instead of pass/fail
- Normalize total score by dividing sum of item scores by max (40), improving granularity
- Update reasoning output to list lowest scoring items with notes for focused feedback
- Revise presentation quality prompt to reflect new 1/3/5 scoring rubric with detailed instructions
- Adjust JSON output schema accordingly, replacing boolean pass with numeric score fields
- Add get_score utility in JSON utils to extract and validate scores from graded items
- Clean report input by removing markdown fences before grading to avoid markup noise
- Add grounding weight configuration in YAML template for improved modular judge weighting

* chore(config): update experiment suffix, prefix and reward weights in deep_finance.sh

* fix(deep_finance): update environment variables and training launch options

* chore(config): parameterize deep finance training configuration

* chore(config): update experiment suffix, prefix, and weight parameters

* fix(example_deep_finance): update dynamic config file generation path

* refactor(judge): remove deprecated presentation quality script

---------

Co-authored-by: binary-husky <[email protected]>
Co-authored-by: Qingxu Fu <[email protected]>
Co-authored-by: qingxu.fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants