Conversation
There was a problem hiding this comment.
Code Review
This pull request upgrades the project to Go 1.26.1 and Python 3.14, while also updating several core dependencies. The primary functional change is the implementation of more robust secret masking across various data sources (Git, HTTP, HuggingFace, ModelScope, and S3) to prevent sensitive credentials from leaking into logs and error messages. The feedback focuses on extending this masking to URIs that might contain embedded credentials and optimizing the secret obscuration logic to handle large keys more efficiently and avoid redundant processing.
| if err != nil { | ||
| logger.Errorf("rclone copy command error: %s", errBuffer) | ||
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", fromURI, toPath, cmd.String(), err) | ||
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", fromURI, toPath, utils.ObscureString(cmd.String(), []string{basicAuthBase64}), err) |
There was a problem hiding this comment.
The fromURI is included in the error message without being obscured. If the user provides a URI that contains embedded credentials (e.g., http://user:pass@host), these will be leaked in the error logs. Please ensure fromURI is also passed through utils.ObscureString before being included in the error message.
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", fromURI, toPath, utils.ObscureString(cmd.String(), []string{basicAuthBase64}), err) | |
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", utils.ObscureString(fromURI, []string{basicAuthBase64}), toPath, utils.ObscureString(cmd.String(), []string{basicAuthBase64}), err) |
| func obscureSecrets(secrets ...string) []string { | ||
| var result []string | ||
| for _, s := range secrets { | ||
| if s == "" { | ||
| continue | ||
| } | ||
| result = append(result, s) | ||
| result = append(result, url.QueryEscape(s)) | ||
| // In certain scenarios (such as shell script processing, cmd.String() formatting, etc.), | ||
| // URL-encoded strings may be encoded again (e.g., % becomes %25). | ||
| // Add double-encoded version to defensively cover this case. | ||
| result = append(result, url.QueryEscape(url.QueryEscape(s))) | ||
| } | ||
| return result | ||
| } |
There was a problem hiding this comment.
The obscureSecrets function can be improved to avoid redundant string replacements and handle large secrets more efficiently. Currently, it adds multiple versions of every secret, including large blobs like PEM-encoded SSH private keys, which can lead to performance overhead and potential false positives in logs. Additionally, using a map to ensure the returned list contains unique strings will improve the efficiency of the subsequent strings.ReplaceAll calls.
func obscureSecrets(secrets ...string) []string {
var result []string
seen := make(map[string]struct{})
add := func(s string) {
if s != "" {
if _, ok := seen[s]; !ok {
result = append(result, s)
seen[s] = struct{}{}
}
}
}
for _, s := range secrets {
add(s)
// Only URL-encode secrets that are short enough to likely appear in a URL.
// Large blobs like PEM-encoded private keys are not used this way and
// encoding them is expensive and increases false positives.
if len(s) < 1024 {
escaped := url.QueryEscape(s)
add(escaped)
add(url.QueryEscape(escaped))
}
}
return result
}| if err != nil { | ||
| logger.Errorf("rclone copy command error: %s", errBuffer) | ||
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", fromURI, toPath, cmd.String(), err) | ||
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", fromURI, toPath, utils.ObscureString(cmd.String(), []string{accessKeyID, secretAccessKey}), err) |
There was a problem hiding this comment.
The fromURI is included in the error message without being obscured. Although S3 URIs typically do not contain credentials in the same way HTTP URIs might, it is best practice to obscure it consistently to prevent accidental leaks if a non-standard or proxied URI is used.
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", fromURI, toPath, utils.ObscureString(cmd.String(), []string{accessKeyID, secretAccessKey}), err) | |
| return fmt.Errorf("failed to copy data from %s to %s with rclone command %s, err: %s", utils.ObscureString(fromURI, []string{accessKeyID, secretAccessKey}), toPath, utils.ObscureString(cmd.String(), []string{accessKeyID, secretAccessKey}), err) |
Sanitize sensitive credentials across all data-source loaders and the shared command utility to prevent passwords, tokens, and keys from appearing in logs and error messages.
when use fakeuser/fake/password in git, it outputs the
https://fakeuser:fakepass%2F123@gitlab.daocloud.cn/ndx/mcamel/mcamel-mysql.git