This repository contains a Python script designed to upload large files to Foundry datasets, especially when other upload methods are not available. The script tracks uploaded files and avoids redundant uploads, providing a seamless experience for managing large data transfers.
- Upload files from a specified directory to a Foundry dataset.
- Avoid redundant uploads by tracking previously uploaded files.
- Use environment variables for configuration.
- Display progress bars for file uploads using
tqdm.
Before using the script, ensure the following:
- Python Environment: Install Python 3.8 or higher.
- Required Libraries: Install the following Python packages:
pip install foundry-dev-tools urllib3 tqdm boto3
- Environment Variables: Set up the required environment variables:
FOUNDRY_TOKEN: Foundry access token.FOUNDRY_HOST: Foundry host URL.INPUT_PATH: Path to the directory containing the files to upload.TARGET_DATASET_RID: Resource ID of the target Foundry dataset.
The script requires the following environment variables:
FOUNDRY_TOKEN: Your Foundry access token for authentication.FOUNDRY_HOST: The Foundry instance URL.INPUT_PATH: Directory containing files to be uploaded.TARGET_DATASET_RID: The resource ID of the target dataset in Foundry.
Use a .env file or export the variables in your shell session:
export FOUNDRY_TOKEN="your_token"
export FOUNDRY_HOST="your_host"
export INPUT_PATH="/path/to/your/files"
export TARGET_DATASET_RID="your_dataset_rid"-
Clone the repository:
git clone https://github.com/arukavina/foundry_upload.git cd foundry-file-upload -
Set up the required environment variables as described above.
-
Place your target files in the directory specified by
INPUT_PATH. -
Run the script:
python upload_files.py
-
Monitor the progress bars for each file being uploaded.
-
Review the
uploaded_files.jsonfile to track uploaded files.
-
File Filtering: The script scans the directory specified by
INPUT_PATHand filters files based on their extension (default:.rpt). Modify theFILE_EXTENSIONvariable to target a different file type. -
Upload Tracking:
- The script tracks uploaded files using a JSON file (
uploaded_files.json) stored in the input directory. - Functions
load_uploaded_filesandsave_uploaded_filesmanage this tracking.
- The script tracks uploaded files using a JSON file (
-
File Uploads: Files are uploaded to the specified Foundry dataset using an S3 client provided by the
foundry_dev_toolslibrary. -
Error Handling: The script gracefully handles upload errors and continues processing other files.
The upload_file_to_foundry function manages the file upload process:
@contextlib.contextmanager
def upload_file_to_foundry(ctx, file_path):
boto3_client = ctx.s3.get_boto3_client(verify=False)
file_size = file_path.stat().st_size
path_in_dataset = file_path.name
with tqdm(total=file_size, desc=path_in_dataset, unit="B", unit_scale=True) as pbar:
boto3_client.upload_file(
str(file_path), TARGET_DATASET_RID, path_in_dataset, Callback=pbar.update
)At the end of the script, a list of successfully uploaded files is displayed:
print("Successfully uploaded files:")
for uploaded_file in uploaded_files:
print(uploaded_file)- Ensure that the
FOUNDRY_TOKENandFOUNDRY_HOSTvalues are correct to avoid authentication issues. - Files already listed in
uploaded_files.jsonare skipped. - Modify
FILE_EXTENSIONto target a different file type if needed.
See the LICENSE file for details.
By using this script, you can efficiently upload large volumes of data to Foundry, bypassing other upload constraints.