This repository contains a collection of one or more tsdat pipelines (as found under the pipelines folder) for
ingesting and analyzing data collected by three versions of Sofar Spotter buoys (Spotter, Spotter2, Spotter3).
Pipelines relevant to each buoy are tagged with the corresponding version number. The basic workflow is to first run
the raw data through the initial ingest pipeline (spotter_v1,spotter_v2,spotter_v3), followed by the respective
VAP pipeline, which calculates wave parameters.
The v2 pipeline series is the most robust of the three and will likely successfully run data from all Spotter
buoy versions. The latest v3 pipeline will only work with Spotter3 zip files.
Zip folders that are too large can stall out an ingest pipeline; the split_zip_folders.py script located in this
directory can break a large zip up into smaller zip folders.
The repository is made up of the following core pieces:
-
runner.py: Main entry point for running a pipeline. -
pipelines/*: Collection of custom data pipelines usingtsdat. -
pipelines/example_ingest: An out-of-the-box exampletsdatpipeline. -
templates/*: Template(s) used to generate new pipelines. -
shared/*: Shared configuration files that may be used across multiple pipelines. -
utils/*: Utility scripts.
The following are required to develop a tsdat pipeline:
-
A GitHub account. Click here to create an account if you don't have one already
-
An Anaconda environment. We strongly recommend developing in an Anaconda Python environment to ensure that there are no library dependency issues. Click here for more information on installing Anaconda on your computer
Windows Users - You can install Anaconda directly to your Windows box OR you can run via a linux environment using the Windows Subsystem for Linux (WSL). See this tutorial on WSL for how to set up a WSL environment and attach VS Code to it.
You can create a new repository based upon the tsdat pipeline-template repository in GitHub:
-
Click this 'Use this template' link and follow the steps to copy the template repository into to your account.
NOTE: If you are looking to get an older version of the template, you will need to select the box next to 'Include all branches' and set the branch your are interested in as your new default branch.
-
On github click the 'Code' button to get a link to your code, then run
git clone <the link you copied>
from the terminal on your computer where you would like to work on the code.
-
Open a terminal shell from your computer
- Linux or Mac: open a regular terminal
- Windows: open an Anaconda prompt if you installed Anaconda directly to Windows, OR open a WSL terminal if you installed Anaconda via WSL.
-
Run the following commands to create and activate your conda environment:
conda env create conda activate spotter
-
Verify your environment is set up correctly by running the tests for this repository:
pytest
If you get the following warning message when running the test:
UserWarning: pyproj unable to set database path.Then run the following additional commands to permanently remove this warning message:
conda remove --force pyproj pip install pyproj
If everything is set up correctly then all the tests should pass.
-
Open the cloned repository in VS Code. (This repository contains default settings for VS Code that will make it much easier to get started quickly.)
-
Install the recommended extensions (there should be a pop-up in VS Code with recommendations).
Windows Users: In order to run python scripts in VSCode, follow steps A-C below:
A. Install the extension Code Runner (authored by Jun Han).
B. Press
F1, typePreferences: Open User Settings (JSON)and select it.C. Add the following lines to the list of user settings, and update
<path to anaconda>for your machine:{ "terminal.integrated.defaultProfile.windows": "Command Prompt", "python.condaPath": "C:/<path to anaconda>/Anaconda3/python.exe", "python.terminal.activateEnvironment": true, "code-runner.executorMap": { "python": "C:/<path to anaconda>/Anaconda3/Scripts/activate.bat && $pythonPath $fullFileName" }, } -
Tell VS Code to use your new conda environment:
- Press
F1to bring up the command pane in VS Code - Type
Python: Select Interpreterand select it. - Select the newly-created
tsdat-pipelinesconda environment from the drop-down list. You may need to refresh the list (cycle icon in the top right) to see it. - Bring up the command pane and type
Developer: Reload Windowto reload VS Code and ensure the settings changes propagate correctly.
- Press
-
Verify your VS Code environment is set up correctly by running the tests for this repository:
- Press
F1to bring up the command pane in VS Code - Type
Test: Run All Testsand select it - A new window pane will show up on the left of VS Code showing test status
- Verify that all tests have passed (Green check marks)
- Press
-
The
runner.pyscript can be run from the command line to process input data files:python runner.py <ingest, vap> <path(s) to file(s) to process>
The pipeline(s) used to process the data will depend on the specific patterns declared by the
pipeline.yamlfiles in each pipeline module in this repository. -
The
runner.pyscript can optionally take a glob pattern in addition to a filepath. E.g., to process all 'csv' files in some input folderdata/to/process/you would run:python runner.py ingest data/to/process/*.csv -
The
--helpoption can be used to show additional usage information:python runner.py --help
-
Value Added Product (VAP) Pipelines operate on the output of ingest pipelines.
-
The command to run these pipelines has a slightly different structure, where we enter the pipeline.yaml configuration file to use, as well as a start and end date:
python runner.py vap <pipeline/<pipeline-name>/config/pipeline.yaml> --begin yyyymmdd.HHMMSS --end yyyymmdd.HHMMSS
-
The --help option can also be used here if you get stuck:
python runner.py vap --help
- Learn more about
tsdat:- GitHub: https://github.com/tsdat/tsdat
- Documentation: https://tsdat.readthedocs.io
- Data standards: https://github.com/tsdat/data_standards
- Learn more about
xarray:- GitHub: https://github.com/pydata/xarray
- Documentation: https://xarray.pydata.org
- Learn more about 'pydantic':
- GitHub: https://github.com/samuelcolvin/pydantic/
- Documentation: https://pydantic-docs.helpmanual.io
- Other useful tools:
- VS Code: https://code.visualstudio.com/docs
- Docker: https://docs.docker.com/get-started/
pytest: https://github.com/pytest-dev/pytestblack: https://github.com/psf/blackmatplotlibguide: https://realpython.com/python-matplotlib-guide/