A couple of scripts for extracting point timeseries data from the CHESS-SCAPE dataset stored in JASMIN's S3 object store. The CHESS-SCAPE dataset consists of 4 ensemble members and 4 RCP warming scenarios at a daily, 1km resolution. The extract_point.py script extracts out the nearest gridpoint for lon/lat coordinates specified, of the specified ensemble member for a specified year for RCP8.5 only. The extract_grid.py file is similar, but extracts out all the gridpoints within a specified bounding box and time-period into separate csv files per gridpoint. (Note: They only works for ensmem 01 for now). Data is linearly interpolated from a 360day calendar to a gregorian calendar using Xarray's convert_calendar function. The extract_point.py script outputs a single csv file YYYY_ENSMEM_LON_LAT.csv. The extract_grid.py file outputs a single csv file per gridpoint chess-scape_YYYY1-YYYY2_ENSMEM_X_Y.csv. All csv files are structured with rows representing days and columns for:
- YEAR: The calendar year (extract_grid.py only)
- DOY: Day of year
- RAD: Total shortwave radiation in MJ/m^2/day
- MINTMP: Minimum temperature in degC
- MAXTMP: Maximum temperature in degC
- VP: Vapour pressure in kPa
- WIND: Surface wind speed in m/s
- RAIN: Total precipitation in mm/day
- CO2: CO2 concentration in ppmv according to the RCP8.5 pathway (note this only varies by calendar year, so will be the same for every day of a given calendar year).
The vapour pressure
These scripts were designed with the R-version of the LINGRA-N Grass Yield model in mind, but could easily be adapted for other use-cases.
These scripts require a python environment with the following packages installed:
These can be installed using your python package manager of choice. E.g.:
conda create --name chess-scape-extract-env -c conda-forge numpy scipy xarray pandas>=2 zarr>=3 dask cftime s3fs pyproj
to create a new environment in which to run the scripts, or:
conda install -c conda-forge numpy scipy xarray pandas>=2 zarr>=3 dask cftime s3fs pyproj
to install into an existing environment, or:
conda create --name chess-scape-extract-env -f conda-envfile.txt
to install into a new environment using the conda environment file provided here.
python -m venv ~/chess-scape-extract-env
source ~/chess-scape-extract-env/bin/activate
to create a new virtual python environment, then
pip install numpy scipy xarray pandas>=2 zarr>=3 dask cftime s3fs pyproj
or
pip install -r requirements.txt
to install.
Now that you have an environment suitable for running the scripts, obtain a copy of them by downloading them from their respective file pages: extract_point.py, extract_grid.py, or cloning the repository with
git clone git@github.com:ukceh-rse/chess-scape-extract.git
Once installed, the scripts can be run by navigating to the folder containing them and executing them as e.g.:
python extract_point.py --lon -1 --lat 52 --year 2020 --ensmem 01
python extract_grid.py --ensmem 01 --outpath "path/to/output/folder" --xllcorner 0 --yllcorner 0 --xurcorner 10000 --yurcorner 10000 --startdate "1981-01-01"--enddate "2079-12-31"
or
ipython extract_point.py -- --lon -1 --lat 52 --year 2020 --ensmem 01
ipython extract_grid.py -- --ensmem 01 --outpath "path/to/output/folder" --xllcorner 0 --yllcorner 0 --xurcorner 10000 --yurcorner 10000 --startdate "1981-01-01"--enddate "2079-12-31"
if using ipython.
The four arguments required to be passed to extract_point.py are:
--lonlongitude coordinate of location to extract nearest gridpoint from--latlatitude coordinate of location to extract nearest gridpoint from--yearyear of data to extract (from 1980 to 2080)--ensmemwhich ensemble member of the CHESS-SCAPE dataset to use. Possible options are '01', '04', '06', '15'.
The six arguments required to be passed to extract_grid.py are:
--ensmemwhich ensemble member of the CHESS-SCAPE dataset to use. Possible options are '01', '04', '06', '15'--outpathfolder in which to put the output csv files--xllcornerx coordinate of the "lower left" corner of the bounding box within which all grid points will be extracted--yllcornery coordinate of the "lower left" corner of the bounding box within which all grid points will be extracted--xurcornerx coordinate of the "upper right" corner of the bounding box within which all grid points will be extracted--yurcornery coordinate of the "upper right" corner of the bounding box within which all grid points will be extracted--startdatestart date (inclusive) of the period of data to extract in "YYYY-MM-DD"" format--enddateend date (inclusive) of the period of data to extract in "YYYY-MM-DD"" format
Further instructions for extract_grid.py when extracting large numbers of gridpoints
This script works by pulling out the requested block of data from the remote object store into memory before reformatting into csv files. The loading into memory means a single run of this script can consume a lot of memory resources if a large area and long time-span is requested. Therefore in these scenarios it is recommended to split the area into smaller regions and/or run each individual region in parallel on a HPC.
A example batch-job submission script for doing this on a SLURM-based HPC environment is provided as template.sbatch. It is set up for the JASMIN LOTUS2 cluster, which requires certain options or 'SLURM Flags' at the top of the file that might not be necessary or might need to be different for other systems. Hopefully the documentation for your HPC environment will provide information on what is required.
The template.sbatch file submits several jobs that each cover a small area of the total spatial extent of the dataset, using a 'Task ID' unique to each job to select out a row of the joblist.txt file. Each row of the joblist.txt file contains a set of arguments for the extract_grid.py script, essentially adjusting the xllcorner, yllcorner, xurcorner, yurcorner arguments to cover a different area for each job.