README

A couple of scripts for extracting point timeseries data from the CHESS-SCAPE dataset stored in JASMIN's S3 object store. The CHESS-SCAPE dataset consists of 4 ensemble members and 4 RCP warming scenarios at a daily, 1km resolution. The extract_point.py script extracts out the nearest gridpoint for lon/lat coordinates specified, of the specified ensemble member for a specified year for RCP8.5 only. The extract_grid.py file is similar, but extracts out all the gridpoints within a specified bounding box and time-period into separate csv files per gridpoint. (Note: They only works for ensmem 01 for now). Data is linearly interpolated from a 360day calendar to a gregorian calendar using Xarray's convert_calendar function. The extract_point.py script outputs a single csv file YYYY_ENSMEM_LON_LAT.csv. The extract_grid.py file outputs a single csv file per gridpoint chess-scape_YYYY1-YYYY2_ENSMEM_X_Y.csv. All csv files are structured with rows representing days and columns for:

YEAR: The calendar year (extract_grid.py only)
DOY: Day of year
RAD: Total shortwave radiation in MJ/m^2/day
MINTMP: Minimum temperature in degC
MAXTMP: Maximum temperature in degC
VP: Vapour pressure in kPa
WIND: Surface wind speed in m/s
RAIN: Total precipitation in mm/day
CO2: CO2 concentration in ppmv according to the RCP8.5 pathway (note this only varies by calendar year, so will be the same for every day of a given calendar year).

The vapour pressure $e$ is derived from the specific humidity $q$ and surface air pressure $p$ using the following equation:

$$e\approx \frac{qp}{0.622 + 0.378q}$$

These scripts were designed with the R-version of the LINGRA-N Grass Yield model in mind, but could easily be adapted for other use-cases.

Installation

Requirements

These scripts require a python environment with the following packages installed:

These can be installed using your python package manager of choice. E.g.:

Anaconda

conda create --name chess-scape-extract-env -c conda-forge numpy scipy xarray pandas>=2 zarr>=3 dask cftime s3fs pyproj

to create a new environment in which to run the scripts, or:

conda install -c conda-forge numpy scipy xarray pandas>=2 zarr>=3 dask cftime s3fs pyproj

to install into an existing environment, or:

conda create --name chess-scape-extract-env -f conda-envfile.txt

to install into a new environment using the conda environment file provided here.

Pip

python -m venv ~/chess-scape-extract-env source ~/chess-scape-extract-env/bin/activate

to create a new virtual python environment, then

pip install numpy scipy xarray pandas>=2 zarr>=3 dask cftime s3fs pyproj

or

pip install -r requirements.txt

to install.

Now that you have an environment suitable for running the scripts, obtain a copy of them by downloading them from their respective file pages: extract_point.py, extract_grid.py, or cloning the repository with

git clone git@github.com:ukceh-rse/chess-scape-extract.git

Running instructions

Once installed, the scripts can be run by navigating to the folder containing them and executing them as e.g.:

python extract_point.py --lon -1 --lat 52 --year 2020 --ensmem 01

python extract_grid.py --ensmem 01 --outpath "path/to/output/folder" --xllcorner 0 --yllcorner 0 --xurcorner 10000 --yurcorner 10000 --startdate "1981-01-01"--enddate "2079-12-31"

or

ipython extract_point.py -- --lon -1 --lat 52 --year 2020 --ensmem 01

ipython extract_grid.py -- --ensmem 01 --outpath "path/to/output/folder" --xllcorner 0 --yllcorner 0 --xurcorner 10000 --yurcorner 10000 --startdate "1981-01-01"--enddate "2079-12-31"

if using ipython.

The four arguments required to be passed to extract_point.py are:

--lon longitude coordinate of location to extract nearest gridpoint from
--lat latitude coordinate of location to extract nearest gridpoint from
--year year of data to extract (from 1980 to 2080)
--ensmem which ensemble member of the CHESS-SCAPE dataset to use. Possible options are '01', '04', '06', '15'.

The six arguments required to be passed to extract_grid.py are:

--ensmem which ensemble member of the CHESS-SCAPE dataset to use. Possible options are '01', '04', '06', '15'
--outpath folder in which to put the output csv files
--xllcorner x coordinate of the "lower left" corner of the bounding box within which all grid points will be extracted
--yllcorner y coordinate of the "lower left" corner of the bounding box within which all grid points will be extracted
--xurcorner x coordinate of the "upper right" corner of the bounding box within which all grid points will be extracted
--yurcorner y coordinate of the "upper right" corner of the bounding box within which all grid points will be extracted
--startdate start date (inclusive) of the period of data to extract in "YYYY-MM-DD"" format
--enddate end date (inclusive) of the period of data to extract in "YYYY-MM-DD"" format

Further instructions for extract_grid.py when extracting large numbers of gridpoints

This script works by pulling out the requested block of data from the remote object store into memory before reformatting into csv files. The loading into memory means a single run of this script can consume a lot of memory resources if a large area and long time-span is requested. Therefore in these scenarios it is recommended to split the area into smaller regions and/or run each individual region in parallel on a HPC.

A example batch-job submission script for doing this on a SLURM-based HPC environment is provided as template.sbatch. It is set up for the JASMIN LOTUS2 cluster, which requires certain options or 'SLURM Flags' at the top of the file that might not be necessary or might need to be different for other systems. Hopefully the documentation for your HPC environment will provide information on what is required.

The template.sbatch file submits several jobs that each cover a small area of the total spatial extent of the dataset, using a 'Task ID' unique to each job to select out a row of the joblist.txt file. Each row of the joblist.txt file contains a set of arguments for the extract_grid.py script, essentially adjusting the xllcorner, yllcorner, xurcorner, yurcorner arguments to cover a different area for each job.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checker.py		checker.py
conda-envfile.txt		conda-envfile.txt
extract_grid.py		extract_grid.py
extract_point.py		extract_point.py
joblist.txt		joblist.txt
requirements.txt		requirements.txt
template.sbatch		template.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Installation

Requirements

Anaconda

Pip

Running instructions

Further instructions for extract_grid.py when extracting large numbers of gridpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

README

Installation

Requirements

Anaconda

Pip

Running instructions

Further instructions for extract_grid.py when extracting large numbers of gridpoints

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages