This archive is distributed in association with the journal Operations Research under the MIT License.
The software and data in this repository are a snapshot of the software and data that were used in the research reported in the paper Pricing Shared Rides by Chiwei Yan, Julia Yan, and Yifan Shen.
To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.
https://doi.org/10.1287/opre.2023.0513
https://doi.org/10.1287/opre.2023.0513.cd
Below is the BibTex for citing this snapshot of the repository.
@misc{yan2024pricing,
author = {Yan, Chiwei and Yan, Julia and Shen, Yifan},
publisher = {Operations Research},
title = {{Pricing Shared Rides}},
year = {2025},
doi = {10.1287/opre.2023.0513.cd},
note = {Available for download at https://github.com/ORJournal/2023.0513},
}
The goal of this repository is to replicate the computational experiments described in Section 4 of the paper Pricing Shared Rides by Chiwei Yan, Julia Yan, and Yifan Shen.
This project was developed and tested using Python 3.9.16. The following Python packages are required:
- numpy==1.24.3
- pandas==1.5.3
- gurobipy==11.0.2
- networkx==3.1
- osmnx==1.3.1
- matplotlib==3.7.1
- graph-tool==2.55
- tqdm
- jupyter
- ipython
It is recommended to use Conda to manage the Python environment and install the dependencies. Run the following commands in your terminal to create and activate the Conda environment:
conda env create -f environment.yml
conda activate shared_pricing_envImportant:
- This project relies on the
graph-toolpackage. For Linux and MacOS users,graph-toolcan be installed via Conda with the command above and you don't need to install it separately. For Windows users, please refer to the graph-tool installation instructions for guidance on using Docker. - This project uses Gurobi as the optimization solver. You must activate a valid license (see Gurobi).
- The experiments were conducted on a Linux-based virtual machine with 8 CPU cores and 128 GB of RAM.
- It is recommended to replicate the results on a machine with at least 4 CPU cores and at least 25 GB of RAM.
- Running the script for a single representative parameter setting usually took 7-10 hours (to finish the complete training and evaluation process) on the above machine, with a peak memory usage around 20 GB RAM.
Scripts for running experiments and visualizing results are provided in the scripts folder. The source codes for the algorithms are located in the src folder, and the input data in the data folder. The plots (corresponding to Figures 9, 10, 11 in the paper) are saved in the results folder.
- To interactively implement a quick example, use:
This notebook allows you to run a quick demo with a small instance with only 10 rider types (compared to 244 in the full instance), which runs in about 10 minutes and requires approximately 7 GB of RAM. To run a larger or full instance, you can adjust the parameter
scripts/computational_experiments.ipynb
n_quickin the notebook. - To run all experiments under all parameter settings and generate full results:
The results include data in Tables 2 (test set performance metrics) and EC2 (training set performance metrics) of the paper, and plots for Figures 9–11 of the paper (map visualizations for one representative setting of
python scripts/computational_experiments.py | tsc=0.9USD/mile andsojourn_time=300seconds). You can change the parametersc_valuesandsojourn_time_valuesat the end of the script to generate results for a subset of parameter settings.
This directory contains scripts for running computational experiments and generating results from the paper.
computational_experiments.pyruns the full set of experiments across all parameter settings and saves results to theresultsfolder.computational_experiments.ipynbis an interactive notebook version for a specific parameter setting. You can run a quick demo with fewer rider types or run a full instance by changing the parameters in the notebook.
This directory contains the source files of the implementation of the algorithms:
instance_data.pycontains the class that stores input data and parameters for each instance.network.pycontains the class for modeling the network and computing distances.policies.pycontains the class to optimize the pricing and matching policies.simulation.pycontains the class to simulate shared ride operations under given policies.evaluator.pycontains the class to evaluate the performance metrics of the policies based on simulation results.utils.pycontains utility functions.
This directory contains the input data used in the experiments.
Chicago_network.picklecontains the Chicago road network data processed based on OpenStreetMap.Chicago_zone.picklecontains the zone data for Chicago, which includes:shapes_gdfis the data for the original 76 community areas (excluding O’Hare International Airport) in Chicago, from CARTO.clusters_gdfis the 42 zones aggregated via k-means clustering.
Chicago_rider_types.csvcontains the aggregated rider types data for Chicago. Each rider type corresponds to a pick-up node, a drop-off node, and an arrival rate (# per second).- The rider types are aggregated based on the real Chicago shared ride requests data over an eight-week horizon in October and November 2019 during Monday morning peak hours (7:30-8:30 a.m., from 2019-10-07 to 2019-11-25). The original ride-sharing data is available at the Chicago Data Portal.
- A two-step clustering method is used to construct the 42 rider types:
- The 76 original community areas of Chicago are grouped into 42 aggregated zones using k-means clustering (as in
Chicago_zone.pickle). - Within each pick-up zone, riders are further grouped by their drop-off locations using another round of k-means clustering.
- The 76 original community areas of Chicago are grouped into 42 aggregated zones using k-means clustering (as in
- Each resulting rider type has a minimum of 5 trip records in the training dataset.
Chicago_demand.picklecontains the arrival data over training (weeks 1-7) and test sets (week 8), which includes:arrival_types: maps dataset (training or test) and date to a rider type list.arrival_times: maps dataset (training or test) and date to an arrival time (0 to 3600 seconds) list.
This directory stores the results of the computational experiments, including:
tables/contains the data for Tables 2 and EC2 in the paper. Each CSV file corresponds to a specific parameter setting (candsojourn_time) under the training or test dataset.figures/contains the map visualizations for Figures 9, 10, and 11 in the paper. Note that Figure 11(a) only shows the demand density distribution and is not part of the computational experiment results, so it is not included.log_c=0.7_sojourn_time=30.txtis a sample log file generated during the execution ofcomputational_experiments.pyunder the parameter settingc = 0.7USD/mile andsojourn_time = 30seconds.
