-
Notifications
You must be signed in to change notification settings - Fork 22
Restarting with overset #700
Copy link
Copy link
Open
Description
Quinoa uses charm’s collide library to do mesh-to-mesh solution transfers for overset meshes. That library does not implement checkpointing, therefore causing any checkpoint-restarted instance involving mesh-to-mesh transfers (i.e. overset meshes) to fail in quinoa. Early attempts to fix this are on the migratable-collidev701 branch (https://github.com/adityakpandare/charm/tree/migratable-collidev701). The restarted run deadlocks after the changes in charm on the migratable-collidev701 branch.
Here are the steps to get to the deadlocking and a stack-trace:
- build charm’s 'migratable-collidev701' branch using following command:
./buildold LIBS mpi-linux-x86_64 --enable-randomized-msgq --with-prio-type=int --enable-error-checking - build quinoa-tpl using following cmake config:
cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> ../ - build quinoa's 'overset_migratablecollide' branch (https://github.com/quinoacomputing/quinoa/tree/overset_migratablecollide) with following cmake config:
cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> -DTPL_DIR=<tpl-installdir-path> ../src/ - Run quinoa (regular, non-restarted run):
../quinoa/build/debug-overset-restart/Main/inciter -c control_file.lua -v - Next, attempt to restart above run as as:
../quinoa/build/debug-overset-restart/Main/inciter -c restart_control_file.lua +restart ./restart/ -v
This gives a deadlocking at the first instance where the collide library is invoked.
Fixing this issue will involve changes from Charm's collide library.
Reactions are currently unavailable