Skip to content

Restarting with overset #700

@adityakpandare

Description

@adityakpandare

Quinoa uses charm’s collide library to do mesh-to-mesh solution transfers for overset meshes. That library does not implement checkpointing, therefore causing any checkpoint-restarted instance involving mesh-to-mesh transfers (i.e. overset meshes) to fail in quinoa. Early attempts to fix this are on the migratable-collidev701 branch (https://github.com/adityakpandare/charm/tree/migratable-collidev701). The restarted run deadlocks after the changes in charm on the migratable-collidev701 branch.

Here are the steps to get to the deadlocking and a stack-trace:

  1. build charm’s 'migratable-collidev701' branch using following command: ./buildold LIBS mpi-linux-x86_64 --enable-randomized-msgq --with-prio-type=int --enable-error-checking
  2. build quinoa-tpl using following cmake config: cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> ../
  3. build quinoa's 'overset_migratablecollide' branch (https://github.com/quinoacomputing/quinoa/tree/overset_migratablecollide) with following cmake config: cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> -DTPL_DIR=<tpl-installdir-path> ../src/
  4. Run quinoa (regular, non-restarted run): ../quinoa/build/debug-overset-restart/Main/inciter -c control_file.lua -v
  5. Next, attempt to restart above run as as: ../quinoa/build/debug-overset-restart/Main/inciter -c restart_control_file.lua +restart ./restart/ -v

This gives a deadlocking at the first instance where the collide library is invoked.

Fixing this issue will involve changes from Charm's collide library.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions