Skip to content

Segmentation fault when running with multi-threads #128

@WernherwW

Description

@WernherwW

Hello, all. When running my code (based on STRUMPACK solver) on CPUs using more than one threads, the code crashes once the matrix to solve is slightly heavy (~GB memory consumption). The error log only gives
Segmentation fault
or
Segmentation error - invalid memory reference
The output log of my code ends with

# multifrontal factorization:
#   - estimated memory usage (exact solver) = 6483.55 MB
#   - minimum pivot, sqrt(eps)*|A|_1 = 9.91058e-07
#   - replacing of small pivots is not enabled

I tried to recompile STRUMPACK with cmake or easyBuild but it didn't work. If I use OMP_NUM_THREADS=1, the code works well until the work size exceeds the capability of one node (then the code ends with out of memory as expected).

This error is machine dependent since it only occurs after I moved to a new HPC: LUMI-C (128 cores per node, 2GB memory per core). The memory I use in multi-thread runs should be much more than enough compared to what I used on previous machines. I would accuse it on the multi-thread adaption problem of STRUMPACK on LUMI-C, but I failed to solve this issue with the LUNI-C's helpdesk. I would appreciate it a lot if I get any clues here.

For more information, similar Segmentation fault also happens when I run the test_BLR_mpi in STRUMPACK examples. Once the
matrix N reaches 100000, the test crashes with segmentation fault. More specifically, I tested OMP_NUM_THREADS=128 (2 nodes * 1 task per node * 128 cores per task) and OMP_NUM_THREADS=1 (2 nodes * 128 tasks per node * 1 core per task). Both work under N=60000, while under N=100000, OMP_NUM_THREADS=128 case crashes with Segmentation fault while OMP_NUM_THREADS=1 case crashes with out of memory.

Many thanks in advance.

Best regards,

Chizhou

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions