Skip to content

Inconsistent results with irlba even with set.seed #76

@dnzdmrcgl

Description

@dnzdmrcgl

We are experiencing issues using Seurat RunPCA function which we traced to the irlba call. With increasing PCs requested (translating to increasing nv and work values) we start to get inconsistent PCs. We figured out that even with the same nv value, using larger work values leads to this behavior. You can find a sample code to reproduce the problem and an input matrix.
A = as.matrix(read.csv("/path/to/input_matrix.csv"))
log = sapply(seq(15, 25, by = 1), function(x) { set.seed(42) tmp = irlba::irlba(A = A, nv = 10, work = x) cat(paste0("Current work value: ", x, " - d values: [", paste0(tmp$d, collapse = ", "), "]\n")) })
which gives the following output:

Current work value: 15 - d values: [364.515240273598, 230.851108439113, 198.782360784733, 192.484410005171, 162.868907525397, 130.710484581722, 106.207022225521, 99.8805706613859, 94.8473720662265, 94.2177890885707]
Current work value: 16 - d values: [364.515240273598, 230.851108439113, 198.782360784733, 192.48441000517, 162.868907525397, 130.710484581722, 106.207022225521, 99.8805706613858, 94.8473720655719, 94.2177890013402]
Current work value: 17 - d values: [364.515240273598, 230.851108439113, 198.782360784733, 192.48441000517, 162.868907525396, 130.710484581722, 106.207022225521, 99.8805706613859, 94.8473720663312, 94.2177891453379]
Current work value: 18 - d values: [364.515240273597, 230.851108439113, 198.782360784733, 192.48441000517, 162.868907525397, 130.710484581721, 106.207022225521, 99.8805706613861, 94.8473720663872, 94.2177891570991]
Current work value: 19 - d values: [364.515240273598, 230.851108439113, 198.782360784733, 192.48441000517, 162.868907525396, 130.710484581722, 106.20702222552, 99.8805706613862, 94.8473720662433, 94.2177891383719]
Current work value: 20 - d values: [364.515240273597, 230.851108439113, 198.782360784733, 192.48441000517, 162.868907525397, 130.710484581721, 106.20702222552, 99.8805706613856, 94.8473720661785, 94.2177891316728]
Current work value: 21 - d values: [364.515240273598, 230.851108439112, 198.782360784733, 192.48441000517, 162.868907525397, 130.710484581722, 106.20702222552, 99.8805706613861, 94.847372066395, 94.2177891611964]
Current work value: 22 - d values: [416.015144115659, 414.710365588711, 389.658535862618, 364.515240273598, 319.68042641293, 260.294111746797, 230.851108439114, 205.688142252177, 198.782360784734, 192.484410005169]
Current work value: 23 - d values: [364.515240273598, 230.851108439113, 198.782360784733, 192.769425656996, 192.48441000517, 162.868907525397, 162.1301500304, 153.485449556037, 130.710484581723, 129.960818263765]
Current work value: 24 - d values: [364.515240273597, 268.676822519523, 230.851108439113, 209.914356761244, 198.782360784733, 197.549569951056, 194.709944290627, 192.48441000517, 189.414109298803, 162.868907525397]
Current work value: 25 - d values: [364.515240273598, 261.850940373684, 259.812079766646, 230.851108439112, 222.914990119514, 198.782360784733, 192.48441000517, 188.158295229676, 186.037081366921, 162.868907525396]

As you can see once the work value reaches 22, the results start to get inconsistent. What could be the problem here? We are testing this using a singularity container that has irlba installed. We only see this behaviour in one of our compute nodes but not in the others using the same container and both nodes are build using the same image/kernel. The sessionInfo() is the following and the same for both nodes:

sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.4.0 Matrix_1.7-0 grid_4.4.0 irlba_2.3.5.1 lattice_0.22-6

You can find the link to the input csv file here: input_matrix.csv Please let me know if you need any other information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions