Skip to content

Huge processing times for CollapseSeq #89

@ssnn-airr

Description

@ssnn-airr

Original report by Santiago Revale (Bitbucket: [Santiago Revale](https://bitbucket.org/Santiago Revale), ).


Hi there!

I was running Presto on MiSeq run samples successfully. Lately, I’ve got a few NextSeq runs to put through the pipeline and, though most of the samples were processed without any issues, I had a few where just the CollapseSeq step took a really long time (33-62 hours).

The thing that annoyed me the most was that three samples took a similar time (33-37 hours) and one took 64 hours. I tried looking at the numbers to figure out if there was a pattern regarding which samples would take longer (as in more raw reads, longer times) but there was no pattern. Here are a few numbers I collected:

SAMPLE	Running Time	 raw_reads	contributing_reads	unique_sequences	unique_cdr3
Sample1 	33:41:44	 6,256,670	         4,779,720	         737,838        581,965
Sample2 	34:29:56	 3,418,984	         2,797,692	         638,508        452,911
Sample3 	37:34:06	10,758,170	         8,810,811	         715,579        497,400
Sample4 	62:36:16	 3,501,513	         2,783,129	         885,801        691,839

The only thing that makes Sample4 outstanding is that it has more unique sequences than the others, although the difference is not proportional to the time it took to processed them.

I would really appreciate any tip or advice regarding what could be going on in here so in the future I could anticipate when this could happen or at least I could give an explanation on why it happened.

Here is some additional info:

# Presto Version: 0.6.0 (from the DockerHub immcantation/suite:4.0.0)

# Command used
CollapseSeq.py \
  -s "Sample4_consensus-pass.fasta" \
  -n 5 \
  --uf BARCODE C_CALL \
  --cf CONSCOUNT \
  --act sum \
  --inner \
  --outname "Sample4"

Thank you very much in advance.

Cheers!

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmajor

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions