Dear creators,
Thank you for creating the SIRVsuite, it looks like a great tool and I tried using it on my own data.
For our experiment, we have used the SIRV4 spike In set. Now, I'd like to use your suite to recover the quantification and coverage of the spike-ins.
However, while doing this, I uncovered some uncertainties.
In your documentation I can read that for the suite to work, I need to have an already aligned bam file for my data, as well as a path to quantification. Here, I have some questions:
-
For the aligned bam file, I need to align it to a combined reference genome with the SIRV annotation inside. I have done this already, but now I am unsure whether I need to use the whole bam file, including all human reference genome aligned reads, or only the uniquely aligned reads to the SIRVome.
-
For the quantification path:
Is it possible to let the suite run with featureCounts quantification? As I have used featureCounts for any quantification so far, I would love to continue using it. As of right now, I have run featureCounts on the transcript level to obtain the raw counts. Then, I have tried to calculate FPKM manually for the raw counts of my SIRV spike-ins.
For FPKM, the raw reads need to also be divided by the library size. But here I am unsure as what exactly needs to be used as library size. Do you mean all reads together including the reads that map to the human reference genome? Or does the library size consists only of the sum of the counts that map uniquely to the SIRVs?
So far, I have calculated FPKMs based only on the sum of the counts that map uniquely to the SIRVs.
- When I run the complete SIRV suite on my makeshift FPKM file from featureCounts, I obtain the following error :
/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
2025-04-17 15:16:25 INFO SIRV_concentration: creating SIRV E0 concentration boxplot
Traceback (most recent call last):
File "/home/agrinko/miniconda3/envs/SIRV_analysis/bin/SIRVsuite", line 8, in
sys.exit(main())
File "/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/SIRVsuite/SIRVsuite.py", line 70, in main
module_concentration.create_sirvsuite_boxplot(module_concentration.data)
File "/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/SIRVsuite/Pipeline/Concentration/SIRV_concentration.py", line 224, in create_sirvsuite_boxplot
limit_x = heatmap_matrix.max()
File "/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial, where)
This is how part of my quantification file looks like:
gene_id tracking_id FPKM_CHN
SIRV101 SIRV1 0.000000
SIRV102 SIRV1 0.000000
SIRV103 SIRV1 0.000000
SIRV105 SIRV1 0.000000
SIRV106 SIRV1 0.000000
SIRV107 SIRV1 0.000000
SIRV108 SIRV1 0.000000
SIRV109 SIRV1 0.000000
SIRV201 SIRV2 0.000000
SIRV202 SIRV2 0.000000
SIRV203 SIRV2 0.000000
SIRV204 SIRV2 0.000000
SIRV205 SIRV2 0.000000
SIRV206 SIRV2 0.000000
SIRV301 SIRV3 0.000000
SIRV302 SIRV3 0.000000
SIRV303 SIRV3 0.000000
SIRV304 SIRV3 0.000000
SIRV305 SIRV3 0.000000
SIRV306 SIRV3 160.325887
SIRV307 SIRV3 0.000000
SIRV308 SIRV3 0.000000
SIRV309 SIRV3 508.151539
I apologize for the many questions and any inconveniences. I appreciate any help gladly, as I would love to use your tool in the future.
Best wishes,
Anastasiya
Dear creators,
Thank you for creating the SIRVsuite, it looks like a great tool and I tried using it on my own data.
For our experiment, we have used the SIRV4 spike In set. Now, I'd like to use your suite to recover the quantification and coverage of the spike-ins.
However, while doing this, I uncovered some uncertainties.
In your documentation I can read that for the suite to work, I need to have an already aligned bam file for my data, as well as a path to quantification. Here, I have some questions:
For the aligned bam file, I need to align it to a combined reference genome with the SIRV annotation inside. I have done this already, but now I am unsure whether I need to use the whole bam file, including all human reference genome aligned reads, or only the uniquely aligned reads to the SIRVome.
For the quantification path:
Is it possible to let the suite run with featureCounts quantification? As I have used featureCounts for any quantification so far, I would love to continue using it. As of right now, I have run featureCounts on the transcript level to obtain the raw counts. Then, I have tried to calculate FPKM manually for the raw counts of my SIRV spike-ins.
For FPKM, the raw reads need to also be divided by the library size. But here I am unsure as what exactly needs to be used as library size. Do you mean all reads together including the reads that map to the human reference genome? Or does the library size consists only of the sum of the counts that map uniquely to the SIRVs?
So far, I have calculated FPKMs based only on the sum of the counts that map uniquely to the SIRVs.
/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
2025-04-17 15:16:25 INFO SIRV_concentration: creating SIRV E0 concentration boxplot
Traceback (most recent call last):
File "/home/agrinko/miniconda3/envs/SIRV_analysis/bin/SIRVsuite", line 8, in
sys.exit(main())
File "/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/SIRVsuite/SIRVsuite.py", line 70, in main
module_concentration.create_sirvsuite_boxplot(module_concentration.data)
File "/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/SIRVsuite/Pipeline/Concentration/SIRV_concentration.py", line 224, in create_sirvsuite_boxplot
limit_x = heatmap_matrix.max()
File "/home/agrinko/miniconda3/envs/SIRV_analysis/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial, where)
This is how part of my quantification file looks like:
gene_id tracking_id FPKM_CHN
SIRV101 SIRV1 0.000000
SIRV102 SIRV1 0.000000
SIRV103 SIRV1 0.000000
SIRV105 SIRV1 0.000000
SIRV106 SIRV1 0.000000
SIRV107 SIRV1 0.000000
SIRV108 SIRV1 0.000000
SIRV109 SIRV1 0.000000
SIRV201 SIRV2 0.000000
SIRV202 SIRV2 0.000000
SIRV203 SIRV2 0.000000
SIRV204 SIRV2 0.000000
SIRV205 SIRV2 0.000000
SIRV206 SIRV2 0.000000
SIRV301 SIRV3 0.000000
SIRV302 SIRV3 0.000000
SIRV303 SIRV3 0.000000
SIRV304 SIRV3 0.000000
SIRV305 SIRV3 0.000000
SIRV306 SIRV3 160.325887
SIRV307 SIRV3 0.000000
SIRV308 SIRV3 0.000000
SIRV309 SIRV3 508.151539
I apologize for the many questions and any inconveniences. I appreciate any help gladly, as I would love to use your tool in the future.
Best wishes,
Anastasiya