Skip to content

Commit d7183dd

Browse files
committed
Typo fixes
1 parent ab4eb32 commit d7183dd

File tree

12 files changed

+55
-55
lines changed

12 files changed

+55
-55
lines changed

RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ fastqc -o /path/to/raw_fastqc/output/directory *.fastq.gz
202202
**Parameter Definitions:**
203203

204204
- `-o` – the output directory to store results
205-
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces inbetween them
205+
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces in between them
206206

207207
**Input Data:**
208208

@@ -291,7 +291,7 @@ fastqc -o /path/to/trimmed_fastqc/output/directory *.fastq.gz
291291
**Parameter Definitions:**
292292

293293
- `-o` – the output directory to store results
294-
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces inbetween them
294+
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces in between them
295295

296296
**Input Data:**
297297

@@ -2218,7 +2218,7 @@ ERCCcounts.to_csv('ERCC_analysis/ERCCcounts_GLbulkRNAseq.csv')
22182218
- ERCC_analysis/ERCC_stats_GLDS-*_GLbulkRNAseq.csv (Samplewise counts statistics table containing 'Min', 'Max', 'Dynamic range', 'R')
22192219
- ERCC_analysis/ERCC_dynrange_GLDS-*_GLbulkRNAseq.csv (Samplewise counts statistics subset table containing 'Dynamic range')
22202220
- ERCC_analysis/ERCC_rsq_GLDS-*_GLbulkRNAseq.csv (Samplewise counts statistics subset table containing 'R')
2221-
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (Samplewise metadata table inlcuding ERCC mix number)
2221+
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (Samplewise metadata table including ERCC mix number)
22222222
- ERCC_analysis/ERCCcounts_GLbulkRNAseq.csv (Samplewise ERCC counts table)
22232223

22242224
<br>
@@ -2283,7 +2283,7 @@ write.csv(normcounts, 'ERCC_analysis/ERCC_normcounts_GLbulkRNAseq.csv') #OUTPUT
22832283

22842284
**Input Data:**
22852285

2286-
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (samplewise metadata table inlcuding ERCC mix number, output from [Step 10a](#10a-evaluate-ercc-count-data-in-python))
2286+
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (samplewise metadata table including ERCC mix number, output from [Step 10a](#10a-evaluate-ercc-count-data-in-python))
22872287
- ERCC_analysis/ERCCcounts_GLbulkRNAseq.csv (samplewise ERCC counts table, output from [Step 10a](#10a-evaluate-ercc-count-data-in-python))
22882288

22892289
**Output Data:**

RNAseq/Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ fastqc -o /path/to/raw_fastqc/output/directory *.fastq.gz
141141
**Parameter Definitions:**
142142

143143
- `-o` – the output directory to store results
144-
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces inbetween them
144+
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces in between them
145145

146146
**Input Data:**
147147

@@ -230,7 +230,7 @@ fastqc -o /path/to/trimmed_fastqc/output/directory *.fastq.gz
230230
**Parameter Definitions:**
231231

232232
- `-o` – the output directory to store results
233-
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces inbetween them
233+
- `*.fastq.gz` – the input reads are specified as a positional argument, and can be given all at once with wildcards like this, or as individual arguments with spaces in between them
234234

235235
**Input Data:**
236236

@@ -2039,7 +2039,7 @@ ERCCcounts.to_csv('ERCC_analysis/ERCCcounts_GLbulkRNAseq.csv')
20392039
- ERCC_analysis/ERCC_stats_GLDS-*_GLbulkRNAseq.csv (Samplewise counts statistics table containing 'Min', 'Max', 'Dynamic range', 'R')
20402040
- ERCC_analysis/ERCC_dynrange_GLDS-*_GLbulkRNAseq.csv (Samplewise counts statistics subset table containing 'Dynamic range')
20412041
- ERCC_analysis/ERCC_rsq_GLDS-*_GLbulkRNAseq.csv (Samplewise counts statistics subset table containing 'R')
2042-
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (Samplewise metadata table inlcuding ERCC mix number)
2042+
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (Samplewise metadata table including ERCC mix number)
20432043
- ERCC_analysis/ERCCcounts_GLbulkRNAseq.csv (Samplewise ERCC counts table)
20442044

20452045
<br>
@@ -2104,7 +2104,7 @@ write.csv(normcounts, 'ERCC_analysis/ERCC_normcounts_GLbulkRNAseq.csv') #OUTPUT
21042104

21052105
**Input Data:**
21062106

2107-
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (samplewise metadata table inlcuding ERCC mix number, output from [Step 9a](#9a-evaluate-ercc-count-data-in-python))
2107+
- ERCC_analysis/ERCCmetadata_GLbulkRNAseq.csv (samplewise metadata table including ERCC mix number, output from [Step 9a](#9a-evaluate-ercc-count-data-in-python))
21082108
- ERCC_analysis/ERCCcounts_GLbulkRNAseq.csv (samplewise ERCC counts table, output from [Step 9a](#9a-evaluate-ercc-count-data-in-python))
21092109

21102110
**Output Data:**

RNAseq/Workflow_Documentation/NF_RCP/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,9 +256,9 @@ nextflow run NF_RCP_2.0.0/main.nf \
256256
257257
* `--reference_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`)
258258
259-
* `--reference_fasta` - specifices the URL or path to a fasta file
259+
* `--reference_fasta` - specifies the URL or path to a fasta file
260260
261-
* `--reference_gtf` - specifices the URL or path to a gtf file
261+
* `--reference_gtf` - specifies the URL or path to a gtf file
262262
263263
<br>
264264

RNAseq/Workflow_Documentation/NF_RCP/workflow_code/bin/dp_tools__NF_RCP/checks.py

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ def check_fastqgz_file_contents(file: Path, count_lines_to_check: int) -> FlagEn
188188
)
189189
else:
190190
code = FlagCode.GREEN
191-
message = f"First {count_lines_to_check} lines checked found no issues. This means headers lines were identifiable and no decompression errors occured."
191+
message = f"First {count_lines_to_check} lines checked found no issues. This means headers lines were identifiable and no decompression errors occurred."
192192
except (EOFError, gzip.BadGzipFile):
193193
code = FlagCode.HALT
194194
message = (
@@ -562,9 +562,9 @@ def check_aggregate_star_unnormalized_counts_table_values_against_samplewise_tab
562562
# check if the values match for any of the count modes
563563
# unstranded, sense, antisense
564564
# for remaining samples, only check the match for the first count mode
565-
# TODO: Fix rare false postive related to zero counts, in those cases the strand_assessment can be prematurely determined which causes other samples to be compared with an inappropriate assessment
565+
# TODO: Fix rare false positive related to zero counts, in those cases the strand_assessment can be prematurely determined which causes other samples to be compared with an inappropriate assessment
566566
for count_mode in STAR_COUNT_MODES:
567-
# make sure to sort indicies
567+
# make sure to sort indices
568568
if df_agg[sample].sort_index().equals(df_samp[count_mode].sort_index()):
569569
# assign strand assessment if first sample
570570
if strand_assessment is None:
@@ -789,7 +789,7 @@ def check_contrasts_table_rows(contrasts_table: Path, **_) -> FlagEntry:
789789
# data specific preprocess
790790
df_contrasts = pd.read_csv(contrasts_table, index_col=0)
791791

792-
def _get_groups_from_comparisions(s: str) -> set[str]:
792+
def _get_groups_from_comparisons(s: str) -> set[str]:
793793
"""Converts '(G1)v(G2)'
794794
into G1...G2 where G1 and G2 are renamed as per the r make names function
795795
@@ -807,7 +807,7 @@ def _get_groups_from_comparisions(s: str) -> set[str]:
807807

808808
bad_columns: dict[str, dict[str, set]] = dict()
809809
for (col_name, col_series) in df_contrasts.items():
810-
expected_values = _get_groups_from_comparisions(col_name)
810+
expected_values = _get_groups_from_comparisons(col_name)
811811
if not expected_values == set(col_series):
812812
bad_columns[col_name] = {
813813
"expected": expected_values,
@@ -974,15 +974,15 @@ def check_dge_table_group_columns_constraints(
974974
].append(group)
975975

976976
# check logic
977-
contraint_description = f"Group mean and standard deviations are correctly computed from samplewise normalized counts within a tolerance of {FLOAT_TOLERANCE} percent (to accomodate minor float related differences )"
977+
constraint_description = f"Group mean and standard deviations are correctly computed from samplewise normalized counts within a tolerance of {FLOAT_TOLERANCE} percent (to accommodate minor float related differences )"
978978
if not any([issue_type for issue_type in issues.values()]):
979979
code = FlagCode.GREEN
980-
message = f"All values in columns: {query_columns} met constraint: {contraint_description}"
980+
message = f"All values in columns: {query_columns} met constraint: {constraint_description}"
981981
else:
982982
code = FlagCode.HALT
983983
message = (
984984
f"Issues found {issues} that"
985-
f"fail the contraint: {contraint_description}."
985+
f"fail the contraint: {constraint_description}."
986986
)
987987
return {"code": code, "message": message}
988988

@@ -1007,10 +1007,10 @@ def check_dge_table_comparison_statistical_columns_exist(
10071007
# check logic
10081008
if not missing_cols:
10091009
code = FlagCode.GREEN
1010-
message = f"All comparision summary statistic columns (Prefixes: {COMPARISON_PREFIXES}) present. {sorted(list(expected_columns))}"
1010+
message = f"All comparison summary statistic columns (Prefixes: {COMPARISON_PREFIXES}) present. {sorted(list(expected_columns))}"
10111011
else:
10121012
code = FlagCode.HALT
1013-
message = f"Missing these comparision summary statistic columns (Prefixes: {COMPARISON_PREFIXES}): {sorted(list(missing_cols))}"
1013+
message = f"Missing these comparison summary statistic columns (Prefixes: {COMPARISON_PREFIXES}): {sorted(list(missing_cols))}"
10141014
return {"code": code, "message": message}
10151015

10161016

@@ -1169,12 +1169,12 @@ def check_dge_table_log2fc_within_reason(
11691169
# Track error messages
11701170
err_msg_yellow = ""
11711171
all_suspect_signs: dict[int, dict[str, float]] = dict()
1172-
for comparision in expected_comparisons:
1173-
query_column = f"Log2fc_{comparision}"
1172+
for comparison in expected_comparisons:
1173+
query_column = f"Log2fc_{comparison}"
11741174
group1_mean_col = (
1175-
"Group.Mean_" + comparision.split(")v(")[0] + ")"
1175+
"Group.Mean_" + comparison.split(")v(")[0] + ")"
11761176
) # Uses parens and adds them back to prevent slicing on 'v' within factor names
1177-
group2_mean_col = "Group.Mean_" + "(" + comparision.split(")v(")[1]
1177+
group2_mean_col = "Group.Mean_" + "(" + comparison.split(")v(")[1]
11781178
computed_log2fc = (df_dge[group1_mean_col] / df_dge[group2_mean_col]).apply(
11791179
math.log, args=[2]
11801180
)
@@ -1191,7 +1191,7 @@ def check_dge_table_log2fc_within_reason(
11911191
# flag if not enough within tolerance
11921192
if percent_within_tolerance < LOG2FC_CROSS_METHOD_TOLERANCE_PERCENT:
11931193
err_msg_yellow += (
1194-
f"For comparison: '{comparision}' {percent_within_tolerance:.2f} % of genes have absolute percent differences "
1194+
f"For comparison: '{comparison}' {percent_within_tolerance:.2f} % of genes have absolute percent differences "
11951195
f"(between log2fc direct computation and DESeq2's approach) "
11961196
f"less than {LOG2FC_CROSS_METHOD_PERCENT_DIFFERENCE_THRESHOLD} % which does not met the minimum percentage "
11971197
f"({LOG2FC_CROSS_METHOD_TOLERANCE_PERCENT} %) of genes required. "
@@ -1344,7 +1344,7 @@ def check_viz_table_columns_constraints(
13441344
code = FlagCode.HALT
13451345
message = (
13461346
f"Issues found {issues} that"
1347-
f"fail the contraint: {viz_pairwise_columns_constraints}."
1347+
f"fail the constraint: {viz_pairwise_columns_constraints}."
13481348
)
13491349
return {"code": code, "message": message}
13501350

@@ -1506,7 +1506,7 @@ def check_sample_in_multiqc_report(
15061506
An optional name_reformat_function can be supplied to address sample name changes that occur in the multiqc report.
15071507
An example being the renaming of Sample '-' characters to '_' for certain RSeQC modules.
15081508
1509-
:param sample: Query sample names to check for presense
1509+
:param sample: Query sample names to check for presence
15101510
:type sample: list[str]
15111511
:param multiqc_report_path: MultiQC report directory
15121512
:type multiqc_report_path: Path

RNAseq/Workflow_Documentation/NF_RCP/workflow_code/bin/dp_tools__NF_RCP/config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ Staging:
9292
Example: 'TRUE'
9393

9494
# this entry denotes the following:
95-
# retrive from that ISA field name
95+
# retrieve from that ISA field name
9696
# multiple values (separated by ",")
9797
# index those to certain runsheet columns
9898
# if the index doesn't exist, optional prevents raising an exception
@@ -1069,7 +1069,7 @@ data assets:
10691069
table order: 15
10701070

10711071
# NOTE: this is while the ERCC analysis sits outside the full pipeline and
1072-
# once incoporated, it should be validated for existence!
1072+
# once incorporated, it should be validated for existence!
10731073
validate exists: false
10741074

10751075
# Assets that are no longer generated by the latest pipeline

0 commit comments

Comments
 (0)