This directory contains all scripts necessary to analyze TCR sequencing from T1D patients and healthy controls.
- Identify identical sequences within alpha chains (V, J and CDR3) after reomving clones from individuals
- Try with Alpha and Beta separate and combined
- Find common sequences within T1D, within control, and both
- Find common sequences within CD4, within CD8, and both
- Visualize with an upset plot
- Use
tcrDistor other clustering algorithims to cluster - Determine if any HLAs are shared for any T cells identified using the two above ideas.
- Clean up the files
- Remove the alpha sequences that are out of frame
- Exclude Mait and NKT cells
- MAIT
- TRAV1-2 & TRAJ33
- TRAV1-2 & TRAJ12
- TRAV1-2 & TRAJ20
- NKT
- TRAV10 & TRAJ18 & TRBV25
- MAIT
- Combine V regions that can't be distinguished
- Alpha - TRAV8-2 and TRAV8-4 should just call TRAV8-2/TRAV8-4
- Beta TRVB10-3 and TRVB 10-6 should just call TRVB10-3/TRVB10-6
- Beta TRVB6-2 and TRVB6-3 should just be TRVB6-2/TRVB6-3
- Don't combined TRVB5-4 and TRVB5-8
- Identify identical sequence in the alpha chains. Allow for either alpha chain --> maybe add the existing alpha information to one column
- Repeat with Beta (but don't need the two columns because they tend to only express one)
- Repeat with combination --> Add column for the second alpha beta pair if there is one?
- Separate by each category to make upset plot
- Run clustering algorithims
- Look into HLA information
- Starting files:
- Output from IMGT
- Published data file
- Clean the files
- Run
scr/scripts/{date}_tcr_duplicates.R - [TODO] Current issues
- current has chain 1 and 2 on the same line, previously they were on separate lines. I'll have to figure out how to rewrite both scripts to accomodate this.
- Run the python script
- Run
src/scripts/python/find_junction_overlap.py - In the
tcr_envrun
python src/script/python/find_junction_overlap.py \
-f1 data/cleaned_{date}_scPCR_Illumina.tsv \
-f2 data/cleaned_{date}_IEDB_TCR.tsv