Skip to content

CUAnschutzBDC/maki_tcr_sequence_analysis

Repository files navigation

TCR sequence clustering

This directory contains all scripts necessary to analyze TCR sequencing from T1D patients and healthy controls.

Goals

  1. Identify identical sequences within alpha chains (V, J and CDR3) after reomving clones from individuals
  • Try with Alpha and Beta separate and combined
  • Find common sequences within T1D, within control, and both
  • Find common sequences within CD4, within CD8, and both
  • Visualize with an upset plot
  1. Use tcrDist or other clustering algorithims to cluster
  2. Determine if any HLAs are shared for any T cells identified using the two above ideas.

Steps

  1. Clean up the files
  • Remove the alpha sequences that are out of frame
  • Exclude Mait and NKT cells
    • MAIT
      • TRAV1-2 & TRAJ33
      • TRAV1-2 & TRAJ12
      • TRAV1-2 & TRAJ20
    • NKT
      • TRAV10 & TRAJ18 & TRBV25
  • Combine V regions that can't be distinguished
    • Alpha - TRAV8-2 and TRAV8-4 should just call TRAV8-2/TRAV8-4
    • Beta TRVB10-3 and TRVB 10-6 should just call TRVB10-3/TRVB10-6
    • Beta TRVB6-2 and TRVB6-3 should just be TRVB6-2/TRVB6-3
    • Don't combined TRVB5-4 and TRVB5-8
  1. Identify identical sequence in the alpha chains. Allow for either alpha chain --> maybe add the existing alpha information to one column
  2. Repeat with Beta (but don't need the two columns because they tend to only express one)
  3. Repeat with combination --> Add column for the second alpha beta pair if there is one?
  4. Separate by each category to make upset plot
  5. Run clustering algorithims
  6. Look into HLA information

Steps to make overlapping_tcrs.tsv

  1. Starting files:
  • Output from IMGT
  • Published data file
  1. Clean the files
  • Run scr/scripts/{date}_tcr_duplicates.R
  • [TODO] Current issues
    • current has chain 1 and 2 on the same line, previously they were on separate lines. I'll have to figure out how to rewrite both scripts to accomodate this.
  1. Run the python script
  • Run src/scripts/python/find_junction_overlap.py
  • In the tcr_env run
python src/script/python/find_junction_overlap.py \
  -f1 data/cleaned_{date}_scPCR_Illumina.tsv \
  -f2 data/cleaned_{date}_IEDB_TCR.tsv

About

analysis of tcr sequences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published