Skip to content

Commit

Permalink
removed observations section from TCR_motifs.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
mchernigovskaya authored Oct 17, 2024
1 parent d6c6cbf commit a1b7887
Showing 1 changed file with 0 additions and 158 deletions.
158 changes: 0 additions & 158 deletions docs_source/usecases/TCR_motifs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,164 +115,6 @@ In this tutorial, a maximum hamming distance of two was selected so that the div
#. Start the simulation with the selected seed and Hamming distances. Check for the presence of the predefined motif in the simulated TCRs by clustering or allocating the seed within the TCR sequences.


**Observations:**

1. A large allowance for hamming distance may impact the identification of the seed sequences when simulated with shorter seeds

Tables 2 and 3 present examples of simulated TCRs for the long and short seed simulations, respectively. As expected, most of the amino acids in the original long seed are retained, with only a few positions changed. The opposite is true for the short seeds. Since we allowed up to two Hamming distances for a seed of three amino acids, it is a challenge to identify the original seed within the simulated TCRs.

.. list-table:: Table 2: Examples of simulated TCRs with long seeds
:header-rows: 1

* - junction_aa
- seed_match
- hamming_distance
- seed
* - CAAGDRSGINQPQHF
- DRSGINQP
- 2
- ELSGINQP
* - CACTELAGGNQPQHF
- ELAGGNQP
- 2
- ELSGINQP
* - CAIASGGRVREQFF
- SGGRVREQ
- 1
- SGGDVREE
* - CAIQGTSGGAIREETQYF
- SGGAIREE
- 2
- SGGDVREE
* - CAIICPGGGTYEQYF
- CPGGGTYE
- 2
- SPAGGTYE
* - CAIPSPCGGCYEQYF
- SPCGGCYE
- 2
- SPAGGTYE

.. list-table:: Table 3: Examples of simulated TCRs with short seeds
:header-rows: 1

* - junction_aa
- seed_match
- hamming_distance
- seed
* - CAAELLEQYF
- AAE
- 2
- PAG
* - CACCNCQPQHF
- PQH
- 2
- PAG
* - CACDTLNEQYF
- DTL
- 2
- DVR
* - CACEWRYNEQFF
- EWR
- 2
- DVR
* - CACILEKLFF
- ACI
- 2
- SGI



2. Hamming distance and seed length may affect the similarity between simulation seeds and simulated TCR cluster concensuses

We used ClustTCR (Valkiers et al., 2021) to investigate the architecture of the simulated TCRs by clustering their CDR3s with up to one allowed amino acid difference. The TCRs simulated with two different sets of seeds were clustered separately, and for each cluster, a motif summarizing the consensus sequence was calculated. The motifs for the 10 largest clusters are provided in tables 4 and 5 for the long and short seed simulation, respectively.

When simulating the TCRs with long seeds, we observe a clear overlap between the clusTCR motifs and the original seeds. This indicates successful simulation of the original motif within the receptors. However, in the case of simulating epitope-specific TCRs using short seeds, we find that these seeds are not well represented within the clusTCR motifs. The clustering may be influenced by other common similarities outside of the predefined motif, leading to a loss of track of the original motif after simulation.

.. note::

Seed length and allowed hamming distance both have an impact on the final results. Even with long seeds, unwanted results can occur if the hamming distances are set too high. As a general recommendation, we advise clustering the simulated receptors in case the exact presence of motifs is required for your study.

.. list-table:: Table 4: Motifs of the 10 largest clusters in the simulated TCRs with large seeds
:header-rows: 1

* - CDR3 Motif
- Seed
- Cluster size
* - CASSp.GGtYEQYF
- SPAGGTYE
- 59
* - CASSLSG.NQPQHF
- ELSGINQP
- 16
* - CASSL.GINQPQHF
- ELSGINQP
- 9
* - CASSAGG.YEQYF
- SPAGGTYE
- 7
* - CASSGG.VRYEQYF
- SGGDVREE
- 6
* - CASP[GP]GG.YEQYF
- SPAGGTYE
- 6
* - CASSE.SGSNQPQHF
- ELSGINQP
- 5
* - CASSPGtGTYEQYF
- SPAGGTYE
- 4
* - CASSvAGGTGELFF
- SPAGGTYE
- 4

.. list-table:: Table 5: Motifs of the 10 largest clusters in the simulated TCRs with short seeds
:header-rows: 1

* - CDR3 Motif
- Seed
- Cluster size
* - CA..YEQYF
- PAG,DVR,SGI
- 15
* - CAs.yEQYF
- PAG,DVR,SGI
- 14
* - CA.T[AP]YEQYF
- PAG,DVR
- 9
* - CArDEQYF
- DVR
- 8
* - CAS..ETQYF
- SGI
- 8
* - CAS.tYEQYF
- SGI
- 6
* - C[RT]DYEQYF
- DVR
- 5
* - CA[KT][SR]ETQYF
- DVR,SGI
- 5
* - C[GV]G[QL]YEQYF
- SGI
- 5
* - CAR.TDTQYF
- DVR
- 5

3. TCRs simulated with a short seed have shorter CDR3s compared to TCRs simulated with long seeds

We also compared the distribution of CDRR3 length (in amino acids) between the TCRs generated with short and long motifs (shown in blue and red, respectively). Our observation indicates that TCRs generated with long motifs tend to be longer than those generated with short motifs.

.. image:: ../_static/figures/usecase_VDJdb_motifs_length_distribution.png
:width: 500



Enhanced approach: defining motifs based on PWM
---------------------------------------

Expand Down

0 comments on commit a1b7887

Please sign in to comment.