Skip to content

4.Results and Discussions

Karthik Nair edited this page Sep 6, 2019 · 18 revisions

Results

Quality Assessment

FastQC analysed and calculated the quality of different kinds of reads.

Given below is the result of read quality for various reads:

Figure 1: Per Base Quality of sel3_SRR5819794, a wgs read

Figure 1 shows the Per Base Quality of sel3_SRR5819794, where the blue line shows the median quality score at each position.

Figure 2: Per Base Quality of sel3_SRR1724093.trim, a ChiPSeq read

Figure 2 shows the Per Base Quality of sel3_SRR1724093, where the blue line shows the median quality score at each position

Figure 3: Per Base Quality of sel3_SRR1719266.1, a raw read

Figure 3 shows the Per Base Quality of sel3_SRR1719266.1, where the blue line shows the median quality score at each position

Figure 4: Per Base Quality of sel3_SRR1719013.trim_1U, a trimmed read

Figure 4 shows the Per Base Quality of sel3_SRR1719013.trim_1U, where the blue line shows the median quality score at each position

DNA Assembly and Evaluation

The SOAPdenovo assemblies consisted of 149577 scaffolds, with an average length of 55599189 with gaps included, and the average gap length being 371. This assembly had 208521 contigs in total with an average gap inclusive length of 38229185.

The SPAdes assembly on the other and consists of 1164864 scaffolds, and 1166261 contigs.

The results from MUMmer plot(Figure 5 & Figure 6) show that the SOAPdenovo had few alignment issues as compared to the SPAdes assembly. But the line for SOAPdenovo is less straight than that of SPAdes, this could be due to the fact that SPAdes used more scaffolds than SOAPdenovo. The overall quality of SOAPdenovo build is better as it has a NG50 of 9452 as calculated by QUAST compared to NG50 value of 14612 for SPAdes.

Figure 5: MUMmerplot of SOAPdenovo along with Nucmer compared to the reference genome
Figure 6: MUMmerplot of SPAdes along with Nucmer compared to the reference genome

Genome Annotation

Genome Annotation using Maker2 pipeline was rather straight forward and easy to use. Being a manual annotation process, this step was a bit tedious due to changes being made to control files at different stages.

The only problem in the process arose while using AUGUSTUS. AUGUSTUS was loading after Maker, and both used different perl versions. This was solved by loading only maker, as it contains AUGUSTUS as well.

35 genes were predicted at the end of the maker run.

Differential Expression

Following Genome alignment using TopHat, HTSeq was used to read count for all the genes for each developmental stage and limb combination.

Log Fold Change was used along with LFC shrinkage to remove noise, following which the LFC values were plotted against normalised counts (Figure 7).

Figure 7: Shrunk LFC of Expressed Genes. Significant genes are marked in red and insignificant genes with black.

A total of 8 genes with p < 0.01 were found, namely, PITX1, LOC107525399, RPS27, TGFBI, SEC24A, PPP2CA, SKP1, and CATSPER3

The gene with the lowest p-value was PITX1. Between the various stages, there is marked difference between the counts for forelimb and hindlimb. Also, while expression levels in the hindlimbs do not show any significant change, the ones in forelimb seem to show marked increase.(Figure 8)

Figure 8: Normalised counts for PITX1, marked for different developmental stages

TGFBI was the gene with the fourth lowest p-value, shows a steeper increase in expression in forelimbs as compared to hindlimbs as the development progressed(Figure 9).

Figure 9: Normalised counts for TGFBI, marked for different developmental stages

Following this, Principal Component Analysis was performed, and visualised (Figure 10). The First PC covers for 83% of the variance. PC2 covers for 15% of the variance. The lear separation between forelimb and hindlimb is visible in this case as well.

Figure 10: PCA Plot: PC1 on x-axis, PC2 on y-axis

Finally, the expression levels of the genes predicted by Maker2 were plotted onto a heirarchical heatmap (Figure 11) to clearly visualise the up-regulation and down-regulation of the genes at different developmental stages in different appendages.

Figure 11: HeatMap of genes predicted by Maker2 across the various samples

Discussions

As mentioned before, differential expression analysis provided 8 genes with p < 0.01.

  • PITX1 - The only gene mentioned in the paper, which was also found in the list of significant genes. PITX1, according the paper, has been previously found to be differentially expressed in bats. This gene also had the highest significance of all genes. PITX1 is a gene which is found primarily in developing lower limbs. The PITX1 protein acts a transcription factor for regulation of genes involved in lower limb development

  • LOC107525399 - An unknown open reading frame. BLAST could not find similar sequences either.

  • RPS27 - It is a ribosomal protein belonging to the S27e family and is a component of the 40S sub unit. Other Ribosomal proteins have been mentioned in the paper(RPL11, RPL35A, RPS7, RPS10 and RPS19), and these genes have been found to be highly heterogeneously expressed across various embryonic tissues including limbs. Mutations in these genes have been known to cause limb malformation in Diamoond-Blackfan anaemia. Elevated expression of this gene has also been associated with melanoma. It is possible that this gene is expressed in higher levels due to the rapid cell division in the embryonic stages, just like cancer(but more controlled).

  • TGFBI - This gene codes for transforming growth factor beta induced(TGFBI). This protein forms part of the extracellular matrix and provides structural support. It is thought to play a rile in cell adhesion and migration

  • SEC24A - This gene codes for a component of coat protein II (COPII)-coated vesicles that is involved in protein transport from the endoplasmic reticulum.

  • PPP2CA - This gene codes protein phosphatase 2 catalytic subunit alpha. This gene is involved in the negative control of cell growth the division. This gene is highly expressed in CG16 forelimb, while pretty much underregulated or neutrally expressed in other stages. This gene could probably involved in the control of cell division in the developing forelimbs to give the right shape to the limbs.

  • SKP1 - This gene codes for a sub unit of SCF complexes, which are composed of this protein, cullin 1, a ring-box protein, and one member of the F-box family of proteins. SCF substrates have been identified in regulation of cell cycle progression and development. It is highly upregulated in CS16 forelimbs and this could be due to the fact the cells in the forelimbs are undergoing rapid cell division in this stage.

  • CATSPER3 - This is a protein coding gene for cation channel sperm associated 3 protein. This gene is usually differentially expressed in testis and kidneys. How this is involved in the development process remains a mystery to me.

Although, PITX1 is the only common gene between the paper and this replication study, what needs to be kept in mind is usually, while individual genes may be of interest, it is usually a network of genes interacting with each other(Gene Interaction Network) that may responsible for certain phenotypes/development of anatomical features in the organism. A mutation in any of these genes can result in malfunction of the network.

The finding of so many significant genes in my opinion is an evidence of this notion.

Improvements

Provided enough time, I would do a differential expression analysis and use the expression data to identify clusters using clustering Algorithms such as K-Means and Fuzzy C-Means. Unlike hierarchical clustering which is extremely common in biological analysis, K-means and Fuzzy C-Means improve iteratively, any early mistakes in the clustering is not propagated to the end result. Hierarchical clustering is susceptible to early stage errors being propagated to end stage.

Similarly instead of just using LFC, I would use other statistical tests such as T-Test for better validation. LFC in my opinion is too strict for data as varied as expression data.

Clone this wiki locally