Skip to content
View SpikyClip's full-sized avatar

Block or report SpikyClip

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SpikyClip/README.md

header

Hi there, I'm Vikesh Ajith (SpikyClip)

I am a full-stack bioinformatics data engineer with 3+ years of experience working at the intersection of data engineering and biology. I bring value to scientists by managing and processing their large streams of data to deliver tidy, accessible datasets that allow them to spend less time wrangling data and more time discovering biological insights. I currently work for the Next-Generation Precision Medicine Program (NGPMP) at the Hudson Institute's Centre for Cancer Research (CCR).

Fewer than 1 in 5 children with cancer are found to have actionable mutations that are targetable with existing drug therapies. To improve these odds, our program has developed an extensive collection of paediatric cancer cell line models which then undergo a comprehensive set of functional genomic screens to identify novel drivers of low-survival paediatric cancers. High-thoroughput drug screens are then used to identify potential treatments that precisely target these novel mutations. Published data is then made available through the Childhood Cancer Model Atlas (CCMA).

Our program produces terabytes of data that has to be stored, processed, cleaned, and annotated before being disseminated to researchers for downstream analysis. My role as a data engineer is to effectively manage the above data lifecycle so that researchers can spend more time on analysis and less time on data wrangling.

I am well versed in bioinformatics, which is a requirement in order to effectively and accurately process a broad variety of genomic data. Apart from my love of data engineering, I am also passionate about using statistics and effective data visualisation to make data-driven decisions. My other hobbies include:

  • 🪓 Woodworking (i.e. collecting tools that I may someday use)
  • 🕹️ Gaming (the more byzantine, the better e.g. Dwarf Fortress, Factorio)
  • 📷 Photography (I was particularly prolific when I studied agriculture, and there were plenty of canola fields...)

Connect

I would love to hear from you! I'm always happy to discuss my experiences and to hear more about any opportunities.

LinkedIn

Languages

GNU Bash PostgreSQL R Python Nextflow

Software

dbt Tidyverse Slurm Docker

Pinned Loading

  1. llrnaseq llrnaseq Public

    SpikyClip/llrnaseq is a simple RNA-seq pipeline adapted to the Latrobe Institute of Molecular Science (LIMS) High Performance Computing Cluster (HPCC).

    Nextflow 2

  2. llrnaseq-rna-features-pipeline llrnaseq-rna-features-pipeline Public

    This readme explains how to use the Nextflow llrnaseq in conjunction with the rna-features python package to generate transfer learning expression features.

    R 1

  3. rna-features rna-features Public

    `rna-features` is a package used to generate machine-learning features from RNAseq data.

    Python 1 1

  4. rosalind-solutions rosalind-solutions Public

    Repository for my solutions to rosalind problems.

    Python 1

  5. advent-of-code advent-of-code Public

    Solutions to problems on advent of code.

    Python