Skip to content

DamLabResources/outerspace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OuterSpace

Outerspace is a collection of tools for analyzing pooled CRISPR screens, viral barcode population studies, and any other application that requires the extraction and couting of variable regions in pooled amplicons. It contains tools to extract regions of interest, correct sequencing error, assess diversity, and compare between samples.

Contents

Quick Start

Install

#pip install outerspace
pip install git+https://github.com/DamLabResources/outerspace.git

Design your extraction strategy

outerspace uses the regex library to extract relevant features from a DNA sequence. This allows an simple, expressive, and modular strategy for extracting of regions of interest while tolerating mismatches. It supports both short, paired end reads and long reads. See the docs/regex_explainer.md for a detailed discussion on how to design your extraction strategy.
Regex Link

Create your config file

If you are going to repeating similar experiments often, outerspace allows you to encapsulate that information in a toml file accepted by all commands. This ensures repeatability between analyses and can drastically simplify command line execution. It also facilitates reproducible science as the config can be stored, shared, and tracked.

See the walkthrough for a more detailed discussion on creating your config file.

Process Your Data

For most analyses, you can use the pipeline command to process all your data in one step:

# Create output directory
mkdir -p results

# Run the pipeline
outerspace pipeline config.toml \
    --input-dir fastq_files \
    --output-dir results \
    --barcode-columns UMI_5prime,UMI_3prime \
    --key-column protospacer \
    --mismatches 2 \
    --method directional \
    --metrics

This will:

  1. Process all FASTQ files in your input directory
  2. Extract sequences using your config patterns
  3. Correct barcodes using UMI-tools clustering
  4. Count unique barcodes per protospacer
  5. Generate metrics files for quality control

For more detailed instructions, including how to run individual commands and perform additional analyses, see the detailed walkthrough.

For running your tasks in parallel or on a cluster consider using our Snakemake wrappers.

Copyright (C) 2025, SCB, DVK PhD, RB, WND PhD. All rights reserved.

About

A python tool for extracting short sequences from NGS data with fuzzy regular expressions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •