Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I use -a or -g when demultiplexing ONT reads with dual barcodes? #799

Open
ashleyp1 opened this issue Aug 9, 2024 · 3 comments
Open

Comments

@ashleyp1
Copy link

ashleyp1 commented Aug 9, 2024

cutadapt 4.9

I have 16S amplicon reads that were sequenced with ONT that I am trying to demultiplex. Each sample was PCR barcoded with a 13 base barcode on both ends, so I expect a read to start with a barcode and end with its reverse complement. I put together a fasta file of all my pairs, some are listed below.

>HL001_FW
ATCCGGTCGGAGA...TCTCCGACCGGAT
>HL002_FW
CTGAGGTGATCAG...CTGATCACCTCAG
>HL003_FW
AGTGTCCTGCTAG...CTAGCAGGACACT
>HL004_FW
ATAAGCAATTCGA...TCGAATTGCTTAT

The problem I run into is whether to use the -a or -g flag. Looking through the documentation I see it used almost interchangeably for linked adapters, but I get different outputs depending on which I use and I'm not sure which is correct. I used the below commands, for reference

cutadapt -e 1 -a file:barcodes_for_cutadapt.fasta -o trimmed-{name}.fastq.gz reads.fastq.gz

cutadapt -e 1 -g file:barcodes_for_cutadapt.fasta -o trimmed-{name}.fastq.gz reads.fastq.gz

@marcelm
Copy link
Owner

marcelm commented Aug 11, 2024

The difference between -a and -g for linked adapters lies in which adapters are required to be in the read, see https://cutadapt.readthedocs.io/en/stable/guide.html#linked-override .

For -g, both adapters are required. For -a, only anchored adapters are required, non-anchored adapters are optional.

The distinction between required and optional is only necessary for linked adapters (the one with the ... in the middle) and determines what happens when one of the constituent adapters is not found.

The rules are like this:

  • If an adapter is required, but not found in the read, the read is not trimmed, even if the other adapter was found.
  • If an adapter is optional and not found in the read, the other adapter may still be trimmed from the read (if found).
  • Anchored adapters are always considered required. (Irrelevant here because you don’t use anchored adapters.)

So if you know your reads are long enough so that you should see both primers or if you want to ensure you only have full-length sequences in your demultiplexed output, use -g. If you want to be less strict, use -a.

(You could also make the first adapter required and leave the second one optional by writing this in the FASTA file: ATAAGCAATTCGA;required...TCGAATTGCTTAT.)

@ashleyp1
Copy link
Author

Thanks for the quick answer! That definitely clears things up for me.

I have a follow up question though, after reading through the documentation more. When demultiplexing, does cutadapt require the complete barcode to be present for it to count? For example, for BARCODE it would identify and trim BARCODEsequence and not CODEsequence. Basically, I want to make sure that I only keep reads with a complete barcode.

@marcelm
Copy link
Owner

marcelm commented Aug 18, 2024

To require the full barcode to be present, use an anchored adapter. You can either add the ^ to each sequence in the FASTA file:

>HL001_FW
^ATCCGGTCGGAGA...TCTCCGACCGGAT

or, as a shortcut, add the ^ before the file: like so: cutadapt -a ^file:barcodes_for_cutadapt.fasta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants