Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneity Spacers and Primers #792

Open
luigallucci opened this issue Jul 2, 2024 · 5 comments
Open

Heterogeneity Spacers and Primers #792

luigallucci opened this issue Jul 2, 2024 · 5 comments

Comments

@luigallucci
Copy link

Hi @marcelm,

thank you for the development of Cutadapt.
I'm actually using the latest stable version and python 3.10.

I'm dealing with these heterogeneity spacers+primers:

Bacterial region V3-V4:
341F (5´-CCTACGGGNGGCWGCAG-3´)
341Fb (5´-TCCTACGGGNGGCWGCAG-3´)
341Fc (5´-ATCCTACGGGNGGCWGCAG-3´)
341Fd (5´-TGTCCTACGGGNGGCWGCAG-3´)
785R (5´-GACTACHVGGGTATCTAATCC-3´)

I would like to know if you know a suggested way to deal with that.
I was thinking to use a file in which I insert these primers (like demultiplexing) but the problem is that basically they are representing the same primers, all equally used on same samples, so I dont need to have 4 different output for each and if I set the files like:

341F
...
341Fb
...
...I feel that this, of course, will lead to different outputs based on the type of primer sequences given, so I'm pretty sure that this is not the right approach.

Do you have suggestions?

@marcelm
Copy link
Owner

marcelm commented Aug 10, 2024

Hi, in case it is still relevant: You can provide multiple primers/adapters with the same name. So something like this:

>341F
(sequence of 341F)
>341F
(sequence of 341Fb)

Then demultiplexing will send them to the same file.

Please leave the issue open as I would like to document this.

@luigallucci
Copy link
Author

Hi @marcelm, still useful thank you!

Just a quick question, should be useful also for future people questions...if I'm using just the first one (341F) and not the other, could we consider as they are removed anyway? The structure is 'spacers+primer' where 341F is just 'primer' and the other (b,c,d), instead, have 'spacers+primer'. I was supposing that everything before the 'primer' is removed.

@marcelm
Copy link
Owner

marcelm commented Nov 13, 2024

Getting back to this: I don’t understand the last question. If still relevant, can you re-phrase?

@luigallucci
Copy link
Author

Hi, @marcelm

kind off, but I will rephrase anyway just to clarify.

If in a situation in which I have heterogeneity spacers+primers in my paired-end data, like:

Bacterial region V3-V4:
341F (5´-CCTACGGGNGGCWGCAG-3´)
341Fb (5´-TCCTACGGGNGGCWGCAG-3´)
341Fc (5´-ATCCTACGGGNGGCWGCAG-3´)
341Fd (5´-TGTCCTACGGGNGGCWGCAG-3´)
785R (5´-GACTACHVGGGTATCTAATCC-3´)

If I'm using only 341F (the original primer, with any spacers) and 785R as input for the tool...Cutadapt trims everything that is found before the primer, so also the spacers despite having given it only the original primers or instead these remain and only the primer is eliminated?

So, could we consider this as valid removal approach or I need to go for a separated file lists, as you suggested in your previous response, to be sure that also the spacers are removed?

>341F
(sequence of 341F)
>341F
(sequence of 341Fb)
>341F
(sequence of 341Fc)
>341F
(sequence of 341Fd)

@marcelm
Copy link
Owner

marcelm commented Nov 13, 2024

If in a situation in which I have heterogeneity spacers+primers in my paired-end data, like:

Bacterial region V3-V4:
341F (5´-CCTACGGGNGGCWGCAG-3´)
341Fb (5´-TCCTACGGGNGGCWGCAG-3´)
341Fc (5´-ATCCTACGGGNGGCWGCAG-3´)
341Fd (5´-TGTCCTACGGGNGGCWGCAG-3´)
785R (5´-GACTACHVGGGTATCTAATCC-3´)

Let me reformat this to make it more visible what is going on:

341F       CCTACGGGNGGCWGCAG
341Fb    T-CCTACGGGNGGCWGCAG
341Fc   AT-CCTACGGGNGGCWGCAG
341Fd  TGT-CCTACGGGNGGCWGCAG

785R  GACTACHVGGGTATCTAATCC

If I'm using only 341F (the original primer, with any spacers) and 785R as input for the tool...Cutadapt trims everything that is found before the primer, so also the spacers despite having given it only the original primers or instead these remain and only the primer is eliminated?

If you provide a sequence with -g SEQUENCE, Cutadapt considers this to be a 5' adapter (actually primer in this case), and it uses these rules:

  • The sequence may appear anywhere within the read. That is, there can be any number of bases before it.
  • When trimming, the sequence itself and anything preceding it is removed from the read.

So, could we consider this as valid removal approach or I need to go for a separated file lists, as you suggested in your previous response, to be sure that also the spacers are removed?

Yes, this is fine. Just provide the 341F sequence without the heterogeneity spacers in order to automatically remove the primer and heteregeneity spacers in one go.

One consideration that could be relevant is that, as I mentioned, the primer can appear anywhere within the read. If you want to be a bit more specific, you could require that at most a certain number of bases appear before the primer. You can use a non-internal 5' adapter for this. It would look like this: -g XN{3}CCTACGGGNGGCWGCAG;o=17 where the 3 is the maximum number of nucleotides you allow before the primer, and o=17 is used to ensure that you see the full primer sequence. (But it probably does not make a big difference – if any.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants