Session 7: Practice 1

Time: 2h
Date: Wednesday, Feb-19th-2020
Goals:
- Creating our first Class Module for working with DNA sequences

Introduction

The goal of this practice is to developt the Seq Class for working with DNA sequences. This library will be in a module called Seq1 (The python file should be called Seq1.py). It is quite similar to Seq0, developed in the previous practice, but using an Object Oriented approach. We will also include several improvements

These are all the normal methods that should be implemented in the Seq Class

Method	Parameters	Return value	Description
len	None	integer	Calculate the total number of bases in the sequence
count_base(base)	base: character	Integer	Calculate the number of the given base in the Sequence
count	None	A dicctionary	Calculate the number of all the bases in the sequence. A dicctionary with the results is returned. The keys are the bases and the values their number
reverse	None	String	Return the reverse sequence
complement	None	String	Return the complement sequence
read_fasta(filename)	filename: string	String	Open a DNA file in FASTA format and return the sequence as a string (It should only contains the characters 'A', 'T', 'G' or 'C

In addition, the Class Seq will have the special methods that we already know: __init__() for initializing the object and __str__() for printing the object as a sequence

Finish your previous practices

Before starting Practice 1, spend time finishing the previous practices. This practice is based on this previous work

Exercises

We will develop the Seq class incrementally, starting from your work on the Session 6

Exercise 1: Creating the Seq1 module

Create a new python file, called Seq1.py in the P1 folder. This file is our module, that we will import from our exercises. Remember that for doing so, you have first to mark the P1 folder as Sources Root

Copy the Seq Class that you have already developed in the exercises of Session 6 in the Seq1.py file

Filename: P1/Seq1.py
Description: This file where the Seq Class for working with DNA sequences is stored. It is our Seq1 module

The goal of this first exercise is making sure that you can access to the Seq class from external files

Filename: P1/Ex1.py
Description: Write a python program that creates an object with the sequence "ACTGA" and prints its length and the sequence itself. The output should be like this:

-----| Exercise 1 |------
New sequence created!
Sequence 1: (Length: 5) ACTGA

Process finished with exit code 0

Considerations: The first thing you have to do is to import the Seq Class from the Seq1 module

from Seq1 import Seq

Exercise 2: Null sequences

We will manage three types of sequences: Valid, Invalid and Null:

Null: Empty sequence "".It has no bases
Valid: A sequence compose of the union of only the four valid bases: 'A', 'T', 'C', 'G'. Example: "ATTACG"
Invalid: A sequence that has one or more characters that are not valid bases. Example: "ATTXXG"

In this exercise we will implement the Null sequences

The null sequences are created by calling the Seq() class with no arguments:

# -- Creating a Null sequence
s = Seq()
# -- Creating a valid sequence
s = Seq("TATAC")

The difference between the creation of the previous two object is that the first one has no arguments when calling Seq, and the second one has one. This means that the argument passed to the __init()__ method is optional

For creating Null sequences the definition of the __init()__ method should be like this:

def __init__(self, strbases="NULL"):

It is used in python for creating optional arguments. If no argument is given, python automatically will create one with the default value to "NULL". This is the value we will use to identify the null sequences

When a Null sequence is created, the __init()__ method will print the message: "NULL Seq Created"

Filename: P1/Ex2.py
Description: Write a python program that creates first a null sequence and then a valid sequence. It should prints the objects. The output of the program should be:

-----| Practice 1, Exercise 2 |------
NULL Seq created
New sequence created!
Sequence 1: NULL
Sequence 2: ACTGA

Considerations: The first you should do in the __init()__ method is checking if it is a null sequence. If so, print the message on the console, assign the value to the self.strbases attribute and return. If it is not null, continue with the other checks

Exercise 3: Null, valid and invalid sequences

In this exercise we will make sure that our Seq class works ok with the three types of sequences. We will create this three sequences:

# -- Create a Null sequence
s1 = Seq()

# -- Create a valid sequence
s2 = Seq("ACTGA")

# -- Create an invalid sequence
s3 = Seq("Invalid sequence")

Filename: P1/E3.py
Description: Write a python program that creates three sequences: null, valid and invalid. Then it prints the objects in the console. This is what we should see on the console:

-----| Practice 1, Exercise 3 |------
NULL Seq created
New sequence created!
INVALID Seq!
Sequence 1: NULL
Sequence 2: ACTGA
Sequence 3: ERROR

Exercise 4: seq_count_base()

Implement the seq_count_base(seq, base) function, that calculates the number of times the given base appears on the sequence. It should be written in the Seq0.py file

Filename: P0/Ex4.py
Desription: Write a python program for calculating the number of each bases located on each of the five genes
Output: This is what should be seen on the console after the execution:

-----| Exercise 4 |------

Gene U5:
  A: 360
  C: 229
  T: 491
  G: 234

Gene ADA:
  A: 7446
  C: 9011
  T: 8394
  G: 9061

Gene FRAT1:
  A: 746
  C: 1138
  T: 823
  G: 1138

Gene FXN:
  A: 6246
  C: 6422
  T: 6652
  G: 6295

Gene U5:
  A: 360
  C: 229
  T: 491
  G: 234

Process finished with exit code 0

Considerations:
- Create a list with the four bases
- For every gene, iterate over the bases, printing its number

Exercise 5: seq_count()

Implement the seq_count(seq) function, that calculates the number of times all the bases appears on the sequence. It returns a dictionary with all the information. The keys of the dictionary are the bases: 'A', 'T', 'C' and 'G'. It should be written in the Seq0.py file

Filename: P0/Ex5.py
Desription: Write a python program for calculating the number of each bases located on each of the five genes. It similar to the exercise 4, but what is printed on the console is the dictionary returned by the seq_count() function
Output: This is what should be seen on the console after the execution:

-----| Exercise 5 |------
Gene U5: {'A': 360, 'T': 491, 'C': 229, 'G': 234}
Gene ADA: {'A': 7446, 'T': 8394, 'C': 9011, 'G': 9061}
Gene FRAT1: {'A': 746, 'T': 823, 'C': 1138, 'G': 1138}
Gene FXN: {'A': 6246, 'T': 6652, 'C': 6422, 'G': 6295}
Gene U5: {'A': 360, 'T': 491, 'C': 229, 'G': 234}

Process finished with exit code 0

Considerations:
- A dictionary in python is created using the curly brackets {} and placing inside the pairs key:value
```
d = {'A':0, 'T':0, 'C':0, 'G':0}
```
- In this example the d dictionary is created, which has the four bases as keys, and they all have been initialized to 0. You can access to any value by writing the key inside the brackets:
```
d['A']  # -- Accesing the value of the 'A' base
```

Exercise 6: seq_reverse()

Implement the seq_reverse(seq) function, that calculates the reverse of the given sequence. Imaging we have this sequence: "ATTCG". Its reverse is: "GCTTA". It should be written in the Seq0.py file

Filename: P0/Ex6.py
Desription: Write a python program for creating a new fragment composed of the first 20 bases of the U5 gene. This fragment should be printed on the console. Then calculate the reverse of this fragment by calling the seq_reverse() function. Finally print it on the console
Output: This is what should be seen on the console after the execution:

------| Exercise 6 |------
Gene U5:
Frag: ATAGACCAAACATGAGAGGC
Rev : CGGAGAGTACAAACCAGATA

Process finished with exit code 0

Considerations:
- The reverse of any string can be easily calculated by using the brackets [] with the correct values inside

Exercise 7: seq_complement()

Implement the seq_complement(seq) function, that calculates a new sequence composed of the complement base of each of the original bases. The bases work in pairs. A and T are complement, as well as C and G. Therefore, the complement sequence of "ATTCG" is "TAAGC". It should be written in the Seq0.py file

Filename: P0/Ex7.py
Desription: Write a python program for creating a new fragment composed of the first 20 bases of the U5 gene. This fragment should be printed on the console. Then calculate the complement of this fragment by calling the seq_complement() function. Finally print it on the console
Output: This is what should be seen on the console after the execution:

-----| Exercise 7 |------
Gene U5:
Frag: ATAGACCAAACATGAGAGGC
Comp: TATCTGGTTTGTACTCTCCG

Process finished with exit code 0

Considerations:
- It is very easy to calculate the complement of the bases by defining a dictionary which store the complement of any base

Exercise 8: processing the genes

Write a python program that automatically calculate the answer to this question:

Which is the most frequent base in each gene?

Filename: P0/Ex8.py
Output: This is what should be seen on the console after the execution. The X letter is the answer for that gene (The value is not shown in this output, you should calculate it)

-----| Exercise 7 |------
Gene U5: Most frequent Base: X
Gene ADA: Most frequent Base: X
Gene FRAT1: Most frequent Base: X
Gene FXN: Most frequent Base: X
Gene U5: Most frequent Base: X

Process finished with exit code 0

END of the session

The session is finished. Make sure, during this week, that everything in this list is checked!

Author

Juan González-Gómez (Obijuan)

Credits

Alvaro del Castillo. He designed and created the original content of this subject. Thanks a lot :-)

License

Links

Home

S0: Introduction
S1: Tools I
S2: Tools II
S3: Practicing with the tools

Session 7: Practice 1

Session 7: Practice 1

Contents

Introduction

Finish your previous practices

Exercises

Exercise 1: Creating the Seq1 module

Exercise 2: Null sequences

Exercise 3: Null, valid and invalid sequences

Exercise 4: seq_count_base()

Exercise 5: seq_count()

Exercise 6: seq_reverse()

Exercise 7: seq_complement()

Exercise 8: processing the genes

END of the session

Author

Credits

License

Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

P0: Seq0 Module

P1: Seq1 Module

P2: Client0 Module

P3: Seq Server

P4: Bases Web Server

P5: Bases2 Web Server

P6: Seq2 Server

P7: Ensembl client

Final project

Clone this wiki locally