-
Notifications
You must be signed in to change notification settings - Fork 7
Session 6: Object Oriented Programming

- Time: 2h
- Date: Tuesday, Feb-18th-2020
- Goals: *
- Introduction
- Working with sequences and functions
- Modelling the sequences with object oriented programming
- TODO
- Exercises
- End of the session
- Author
- Credits
- License
All the python programs that you have developed so far use the classical paradigm of the procedural programming. The main idea is to use functions that can be called at any point during the program's execution
The problem you have to solve is divided into smaller parts, implemented using functions, data structures and variables
On the contrary, the object oriented paradigm tries to model the problems by means of defining objects that interact between them. Each object has attributes and methods. The main advantages are the encapsulation and the reusability of the code
In the previous session we developed our own library of functions (Seq0) for working with sequences. As an example let's use that library for performing some simple calculations on the sequence "ATTCCCGGGG". Save this example in the folder Session-06 and call it test-01.py
from Seq0 import *
seq1 = "ATTCCCGGGG"
print(f"Seq: {seq1}")
print(f" Rev : {seq_reverse(seq1)}")
print(f" Comp: {seq_complement(seq1)}")
print(f" Length: {seq_len(seq1)}")
print(f" A: {seq_count_base(seq1, 'A')}")
print(f" T: {seq_count_base(seq1, 'T')}")
print(f" C: {seq_count_base(seq1, 'C')}")
print(f" G: {seq_count_base(seq1, 'G')}")This is what you get in the console when the program is executed:
Seq: ATTCCCGGGG
Rev : GGGGCCCTTA
Comp: TAAGGGCCCC
Length: 10
A: 1
T: 2
C: 3
G: 4
This paradigm is based on defining the data (variables) on one hand, and creating separated functions for working with that data. When calling the function you should pass the data as parameters. The data and function are separated things
Imagine that now we define a new sequence, but we make a mistake:
from Seq0 import *
# -- This sequence is invalid, as it as characteres
# -- different than the 4 bases: A,T, C,G
seq1 = "ATTXMMNXCCCGGGG"
print(f"Seq: {seq1}")
print(f" Length: {seq_len(seq1)}")
print(f" A: {seq_count_base(seq1, 'A')}")
print(f" T: {seq_count_base(seq1, 'T')}")
print(f" C: {seq_count_base(seq1, 'C')}")
print(f" G: {seq_count_base(seq1, 'G')}")
# -- But this program works normally. No error es detected- How could you solve that problem? How could you guarantee that the sequence introduced is valid?
One solution is adding a new function for checking that a given sequence is ok. Something like this:
...
seq1 = ATTXMMNXCCCGGGG"
# -- Check that the sequence is valid
seq_check(seq1)
...It is ok, but what happens if there are programmers that do not call this function for checking? It is NOT possible for you to assure that it is going to work in all the cases. It depends on the people using it. Some may call the seq_check() function, but other do not.
Is it possible to have a better model for organizing the data and the functions?
Yes! There are better models. One is the Object Oriented Programming
In this model, the data and the functions are grouped together into what is called and object. They are no longer separated. You work with objects. Every object has a well defined actions that you can perform on then. These actions are called methods
In order to learn about this new paradigm, let's model the DNA sequences with it
We will think about the sequences as objects. This objects can have some properties, like their name, the chromosome to which they belog or any other information. We also refer to this properties as the object attributes
These object also have come methods: different actions that can be performed on them, such as calculating their length, the number of a certain bases, their complement, and so on
We will learn this model by defining sequence objects from scratch
A class is the template we use for creating objects. Inside the class we define all the methods of the objects of that class, and we program their behaviour. Let's create a minimum class for working with sequences. We start by defining an empty class:
class Seq:
"""A class for representing sequences"""
passWhen the class is defined, we can create objects of this class as shown here:
# Main program
# Create an object of the class Seq
s1 = Seq()
# Create another object of the Class Seq
s2 = Seq()Let's place a breakpoint in the line 7

Press the step over option twice. On the variable panel we will see the two new objects created: s1 and s2. Notice that they are of type Seq

Contrats! You have created your first two empty objects!
The methods are the actions that the objects can perform. The first method we are implementing is the initialization method. It is a special method that is called every time a new object is created. All the methods have the special parameter self as the first parameter
class Seq:
"""A class for representing sequences"""
def __init__(self):
print("New sequence created!")
# Main program
# Create an object of the class Seq
s1 = Seq()
s2 = Seq()
print("Testing...")Run the program. When the s1 object is created, the string "New sequence created!" is printed. The same happens when the s2 object is also created. The output of the program is:
New sequence created!
New sequence created!
Testing....
-
Execute it step by step, using the step over command
-
Execute it step by step using the step into command every time. Check that the debugger enters into the Class and execute the** __init__** method
For representing a sequence we will use a string that is store in every object. We will call this string as strbases. The data stored in the objects is referred as attributes
We modify the __ini__ method to include a new parameter: the string for creating the object. That parameter will be stored in the object attribute: self.strbases
class Seq:
"""A class for representing sequences"""
def __init__(self, strbases):
# Initialize the sequence with the value
# passed as argument when creating the object
self.strbases = strbases
print("New sequence created!")
# Main program
# Create objects of the class Seq
s1 = Seq("AGTACACTGGT")
s2 = Seq("CGTAAC")If you execute it, you will see the same output than before. But now something has happened. The two objects created have their own sequence string stored in the strbases attribute. Let's debug it to see it

Each object has its own sequence! If you debug it using the step into option, you will see that the sequence is stored in the object when the self.strbases = strbases line is executed
There is another special method, called __str__ that is invoked whenever the object is printed. Printing our objects means that we want to see the sequence on the console
class Seq:
"""A class for representing sequences"""
def __init__(self, strbases):
# Initialize the sequence with the value
# passed as argument when creating the object
self.strbases = strbases
print("New sequence created!")
def __str__(self):
"""Method called when the object is being printed"""
# -- We just return the string with the sequence
return self.strbases
# --- Main program
s1 = Seq("AGTACACTGGT")
s2 = Seq("CGTAAC")
# -- Printing the objects
print(f"Sequence 1: {s1}")
print(f"Sequence 2: {s2}")
print("Testing....")After running it, we will see the following messages on the console:
New sequence created!
New sequence created!
Sequence 1: AGTACACTGGT
Sequence 2: CGTAAC
Testing....
In debug mode, if the step into option is pressed when the next instruction to be executed is the first print, you will see how the execution pointer moved into the __str__ method
Let's add a new method: len() for calculating the length of the sequence. As it is a method, the first parameter must be self. For calculating the length, we will use the len() function (because in this example the type we are using for storing the bases is a string)
class Seq:
"""A class for representing sequences"""
def __init__(self, strbases):
# Initialize the sequence with the value
# passed as argument when creating the object
self.strbases = strbases
print("New sequence created!")
def __str__(self):
"""Method called when the object is being printed"""
# -- We just return the string with the sequence
return self.strbases
def len(self):
"""Calculate the length of the sequence"""
return len(self.strbases)
# --- Main program
s1 = Seq("AGTACACTGGT")
s2 = Seq("CGTAAC")
# -- Printing the objects
print(f"Sequence 1: {s1}")
print(f" Length: {s1.len()}")
print(f"Sequence 2: {s2}")
print(f" Length: {s2.len()}")Once the objects have been created, their lengths can be calculated just by calling the len() method. For doing so, we have to write a point and the the name of the method: s1.len(). The meaning of this is: "Execute the action for calculating the length on the s1 object"
Run the program. This is what you should see on the console:
New sequence created!
New sequence created!
Sequence 1: AGTACACTGGT
Length: 11
Sequence 2: CGTAAC
Length: 6
New classes can be derived from others, reusing their methods and adding new ones. This is called inheritance. Just to present the concepts, Let's create the Gene class derived from the Sequence. It will not add anything
class Gene(Seq):
"""This class is derived from the Seq Class
All the objects of class Gene will inherite
the methods from the Seq class
"""
passNow we can create objects from the Gene class. This objects will have the same methods than the Seq objects. We say that these methods have been inherited from the Seq class
# --- Main program
s1 = Seq("AGTACACTGGT")
g = Gene("CGTAAC")
# -- Printing the objects
print(f"Sequence 1: {s1}")
print(f" Length: {s1.len()}")
print(f"Gene: {g}")
print(f" Length: {g.len()}")If we run the program this is what is shown in the console:
New sequence created!
New sequence created!
Sequence 1: AGTACACTGGT
Length: 11
Gene: CGTAAC
Length: 6
Let's practice with the ensembl database. Create the Session-04 folder in your working repository. Store all the files create during this session (exercises and sequences)
Get from the Ensembl database the following sequences of the given Genes:
-
RNU6_269P gene, from Chromose 16 of the Human gnome
- Filename: Session-04/RNU6_269P.txt
-
FRAT1. Chromosome 10. Human Gnome
- Filename: Session-04/FRAT1.txt
-
U5. Clown anemonefish
- Filename: Session-04/U5.txt
-
ADA. Human Gnome. In which Chromosome is found?
- Filename: Session-04/ADA.txt
-
FXN. Cat Gnome. In which Chromosome is found?
- Filename: Session-04/FXN.txt
From now on, we will use the module Path from the Pathlib library for accessing the files. This is a very simple and modern way of working with files and paths
- Filename: Session-04/print_file.py
- Description: This program just opens a text file (for example a dna file) and prints the contents on the console. The goal is to get familiar with the Path module. This exercise is already solve for your. Just try it and make sure you understand it. No error control is implemented yet
from pathlib import Path
# -- Constant with the new of the file to open
FILENAME = "RNU6_269P.txt"
# -- Open and read the file
file_contents = Path(FILENAME).read_text()
# -- Print the contents on the console
print(file_contents)The session is finished. Make sure, during this week, that everything in this list is checked!
- You have all the items of the session 3 checked!
- Your working repo contains the Session-04 Folder with the following files:
- RNU6_269P.txt
- FRAT1.txt
- U5.txt
- ADA.txt
- FXN.txt
- print_file.py
- head.py
- body.py
- sequence.py
- All the previous files have been pushed to your remote Github repo
- Juan González-Gómez (Obijuan)

- Alvaro del Castillo. He designed and created the original content of this subject. Thanks a lot :-)

S0: Introduction
S1: Tools I
S2: Tools II
S3: Practicing with the tools
S8: Client-Server-1
S9: Client-Server-2
S10: Client-server-3
S11: Client-server-4
S12: HTTP protocol-1
S13: HTTP protocol-2
S14: HTTP module
S15: HTTP module
S16: HTML forms
S17: HTML forms