Programming Python for Bioinformatics
This optional course is offered in the summer term primarily for the FBBB students in the major programmes. The goals for this course are as follows: (1) Introducing students to Python 3, a programming language widely used within scientific applications. (2) Introducing students to methods for biological data analysis (nucleotide and amino acid sequences, protein structures, bibliographic data) utilising existing tools and self-developed scripts in Python 3 programming language, particularly for sequence homology searches on data sets of variable size. (3) Acquisition of skills necessary for automated application of existing tools for biological data analysis well-fitted to a given problem and its specifics by writing own, simple scripts in Python 3. (4) Acquisition of skills in using Python 3 programming language for processing and visualisation of biological data analysis results.
The course (4 ECTS) includes 15h lectures and 30h practicals. The classes are taught in 4h modules on Mondays 8-11am. The course is co-supervised by Michał Bukowski and Krzysztof Murzyn. Practicals are taught by Adrian Kania and Michał Bukowski.
Course participants' pages
Course instructors' pages
Literature and supporting information
McKinney Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. 2nd Edition. O'Reilly Media, Inc. 2018
Clustal Omega: Sievers et al. (2011) Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology. 7(1):539
The final evaluation of the course is based on student's performance in solving problem tasks presented during classes or assigned as homework (max. 60 pts). At the end of the course a 90 min practical test is scheduled (max. 40 pts). The test includes a few short, practical tasks encompassing the use of tools for biological data analysis and visualization of obtained results. A student is expected to actively participate in the majority of classes and collect at least 50 pts out of 100 pts (max. 60 pts for practicals and max. 40 pts for the practical test).
Students during the course can collect up to 100 points. The following grading policy is used:
- F (failing, 2.0) 0-49 pts
- D (passing, 3.0) 50-59 pts
- C (average, 3.5) 60-69 pts
- B (good, 4.0) 70-79 pts
- B+ (very good, 4.5) 80-89 pts
- A (excellent, 5.0) 90+ pts.
The deliverables of the course are as follows:
- Knowledge – Student knows and understands
- methods for searching biological databases and multiple sequence alignment construction
- command-line options and functionalities of BLAST+ and Clustal Omega tools
- basics of Python 3 programming language and its applications in simple biological data analysis
- how to use NCBI E-Utilities for biological data acquisition from publicly accessible databases
- Python 3 modules and packages such as matplotlib and Pandas and their applications in tabular data processing and data visualisation
- Skills – Student can
- select and use adequate command-line BLAST+ tools locally for their own database creation and sequence homology searches
- use publicly available Clustal Omega software for multiple sequence alignment construction
- use Python 3 programming language for writing simple scripts in order to acquire biological data from publicly accessible databases by using NCBI E-Utilities
- use Jupyter Notebook software for team working in order to deliver simple Python 3 scripts for biological data processing and visualisation
- Social competences – Student is ready to
- work in a team and participate in workload division for data analysis delivery
- recognise the limits of his / her knowledge and is aware of the necessity of lifelong learning
- grasp the meaning of practical use of the acquired knowledge in the field of molecular biology