Introduction to Python
Python is one of the most popular programming languages. There are lots of reasons why, here are some of them.
- It is very easy to use
- There are lots and lots of tools for different things that have already been written and tested by others
- The scientific python suite includes many interconnected modules
- numpy - the numeric Python library that excels at matrix manipulations and common mathematical functions
- scipy - the scientific Python library that includes statistics, modeling, and much more
- matplotlib - the graphical library for making plots and graphs
- seaborn - an extension to matplot lib that makes beautiful graphs easy
- pandas - rich data structures (especially tables) for manipulating data
- networkx - for network analysis, interrogation, and manipulation
- sci-kit learn - for machine learning
- There are many, many domain specific add ons that you can leverage when you are writing your code and analyzing your data. For example, some of the topics include:
- The scientific python suite includes many interconnected modules
- It is one of the most popular langauges on Stack Overflow with over 1.75 million questions and answers!
- The PyCharm development enironment is free!
The pros and cons of Python
Python is no where near the fastest computer programming language. In fact, it is even slower than some similar languages like Perl, but it is much easier to develop in Python than many other languages (e.g. Perl). In a lot of applications (e.g. bioinformatics, data science, geoscience), the development time is much more important than the computer run time. Note that you should still think about what affects computer run time and how to measure that, and sometimes you will run into intractable problems, but often the cheapest solution is to throw more computers at a problem, not to spend more time developing it. Note this is not always true, and there are many occassions (e.g. running things on cell phones) where you want to be very careful to optimize your applications!
Learning Python for Bioinformatics
We have adapted Marc Cohen’s Google Colab notebooks that teach Python to be more aligned with bioinformatics. This series of Python notebooks will walk you through the Python basics, and introduce you to more advanced concepts as you progress.
You can access the first Google Colab notebook here
Or you can jump to a specific lesson. Each of these links opens in Google Colab. You should make a copy of the file and run the code for yourself.
- Lesson 0 - Introduction and Index
- Lesson 1 - Variables and Types
- Variables
- Naming variables
- Types of data
- Numeric Types
- String Types
- Using Variables in Python
- Built in Python functions
- Lesson 2 - Expressions
- Constants vs. Variables
- Data Types
- The Boolean (bool) Type
- The None Type
- Comparison Opererators
- Boolean Operators - and, or, and not
- Order of Evaluation
- Python Precedence Rules
- F strings
- Lesson 3 Lists
- Lists
- Creating Lists
- Sets
- List Operations
- Lesson 4 - Dictionaries
- Dictionary Operations
- Rule of thumb for truth value of lists, and dictionaries
- Lesson 5 - Conditionals
- Controlling Program Flow
- if Statements
- if Block Structure
- Python’s use of whitespace
- else Statements
- elif Statements
- For loops
- Lesson 6 - Functions
- Defining Functions
- Docstrings
- Return Values
- Lesson 7 - Reading and Writing Files
- Reading a fasta file
- Writing to files
- Reading and writing gzip files
- Lesson 8 - Modules
- The from Statement
- When to use import vs. from
- Lesson 9 - Finding and installing other peoples code
- Lesson 10 - Translating a DNA sequence
- Challenges to try!
- Lesson 11 - BioPython
- Translating a DNA sequence
- Reading a fasta file
- Reading a fastq file
- Reading GenBank Files
- Lesson 12 - Plotting data with Pandas and Seaborn
- Pandas
- Seaborn
- Plotting the abundance of phage in samples
- Violin plots with jitters
- Plotting bacterial phyla
- Stacked bar charts
- Heatmaps