Practical 2: Common Python Data Structures and Functions
Start ScrumPy as described in the installation instructions. Read the tutorial material below, and then replicate the python examples in the ScrumPy command window (don't copy and paste!). Make sure you understand why you get the output you see, and test this by modifying some of these examples - you should be able to correctly predict the effects of these modifications.
Data Types
A given data type can be as simple as a single digit, or as complex as a suite of genome databases. Python comes with a number of relatively simple (but still extremely useful) data-types, and ScrumPy extends this by providing additional data-types particularly useful for metabolic modelling and related activities. We will start here by examining some of the common built-in types.
Strings
Strings are collections of characters. Characters in a string can be accessed by indexing, and membership of a subset of characters in a string can be evaluated.
Numerical types
The numerical types we will be dealing with are integers, int, and floating-point numbers, float. Integers are written as a sequence of digits. Floats are written as digits with a decimal point in the sequence, and an optional exponent (e or E).
The type of a given data object can be checked using the built-in function type().
Floats and integers can be interconverted using the constructors int() or float().
The common mathematical operators (+,-,/,*) work as expected, note that x**y means xy.
Boolean
Booleans are a subtype of integers. A boolean value is either True or False, and used for writing conditional statements, i.e. if something is True, do something.
Lists (and tuples)
Lists and tuples are collections of items in which are stored in a specific order and each item is associated with (indexed by) an integer. The main difference between the two is that tuples are immutable - once a tuple is created it cannot be changed, whereas lists can. For these exercises we will mainly use lists. An empty list can be created by assigning a pair of closed square brackets to a variable.
1 >>> empty_list=[]
Items can be appended to a list by using the append() method.
- A list can also be created and populated in one go.
Items can be removed from a list using the remove() method.
- As with strings, indexing can be used to copy a subset of a list, keep in mind that the indices of items in lists (like characters in strings) are numbered from 0. Membership of an item in a list can be evaluated as described for strings.
Subsets of lists (and strings) can be accessed using slicing:
The index of a known item can be retrieved using the index() method.
Lists can contain any objects, including other lists. These lists are referred to as nested:
Dictionaries
In other programming languages dictionaries are sometimes called "associative arrays". Unlike lists, dictionaries store collections of items that are ordered by keys, not indices. There is no specific order of the items in a dictionary. The keys of a dictionary must be unique (for a given dictionary) and be hashable, for now this means that any object that is not a list can be used as a key. Here are some examples of dictionaries in action:
1 >>> dict_1 = {'alfa':1,'beta':2} #create a dictionary
2 >>> keys = ['alfa','beta']
3 >>> vals = [1,2]
4 >>> dict_1 = dict(zip(keys,vals)) #create a dictionary from two lists
5 >>> dict_1
6 {'alfa':1,'beta':2}
7 >>> dict_1['alfa'] #access value '1' by key 'alfa'
8 1
9 >>> dict_1.has_key('beta') #check that dict_1 has key 'beta'
10 True
11 >>> dict_1.keys() #print keys of dict_1
12 ['alfa','beta']
13 >>> dict_1.values() #print values
14 [1,2]
15 >>> dict_1['gamma'] = 1 #add new key:value pair
16 >>> dict_1['alfa'] = 'a' #overwrite key:value pair
As with lists, dictionaries can contain nested dictionaries:
1 >>> dict_1 = {'alfa':1,'beta':2} #define first dictionary
2 >>> dict_2 = {'gamma':1} #define second dictionary
3 >>> dict_1['nested'] = dict_2 #add dict_2 as value to key 'nested' in dict_1
4 >>> dict_1
5 {'beta': 2, 'alfa': 1, 'nested': {'gamma': 1}}
6 >>> dict_1['nested']['gamma'] #access key 'gamma' in nested dictionary
7 1
Built-in functions, loops, conditionals, assignment, and evaluation
Some of the conventions for Python syntax we have already seen. Useful built-in functions include len(), which returns the length of an object,
and dir(), which returns a list of methods and attributes of an object.
1 >>> dir(dict_1) # list attributes of dict_1
2 ['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']
You can read more about the built-in functions here. range(integer_1,integer_2) is a built-in function that returns a list of integers ranging from the one integer, integer_1, to (but excluding) the other, integer_2. If the first argument is left blank, Python assumes integer_1 is 0. For example:
- The step size is 1 by default, but can be specified as the third argument:
Note that the step size must be an integer. If floating point steps are needed, the arange() function from the numpy package can be used. It is very similar to the range() function but accepts floating point arguments and returns array objects, which can be converted to lists using the method tolist()
The for loop is used to iterate over an iterable object, e.g. a list. Depending on how the loop is formulated the loop variable will either be an item in the iterable object or an index.
while loops iterate until a condition is fulfilled.
1 >>> a_list = ['a','b','c']
2 >>> i = 0 #assign value 0 to variable i
3 >>> while i<len(a_list): #as long as i is less than 3
4 print a_list[i] #print item at index i in a_list
5 i += 1 #increment i by 1
6
7 a
8 b
9 c
10
11
12
13 >>> for i in range(len(a_list)): #iterating over indices
14 print a_list[i]
15
16 a
17 b
18 c
This implies that if the condition i<len(a_list) is never fulfilled the loop continues indefinitely, which it will. Loops can be combined with conditional statements, where a block of code is executed if a statement is true, else another block is executed. The else block is optional, but must be the last option and no statement may follow on the same line.
If several options are possible the elif statement can be used.
- You may have noticed it already, but it is necessary to point out the distinction between assignment and evaluation: