ScrumPy Model Description Language

Overview

The ScrumPy model description langauge is intended to be a simple, human read/writable, description of a metabolic network, using a syntax that can be readily understood, with a rudimentary knowledge of biochemistry or computing.

A model is defined as a list of statements, and these can be one of for distinct types:

The most important of these is, of course, the reaction, and it is in fact possible to construct rather simple models using only reaction statements.Initialisations set the numerical value of kinetic parameters and initial values of internal concentrations. Comments are ignored by ScrumPy, but exist to inform the the human reader. Directives do not form part of the model description per se, rather they serve as instructions to ScrumPy as to how the model should be interpreted.

Reactions

The reaction is the most fundemantal part of a descripton of metabolism, and is divided into three distinct components: the reaction name, the stoichiometry and a kinetic statement (strictly in that order):

   1 Reac1:
   2    a -> b
   3    V1 * (a - b/keq1)

Here, line 1 defines the reaction name as Reac1, the colon (:) is not part of the name, but indicates that the text to the left is the name of the reaction.

Line 2 is the stoichiometry, in this case one mol of metabolite a is converted into one mol of b. The sperator ,->, indicates that the reaction is irreversible in the left-right direction. Stoichiometries can be constructed with two other separators, <>, indicating a reversible reaction, and <-, indicating an irreversible reaction in the right-left direction.

In the example above a and b were interconverted in unit ratio, other stoichiometric coefficients can be spicified in the obvious way, e.g:

   1 Reac2:
   2    2 a -> 3 b
   3    V1 * (a - b/keq1)

Specifies that two molecules of a are converted into 3 of b. The coefficients can be represented as integers (as above), floating point values (e.g. 3.14 etc), or rational numbers (e.g 22/7). It is only in relatively special circumstances that such exotic coefficients are needed. See also the ElType() directive below.

Line 3 is the kinetic (or rate) law, here specifying simple mass action kinetics with a rate constant V1 and an equilibrium constant, keq1. Sharp eyed readers will notice that these two reactions contain an implicit contradiction, the stochiometry specifies that the reaction is irreversible, but the rate law is reversible. Surprisingly, such a state of afairs does not cause any problems to ScrumPy (because kinetic and structural functionality are separate from one another). However the user must keep this fact in mind when interpreting results generated from kinectic and structural anlysis on the same model.

More detail on kinetics to follow.

In instances where the modeller wishes to only cary out structural analysis on the model the kinetic statement can be replaced with a tilde (~):

   1 Reac3:
   2    2 a -> 3 b
   3    ~

This in fact specifies a default rate law with mass action kinetics with rate and equilibrium constants of one. Thus it is permissible (if not very useful) to use this for kinetic models as well.See also the Structural() directive below.

Identifiers

The named components of of a reaction (rection name, metabolites and parameters) are collectively refered to as identifiers. Identifiers can be of two types - unquoted and quoted. Unquoted identifiers (used in the examples above must start with a letter be followed by zero or more letters, digits and underscore (_).

If a model is only going to be subject to structural analysis, then it may be more convenient, or indeed essential, to use quoted identifiers. A quoted identifier may contain any sequence of characters, except new line, delimited by double quotes (ie the single " character) e.g:

   1 "P-PANTOCYSLIG-RXN":
   2         "CYS" + "CTP" + "4-P-PANTOTHENATE" ->"PPI" + "CMP" + "PHOSPHOPANTOTHENOYL-L-CYSTEINE"
   3         ~

See also the Structural() and Dequote() directives below.

Initialisations

Parameters and concentrations can be assigned a value either as a literal number, or in terms od some other defined values:

   1 V1 = 42
   2 
   3 V2 = V1 * 5

Values will be treated as a concentration if, and only if, they have appeared in the stoichiometry of at least one reaction. Any other value will be assumed to be a parameter, even if it has not been used elsewhere.

It is permissable to leave both concentrations and prmeters uninitialised, but ScrumPy will generate separate warnings for unitialised parameters and concnetrations. A parameter left uninitialised will be set to the internl value of 'NaN' (not a number) and no valid kinetic analysis can be performed until it is set via the ScrumPy interface.

Concentrations left uninitialised will be set to a default value of zero. However this is generally not reccomended as it can lead to rather sever problems in networks containing cycles.

Comments

Comments are defined as starting with a hash (#) character and continue to the end of the line, as with Python. They are completely disregrded by ScrumPy.

   1 Reac3:    # A comment about reaction 3
   2    2 a -> 3 b
   3    ~
   4 
   5 # This coments out the entire line, useful for temporarily removing a reaction
   6 
   7 #Reac3:    # A comment about reaction 3
   8 #   a -> 3 b
   9 #   ~

Directives

Directives are optional instructions to ScrumPy as to how the model description is to be processed, the commonest of which are as follows:

DeQuote()

This directive causes the quotation marks to be removed from identifiers in the ScrumPy model object. This saves the user having to continually double quote identifiers when working interactively. It's use is strongly reccomended when using models with quoted identifiers.

ElType( type_id)

Defines the representation used internally for stoichiometric coefficients, as specified by type_id where type_id is one of the following:

The default, rat, is arbitrary prescision rational (based on the gmpy library) which has the advantage of eliminating rounding errors in some numerical calculations, notably elementary mode calculations. However, there is an overhead in terms of memeory and processing time incuured by using this representation, and in very large (genome-scale) models, where elementary mode calculation is arguably less useful, there is a small, but useful performance gain to be had by using float or int. Unless it is possible to be certain that all stoichiometric coefficients are are integers, then floating point representation is to be preferred. Using ElType(int) in a model with non integer coefficients will lead to serious numerical errors.

String and Boolean representations are much less commonly used, and are only of potential use to developers.

External(metabolite_ids)

The External directive is used to specify which metabolites are external (present in fixed concentration in the environment and produced or consumed by the network at steady-state). It takes a comma delimited list of metabolite identifiers as its' argument, and can be used multiple times. Decalaring a non-existent metabolite as external will generate a warning, but will not otherwise cause problems. Declaring a metabolite as external more than once has no additional effect, it simply remains external.

External("CU+2", "PROTON", "MG+2", "CA+2", "WATER")

Note that a metabolite can also be specified as external by prepending the name with "x_" or "X_" .

Structural()

This specifies that a model will only be subject to structural analysis, never kinetic. Using this saves a great deal of time and memory and should always be used for genome-scale models. It is mandatory for models using quoted identifiers.

Include(FileNames)

A particularly useful feature of ScrumPy is the ability to modualrise (split into two or more separate files) a model. A common example od this is in the construction of genome scale models, where one set of reactions originates in a data-base, whereas others are explicitily added by the user. The whole model can then be given a tree-like structure, where the root (top-level) module includes a a number of others. For example, the top level module in the genome scale model described in Plant Physiology by Poolman'' et al ''(2009) contained:

   1 DeQuote()                    # remove quotes from identifiers
   2 Structural()                 # skip kinetic processing
   3 ElType(float)                # use floating point for matrix elements
   4 
   5 External("WATER", "PROTON")  # Always treat water and protons as external
   6 
   7 Include(
   8     FromBuildModel.spy,      # generated from the aracyc database
   9     Mito.spy,                # A "hand built" representation of mitochondrial metabolism
  10     Transporters.spy,        # Defines the exchange with the environment.
  11     Extras.spy               # Reactions not fitting in with the above.
  12 )

None: SpyMDL (last edited 2012-10-23 16:46:24 by mark)