Gapfilling

Gap filling is more of an art form than a science!

The goal of gap filling is to add reactions to your metabolic network that complete the network, but not to add so many reactions that the model will grow under any circumstance at any time. You just want the model to grow in the right conditions!

PyFBA includes several different gap filling approaches, and the API is designed so that it is easy to add and test your own gap filling designs.

First, lets take a look at the built in approaches:

PyFBA.gapfill.essentials.suggest_essential_reactions()

Identify a set of reactions that you should add to your model for growth because they are essential reactions.

There are 110 reactions (one of which is biomass) that are in _every_ model produced thus far and we include them in all models. This is a set of those reactions just to make sure that we have added them!

Returns:A set of proposed reactions that should be added to your model to see if it grows
Return type:set
PyFBA.gapfill.maps_to_proteins.suggest_reactions_with_proteins(reactions, verbose=False)

Identify a set of reactions that you should add to your model for growth based on the reactions that have proteins associated with them.

Parameters:
  • reactions – our reactions dictionary from parsing the model seed
  • verbose (bool) – add additional output
Returns:

a set of reactions that could be added to test for growth

Return type:

set

PyFBA.gapfill.maps_to_proteins.suggest_reactions_without_proteins(reactions, verbose=False)

Identify a set of reactions that don’t have any proteins associated with them.

It is generally a bad idea to add all of these to your model since there are about 30,000 and they will probably break your computer if you try and solve it with FBA

Parameters:
  • reactions (dict) – our reactions dictionary from parsing the model seed
  • verbose (bool) – add additional output
Returns:

A set of proposed reactions that should be added to your model to see if it grows

Return type:

set

PyFBA.gapfill.media.suggest_from_media(compounds, reactions, reactions2run, media, verbose=False)

Identify a set of reactions that you should add to your model for growth based on the media compounds

Parameters:
  • reactions (dict) – Our reactions dict object
  • verbose (bool) – Print more output
  • compounds (dict) – Our compounds dictionary
  • reactions2run (set.) – The reactions we are running
  • media (set.) – A set of the compounds in the media
Returns:

A set of proposed reactions that should be added to your model to see if it grows

Return type:

set

PyFBA.gapfill.orphan_compound.suggest_by_compound(compounds, reactions, reactions2run, max_reactions, verbose=False)

Identify a set of reactions that you should add to your model for growth because they contain orphan compounds

This is a slightly different approach to suggesting by compound. We look for “orphan” compounds that only have a few connections to other reactions, and then add those to our network to see what happens.

Note: that our notion of two compounds being equal normally includes their location (and str(c) includes the location). However, we probably have several instances of:

cpd [e] -> cpd [c] and cpd [c] -> products

These should be considered to be the same, and we probably don’t want to consider external compounds any way

Parameters:
  • reactions (dict) – The reactions dictionary
  • compounds (dict) – The compounds dictionary
  • reactions2run (set) – The set of reactions that we will already run
  • max_reactions (int) – The maximum number of reactions that a compound can be associated with. Avoids, eg. H2O
  • verbose (bool) – Print more output
Returns:

A set of proposed reactions that should be added to your model to see if it grows

Return type:

set

PyFBA.gapfill.probability.compound_probability(reactions, reactions2run, cutoff=0, rxn_with_proteins=True, verbose=False)

Identify a set of reactions that you should add to your model for growth based on the probability for the reaction to run left to right and right to left.

The probability is basically the fraction of compounds that are present.

If you set cutoff to zero we calculate the minimum coverage of the compounds in the reactions already in the model and use that to determine which other reactions should be added based on the observation that they have a similar proportion of compounds already in the network.

Parameters:
  • reactions (dict) – our reactions dict
  • reactions2run (set) – The current set of reactions that we will run
  • cutoff (float) – is a minimum probability that must be exceeded for the reaction to be proposed
  • rxn_with_proteins (bool) – limits to just those reactions that have proteins
  • verbose (bool) – print more output
Returns:

A set of proposed reactions that should be added to your model to see if it grows

Return type:

set

PyFBA.gapfill.roles.suggest_from_roles(roles_file, reactions, threshold=0, verbose=False)

Identify a set of reactions that we should add based on a roles file.

We assume that the roles file has the format: [role, probability] separated by tabs. We make no assumption about where you got that file, but you might, for example, look at the closely related organisms.

Parameters:
  • threshold (float) – the threshold for inclusion of the role based on the probability in the file (default = All roles)
  • roles_file (str.) – a file with a list of roles and their probabilities
  • reactions (dict.) – The reactions dictionary from parsing the model seed
  • verbose (bool.) – add additional output
Returns:

A set of proposed reactions that should be added to your model to see if it grows

Return type:

set

PyFBA.gapfill.subsystem.suggest_reactions_from_subsystems(reactions, reactions2run, ssfile='/data/PyFBA/PyFBA/Biochemistry/SEED/Subsystems/SS_functions.txt', threshold=0, verbose=False)

Identify a set of reactions that you should add to your model for growth based on the subsystems that are present in your model and their coverage.

Read roles and subsystems from the subsystems file (which has role, subsystem, classification 1, classification 2) and make suggestions for missing reactions based on the subsystems that only have partial reaction coverage.

Parameters:
  • threshold (float) – The minimum fraction of the genes that are already in the subsystem for it to be added (default=0)
  • reactions (dict) – our reactions dictionary from parsing the model seed
  • reactions2run (set) – set of reactions that we are going to run
  • ssfile (str) – a subsystem file (really the output of dump_functions.pl on the seed machines)
  • verbose (bool) – add additional output
Returns:

A set of proposed reactions that should be added to your model to see if it grows

Return type:

set