Extended Info
Introduction
This page provides detailed technical guidance on the steps required to build a complete and functional model with Thermo-Flux. Going beyond the quickstart and the two tutorials (e.coli core model and iMM904 model), here we present examples of code implementation that are relevant to augment your own stoichiometric model with thermodynamic constraints. For the background information and the modelling assumptions, we encourage you to refer to the protocol paper.
Step 1: Definition of physical and biochemical parameters
Thermo-Flux models rely on a well-defined thermodynamic environment. This includes pH, ionic strength, temperature, and compartment-specific membrane potentials.
Membrane Potentials
Membrane potential difference between compartment c and e is defined as
:math: Phi_{ce} = Phi_c - Phi_e
with Phi_c representing the potential in compartment c.
Table 1 : Physical and biochemical parameters, their units and examples of how to define them in ‘Thermo-Flux’.
Parameter |
Unit |
Code example |
|---|---|---|
Potential hydrogen (pH) |
Dimensionless |
model.pH = {"c": Q_(7.6), "e": Q_(7)}
|
Ionic strength (I) |
mol L-1 (M) |
model.I = {"c": Q_(0.25, 'M'),
"e": Q_(0.25, 'M')}
|
Membrane potential differences (phi) |
Volt (V) |
model.phi = {'ce': Q_(0.15, 'V')}
|
Potential magnesium (pMg) |
Dimensionless |
model.pMg = {'c': Q_(3), 'e': Q_(3)}
|
Temperature (T) |
Kelvin (K) |
model.T = Q_(298, 'Kelvin')
|
Step 2: Definition of metabolites and chemical species
Connection with eQuilibrator
Thermo-Flux automatically recognises the metabolite identifiers and links to the eQuilibrator compound retrieved with metabolite.compound.
Annotations can be user-defined or updated in the attribute annotation of the metabolite class e.g.:
metabolite.annotation = {'CHEBI':'11111', 'kegg':'C00000'}
Metabolite species
In Thermo-Flux, the average charge, average number of protons, and magnesium ions are returned by the function metabolite.average_charge_protons(), which first interrogates the eQuilibrator compound and then uses the physical parameters defined in Step 1 to return the condition-specific metabolite information.
To facilitate the understanding of these average calculations, this function also returns the information on each species’ abundance and their charge, number of protons and magnesium ion (Figure 2c).
Definition of metabolites with non-decomposable or unknown structures
The formula and charge for these metabolites should be defined using the COBRApy attributes with metabolite.formula and metabolite.charge.
model.metabolite.formula = 'C6H12O6'
model.metabolite.charge = -1
Local cache to access eQuilibrator compounds
When Thermo-Flux queries an eQuilibrator compound for the first time, eQuilibrator will require downloading the latest up-to-date database of eQuilibrator compounds. This local cache is named compound.sqlite and integrates native functions to retrieve compounds or manually add compounds (see eQuilibrator local cache).
Balancing reactions when metabolites don’t have an eQuilibrator formula
When an equilibrator compound is not found for one of the metabolites in a reaction, the initial model formula is used for all metabolites. This avoids reactions being considered as unbalanced because of a lack of coherence.
Step 3: Calculation of Gibbs formation energies
The function model.update_thermo_info() will automatically calculate the required parameters based on the defined physiochemical conditions (Step 1) and the metabolites of the model will now have a defined transformed Gibbs formation energy (\Delta_f G^\prime) and an average charge and number of protons.
Box 1: Additional considerations for Gibbs energy of formation calculation
Uncertainty
Different default uncertainty can be specified with model.rmse_inf = Q_(3000, 'kJ/mol').
We can also estimate a non-zero Gibbs formation energy for metabolites with non-decomposable or unknown structures (see supplementary section “metabolites with unknown formation energy”). This is implemented by the fit_unknown_dfG0=True argument when estimating Gibbs formation energies.
Redox
In Thermo-Flux a formation energy and a standard error can be explicitly defined, and the redox attribute set to true to ensure the formation energy is not automatically recalculated.
For example, the midpoint potential of cytochrome C is 250 mV (Lennarz & Lane, 2013). Applying the Nernst equation at equilibrium yields ∆𝑟𝐺° ′=− 1 × 96.5 × 0.250 = −24.125 𝑘𝐽𝑚𝑜𝑙−1. If the oxidised and reduced cytochrome C always appear as a pair in reactions, the formation energy of the reduced form can be defined as −24.125/2 = −12.05 𝑘𝐽𝑚𝑜𝑙−1 and the oxidised form as 12.05 𝑘𝐽.𝑚𝑜𝑙−1. e.g.:
cyt_c_red_c.dfG0prime() = Q_(-12.05, 'kJ/mol')
cyt_c_red_c.redox = True
cyt_c_red_c.dfG_SE = Q_(0, 'kJ/mol')
Biomass formation energy
In Thermo-Flux, the function thermo_flux.tools.drg_tools.dfGbm() returns the biomass formation energy given a specified empirical formula of biomass and can be used to explicitly define the biomass formation energy, e.g.:
dfGbm = thermo_flux.tools.drg_tools.dfGbm(H=1.613, O=0.557, N=0.158, P=0.012,
S=0.003, K=0.022, Mg=0.003, Ca=0.001,
units='kJ/g')
model.metabolites.biomass.dfGprime() = dfGbm
model.metabolites.biomass.biomass = True
model.metabolites.biomass.dfG_SE = 0
This function uses the following approach:
The standard enthalpy of combustion (\(H_c^0\)) is estimated using the Patel-Erickson equation, which assumes it is proportional to the number of electrons transferred to oxygen during combustion. Here \(n_C\), \(n_H\), \(n_O\), \(n_N\), \(n_P\) and \(n_S\) are the number of C, H, O, N, P and S atoms in the biomass elemental formula:
The standard enthalpy of formation of biomass is then derived from combustion products:
The standard entropy of formation is estimated as:
From which the Gibbs free energy of formation follows:
Finally, the Gibbs formation energy can be converted to a mass-specific value using either the cell dry weight density or the molecular mass from the biomass elemental formula:
Biomass formation energy is made dependent on the pH of the biomass metabolite’s compartment when transformed based on the number of hydrogen atoms of which it is formed. It is done automatically when building a Thermo-Flux model if model.update_biomass_dfG0 is set to True.
Metabolites with unknown formation energy
By default, metabolites with unknown or non-decomposable structures are assigned a mean formation energy of 0 kJ·mol-1, meaning they are excluded from reaction energy calculations. This can be problematic — large absolute errors may be needed to estimate feasible reaction energies. To get better estimates, we can exploit the reaction stoichiometry already present in the model.
When an unknown metabolite appears in a reaction alongside known ones, the reaction energy is shifted by roughly the magnitude of the unknown formation energy. Since metabolic systems tend to operate near equilibrium, reaction energies should be close to 0. We can therefore assume that a large computed reaction energy mostly reflects the gap between the true and assumed (0) formation energy of the unknown compound. Pushing the estimated formation energy toward cancelling the reaction energy gives a better approximation — and because reactions share metabolites, this can be solved as a least-squares problem across multiple reactions simultaneously.
In practice, for reactions containing unknown compounds, the uncertainty in reaction energy is regressed against the negative of the mean standard reaction energies. The least-squares solution yields an error vector m, which is multiplied by the relevant part of the square-root covariance matrix to convert back into formation energies, giving updated mean formation energies for the unknown compounds.
Example — ATP hydrolysis:
Say the formation energy of ATP is unknown and assumed to be 0 kJ·mol-1. The computed \(\Delta_r G\) of ATP hydrolysis then becomes −2802 kJ·mol-1:
Metabolite |
Stoichiometry |
Formation energy × Stoichiometry |
|---|---|---|
ATP |
−1 |
0 |
H2O |
−1 |
238 |
ADP |
+1 |
−1945 |
Pᵢ |
+1 |
−1095 |
Total (reaction energy) |
−2802 |
Shifting ATP’s formation energy to cancel this reaction energy moves it toward −2802 kJ·mol-1 — much closer to the true value of −2811 kJ·mol-1, and far better than 0. The more reactions a metabolite participates in, and the more known metabolites those reactions contain, the better the estimate. Conversely, if many unknowns appear together in the same reaction, individual estimates become less reliable.
The underlying logic in equation form:
Step 4: Delineation of transporter characteristics
For each transport reaction, ‘Thermo-Flux’ will automatically determine the transported metabolite, the transported charge, and the transported protons, depending on the defined physiological parameters of the compartments and the reaction stoichiometry. Additional transported protons can be achieved by altering the reaction stoichiometry (Figure 3b). Alternatively, additional transported protons can be defined using reaction.transported_h(), which represents additional protons transported by a reaction, e.g.:
reaction.transported_h = {'e': -1, 'c': 1}
to define an additional proton moving from the extracellular (e) compartment to the cytosol (c).
Adding transporter variants Additionally, in case of transport processes, for which at the given pH value no charge-neutral transport variant exists, we suggest introducing an additional transport reaction, in which protons balancing the charge are co-translocated together with the respective species, i.e., adding a proton symporter or antiporter. This additional transport variant ensures that for every metabolite, a transport variant exists that does not translocate net charge.
Addition of transporter variants can automatically be achieved with the function reaction.add_transporter_variants(), which identifies the species transported in the original reaction and adds variants to represent the transport of all alternative species.
For example, a model may contain a reaction for phosphate transport, pi_e -> pi_c. At pH 5, this ion exists entirely in the H_2PO_4^- form with a charge of -1 (Figure 3a). Therefore, all the major species of the latter ion are already represented but a charge-neutral transporter does not exist. A proton coupled reaction of pi_e + H_e -> pi_c + H_c is automatically added to the model (Figure 3b).
Transporters with simultaneous chemical transformation of the transported metabolite Some transport reactions involve chemical transformation of the transported metabolite, e.g., phosphotransferase system (PTS) sugar transporters which phosphorylate sugars during transport (McCoy et al., 2015). In this case it is not possible to automatically determine the specific metabolite that is transported, as it does not appear as both a substrate and product of the reaction. Therefore, it is necessary to manually specify the transported metabolite using e.g.:
reaction.transported_mets = {Glc_e: -1}
to represent extracellular glucose as the metabolite that is transported across the membrane.
Reporting
By setting the argument report to True, the function model.update_thermo_info() can provide a reporting table as a pandas DataFrame, with information on the stoichiometry, balancing status, and transported metabolites/charge/protons of each reaction. In this table, reactions that require inspection by the user will appear in the top rows.
Ambiguous proton or ion transporters
It is important to distinguish between free protons that are transported as part of the transport mechanism (e.g. in proton symporters) and protons which are bound/released from metabolites as part of a chemical reaction.
In general, this is automatically determined but in some cases is ambiguous. Ambiguous reactions are highlighted to the user for manual curation. Curation consists of specifying manually the number of transported free protons or ions, e.g., reaction.transported_h = {'e': -1, 'c': 1} to represent the transport of one proton from the extracellular to cytosolic compartment.
As an example, the reaction of mitochondrial Complex II
Ubiquinone-8_c + succinate_m <=> fumarate_m + Ubiquinol-8_c
would need the user to specify:
tmodel.reactions.ComplexII.transported_h = {'m': -2.0, 'c': 2.0}
as two protons are moved from the mitochondria to the cytosol and are subsequently taken up by the protonation of Ubiquinone-8 into Ubiquinol-8.
Step 5: pH-dependent charge and proton balancing
Non-transport reactions
The function reaction_balance() can be used to automatically balance the protons in a reaction based on the compartment conditions with the option to also balance magnesium ions if desired.
In the example of ATP hydrolysis, 0.7 protons will be added to have an equal number of protons and charge on both sides of the reaction (protons are positively charged and therefore charge balance is also maintained).
Transport reactions
To balance transport reactions, Thermo-Flux first identifies the most abundant species (using metabolite.major_microspecies automatically), then considers it as being transported. The balancing then occurs by comparing what is in the inner compartment, what is being transported and what will be in the outer compartment.
Magnesium ions
Analogously to protons, Mg2+ ions can also be balanced, and this option is available to the user by setting balance_mg=True.
Step 6: Calculation of Gibbs energy of reactions
To calculate the standard reaction energy of all reactions in the model, the function model.update_thermo_info() can be used. Once it has been run, the standard reaction energy and the standard transformed reaction energy (calculated using standard transformed formation energies) can be retrieved for each reaction with reaction.drG0 and reaction.drG0prime, respectively.
Note : reactions occurring within the lipid bilayer (such as those catalysed by membrane-embedded enzymes like the respiratory chain complexes) represent a limitation of the current framework. This limit manifests in two ways: ΔfG° estimates for membrane-phase species are unreliable, as group contribution methods are trained on aqueous-phase data and perform poorly for compounds with long hydrophobic tails; and the concentration-dependent term of the Gibbs energy is ill-defined for species lacking a true aqueous concentration. To mitigate over-constraining reaction directionality with inaccurate thermodynamic data, we recommend assigning wide concentration bounds to such species, a practice suggested by Elad Noor to the eQuilibrator community (https://groups.google.com/g/equilibrator-users/c/ARvwhQSo5rU/m/XOwFTdbIAQAJ).
Step 7: Establishment of the thermodynamic-stoichiometric solution space
Metabolite concentration bounds
In practice metabolite concentration bounds are defined by setting the lower_bound and upper_bound attributes and a user defined unit e.g.:
metabolite.lower_bound = Q_(10, 'µM')
The concentration values will then be automatically converted to mol/L before applying thermodynamic constraints.
The function model.add_TFBA_variables() sets up a thermodynamic FBA optimisation problem using the Gurobi optimiser that can be optimised using model.m.optimize(). Implementation of the constraints in the linear program is detailed in the methods (see: implementing conditional constraints in a linear program).
Step 8: Regression: fitting models to experimental data
The function model.regression() can be used to add regression constraints and objectives to the previously constructed thermodynamic FBA problem. Data can be provided for any flux or metabolite concentration, in the pandas DataFrame format.
The Dataframe for the fluxes and the metabolite data needs to be in the following format :
condition |
rxn/met |
mean |
sd |
|---|---|---|---|
condition 1 |
rxn/met A |
XXX |
YYY |
rxn/met B |
ZZZ |
WWW |
Note the pandas.MultiIndex (condition,rxn/met).
Objective function of a regression optimization
The objective function of the optimization problem that is set up using the function gurobi.solver.regression() and minimized is:
Here, \(v_{j}^{obs,mean}\) and \(v_{j}^{obs,std}\) denote the experimentally measured mean and standard deviation of flux \(v_{j}\), respectively, while \(\widehat{v_{j}}\) represents the model-predicted flux. Similarly, \(c_{i}^{obs,mean}\) and \(c_{i}^{obs,std}\) are the measured mean and standard deviation of metabolite \(c_{i}\), and \(\widehat{c_{i}}\) is the model-predicted metabolite concentration. \(V_{c}\) and \(V_{m}\) correspond to the relative volumes of the cytosolic and mitochondrial compartments, respectively. The sets \(RXN\) and \(MET\) represent all reactions and metabolites considered in the model.
Box 3: additional considerations for regressions
Model starting points
The function thermo_flux.solver.gurobi.model_start has been built to allow MIP start from only non-computed values and reduce the probability of multiplying numerical issues between them. This function can even enable the start from a set of specific variables which are known to not cause numerical issues (for example, starting from only metabolite concentrations). The user can provide starting points in either .sol or .mst format:
thermo_flux.solver.gurobi.model_start(tmodel, 'filename.sol',
ignore_vars=['all'],
fix_vars=['qm','ln_conc'],
fix='start')
Multiple starts with different random seeds
As Gurobi is using a branch-and-cut approach to solve the MILP problem, it can sometimes face performance variability issues. An effective way of tackling this problem is to run several optimizations with different values of the seed parameter GRBmodel.params.Seed.