Extended Info
=============

Introduction
************
This page provides detailed technical guidance on the steps required to build a complete and functional model with Thermo-Flux. Going beyond the quickstart and the two tutorials (*e.coli* core model and iMM904 model), here we present examples of code implementation that are relevant to augment your own stoichiometric model with thermodynamic constraints. For the background information and the modelling assumptions, we encourage you to refer to the protocol paper.

Step 1: Definition of physical and biochemical parameters
*********************************************************
Thermo-Flux models rely on a well-defined thermodynamic environment. This includes pH, ionic strength, temperature, and compartment-specific membrane potentials.

Membrane Potentials
-------------------
Membrane potential difference between compartment c and e is defined as
:math: `\Phi_{ce} = \Phi_c - \Phi_e`
with ``Phi_c`` representing the potential in compartment ``c``.

Table 1 : Physical and biochemical parameters, their units and examples of how to define them in ‘Thermo-Flux’. 

+-------------------------------+-----------------------+-----------------------------------------+
| Parameter                     | Unit                  | Code example                            |
+===============================+=======================+=========================================+
| Potential hydrogen (pH)       | Dimensionless         | .. code-block:: python                  |
|                               |                       |                                         |
|                               |                       |    model.pH = {"c": Q_(7.6), "e": Q_(7)}|
+-------------------------------+-----------------------+-----------------------------------------+
| Ionic strength (I)            | mol L-1 (M)           | .. code-block:: python                  |
|                               |                       |                                         |
|                               |                       |    model.I = {"c": Q_(0.25, 'M'),       |
|                               |                       |               "e": Q_(0.25, 'M')}       |
+-------------------------------+-----------------------+-----------------------------------------+
| Membrane potential differences| Volt (V)              | .. code-block:: python                  |
| (phi)                         |                       |                                         |
|                               |                       |    model.phi = {'ce': Q_(0.15, 'V')}    |
+-------------------------------+-----------------------+-----------------------------------------+
| Potential magnesium (pMg)     | Dimensionless         | .. code-block:: python                  |
|                               |                       |                                         |
|                               |                       |    model.pMg = {'c': Q_(3), 'e': Q_(3)} |
+-------------------------------+-----------------------+-----------------------------------------+
| Temperature (T)               | Kelvin (K)            | .. code-block:: python                  |
|                               |                       |                                         |
|                               |                       |    model.T = Q_(298, 'Kelvin')          |
+-------------------------------+-----------------------+-----------------------------------------+


Step 2: Definition of metabolites and chemical species
******************************************************
Connection with eQuilibrator
----------------------------
``Thermo-Flux`` automatically recognises the metabolite identifiers and links to the eQuilibrator compound retrieved with ``metabolite.compound``.

Annotations can be user-defined or updated in the attribute ``annotation`` of the metabolite class e.g.:

::

    metabolite.annotation = {'CHEBI':'11111', 'kegg':'C00000'}

Metabolite species
------------------

In ``Thermo-Flux``, the average charge, average number of protons, and magnesium ions are returned by the function ``metabolite.average_charge_protons()``, which first interrogates the eQuilibrator compound and then uses the physical parameters defined in Step 1 to return the condition-specific metabolite information.

To facilitate the understanding of these average calculations, this function also returns the information on each species’ abundance and their charge, number of protons and magnesium ion (Figure 2c).


Definition of metabolites with non-decomposable or unknown structures
---------------------------------------------------------------------
The formula and charge for these metabolites should be defined using the COBRApy attributes with ``metabolite.formula`` and ``metabolite.charge``.
::
     model.metabolite.formula = 'C6H12O6'
     model.metabolite.charge = -1

Local cache to access eQuilibrator compounds
--------------------------------------------
When ``Thermo-Flux`` queries an eQuilibrator compound for the first time, eQuilibrator will require downloading the latest up-to-date database of eQuilibrator compounds. This local cache is named ``compound.sqlite`` and integrates native functions to retrieve compounds or manually add compounds (see `eQuilibrator local cache <https://equilibrator.readthedocs.io/en/latest/local_cache.html>`_).

Balancing reactions when metabolites don’t have an eQuilibrator formula
-----------------------------------------------------------------------
When an equilibrator compound is not found for one of the metabolites in a reaction, the initial model formula is used for all metabolites. This avoids reactions being considered as unbalanced because of a lack of coherence. 


Step 3: Calculation of Gibbs formation energies
***********************************************

The function ``model.update_thermo_info()`` will automatically calculate the required parameters based on the defined physiochemical conditions (Step 1) and the metabolites of the model will now have a defined transformed Gibbs formation energy (``\Delta_f G^\prime``) and an average charge and number of protons.

.. rubric:: Box 1: Additional considerations for Gibbs energy of formation calculation
--------------------------------------------------------------------------------------

**Uncertainty**

Different default uncertainty can be specified with ``model.rmse_inf = Q_(3000, 'kJ/mol')``.

We can also estimate a non-zero Gibbs formation energy for metabolites with non-decomposable or unknown structures (see supplementary section “metabolites with unknown formation energy”). This is implemented by the ``fit_unknown_dfG0=True`` argument when estimating Gibbs formation energies.

**Redox**

In ``Thermo-Flux`` a formation energy and a standard error can be explicitly defined, and the ``redox`` attribute set to true to ensure the formation energy is not automatically recalculated.
For example, the midpoint potential of cytochrome C is 250 mV (Lennarz & Lane, 2013). Applying the Nernst equation at equilibrium yields ∆𝑟𝐺° ′=− 1 × 96.5 × 0.250 = −24.125 𝑘𝐽𝑚𝑜𝑙−1. If the oxidised and reduced cytochrome C always appear as a pair in reactions, the formation energy of the reduced form can be defined as −24.125/2 = −12.05 𝑘𝐽𝑚𝑜𝑙−1 and the oxidised form as 12.05 𝑘𝐽.𝑚𝑜𝑙−1. e.g.:

::

    cyt_c_red_c.dfG0prime() = Q_(-12.05, 'kJ/mol')
    cyt_c_red_c.redox = True
    cyt_c_red_c.dfG_SE = Q_(0, 'kJ/mol')


**Biomass formation energy**

In ``Thermo-Flux``, the function ``thermo_flux.tools.drg_tools.dfGbm()`` returns the biomass formation energy given a specified empirical formula of biomass and can be used to explicitly define the biomass formation energy, e.g.:

::

    dfGbm = thermo_flux.tools.drg_tools.dfGbm(H=1.613, O=0.557, N=0.158, P=0.012,
                                              S=0.003, K=0.022, Mg=0.003, Ca=0.001,
                                              units='kJ/g')
    model.metabolites.biomass.dfGprime() = dfGbm
    model.metabolites.biomass.biomass = True
    model.metabolites.biomass.dfG_SE = 0

This function uses the following approach:

The standard enthalpy of combustion (:math:`H_c^0`) is estimated using the Patel-Erickson equation, which assumes it is proportional to the number of electrons transferred to oxygen during combustion. Here :math:`n_C`, :math:`n_H`, :math:`n_O`, :math:`n_N`, :math:`n_P` and :math:`n_S` are the number of C, H, O, N, P and S atoms in the biomass elemental formula:

.. math::

   H_c^0 = -111.14 \, \text{kJ mol}^{-1} \cdot (4n_C + n_H - 2n_O - 0n_N + 5n_P + 6n_S)

The standard enthalpy of formation of biomass is then derived from combustion products:

.. math::

   \Delta_f H_{bm}^0 = n_C \Delta_f H_{CO_2}^0 + 0.5 \, n_H \Delta_f H_{H_2O}^0

The standard entropy of formation is estimated as:

.. math::

   \Delta_f S_{bm}^0 = -0.813 \sum_j \frac{S^0_m(j)}{a_j} n_j

From which the Gibbs free energy of formation follows:

.. math::

   \Delta_f G_{bm}^0 = \Delta_f H_{bm}^0 - T \Delta_f S_{bm}^0

Finally, the Gibbs formation energy can be converted to a mass-specific value using either the cell dry weight density or the molecular mass from the biomass elemental formula:

.. math::

   \Delta_f G^0_{bm} \, (\text{kJ} \, g_{CDW}^{-1}) = \frac{\Delta_f G^0_{bm} \, (\text{kJ} \, \text{cmol}^{-1})}{\rho \, (\text{cmol} \, g_{CDW}^{-1})}
Biomass formation energy is made dependent on the pH of the biomass metabolite’s compartment when transformed based on the number of hydrogen atoms of which it is formed. It is done automatically when building a ``Thermo-Flux`` model if ``model.update_biomass_dfG0`` is set to True.

**Metabolites with unknown formation energy**

By default, metabolites with unknown or non-decomposable structures are assigned a mean formation energy of 0 kJ·mol\ :sup:`-1`, meaning they are excluded from reaction energy calculations. This can be problematic — large absolute errors may be needed to estimate feasible reaction energies. To get better estimates, we can exploit the reaction stoichiometry already present in the model.

When an unknown metabolite appears in a reaction alongside known ones, the reaction energy is shifted by roughly the magnitude of the unknown formation energy. Since metabolic systems tend to operate near equilibrium, reaction energies should be close to 0. We can therefore assume that a large computed reaction energy mostly reflects the gap between the true and assumed (0) formation energy of the unknown compound. Pushing the estimated formation energy toward cancelling the reaction energy gives a better approximation — and because reactions share metabolites, this can be solved as a least-squares problem across multiple reactions simultaneously.

In practice, for reactions containing unknown compounds, the uncertainty in reaction energy is regressed against the negative of the mean standard reaction energies. The least-squares solution yields an error vector **m**, which is multiplied by the relevant part of the square-root covariance matrix to convert back into formation energies, giving updated mean formation energies for the unknown compounds.

Example — ATP hydrolysis:

Say the formation energy of ATP is unknown and assumed to be 0 kJ·mol\ :sup:`-1`. The computed :math:`\Delta_r G` of ATP hydrolysis then becomes −2802 kJ·mol\ :sup:`-1`:

.. list-table::
   :header-rows: 1
   :widths: 30 20 35
   :align: center

   * - **Metabolite**
     - **Stoichiometry**
     - **Formation energy × Stoichiometry**
   * - ATP
     - −1
     - 0
   * - H\ :sub:`2`\ O
     - −1
     - 238
   * - ADP
     - +1
     - −1945
   * - Pᵢ
     - +1
     - −1095
   * - **Total (reaction energy)**
     -
     - **−2802**

Shifting ATP's formation energy to cancel this reaction energy moves it toward −2802 kJ·mol\ :sup:`-1` — much closer to the true value of −2811 kJ·mol\ :sup:`-1`, and far better than 0. The more reactions a metabolite participates in, and the more known metabolites those reactions contain, the better the estimate. Conversely, if many unknowns appear together in the same reaction, individual estimates become less reliable.

The underlying logic in equation form:

.. math::

   \Delta_r G^{\circ} = \sum_{i \in \text{known}} S_{ij} \Delta_f G_i^{\circ} + \sum_{i \in \text{unknown}} S_{ij} \Delta_f G_i^{\circ}

.. math::

   \Delta_r G^{\circ} \ll \sum_{i \in \text{unknown}} S_{ij} \Delta_f G_i^{\circ}

.. math::

   \sum_{i \in \text{unknown}} S_{ij} \Delta_f G_i^{\circ} \approx -\sum_{i \in \text{known}} S_{ij} \Delta_f G_i^{\circ}


Step 4: Delineation of transporter characteristics
**************************************************

For each transport reaction, ‘Thermo-Flux’ will automatically determine the transported metabolite, the transported charge, and the transported protons, depending on the defined physiological parameters of the compartments and the reaction stoichiometry. Additional transported protons can be achieved by altering the reaction stoichiometry (Figure 3b). Alternatively, additional transported protons can be defined using ``reaction.transported_h()``, which represents additional protons transported by a reaction, e.g.:

::

    reaction.transported_h = {'e': -1, 'c': 1}

to define an additional proton moving from the extracellular (``e``) compartment to the cytosol (``c``).


**Adding transporter variants**
Additionally, in case of transport processes, for which at the given pH value no charge-neutral transport variant exists, we suggest introducing an additional transport reaction, in which protons balancing the charge are co-translocated together with the respective species, i.e., adding a proton symporter or antiporter. This additional transport variant ensures that for every metabolite, a transport variant exists that does not translocate net charge.

Addition of transporter variants can automatically be achieved with the function ``reaction.add_transporter_variants()``, which identifies the species transported in the original reaction and adds variants to represent the transport of all alternative species.

For example, a model may contain a reaction for phosphate transport, ``pi_e -> pi_c``. At pH 5, this ion exists entirely in the ``H_2PO_4^-`` form with a charge of -1 (Figure 3a). Therefore, all the major species of the latter ion are already represented but a charge-neutral transporter does not exist. A proton coupled reaction of ``pi_e + H_e -> pi_c + H_c`` is automatically added to the model (Figure 3b).

**Transporters with simultaneous chemical transformation of the transported metabolite**
Some transport reactions involve chemical transformation of the transported metabolite, e.g., phosphotransferase system (PTS) sugar transporters which phosphorylate sugars during transport (McCoy et al., 2015). In this case it is not possible to automatically determine the specific metabolite that is transported, as it does not appear as both a substrate and product of the reaction. Therefore, it is necessary to manually specify the transported metabolite using e.g.:

::

    reaction.transported_mets = {Glc_e: -1}

to represent extracellular glucose as the metabolite that is transported across the membrane.

**Reporting**

By setting the argument ``report`` to True, the function ``model.update_thermo_info()`` can provide a reporting table as a pandas DataFrame, with information on the stoichiometry, balancing status, and transported metabolites/charge/protons of each reaction. In this table, reactions that require inspection by the user will appear in the top rows.

**Ambiguous proton or ion transporters**

It is important to distinguish between free protons that are transported as part of the transport mechanism (e.g. in proton symporters) and protons which are bound/released from metabolites as part of a chemical reaction.
In general, this is automatically determined but in some cases is ambiguous. Ambiguous reactions are highlighted to the user for manual curation. Curation consists of specifying manually the number of transported free protons or ions, e.g., ``reaction.transported_h = {'e': -1, 'c': 1}`` to represent the transport of one proton from the extracellular to cytosolic compartment.

As an example, the reaction of mitochondrial Complex II

::

    Ubiquinone-8_c + succinate_m <=> fumarate_m + Ubiquinol-8_c

would need the user to specify:

::

    tmodel.reactions.ComplexII.transported_h = {'m': -2.0, 'c': 2.0}

as two protons are moved from the mitochondria to the cytosol and are subsequently taken up by the protonation of Ubiquinone-8 into Ubiquinol-8.

Step 5: pH-dependent charge and proton balancing
************************************************

Non-transport reactions
-----------------------

The function ``reaction_balance()`` can be used to automatically balance the protons in a reaction based on the compartment conditions with the option to also balance magnesium ions if desired.

In the example of ATP hydrolysis, 0.7 protons will be added to have an equal number of protons and charge on both sides of the reaction (protons are positively charged and therefore charge balance is also maintained).

Transport reactions
-------------------

To balance transport reactions, ``Thermo-Flux`` first identifies the most abundant species (using ``metabolite.major_microspecies`` automatically), then considers it as being transported. The balancing then occurs by comparing what is in the inner compartment, what is being transported and what will be in the outer compartment.

Magnesium ions
--------------
Analogously to protons, Mg2+ ions can also be balanced, and this option is available to the user by setting ``balance_mg=True``.

Step 6: Calculation of Gibbs energy of reactions
************************************************

To calculate the standard reaction energy of all reactions in the model, the function ``model.update_thermo_info()`` can be used. Once it has been run, the standard reaction energy and the standard transformed reaction energy (calculated using standard transformed formation energies) can be retrieved for each reaction with ``reaction.drG0`` and ``reaction.drG0prime``, respectively.

Note : reactions occurring within the lipid bilayer (such as those catalysed by membrane-embedded enzymes like the respiratory chain complexes) represent a limitation of the current framework. This limit manifests in two ways: ΔfG° estimates for membrane-phase species are unreliable, as group contribution methods are trained on aqueous-phase data and perform poorly for compounds with long hydrophobic tails; and the concentration-dependent term of the Gibbs energy is ill-defined for species lacking a true aqueous concentration. To mitigate over-constraining reaction directionality with inaccurate thermodynamic data, we recommend assigning wide concentration bounds to such species, a practice suggested by Elad Noor to the eQuilibrator community (https://groups.google.com/g/equilibrator-users/c/ARvwhQSo5rU/m/XOwFTdbIAQAJ).

Step 7: Establishment of the thermodynamic-stoichiometric solution space
************************************************************************

Metabolite concentration bounds
-------------------------------

In practice metabolite concentration bounds are defined by setting the ``lower_bound`` and ``upper_bound`` attributes and a user defined unit e.g.:

::

    metabolite.lower_bound = Q_(10, 'µM')

The concentration values will then be automatically converted to mol/L before applying thermodynamic constraints.

The function ``model.add_TFBA_variables()`` sets up a thermodynamic FBA optimisation problem using the Gurobi optimiser that can be optimised using ``model.m.optimize()``. Implementation of the constraints in the linear program is detailed in the methods (see: implementing conditional constraints in a linear program).

Units of fluxes and reaction energy vs. biomass-related values
--------------------------------------------------------------

When specifying the formation energy of the biomass metabolite, values must be given in J·gDW\ :sup:`-1` — unlike all other reactions in the model, which use kJ·mol\ :sup:`-1`. This discrepancy comes from how fluxes are defined: the biomass reaction is typically built from cellular composition measurements expressed in grams per gram of dry cell weight (g·gDW\ :sup:`-1`). Since 'Thermo-Flux' handles all reactions the same when determining the gibbs energy dissipation rate (multiplying :math:`v` by :math:`\Delta_r G`), the biomass formation energy has to be passed in as if it were in kJ·mol\ :sup:`-1`, even though it is physically in J·gDW\ :sup:`-1`.

For a normal reaction, the units work out as:

.. math::

   g_{diss}^{rxn} = \Delta_r G \times v
   = (\text{kJ·mol}^{-1}) \times (\text{mmol·gDW}^{-1}\text{·h}^{-1})
   = (10^3 \, \text{J·mol}^{-1}) \times (10^{-3} \, \text{mol·gDW}^{-1}\text{·h}^{-1})
   = \text{J·gDW}^{-1}\text{·h}^{-1}

For the biomass reaction:

.. math::

   g_{diss}^{bio} = \Delta_f G_{bio} \times \mu
   = (\text{J·gDW}^{-1}) \times (\text{gDW·gDW}^{-1}\text{·h}^{-1})
   = \text{J·gDW}^{-1}\text{·h}^{-1}
 

.. rubric:: Box 2: additional considerations for the formulation of the thermodynamic/stoichiometric solution space
-------------------------------------------------------------------------------------------------------------------

**Compartmented metabolite concentrations and whole cell concentrations**

The function ``model.total_cell_conc()`` will add whole cell metabolite concentration constraints on the compartmented metabolic concentrations, based on whole cell metabolite data and the relative compartment volumes which must be provided as an input to the function, respectively as a pandas DataFrame and a Python dictionary.

**Relaxing the second law constraint**

The user can relax the second law constraint for any specific reaction by setting ``reaction.ignore_snd = True``.

**Ignoring metabolite concentrations**

The concentration of pseudo metabolites that are often added to stoichiometric models as a convenient way to add constraints should also be ignored by setting

::

    metabolite.ignore_conc = True

**Variability analysis**

In ``Thermo-Flux`` variability analysis is implemented with the function ``solver.gurobi.variability_analysis()``, which sets the optimization problem for any variables provided as an argument to the function. Specifically, the function uses the Gurobi multi-scenario optimization feature, with two scenarios for each variable (one minimizes the variable and the other maximizes it). The results are retrieved with ``solver.gurobi.variability_results()`` and both functions can still be used if the optimization is solved using a high-performance computing (HPC) cluster.

Step 8: Regression: fitting models to experimental data
*******************************************************

The function ``model.regression()`` can be used to add regression constraints and objectives to the previously constructed thermodynamic FBA problem. Data can be provided for any flux or metabolite concentration, in the pandas DataFrame format.

The Dataframe for the fluxes and the metabolite data needs to be in the following format :  

+--------------+-------------+-------+-----+
| condition    |   rxn/met   | mean  | sd  |
+==============+=============+=======+=====+
| condition 1  | rxn/met  A  |  XXX  | YYY |
+              +-------------+-------+-----+
|              | rxn/met  B  |  ZZZ  | WWW |
+--------------+-------------+-------+-----+

Note the pandas.MultiIndex (condition,rxn/met).

**Objective function of a regression optimization**

The objective function of the optimization problem that is set up using the function ``gurobi.solver.regression()`` and minimized is:

.. math::

   \ v_{j} \in RXN,\ c_{i} \in MET \quad | \quad
   f:\left( c,v \right) \mapsto
   \sum_{j}\left( \frac{v_{j}^{obs,mean} - v_{j}}{v_{j}^{obs,std}} \right)^{2}
   + \sum_{i}\left( \frac{V_{c} \cdot \exp\left( \ln\left( c_{i}^{obs,mean} \right) \right)
   + V_{m} \cdot \exp\left( \ln\left( c_{i}^{obs,mean} \right) \right)
   - c_{i}}{c_{i}^{obs,std}} \right)^{2}

Here, :math:`v_{j}^{obs,mean}` and :math:`v_{j}^{obs,std}` denote the experimentally measured mean and standard deviation of flux :math:`v_{j}`, respectively, while :math:`\widehat{v_{j}}` represents the model-predicted flux. Similarly, :math:`c_{i}^{obs,mean}` and :math:`c_{i}^{obs,std}` are the measured mean and standard deviation of metabolite :math:`c_{i}`, and :math:`\widehat{c_{i}}` is the model-predicted metabolite concentration. :math:`V_{c}` and :math:`V_{m}` correspond to the relative volumes of the cytosolic and mitochondrial compartments, respectively. The sets :math:`RXN` and :math:`MET` represent all reactions and metabolites considered in the model.


.. rubric:: Box 3: additional considerations for regressions
------------------------------------------------------------

**Model starting points**

The function ``thermo_flux.solver.gurobi.model_start`` has been built to allow MIP start from only non-computed values and reduce the probability of multiplying numerical issues between them. This function can even enable the start from a set of specific variables which are known to not cause numerical issues (for example, starting from only metabolite concentrations). The user can provide starting points in either ``.sol`` or ``.mst`` format:

::

    thermo_flux.solver.gurobi.model_start(tmodel, 'filename.sol',
                                         ignore_vars=['all'],
                                         fix_vars=['qm','ln_conc'],
                                         fix='start')

**Multiple starts with different random seeds**

As Gurobi is using a branch-and-cut approach to solve the MILP problem, it can sometimes face performance variability issues. An effective way of tackling this problem is to run several optimizations with different values of the seed parameter ``GRBmodel.params.Seed``.