What are good data on the Human Metabolic Model and where can I get them?

What are good data on the Human Metabolic Model and where can I get them?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Trying to get a good SBML representation of the Human Metabolic Model for use in Flux Balance Analysis and drug targetting (i.e. gene knockout) simulations.

What are good sources for these data?

Thanks very much!

As far as I know, there is no one model officially known as "the Human Metabolic Model", but the Recon 2 model, described as a "consensus" model and the most comprehensive to date in Thiele et al. 2013, A community-driven global reconstruction of human metabolism ( is available at

Tissue-specific models are also available. See for example Wang et al.: Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC Systems Biology 2012, 6:153:

A C. elegans model to study human metabolic regulation

Lipid metabolic disorder is a critical risk factor for metabolic syndrome, triggering debilitating diseases like obesity and diabetes. Both obesity and diabetes are the epicenter of important medical issues, representing a major international public health threat. Accumulation of fat in adipose tissue, muscles and liver and/or the defects in their ability to metabolize fatty acids, results in insulin resistance. This triggers an early pathogenesis of type 2 diabetes (T2D). In mammals, lipid metabolism involves several organs, including the brain, adipose tissue, muscles, liver, and gut. These organs are part of complex homeostatic system and communicate through hormones, neurons and metabolites. Our study dissects the importance of mammalian Krüppel-like factors in over all energy homeostasis. Factors controlling energy metabolism are conserved between mammals and Caenorhabditis elegans providing a new and powerful strategy to delineate the molecular pathways that lead to metabolic disorder. The C. elegans intestine is our model system where genetics, molecular biology, and cell biology are used to identify and understand genes required in fat metabolism. Thus far, we have found an important role of C. elegans KLF in FA biosynthesis, mitochondrial proliferation, lipid secretion, and β-oxidation. The mechanism by which KLF controls these events in lipid metabolism is unknown. We have recently observed that C. elegans KLF-3 selectively acts on insulin components to regulate insulin pathway activity. There are many factors that control energy homeostasis and defects in this control system are implicated in the pathogenesis of human obesity and diabetes. In this review we are discussing a role of KLF in human metabolic regulation.


  • 1 State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
  • 2 Shanghai Collaborative Innovation Center for Biomanufacturing Technology, Shanghai, China

Genome-scale metabolic models (GEMs) have become a popular tool for systems biology, and they have been used in many fields such as industrial biotechnology and systems medicine. Since more and more studies are being conducted using GEMs, they have recently received considerable attention. In this review, we introduce the basic concept of GEMs and provide an overview of their applications in biotechnology, systems medicine, and some other fields. In addition, we describe the general principle of the applications and analyses built on GEMs. The purpose of this review is to introduce the application of GEMs in biological analysis and to promote its wider use by biologists.

Results and discussion

Description of the human liver metabolic network

We used a human liver metabolic model represented by a set of metabolic reactions[8], which contains 1360 metabolites and 1826 reactions. The human metabolic model was generated based on MBA algorithm[8], which is a model-building algorithm used to derive tissue-specific metabolic models from a generic model[6] by integrating a variety of tissue-specific molecular data sources, including literature-based knowledge, transcriptomic, proteomic, metabolomic and phenotypic data. In the human liver metabolic model, each metabolite is represented in the form of A[x], where A is the name of a metabolite and x in the bracket [ ] is the abbreviation of the cell compartment where the metabolite A appears (see Additional file1). Metabolite A may appear in different cell compartments x,…,y, so there are A[x],…,A[y] for the same metabolite but different cell compartments, which are counted as different metabolites.

Based on the principle that a set of metabolic reactions can be translated into a network representation[25], we reformulated the liver model in the following way: denoting each metabolite by a node labeled with A[x], and connecting two nodes by A[x] → B[y] if there is a chemical reaction where A[x] is a substrate and B[y] is a product. The derived HLMN contains 1360 nodes and 6501 links (see Additional file1). In order to illustrate the process of reformulating the HLMN, an example with three metabolic reactions is given in Figure1.

An example to show how the HLMN is reformatted from the liver model. A) Three metabolic reactions in the liver model are shown on the left, where hgentis, o2, 4mlacac, h, hco3 and h2co3 are metabolites, [c] and [m] are the abbreviations of cell compartments "cytoplasm" and "mitochondrion" denoting where the corresponding metabolites appear. The first metabolic reaction represents that homogentisate in cytoplasm (hgentis[c]) is oxidated into 4-Maleylacetoacetate (4mlacac[c]) and hydrogen ion (h[c]) the second means that the hydrogen ion in cytoplasm (h[c]) is transported into mitochondrion (h[m]) the third represents that the hydrogen ion in mitochondrion (h[m]) reacts with bicarbonate (hco3[m]) to form carbonic acid (h2co3[m]). The network reformatted from these three metabolic reactions is shown on the right, where each node denotes a metabolite with the information of cell compartment where it appears, two nodes have a link if there is a chemical reaction such that one metabolite is substrate and another one is a product. B) The abbreviations of cell compartment and their corresponding full names.

For convenience and without ambiguity, we will not distinguish nodes from metabolites hereinafter when refer to the properties of the HLMN. For example, when we say a driver node in the HLMN, we may mean a driver metabolite in the HLMN.

Classification and analysis of driver metabolites

Driver metabolites in the HLMN are metabolites where inputs are injected. If the driver metabolites in a minimum driver metabolites set (MDMS, for short) are all controlled by different inputs, the HLMN can be steered from any given state to a desired state in finite time. "Minimum" means that if signals are only input on a proper subset of S, then the HLMN cannot be guided to some final desired states in finite time. MDMSs are determined by detecting maximum matchings in the HLMN (see Methods).

A maximum matching is a maximum set of links that do not share start or end nodes[16]. There are different maximum matchings in a network[26], which could result in different MDMSs in the HLMN. Counting the number of all maximum matchings in an arbitrary network has been proven to belong to the ♯P-complete (sharp P-complete) class of problems[27]. There is no currently known polynomial-time algorithm for solving a ♯P-complete problem. The number of maximum matchings can grow exponentially with networks size, hence a network with only hundreds of nodes often leads to millions of maximum matchings. Enumeration of maximum matchings is computationally prohibitive for large networks[28]. Thus, the enumeration of maximum matchings in the HLMN (containing 1360 nodes) is hard to achieve.

Classification of driver metabolites

We randomly identified 5000 different maximum matchings (see Additional file2) and their corresponding MDMSs (see Methods). In the HLMN, a node may appear in different MDMSs. For each node v, we counted the number of MDMSs that the node v appears in and then normalized the number (that is, the number is divided by 5000). The normalized values characterize the frequency f d of each node appearing in the 5000 MDMSs. According to the frequency of each node, we classified the metabolites into three groups: critical driver metabolites with f d = 1, high-frequency driver metabolites with 0.6 ≤ f d < 1, low-frequency driver metabolites with 0 ≤ f d < 0.6.

A node with f d = 1 means that the node appears in all the MDMSs. Such nodes may possess some specific properties or functions, which could provide valuable information on the HLMN. So we classified the nodes with f d = 1 being critical driver nodes. The reason why we chose the threshold 0.6 to separate high-frequency driver metabolites from low-frequency driver metabolites, is that we would like to make the difference between the roles of metabolites in these two groups as big as possible (for detailed analysis, see the subsection "The roles of the high-frequency driver metabolites").

In order to test whether the classification of metabolites based on 5000 MDMSs is reliable, we computed the frequencies of metabolites in 51 different families of MDMSs with sizes of 5000,5100,5200,…,10000. The frequency of each metabolite computed based on different families of MDMSs stays in a same region, where the regions are f d = 1, 0.6 ≤ f d < 1 and 0 ≤ f d < 0.6 (see Additional file3). In other words, the classifications of each metabolite are the same based on these different families. Hence the classification of metabolites based on 5000 MDMSs is reliable. Furthermore, we have employed an unbiased random sampling method[28] to validate the results based on the 5000 MDMSs (for detailed analysis, see the subsection "Validation for the classification and the properties of driver metabolites").

Topological analysis of driver metabolites in the HLMN

We computed different centralities of each metabolite i in the HLMN, which include out-degree OutD, in-degree InD, degree D, betweenness BC, closeness CC, in-closeness CCI and out-closeness CCO (for definitions, see Methods). The frequency f d was found to decrease quickly with the in-degree (see Additional file4) while this pattern does not hold for other centralities, which is consistent with the result in[28]. For each centrality, all metabolites in the HLMN are divided into three sets of similar sizes, based on their centrality scores (low, medium, and high), In this way, seven families of sets were obtained: F D = < D l , D m , D h >, F OutD = < OutD l , OutD m , OutD h >, F InD = < InD l , InD m , InD h >, F BC = < BC l , BC m , BC h >, F CC = < CC l , CC m , CC h >, F CCI = < CCI l , CCI m , CCI h >, F CCO = < CCO l , CCO m , CCO h >, where each family contains three sets, and the subscripts l,m,h respectively represent low, medium and high.

We used set A to denote the union of metabolites from the 5000 MDMSs, and set B to denote the union set of the critical and high-frequency driver metabolites. For each of the families F D , F OutD , F InD , F BC , F CC , F CCI , F CCO , the fractions of metabolites from set A that belong to the three sets in the family were computed (see Figure2(A)), and the fractions of metabolites from set B that belong to the three sets in the family were also computed (see Figure2(B)). For example, for the family F D , we computed |AD l|/|A|, |AD m|/|A|, |AD h|/|A|, and |BD l|/|B|, |BD m|/|B|, |BD h|/|B|, where | ∗ | denotes the size of set ∗ .

Topological analysis of the driver metabolites. A) The metabolites in set A (the union of metabolites from the 5000 MDMSs) B) The metabolites in set B (the set of critical and the high-frequency driver metabolites). For the labels in the horizontal axis, "D", "InD", "OutD", "BC", "CC", "CCI", "CCO" respectively represent degree, in-degree, out-degree, betweenness, closeness, in-closeness, out-closeness. The hight of blue, green and brown bars respectively represent the fractions of the driver metabolites with high, medium and low centrality scores. The difference between the fractions for each centrality in B) is greater than that in A). Except for the out-closeness in B), the brown bars are all lower than the blue and green ones for the same centrality, which indicates that the driver metabolites tend to avoid nodes with the high degree (resp., out-degree, in-degrees, betweenness, closeness and in-closeness), while the critical and high-frequency driver metabolites tend to have high out-closeness.

Comparing the results shown in Figure2(A) and Figure2(B), we find that for each centrality, the difference between the fractions computed in set B is greater than that in set A, which means that the topological characteristic differences are bigger in the set of critical and high-frequency driver metabolites. In-degree and in-closeness measure the susceptibility of a metabolite to be influenced by other metabolites. Higher in-degree and higher in-closeness imply that the metabolite could be more easily influenced by others. Out-closeness measures the ability of a metabolite to influence other metabolites. Higher out-closeness implies that the metabolite could influence others more easily. The metabolites in set B tend to have low in-degree, low in-closeness, and high out-closeness. Therefore, the driver metabolites, especially the critical and high-frequency driver metabolites, tend to have strong ability to influence the states of other metabolites and weak susceptibility to be influenced by the states of other metabolites. Moreover, injecting control inputs (drugs, signals from environment or inside the organism, etc.) to critical and high-frequency driver metabolites could regulate the whole state of the HLMN, which indicates that the critical and high-frequency driver metabolites may be potential drug-targets.

For each centrality, we used chi-square test (see Methods) to establish whether or not the fraction distribution in set A and set B differs from that in the whole network (the reason why we chose chi-square test is given in Methods). The chi-square statistic values for each centrality in set A and set B are shown in Table1. While the table value for chi-square statistic is 5.99, based on the freedom being 2 and the level of significance being 0.05. Except for the CCO, other chi-square statistic values are greater than the table value in set A, and the chi-square statistic values for all the centralities are greater than the table value in set B. It means that except for the CCO in set A, for other centralities in set A and all the centralities in set B, the fraction distributions differ from that in the whole network. Thus, the result of the topological features of driver metabolites is of statistical significance.

Properties of the critical driver metabolites

In the HLMN, we detected 36 critical driver metabolites (see Table2). Their in-degrees are all zero, which is consistent with the result in[29] and means that the 36 critical driver metabolites are all the start metabolites of paths (paths in the HLMN are sequential reactions between metabolites). By Lin’s structural controllability theorem[18, 30], if a system is controllable, there is no inaccessible nodes (i.e., nodes that cannot be accessed or "influenced" by the external inputs). Since these start metabolites cannot be influenced by the external inputs via other metabolites, they need to be directly controlled by external inputs.

The 36 critical driver metabolites are all found to be extracellular (each of the 36 critical driver metabolites is associated with the abbreviation of compartmental information "[e]", which means extracellular). By checking the biochemistry activities of the 36 critical driver metabolites, we find that they all participate in the transport reactions from the extracellular into the cell, which suggests that the intakes of these extracellular metabolites play important roles in the biological activities of the liver cells. For example, appropriately increasing the intake of the critical driver metabolite gamma-tocopherol could help lower the cholesterol level, and increasing the intake of the critical driver metabolite alpha-tocopherol could decrease lipid peroxidation and hepatic stellate cells activation, which could protect liver cells and prevent liver fibrosis[51].

We investigated the biological essentiality of the 36 critical driver metabolites. The essentiality of a metabolite measures how important the metabolite is in the whole metabolic systems or some metabolic processes. Although a metabolite could exist in different compartments, the metabolite is recognized to be essential as long as it is found to be essential in any one of the compartments[33]. Based on the different essentiality of metabolites, the metabolites were classified into three groups:

Universal Metabolites (UM): Some inorganic or cofactor metabolites, such as CMP and ATP, which have been found to exist universally in more than 90% organisms. The universal metabolites are usually treated as essential metabolites because most living matter cannot survive without them[33, 52].

Functional Essential Metabolites (FEM): The metabolites which are not UM and have essential roles in some biological functions. For example, folate is essential to numerous bodily functions, and required by the human body to synthesize, repair and methylate DNA as well as to act as a cofactor in certain biological reactions[31] Hyaluronan is essential for embryogenesis[37] Human body requires pantothenic acid to synthesize coenzyme-A (CoA), as well as to synthesize and metabolize proteins, carbohydrates, and fats[44].

Essentiality Undiscovered Metabolites (EUM): The metabolites whose essentiality have not been discovered. These metabolites may be the potential essential metabolites, which demands further experimental verification.

Among the 36 critical driver metabolites, we find that 10 metabolites are UM 17 metabolites are FEM 9 metabolites are EUM. Therefore, among the 36 critical driver metabolites, 27 metabolites are essential, which suggests that the critical driver metabolites play important roles in human liver metabolism.

The roles of the high-frequency driver metabolites

We used simulated annealing (SA) algorithm[53] to detect modules in the HLMN. The reason why we chose the SA algorithm is that it is a commonly used technique to detect modules, and a benchmark to validate the effectiveness of the newly developed module-detecting algorithms[54, 55]. Compared with other module-detecting algorithms, such as the markov clustering method, the SA algorithm performs better in detecting modules in large scale metabolic networks and the detected modules are more biologically meaningful[56], since the SA algorithm is less sensitive to noise such as experimental error or incomplete data.

According to the two parameters within-degree and the partition coefficient of each node in the modularized HLMN, the nodes were divided into seven classes: R1, R2, R3, R4, R5, R6, R7 (for details, see Methods).

Since the SA algorithm is stochastic, different results of modularization could be obtained in different runs. We have run the SA algorithm for 100 times. Based on the result of each run, the nodes of the HLMN were classified into the seven classes R1, R2, R3, R4, R5, R6, R7. Among the 100 classification results, the probability of each node being classified into each class is counted. As shown in Figure3, most nodes are always classified into a same class, which indicates that the role classification for the nodes in the HLMN based on the SA algorithm is reliable.

The probability of each metabolite for each role among 100 partitions. The color is mainly dark red (the corresponding probability equals to 1) or dark blue (the corresponding probability equals to 0). It means that most metabolites in the HLMN are always classified into a same role class among 100 partitions, which indicates that the role classification for metabolites is reliable.

It has been found that the non-hubs connecting different modules are responsible for inter-module fluxes which influence the state of metabolic networks[57], while the nodes with high frequency f d have strong ability to influence the states of other metabolites, which prompts us to think whether the nodes with high frequency f d tend to be non-hubs connecting different modules. In the HLMN, more than 92% nodes are of roles R1 and R2, which are both non-hubs and R1 nodes have no connection with other modules while R2 nodes have connections with different modules. As shown in Figure4(A), with the frequency threshold f dt increasing, the fraction of R1 nodes among the set of nodes with f dtf d < 1 decreases while the fraction of R2 nodes increases. The fractions of nodes with different roles fluctuate when f dt ≥ 0.7 due to the small size of the set of nodes with f dtf d < 1. When f dt < 0.7, the difference between the fractions of R1 nodes and R2 nodes is the biggest at around f dt = 0.6. Therefore, we chose the threshold f dt = 0.6 to differentiate the high-frequency driver metabolites from the low-frequency driver metabolites. The fact that the roles of high-frequency driver metabolites tend to be R2, indicates that the high-frequency driver metabolites tend to be non-hubs connecting different modules. Different modules could be mapped to different pathways[56], which means that the high-frequency driver metabolites tend to participate in different metabolic pathways. For example, the high-frequency driver metabolite cyclic adenosine monophosphate plays regulatory roles in glucose, protein and fatty metabolism pathways at the same time[58]. It suggests that the high-frequency driver metabolites play important roles in human liver metabolic network.

The fractions of the metabolites with different roles based on different frequency threshold. A) and B) respectively show the results which are based on the modules detected by the simulated annealing algorithm and the fast greedy algorithm. Each point connected by dotted lines is the fraction of the metabolites with a specific role among the set of driver metabolites whose frequency f dtf d < 1, while each solid line means the fraction of metabolites with each role among the HLMN. In the HLMN, most metabolites are of roles R1 and R2. With the frequency threshold f dt increasing, the fraction of R1 metabolites among the set of metabolites with f dtf d < 1 decreases while the fraction of R2 metabolites increases. The fractions of metabolites with different roles fluctuate when f dt ≥ 0.7 due to the small size of the set of metabolites with f dtf d < 1. When f dt < 0.7, the difference between the fractions of R1 metabolites and R2 metabolites is the biggest at about f dt = 0.6. Thus, we choose the threshold f dt = 0.6 to differentiate the high-frequency driver metabolites from the low-frequency driver metabolites, and the high-frequency driver metabolites tend to be of role R2.

To validate that the result of the high-frequency driver metabolites does not depend on the module detecting method SA algorithm, we used another module detecting method fast greedy[59] to detect modules in the HLMN, and classify the nodes into 7 classes: R1, R2, R3, R4, R5, R6, R7. With the frequency threshold f dt increasing, the fractions of R1 nodes and R2 nodes among the set of nodes with f dtf d < 1 show the similar pattern as that based on the SA algorithm, which is shown in Figure4(B). We arrived at the same conclusion that the high-frequency driver metabolites tend to be the non-hub connecting different modules.

Validation for the classification and the properties of driver metabolites

The results of the properties on the driver metabolites, critical driver metabolites and high-frequency driver metabolites are all based on the 5000 MDMSs. To validate that these results do not depend on the 5000 MDMSs, we applied an unbiased sampling method proposed by Jia et al.[28] to compute the frequency f d that each node acts as a driver node (see Additional file5).

Comparing with the results which based on the 5000 MDMSs, the set of critical driver nodes determined by this method is the same, while the set of high-frequency driver nodes determined by this method is not exactly the same, which may be caused by the randomness of sampling. However, the following result holds for both two methods: the high-frequency driver nodes tend to be the non-hubs connecting different modules. (see Additional file4). Moreover, the topological analysis has been applied to the set A (the set of the metabolites with f d > 0) and set B (the set of the metabolites with 1 ≤ f d > 0.6) detected by the method in[28]. The conclusion still holds that the driver metabolites, especially the critical and high-frequency driver metabolites, tend to have strong ability to influence the states of other metabolites and weak susceptibility to be influenced by the states of other metabolites (see Additional file4).

In conclusion, although the classification and analysis of driver metabolites are based on the 5000 MDMSs, the results on the properties of different driver metabolites do not rely on the 5000 MDMSs.

Alternative classification of driver nodes and the control mode of the HLMN

A recently published paper[29] has given an alternative classification of nodes based on their participation in control. A node is critical, itermittent or redundant if it acts as a driver node in all, some or none of the minimum sets of driver nodes. By measuring the fraction n r of the redundant nodes for a network with varying average degree, two distinct control modes were discovered in[29]. Based on the difference value of the fraction n r and n r T for its transpose network (whose wiring diagram is identical to the original network but the direction of each link is reversed), the control mode of a network can be identified: if Δ n r = n r - n r T > 0 the network is centralized and if Δn r < 0 it is distributed.

We have applied the tools in[29] to the HLMN, and find that the control mode of the HLMN is distributed. While in[29], the control modes of the three involved metabolic networks cannot be identified, which is caused by the incompleteness of the metabolic networks, whose average degrees are in the ‘pre-bifurcation’ region (where no distinct control modes exist). With more information on these metabolic networks being uncovered, the average degrees increase and result in identifiable control modes. For example, the E. coli metabolic network[11] studied in[29] was assembled in 2000, and its control mode cannot be identified however, when we applied the tools to the E. coli metabolic network iJO1366[60], which was assemble in 2011, we can find that the control mode of network iJO1366 is centralized. It is not easy to figure out the reason why the control mode of the human liver metabolic network is distributed and the E. coli metabolic network iJO1366 is centralized, due to the incompleteness of these two networks, whose control mode may alter with the increase of the network scale.

The role of reactions in the robustness of the controllability in the HLMN

Reaction failures could happen in metabolic systems, and different reaction failures have different impacts on the robustness of the metabolic function. Robustness characters the ability of metabolic systems to behavior normally under reaction failures. Some reaction failures would break the cellular homeostasis, resulting in an anti-proliferative effect[61] or apoptosis[62], while some almostly have no influence on the cellular functions[63]. In what follows, we focus on the impacts of different reaction failures on the robustness (whether the network is controllable with the same MDMS under reaction failures) of the controllability in the HLMN.

Based on different impacts on the robustness of controllability caused by links absence, the links have been classified into three categories[16]: "critical" if its absence causes the minimum number of driver nodes increased so as to maintain full control "redundant" if it can be removed without affecting the current set of driver nodes "ordinary" if it is neither critical nor redundant. From the fractions of critical, ordinary and redundant links in the HLMN, which are shown in Figure5, we can find that few links are critical and most links are ordinary, whose absence may change the current set of driver nodes, but the network could still be controlled with the same number of driver nodes. In the human liver metabolism, there are only a few reactions represented by critical links, which provides an explanation to why human liver metabolism could function well under many different circumstances.

The fractions of the links of different types. The fractions of critical, ordinary and redundant links among the whole link set in the HLMN are shown in A). Few links are critical and most links are ordinary, which implies that there are only a few reactions whose failure could cause the increase of the minimum number of driver nodes. The fractions of the core high (CH), core moderate (CM) and non-core (NC) reaction links among the set of critical, ordinary, redundant links or the set of all the links (Ensemble) are shown in B). The fraction of the core high reaction links is the smallest and the fraction of non-core reaction links is the biggest in the set of critical links, which means that the reactions represented by critical links tend to be the non-core reactions in the human liver metabolic network.

In the human liver metabolic model[8], the reactions have been classified into three classes: core high reactions for these reactions included in human-curated tissue-specific pathways, which are essential in the human liver metabolism core moderate reactions for these reactions testified by molecule data non-core reactions for the other, most of which are not associated with genes in the model and 50% are transport reactions. We computed the fractions of links representing the core high, core moderate and non-core reactions among the set of critical, ordinary and redundant links and the set of all links in the HLMN, which are shown in Figure5. Comparing with the fractions among the sets of the ordinary, redundant links and the whole link set in the HLMN, the fraction of links representing core high reactions are the lowest and the fraction of non-core reaction links are the highest in the set of critical links, which indicates that the reactions represented by critical links tend to be the non-core reactions.

Transport reactions transfer metabolites across compartments, many of them transfer metabolites from the environment into the cell. The fraction of transport reaction links among the set of critical links is 47.5%, while that among the whole link set in the HLMN is 20.8%. Moreover, we computed the fraction of links representing transport reactions which transfer metabolites from the environment into the cell among the set of critical links and that among the whole link set in the HLMN, which are 33.6% and 12.2%, respectively. These comparisons indicate that transport reactions and the environment are important in influencing the robustness of controllability of the HLMN. The metabolites carried in by transport reactions could activate a series metabolic reactions in human liver cells, which could change the state of the liver metabolism and influence the controllability of the HLMN.

Validation for the result that the reactions represented by critical links tend to be the non-core reactions

We used chi-square test (see Methods) to test whether or not the differences between the fractions of the core high, core moderate and non-core reaction links among the whole network and those among the set of critical links are out of chance. The observed data are the number of core high, core moderate and non-core reaction links among each of the sets of the critical links, which are 132, 59 and 226 respectively. The expected percentages are the fractions of core-high, core moderate and non-core reaction links among the whole network, which are 60.14% and 14.55% and 25.3% respectively. The chi-square statistic value was computed based on the chi-square formula (see Methods), which is 193.9. With the freedom degree being 2 and the significance level being 0.05, the table value for chi-square statistic is 5.99. The chi-square statistic value is bigger than the table value, so there is a significant difference between the fractions among the set of critical links and those among the whole network, which means that the reactions represented by critical links tend to be the non-core reactions.


One element of classical systems analysis treats a system as a black or grey box, the inner structure and behaviour of which can be analysed and modelled by varying an internal or external condition, probing it from outside and studying the effect of the variation on the external observables. The result is an understanding of the inner make-up and workings of the system. The equivalent of this in biology is to observe what a cell or system excretes under controlled conditions — the 'metabolic footprint' or exometabolome — as this is readily and accurately measurable. Here, we review the principles, experimental approaches and scientific outcomes that have been obtained with this useful and convenient strategy.

Models of obesity with hyperlipidemia

Obese mouse models such as A y /a, Lep ob/ob and LepR db/db have increased total plasma cholesterol levels however, this is the result of increased HDL rather than increased VLDL and LDL levels (Nishina et al., 1994a Silver et al., 1999 Silver et al., 2000 Gruen et al., 2005 Gruen et al., 2006 Coenen and Hasty, 2007). The increase in HDL probably accounts for the resistance of these obese mice to atherosclerotic lesion formation (Nishina et al., 1994b). By contrast, human MetS is associated with obesity, increased levels of TG-rich VLDL and reduced levels of HDL. To generate obese mouse models with hyperlipidemia, our laboratory and others have crossed the Lep ob/ob , LepR db/db and A y /a mice onto LDLR −/− and apoE −/− backgrounds. The models described below are all on the C57BL/6J strain.

Lep ob/ob LDLR −/− and LepR db/db LDLR −/− mice

To develop a model that better reflects MetS-related hyperlipidemia, we crossed Lep ob/ob mice onto an LDLR −/− background (Lep ob/ob LDLR −/− ) (Hasty et al., 2001). These mice are obese and develop dramatic hypercholesterolemia characterized by elevated VLDL and LDL (Fig. 1D please note that the figure represents a LepR db/db LDLR −/− mouse, however, the Lep ob/ob LDLR −/− mice have a similar profile), as well as hypertriglyceridemia. Furthermore, these mice spontaneously develop atherosclerotic lesions, and are thus very useful for studying the role of obesity in cardiovascular disease. Our group and others have used this model to study therapeutic treatments for MetS, as well as to study mechanisms of obesity-related hyperlipidemia (Mertens et al., 2003 Verreth et al., 2004 Hasty et al., 2006 Verreth et al., 2006 Coenen et al., 2007a). LepR db/db LDLR −/− mice have a phenotype identical to that of the Lep ob/ob LDLR −/− mice with the exception that they have very high circulating leptin levels (Gruen et al., 2006). We have also crossed the Lep ob/ob and LepR db/db mice onto an apoE −/− background, and these mice are also obese, insulin resistant and hyperlipidemic (Gruen et al., 2006 Atkinson et al., 2008). The Lep ob/ob apoE −/− and LepR db/db apoE −/− mice have extreme elevations in VLDL and almost no HDL (Fig. 1D). Although these mice provide a better model for MetS than do Lep ob/ob and LepR db/db mice, because of their hyperlipidemia and susceptibility to atherosclerosis, the hyperlipidemia is quite extreme and the caveat remains that they are completely deficient in leptin signaling.

LDLR 3KO and apoE 3KO mice

Recently, Lloyd et al. crossed Lep ob/ob LDLR −/− and Lep ob/ob apoE −/− mice onto an apoB100 only background (named LDLR 3KO and apoE 3KO, respectively) (Lloyd et al., 2008). These mice are obese (>40 grams), hyperinsulinemic (>30 ng/ml), hyperlipidemic (total cholesterol >750 mg/dl and TGs >250 mg/dl) and hypertensive (systolic pressure >150 mmHg), as measured by the tail cuff method. Interestingly, the apoE 3KO mice are diabetic by 9–10 weeks of age, whereas the LDLR 3KO mice are not. This may be because of their apoE versus LDLR genotype, or the fact that the apoE 3KO mice were only 74.6% C57BL/6J, whereas the LDLR 3KO mice were 94.7% C57BL/6J in the original report. One unique aspect to these mice is that they only express apoB100 (and not apoB48) from their livers. In humans, apoB48 is expressed solely from the intestines and apoB100 is expressed solely from the liver, whereas, in rodents, both forms of apoB are expressed in the liver. Thus, expression of only apoB100 from the livers of these mice provides a lipoprotein metabolism setting which is similar to that seen in humans.

A y /aLDLR −/− and A y /aapoE −/− mice

To develop a model of MetS with delayed onset obesity and hyperlipidemia, we crossed the A y /a mice onto an LDLR −/− background (Coenen et al., 2007b Coenen and Hasty, 2007). In contrast to LDLR −/− mice, the A y /aLDLR −/− mice are obese, have an increased fat mass, and have slightly elevated plasma cholesterol and TG levels. Furthermore, when these mice are placed on a Western diet they become even more obese and hyperlipidemic, and also develop IR. The hyperlipidemia appears to be the result of both increased hepatic TG production and decreased VLDL clearance. The A y /aLDLR −/− mice do not develop overt hypertension however, on the Western-type diet they develop a fatty liver. Thus, these mice seem to have many different aspects of MetS and, to date, appear to be one of the best models for use.

The A y /a and apoE −/− mice have also been crossed to generate A y /aapoE −/− mice (Gao et al., 2007). Interestingly, the deficiency of apoE protects A y /a mice from obesity and hepatic steatosis, and improved their insulin sensitivity. Thus, the phenotype of A y /a mice differs in the presence of LDLR versus apoE deficiency.

Summary of obese hyperlipidemic mouse models of the MetS

By crossing models of hyperlipidemia onto obesity-prone backgrounds, it is possible to study several aspects of MetS simultaneously. The mouse models listed above are the most commonly used in this regard, and their use has shed light on the mechanisms by which obesity influences lipoprotein metabolism. In addition, these models have been useful in testing various agents for their therapeutic potential. One disappointment in the studies of these models is that the presence of IR does not appear to impact atherosclerotic lesion formation in most models unless there are concomitant changes in plasma lipid levels (Merat et al., 1999 Wu et al., 2006 Coenen and Hasty, 2007 and reviewed in Goldberg and Dansky, 2006).

Materials and methods

Cell culture

Primed hiPSC IMR90-4 (RRID: CVCL_C437) were purchased from WiCell and WTC-11 (RRID: CVCL_Y803) were obtained from The J. David Gladstone Institutes, designated throughout the text by hiPSC 1 and hiPSC 2 respectively. Primed hiPSC were maintained under feeder-free conditions with Matrigel (Corning Matrigel hESC-Qualified Matrix and Corning Matrigel Growth Factor Reduced (GFR) Basement Membrane Matrix, BD Biosciences) and fed daily with mTeSR1 medium (STEMCELL Technologies). Versene (Gibco Life Technologies) and Accutase (STEMCELL Technologies) were used to enzymatically dissociate hiPSCs into single cells for hiPSC 1 and hiPSC 2, respectively. At cell passage, mTeSR1 was also supplemented with 5 μM of ROCK inhibitor Y-27632 (Calbiochem). Complete medium exchange was performed every day. Cells were maintained under humidified atmosphere with 5% CO2, at 37°C.

hNSC 1 was derived from hiPSC 1 IMR90-4 using a dual SMAD inhibition protocol [40]. Briefly, hiPSC differentiation was induced by supplementing the culture media with 10 μM SB431542 and 1 μM LDN193189 (both from STEMCELL Technologies) for 10 days. hNSC 2 was originally derived from hiPSC line RIi001-A (RRID: CVCL_C888), as previously described [41] and designated throughout the text as hNSC 2. hNSC were expanded in DMEM/F12 (Invitrogen) with Glutamax, N2 and B27 supplements (Invitrogen), 20 μg/mL insulin and 20 ng/mL of bFGF (Peprotech) and of EGF (Sigma). Half of culture media volume was exchanged every other day [41]. hNSC were maintained under humidified atmosphere with 5% CO2 and 3% O2, at 37°C.

Stirred-tank bioreactor cultures

hiPSC and hNSC were inoculated in 200 mL of media as single cell suspensions of 0.25x10 6 cell/mL and 0.4x10 6 cells/mL, respectively, into software-controlled stirred-tank DASGIP Bioblock bioreactor system (Eppendorf). hiPSC 1, hiPSC 2, hNSC 1 and hNSC 2 were used for these experiments at cell passage number P40, P36, P12 and P34, respectively (four bioreactor runs in total). Bioreactor temperature was set to 37°C, dissolved oxygen to 15%, pH to 7.4, aeration rate to 0.1 vvm and the agitation rate to a range from 70 to 100 rpm [12,42]. Perfusion was initiated after inoculation and interrupted just before the perturbation experiment. Perfusion rates of hiPSC and hNSC were 1.3 day -1 and 0.33 day -1 , respectively. Cells were allowed to aggregate for 2 to 3 days before performing the perturbation experiment.

Perturbation experiments and sampling for metabolomics

Before initiating the perturbation experiment, perfusion was interrupted. Glutamine concentration in the culture medium was determined using an YSI 7100 MBS analyser (YSI Life Sciences, Yellow Springs, Ohio USA) offline. The glutamine pulse was induced by adding the required volume of glutamine concentrated solution (L-Glutamine, 200 mM, Gibco) to attain a concentration of 15 mM in the culture media. Changes in osmolarity of the culture medium of bioreactor cultures were later determined using a K-7400S Semi-Micro Osmometer (KNAUER Wissenschaftliche Geräte GmbH, Germany). Sampling was performed before the glutamine step and at several time-points after the step: immediately (0 min), 5 min, 10 min, 15 min, 30 min, 1 h and 2 h after. A sample of 15 mL of culture was collected per time-point sample and distributed equitably in three 50 mL tubes, containing ice-cold PBS to quench cell metabolism, generating three technical replicates that were processed independently. After centrifugation at 300xg for 3 min at 4°C, a sample of supernatant was stored for later quantification of extracellular glutamine, glucose, lactate and ammonia. The remaining supernatant was discarded and the cell pellet was washed with ice-cold PBS and centrifuged again. The supernatant was removed and a solution of 40:40:20 acetonitrile:methanol:water was added to extract intracellular metabolites from the cell pellet. Sonication was performed to guarantee a complete cell lysis. The extracts were transferred to microcentrifuge tubes and centrifuged at 20000 x g at 0°C for 15 minutes. Supernatant was collected, snap-frozen in liquid nitrogen and stored at -80°C until metabolomic analysis. The pellet was also snap-frozen in liquid nitrogen and stored at -80°C until protein quantification. Protein was dissolved in lysis buffer containing 2% SDS (v/v) and quantified using a Microplate BCA Protein Assay Kit (Thermo Scientific). Ammonia was quantified using the Ammonia Assay Kit (Megazyme).

Cell viability

Cell viability in spheroids was analysed before the glutamine perturbation experiment by staining the spheroids with fluorescein diacetate (FDA) in PBS (0.02 mg/mL) and propidium iodide (PI) in PBS (0.002 mg/mL), followed by visualization by fluorescence microscopy using an inverted phase contrast microscope (Leica Microsystems GmbH).

Flow cytometry

Single-cell suspensions of hiPSC were prepared by Versene/Accutase treatment of cell spheroids. Cell density was determined and 0.5x10 6 cells were transferred to a microcentrifuge tube, centrifuged at 300xg for 5 min and washed with 2% FBS in PBS. The cells were resuspended in 50 μl of a solution containing the primary antibody: TRA-1-60 (Santa Cruz Biotechnology, sc-21705, dilution 6:100) or SSEA4 (Santa Cruz Biotechnology, sc-21704, dilution 1:10). Cells were incubated with the primary antibody solution for 1 hour at 4°C, washed with 2% FBS in PBS and centrifuged twice, followed by 30 minutes at 4°C incubation with AlexaFluor 488 secondary antibodies (Invitrogen, A21042 for TRA-1-60 and A11001 for SSEA4, dilution 1:1000). Cells were washed and centrifuged twice with 2% FBS in PBS, and finally resuspended in 500 uL of 2% FBS in PBS for flow cytometry analysis. Data was collected on a CyFlow Space flow cytometer from Partec. Cells were gated on forward and side scatter dot plots. 10,000 events per sample were acquired and the data were analyzed with FloMax software (version 3.0).

Immunofluorescence microscopy

hNSC spheroids were plated on sterile glass coverslips inserted on 24-well plates and left for adherence at 37°C and 5% CO2. Each coverslip containing spheroids were washed once with cold PBS +/+ and then fixed in 500 μL of 4% paraformaldehyde + 4% sucrose in phosphate-buffered saline (PBS) for 20 min at room temperature. Before storage, fixed cells were washed twice with 500 μL PBS. Cells were blocked and permeabilized with 0.2% FSG (Gelatin from cold water fish skin, Sigma, G7765) + 0.1% TritonX-100 in PBS for 20 minutes at room temperature. Primary antibodies were diluted in 0.125% FSG in PBS + 0.1% TritonX-100 and added to fixed spheroids for an incubation of 2 hours at room temperature. Afterwards, cells were washed twice with PBS and incubated with secondary antibodies diluted in 0.125% FSG in PBS for 1 hour and protected from light. Primary and secondary antibodies were used as follows: anti-nestin (Merck Millipore, AB5922), anti-Sox2 (Merck Millipore, AB5603), anti-βIII-tubulin (Merck Millipore, 1:200, MAB1637), AlexaFluor 488 goat anti-rabbit IgG (Invitrogen, A11008), AlexaFluor 594 goat anti-mouse IgG (Invitrogen, A11005). Coverslips were mounted in ProLong Gold antifade reagent with DAPI (Invitrogen, P36935) for staining of cell nuclei. Preparations were visualized on an inverted microscope Leica DMI6000 B (Leica Microsystems). The obtained images were processed using FIJI software [43] and relying solely on linear adjustments.

Metabolomic analysis of intracellular extracts

Targeted and quantitative metabolomic analysis was performed using the AbsoluteIDQ p180 kit and the Energy Metabolism Assay (Biocrates Life Sciences AG, Innsbruck, Austria). The two assays quantify a total of 201 metabolites from different biological classes, including amino acids, biogenic amines, acylcarnitines, lysophosphatidylcholines, phosphatidylcholines, sphingomyelins and several metabolites of the energy metabolism. For the first assay, analyses were carried out after phenylisothiocyanate (PITC)-derivatization in the presence of internal standards by flow-injection tandem mass spectrometry (FIA-MS/MS, for quantification of acylcarnitines, (lyso-) phosphatidylcholines, sphingomyelins, hexoses) and liquid chromatography-tandem mass spectrometry (LC-MS/MS, for amino acids, biogenic amines) using a SCIEX 4000 QTRAP (SCIEX, Darmstadt, Germany) and a Xevo TQ-S Micro (Waters, Vienna, Austria) instrument with an electrospray ionization (ESI) source. The experimental metabolomics measurement technique is described in detail by patent US 2007/0004044 (accessible online at For the second assay, after derivatization to their corresponding methoxime-trimethylsilyl (MeOx-TMS) derivatives, energy metabolites were determined by gas chromatography-mass spectrometry (GC-MS) using an Agilent 7890 GC/5975 MSD (Agilent, Santa Clara, USA) system. Pretreated samples were evaporated to complete dryness and subjected to a two-step methoximation-silylation derivatization. N-methyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA) was used as silylation reagent. Split injection was performed and chromatograms were recorded in selected ion monitoring (SIM) mode. External standard calibration curves and ten internal standards were used to calculate concentrations of individual energy metabolites. Data were quantified using the appropriate MS software (Agilent, Masshunter) and imported into Biocrates MetIDQ software for further analysis.

Data pre-processing and statistical analyses

Absolute metabolic values were normalized by the protein content of the cell pellet for each replicated sample (S2 Table). Metabolites with more than 62.5% of missing values or with coefficients of variation greater than 15% were excluded.

For unsupervised analyses, normalized and averaged metabolic values per time-point were z-scored by subtracting to each value the mean for each metabolite-cell and then dividing by the respective standard-deviation. Principal component analysis and hierarchical clustering was performed in Matlab R2015b (MathWorks, Natick MA) and in Perseus software [44], respectively.

Steady-state fold changes were statistically tested by performing a two-sample t-test, two-sided, assuming the two samples comes from independent random samples from normal distributions with equal means and equal but unknown variances. The Benjamini-Hochberg method was used to correct for multiple testing errors using a false discovery rate of 5% [45]. These fold-changes were log2-transformed for depiction in volcano plots and in the Pearson Correlation matrices. These statistical tests and correlations were performed in Matlab R2015b (MathWorks, Natick MA).

Dynamic modelling and characterization of parameters

A classical model from process dynamics and control based on two liquid surge tanks placed in series [23] was used for modelling the dynamical metabolic profiles. The specific model, named second order with numerator dynamics, has different equations for two scenarios: one for an underdamped process and another for an overdamped process, displayed below with a complex variable s.

Metabolic profiles were fit to these two equations by minimization of the sum of squared residuals. The scenario that presented the lowest residual was chosen. The parameter M was calculated based on the concentration of extracellular concentration of glutamine (S1 Table). Fitting of the four other parameters, the steady-state gain K, the numerator coefficient τA, the response time τ and the damping coefficient ζ, was performed in MATLAB using lsqnonlin and nlinfit functions. The former function was used to get a first estimation of model parameters which would serve as initial parameters to the latter, as the latter function accepts standard-deviations as weights for fitting. Fits with residual norm above 4% were not considered. The fitting of intracellular glutamine for the four cell lines, instantly subjected to the sudden extracellular glutamine step, using the same model parameters except one, display considerable resemblance and very low average fitting error (S6 Fig). So, to tackle randomness variable affecting the quantification of moles of metabolites per protein quantity between time-points, experimental values of hNSC (especially affected by the mentioned random variable) were normalized to the shared glutamine profile by multiplying the ratio of simulated value per experimental value of glutamine at that time-point and for that cell line (for all i, g and y, Met normalized i,cell line g, time-point y = Met i,cell line g, time-point y x (Gln simulated cell line g, time-point y / Gln experimental cell line g, time-point y).

The settling time of each fitting curve was determined by finding the time-point after which the metabolic pool value would remain inside a band whose width is equal to ±5% of the final metabolic pool concentration.

The damping coefficient in the overdamped case was calculated directly from the model parameters obtained for each metabolite:

Dynamic stability against short-term variation

Short-term fluctuations in input are inevitable in metabolic systems because inputs of metabolites can change dramatically after each meal. The role of the homeostatic mechanisms in damping the effect of fluctuating inputs can be illustrated by varying the amino acid input terms in the model to reflect the changes in their blood values during and after meals over a 24-hour period [29]. In Fig. 3 we see the effect of three meals on a few concentrations and reaction velocities in an enlarged model of the folate-mediated one carbon metabolism (FOCM) system that also includes the mitochondria and the synthesis of glutathione (GSH) [4]. The fluxes in different parts of the system vary enormously with meals, except for three critical reactions (e.g., thymidylate synthase, the rate-limiting step for DNA synthesis, and DNA methyl transferase), and the concentration of GSH (Fig. 3c), which vary little if at all. This system has the property that many reactions and substrate levels change dramatically in order to maintain stability of a few critical ones. This is typical of physiological homeostasis. By removing the feedbacks one-by-one, or in different combinations, one can study the degree to which each contributes to the stabilization of each of the four critical reactions. For instance, we showed that the inhibition of MTHFR by SAM has the biggest effect on stabilizing the DNMT reaction, whereas the stimulation of cystathionine β-synthase (CBS) by SAM has a smaller effect [2]. The well-known mechanism of product inhibition also has important stabilization effects. S-adenosylhomocysteine (SAH) inhibits all the methyl transferases. Figure 3d shows how this product inhibition helps stabilize the DNMT reaction.

Effect of short-term variation in amino acid input on metabolite levels and reaction velocities in the folate and methionine cycles. Three pulses of amino acids are shown by gray bars below the figures, corresponding to three meals over a 24-hour period. Variation in response is shown as percentage deviation from the mean. a Two metabolites (SAM and 5mTHF) and reaction velocities [cystathionine β-synthase (CBS) and MTHFR] that show complementary responses. b Reaction velocities of mitochondrial and cytosolic serinehydroxymethyltransferase (SHMT values below −100 % are reversals of direction of the reaction). c Dynamic variation of fluxes throughout the pathway stabilize the velocities of DNMT, TS, and AICART and the concentration of glutathione. d Eliminating product inhibition by S-adenosylhomocysteine (SAH) increases the sensitivity of the DNMT reaction to variation in input is indicated by the greater amplitude of the response. After [28, 35]


Recombinant protein production is a multibillion-dollar business, mainly comprised by therapeutic agents (i.e. recombinant biologic drugs) and industrial enzymes [1–3]. These compounds are commonly synthesized in Escherichia coli, Saccharomyces cerevisiae and Chinese Hamster Ovary cells (CHO) [1, 4–6] however, there is strong pressure to find cost-effective alternatives to overcome technical and economic disadvantages of the aforementioned cell factories, especially in downstream processing [7].

Among the unconventional cell factories used for recombinant protein production, the methylotrophic yeast Pichia pastoris (syn. Komagataella phaffii) has received special attention thanks to its convenient physiology and easy handling [8]. There are strong promoters for this cell factory which are commercially available and that allow for the controlled expression of heterologous proteins [8]. Unlike E. coli, P. pastoris naturally performs post-translational modifications [6, 9], which are essential for most eukaryotic protein functionality [7, 10, 11]. In contrast to S. cerevisiae, P. pastoris exhibits a Crabtree-negative phenotype, showing a reduced synthesis of undesirable products, like ethanol, in glucose-limited conditions [12, 13]. It also shows a lower basal secretion of proteins when compared to other yeasts, which makes downstream processing easier [13, 14]. Finally, P. pastoris can be efficiently cultivated up to high cell densities using fed-batch technology [8], achieving high titers and productivities. For these desirable features, P. pastoris has been widely used for the expression of recombinant proteins, reaching grams per liter concentrations in several cases [9, 15–18]. Most remarkably, and as proof of its technical feasibility and adequacy, two recombinant proteins produced in this cell factory have already been approved by the FDA for medical purposes [10, 19].

Despite its growing acceptance and actual successful applications, recombinant protein production in P. pastoris can be undermined by several cellular processes, where protein folding and secretion are the most recurrent bottlenecks [14, 20, 21]. In addition, limitations may also be caused by the codon usage of the recombinant protein [22], promoter selection [23], carbon and oxygen availability in the culture [24, 25] and fed-batch operational parameters [26], seriously hampering protein yield, productivity and the economic feasibility of the process.

Industrially, P. pastoris is commonly grown in fed-batch cultures in order to maximize the titer and volumetric productivity of a desired compound, often a recombinant protein [27, 28]. This is achieved by adding a culture medium in such a way that the microorganism grows at a desired specific growth rate, which is chosen to maximize the synthesis of the target product and to limit the formation of inhibitory compounds [29]. During this and other cultivation systems, the cells adapt constantly to the changing extracellular environment and to the limited mass transfer conditions observed at high densities [30, 31]. Therefore, it is critical to understand how the cell metabolism interacts with the nutritional and environmental stresses exerted by process conditions to improve bioreactor performance [32]. This is a complex task, however, since the strain’s characteristics and process variables often require significant amounts of time and money for characterization and fine-tuning [12]. Therefore, it is desirable to have a platform to integrate different levels of information from dynamic cultivations of P. pastoris that can be used to elaborate rational hypotheses to increase process productivity.

Systems biology offers a quantitative and comprehensive approach to address this task [33]. In particular, Genome-Scale dynamic Flux Balance Analysis (GS-dFBA) [34–36] is a modeling framework that allows the simulation of metabolism during non-stationary (batch or fed-batch) cultures. GS-dFBA models couple the dynamic mass balances of the extracellular environment of the bioreactor with comprehensive mathematical representations of cellular metabolism called Genome Scale Metabolic Models (GSMs). These structures represent the cell’s entire metabolism as a set of underdetermined constrained mass-balances [30, 37, 38]. GSMs have been employed to understand cellular behavior under different environmental conditions, to map over omics data, and to define a metabolic engineering targets [39, 40]. There are currently five published GSMs of P. pastoris [41–45] which have been developed to help the strain optimization process with a special emphasis on recombinant protein production. Moreover, one of these models has been successfully employed to improve recombinant protein production in P. pastoris [46], validating these frameworks as strain engineering tools for this particular yeast.

GS-dFBA models usually contain several parameters, whose values can be obtained by regression of experimental data. These parameters are used as inputs to obtain flux distributions throughout cultivations, so their values need to be reliable. To ensure this, pre- and post-regression diagnostics have been employed to determine if a certain parameter is supported by the observed data or not [47, 48]. These analyses consist in verifying the model’s capacity to explain the behavior of a system (goodness-of-fit) and the presence of the following parametric limitations: (i) low or no impact on the state variables (sensitivity), (ii) strong correlations with other parameters of the model (identifiability) and (iii) lack of statistical significance (significance). A model is considered robust if it has the capacity to explain different conditions, while containing only sensitive, identifiable and significant parameters.

Here, we present a robust dynamic genome-scale metabolic model of P. pastoris in glucose-limited, aerobic batch and fed-batch cultivations. To assemble the dynamic modeling framework, we started by selecting one of the available genome-scale metabolic models [43] and manually curated it to yield realistic flux distributions. Then, we included it in a set of mass balances representing the main compounds present in culture supernatant. Once assembled, the model was calibrated using experimental data from eight batch and three fed-batch cultivations. Next, we employed pre/post regression diagnostics to determine sensitivity, significance and identifiability problems in the model. In order to avoid the aforementioned statistical limitations, problematic parameters were fixed (i.e. removed from the adjustable parameter set) based on the pre/post regression diagnostics, yielding reduced and potentially robust model structures. Potentially robust model structures consisted in the original model formulation with less adjustable parameters. After evaluating these reduced models for each type of cultivation, we chose the one that presented fewer parametric limitations after being re-calibrated with the available data. These reduced models yielded no (or just a few) significance, sensitivity or identifiability problems when calibrating new data and they could predict bioreactor dynamics in conditions like the ones used for their determination. Finally, we carried out simulations to assess the potential of the model to study P. pastoris metabolism under industrially relevant conditions, and to select molecular and process engineering strategies to improve recombinant protein production.


Martin Foltz (Unilever) is thanked for providing useful comments on the manuscript. We acknowledge the financial support of the European Community under the Framework 6 Marie-Curie Host Fellowships for the Transfer of Knowledge Industry-Academia Strategic Partnership scheme, specifically GUTSYSTEM Project MTKI-CT-2006-042786. Sam Possemiers and Tom Van de Wiele are postdoctoral research fellows of the Flemish Science Foundation (FWO-Vlaanderen). Part of this project was carried out within the research program of the Netherlands Metabolomics Centre, which is part of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research.