Merge pull request #140 from FH96/patch-9

The Jupyter notebook file for TLOF tutorial
opencobra · Jan 4, 2023 · 4439ad4 · 4439ad4
2 parents ce7035f + b37e94f
commit 4439ad4
Showing 1 changed file with 315 additions and 0 deletions.
diff --git a/tutorials/tutorial-TLOF.ipynb b/tutorials/tutorial-TLOF.ipynb
@@ -0,0 +1,315 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "f1166da9",
+   "metadata": {},
+   "source": [
+    "# tutorial-TLOF\n",
+    "\n",
+    "This tutorial serves as a reference to get started with `TLOF`. Download the live notebook from [here](https://github.com/opencobra/COBRA.jl/tree/master/tutorials).\n",
+    "\n",
+    "If you are not familiar with `COBRA.jl`, or how `COBRA.jl` should be installed, please refer to the tutorial on `COBRA.jl`.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52de2a99",
+   "metadata": {},
+   "source": [
+    "# TLOF\n",
+    "Transcription-based Lasso Objective Finder(TLOF) is an optimization based method to obtain a context-specific objective function for a given condition.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1ba294c",
+   "metadata": {},
+   "source": [
+    "## Formulation\n",
+    "`TLOF` solves the following optimization problem to find a context-specific objective function:\n",
+    "\n",
+    "*Minimize:*\n",
+    "\n",
+    "\n",
+    "\n",
+    "$$\\parallel v - v_\\text{est} \\parallel_2 + 𝑹∗\\parallel c\\parallel_1$$\n",
+    "\n",
+    "\n",
+    "\n",
+    "*Subject to:*\n",
+    "\n",
+    "$$\\sum_{j \\in P}c_j v_j=\\text{carbon uptake rate} \\times  \\text{g}$$\n",
+    "\n",
+    "$$ a_j \\geq c_j \\quad \\forall j \\in P$$\n",
+    "\n",
+    "$$ a_j \\geq -c_j \\quad \\forall j \\in P$$\n",
+    "\n",
+    "$$\\sum_{j \\in P}S_ij v_j=0 \\quad \\forall i \\in N$$\n",
+    "\n",
+    "$$v_\\text{carbon uptake rxn}=\\text{carbon uptake rate}$$\n",
+    "\n",
+    "$$\\sum_{i=1}^N u_i S_ij \\geq c_j \\quad \\forall j \\in P$$\n",
+    "\n",
+    "$$\\sum_{i=1}^N u_i S_ij \\geq 0 \\quad \\forall j \\notin P , \\text{carbon uptake rxn}$$\n",
+    "\n",
+    "$$\\sum_{i=1}^N u_i S_ij + \\text{g} \\geq 0 \\quad \\forall j \\in \\text{carbon uptake rxn}$$\n",
+    "\n",
+    "$$v_j \\geq 0 \\quad \\forall j \\in I$$\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21856af6",
+   "metadata": {},
+   "source": [
+    "Where  v<sub>est</sub> is the flux estimation, v is flux vector, R is regularization coefficient, S is stoichiometric matrix ,  u and g are dual variables , P is the set of reactions considered as “Potential cellular objectives” and *I* is the set of irreversible reactions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f7b1eca",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "`TLOF` reads SBML models by [SBML.jl](https://github.com/LCSB-BioCore/SBML.jl), models the optimization problem by [JuMP.jl](https://github.com/jump-dev/JuMP.jl) and uses [Ipopt.jl](https://github.com/jump-dev/Ipopt.jl) as the solver. \n",
+    "[LinearAlgebra.jl](https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/LinearAlgebra.jl) is also required in the computations inside the function.\n",
+    "\n",
+    "So these four packages are necessary to run `TLOF`, in addition, [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl), [CSV.jl](https://github.com/JuliaData/CSV.jl), [HTTP.jl](https://github.com/JuliaWeb/HTTP.jl) and [Test.jl](https://github.com/JuliaLang/julia/blob/master/stdlib/Test/src/Test.jl) are needed to run the test script for this function. \n",
+    "They can be installed as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f32c2f94",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using Pkg\n",
+    "Pkg.add(\"COBRA\")\n",
+    "\n",
+    "Pkg.add(\"SBML\")\n",
+    "Pkg.add(\"JuMP\")\n",
+    "Pkg.add(\"Ipopt\")\n",
+    "\n",
+    "Pkg.add(\"DataFrames\")\n",
+    "Pkg.add(\"CSV\")\n",
+    "Pkg.add(\"HTTP\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "868d21fd",
+   "metadata": {},
+   "source": [
+    "This function can be called simply, by a single line of code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ebac48a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using COBRA\n",
+    "TLOF(metabolic_model,lambda,flux_estimation,module_flux,selected_rxns,carbon_uptake_rxn,carbon_uptake_rate,sd)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4087a28c",
+   "metadata": {},
+   "source": [
+    "#### Input:\n",
+    "  **metabolic model**: Metabolic models contain sotoichiometric matrix  and also other informations such as flux boundaries and Gene-Protein-Reaction rules. They can be found in different formats including .xml. Metabolic models can be downloaded from [BiGG Models](http://bigg.ucsd.edu/) or elsewhere.\n",
+    "\n",
+    "  **lambda**: Regularization coefficient for the L1 norm term in the objective function of the optimization problem. The larger lambda, the sparser the objective function.\n",
+    "  \n",
+    "  **flux_estimation**: It is a dataframe that has two columns, the first one contains the name of the reactions and the second one flux values.\n",
+    "\n",
+    "*The next two arguments can either be given by the user or assessed by `TLOF_Preprocess` function, provided in this repo.\n",
+    "\n",
+    "**module_flux**: Sometimes measuring the flux of a single reaction is not possible, thus we have measured (or estimated) flux, for example, associated with A-B or A+B, where A and B are reactions in metabolic network. On the other hand, the optimization problem finds flux for single reactions (in that example, A and B separately). But in the objective function (see formulation section above), the difference between measured flux and the corresponding predicted value should be calculated so this `module_flux`,  whose dot product with the predicted flux vector returns the appropriate value for `v`, is required to solve the problem.\n",
+    "\n",
+    "**rxn_names**: This argument is a vector containing the name of the reactions and can be different from the first column of `flux_estimation` according to the explanations for the previous argument.\n",
+    "\n",
+    "**selected_rxns**: A user can define which reactions should be included in potential cellular objective set. This can be either all reactions of the network or any subset of the reactions, defined by their index in the stoichiometric matrix. \n",
+    "\n",
+    "**carbon_uptake_rxn**: The name of the reaction through which carbon is uptaken by a cell, for example,\"R_GLCptspp\". It should match with the reaction names of metabolic network. \n",
+    "\n",
+    "**carbon_uptake_rate**: The exchange flux associated with the carbon source, measured experimentally.\n",
+    "\n",
+    "#### OPTIONAL INPUTS\n",
+    "\n",
+    "**sd**: Measurements are usually performed as replicates and the average value is reported, so there is also a standard deviation value. Since problems with inequality constraints converge better, if any value is given to this argument the capacity constraint will be applied as an inequality constraint, otherwise it will be an equality constraint.  \n",
+    "  \n",
+    "  \n",
+    " #### Output:\n",
+    "\n",
+    "  **c**: It is the objective function found by`TLOF` and is of type Vector{Float64} (a vector whose elements are Float64), which has the same length as the `selected_reaction`.\n",
+    " \n",
+    "  \n",
+    "  **obj**: The optimal value for objective function\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "332fbcfd",
+   "metadata": {},
+   "source": [
+    "## TLOF_Preprocess usage\n",
+    "As it was explained thoroughly for `module_flux` arguement above, this function computes two input data needed to run `TLOF`: \n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "abb7046b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rxn_names,module_flux=TLOF_Preprocess(flux_estimation)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caa45fa9",
+   "metadata": {},
+   "source": [
+    "#### Input:\n",
+    "\n",
+    "**flux_estimation**: Just the same as what was mentioned above.\n",
+    "\n",
+    "#### Output:\n",
+    "\n",
+    "  **rxn_names** and **module_flux**: As explained earlier, what are needed for `TLOF`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1a3894a",
+   "metadata": {},
+   "source": [
+    "The following is an example in which `TLOF` is used to find the objective function for *E. coli* by using the provided flux estimation data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5d8e69ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Downloading the metabolic model\n",
+    "ecoli_model=HTTP.get(\"http://bigg.ucsd.edu/static/models/iJO1366.xml\")\n",
+    "write(\"iJO1366.xml\",ecoli_model.body)\n",
+    "metabolic_model=readSBML(\"iJO1366.xml\")\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "67be70de",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#An example of flux data for *E. coli* to run TLOF\n",
+    "flux_estimation= CSV.read(\"flux estimation.csv\",DataFrame)\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e10eeea",
+   "metadata": {},
+   "source": [
+    "For this example all reactions of the metabolic network are allowed to be in potential cellular objective set:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4226b41f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s_matrix=stoichiometry_matrix(metabolic_model)\n",
+    "selected_rxns=1:length(metabolic_model.reactions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b03e04a",
+   "metadata": {},
+   "source": [
+    "Suitable values are given as input:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f430186d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lambda=0.001\n",
+    "carbon_uptake_rxn=\"R_SUCCt2_2pp\"\n",
+    "carbon_uptake_rate=15.902\n",
+    "sd=0.305667764744552\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b925ee6",
+   "metadata": {},
+   "source": [
+    "`TLOF_Preprocess` is called to obtain the required input for `TLOF`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7991ede4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rxn_names,module_flux=TLOF_Preprocess(flux_estimation)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "87b72ec1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#running TLOF\n",
+    "c,obj=TLOF(metabolic_model,lambda,flux_estimation,module_flux,rxn_names,selected_rxns,carbon_uptake_rxn,carbon_uptake_rate,sd)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Julia 1.6.3",
+   "language": "julia",
+   "name": "julia-1.6"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}