grenadine.Inference package¶
Submodules¶
grenadine.Inference.classification_predictors module¶
This module allows to infer Gene Regulatory Networks using gene expresion data (RNAseq or Microarray). This module implements several inference algorithms based on classification, using scikit-learn.
-
grenadine.Inference.classification_predictors.
AdaBoost_classifier_score
(X, y, **adab_parameters)[source]¶ AdaBoost Classifier, score predictor function based on scikit-learn AdaBoostClassifier.
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **adab_parameters – Named parameters for the sklearn AdaBoostClassifier
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the AdaBoostClassifier to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = AdaBoost_classifier_score(tfs,tg) >>> scores array([0.24, 0.44, 0.32])
-
grenadine.Inference.classification_predictors.
ComplementNB_classifier_score
(X, y, **nb_parameters)[source]¶ Complement Naive Bayes Classifier, score predictor function based on scikit-learn ComplementtNB.
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **nb_parameters – Named parameters for the sklearn MultinomialNB
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the ComplementNB to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = ComplementNB_classifier_score(tfs,tg) >>> scores array([0.28113447, 0.39096368, 0.45629413])
-
grenadine.Inference.classification_predictors.
GB_classifier_score
(X, y, **gb_parameters)[source]¶ Gradient Boosting Classifier, score predictor function based on scikit-learn GradientBoostingClassifier.
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **gb_parameters – Named parameters for the sklearn _sklearn_ExtraTreesClassifier
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the GradientBoostingClassifier to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = GB_classifier_score(tfs,tg) >>> scores array([0.33959125, 0.21147015, 0.4489386 ])
-
grenadine.Inference.classification_predictors.
MultinomialNB_classifier_score
(X, y, **nb_parameters)[source]¶ Multinomial Naive Bayes Classifier, score predictor function based on scikit-learn MultinomialNB.
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **nb_parameters – Named parameters for the sklearn MultinomialNB
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the MultinomialNB to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = MultinomialNB_classifier_score(tfs,tg) >>> scores array([0.3010284 , 0.41871716, 0.4272386 ])
-
grenadine.Inference.classification_predictors.
RF_classifier_score
(X, y, **rf_parameters)[source]¶ Random Forest Classifier, score predictor function based on scikit-learn RandomForestClassifier.
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **rf_parameters – Named parameters for the sklearn _sklearn_RandomForestClassifier
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the RandomForestClassifier to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = RF_classifier_score(tfs,tg) >>> scores array([0.21071429, 0.4 , 0.28928571])
-
grenadine.Inference.classification_predictors.
SVM_classifier_score
(X, y, **svm_parameters)[source]¶ SVM Classifier, score predictor function based on scikit-learn SVC (Support Vector Classifier).
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **svm_parameters – Named parameters for the sklearn SVC
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the SVC to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = SVM_classifier_score(tfs,tg) >>> scores array([0.58413783, 0.5448345 , 0.31764191])
-
grenadine.Inference.classification_predictors.
XRF_classifier_score
(X, y, **xrf_parameters)[source]¶ Randomized decision trees Classifier, score predictor function based on scikit-learn ExtraTreesClassifier.
Parameters: - X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
- **xrf_parameters – Named parameters for the sklearn _sklearn_ExtraTreesClassifier
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the ExtraTreesClassifier to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"]) >>> scores = XRF_classifier_score(tfs,tg) >>> scores array([0.31354167, 0.35520833, 0.33125 ])
-
grenadine.Inference.classification_predictors.
bagging_classifier_score
(X, y, **bagging_parameters)[source]¶ Apply the bagging technique to a regression algorithm, based on scikit-learn BaggingClassifier.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **adab_parameters – Named parameters for the sklearn AdaBoostRegressor
Returns: co-regulation scores.
The i-th element of the score array represents the average score assigned by the Base Regressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> from sklearn.svm import SVR >>> np.random.seed(0) >>> svc = SVC(kernel="linear",decision_function_shape='ovr') >>> nb_conditions = 10 >>> tfs = pd.DataFrame(np.random.randn(nb_conditions,3), index =["c"+str(i) for i in range(nb_conditions)], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randint(0,2,size=nb_conditions), index =["c"+str(i) for i in range(nb_conditions)]) >>> bagging_parameters = {"base_estimator":svc, "n_estimators":5, "max_samples":0.9} >>> scores = bagging_classifier_score(tfs,tg,**bagging_parameters) >>> scores array([0.269231,0.412219,0.299806])
grenadine.Inference.inference module¶
This module allows to infer co-expression Gene Regulatory Networks using gene expression data (RNAseq or Microarray).
-
grenadine.Inference.inference.
clean_nan_inf_scores
(scores)[source]¶ Replaces nan and -inf scores by the (minimum_score - 1), and inf scores by (maximum_score + 1)
Parameters: - scores (pandas.DataFrame) – co-regulation score matrix.
- are target genes and columns are transcription factors. (Rows) –
- value at row i and column j represents the score assigned by the (The) –
- to the regulatory relationship between target gene i (score_predictor) –
- transcription factor j. (and) –
Returns: co-regulation score matrix.
Rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by the score_predictor to the regulatory relationship between target gene i and transcription factor j.
Return type: pandas.DataFrame
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> data = pd.DataFrame(np.random.randn(5, 5), index=["gene1", "gene2", "gene3", "gene4", "gene5"], columns=["c1", "c2", "c3", "c4", "c5"]) >>> tf_list = ["gene1", "gene2", "gene5"]
>>> # Example with a regression method >>> from grenadine.Inference.regression_predictors import GENIE3 >>> scores1 = score_links(gene_expression_matrix=data, score_predictor=GENIE3, tf_list=tf_list) >>> scores1 gene2 gene5 gene1 gene1 0.484081 0.515919 NaN gene2 NaN 0.653471 0.346529 gene3 0.245136 0.301229 0.453634 gene4 0.309982 0.306964 0.383054 gene5 0.529839 NaN 0.470161 >>> clean_nan_inf_scores(scores1) gene2 gene5 gene1 gene1 0.484081 0.515919 0.245126 gene2 0.245126 0.653471 0.346529 gene3 0.245136 0.301229 0.453634 gene4 0.309982 0.306964 0.383054 gene5 0.529839 0.245126 0.470161
-
grenadine.Inference.inference.
ensemble_score_links
(score_links_matrices, score_links_weights=None)[source]¶ Makes an ensemble co-regulation score matrix from a list of co-regulation score matrices obtained using different methods, and possibly a list of weights for each method
Parameters: - score_links_matrices (list) – list of co-regulation score matrices (pandas DataFrames)
- score_links_weights (list) – list of weights for each method (the higher the more confidence on the method). If no value is provided each method as a unitary weight
Returns: co-regulation score matrix.
Rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by the score_predictor to the regulatory relationship between target gene i and transcription factor j.
Return type: pandas.DataFrame
-
grenadine.Inference.inference.
join_rankings_scores_df
(**rank_scores)[source]¶ Join rankings and scores data frames generated by different methods.
Parameters: **rank_scores – Named parameters, where arguments names should be the methods names and arguments values correspond to pandas.DataFrame output of rank_GRN Returns: - joined ranks and joined scores
- where rows represent possible regulatory links and columns represent each method. Values at row i and column j represent resp. the rank or the score of edge i computed by method j.
Return type: (pandas.DataFrame, pandas.DataFrame) Examples
>>> import pandas as pd >>> method1_rank = pd.DataFrame([[1,1.3, "gene1", "gene2"], [2,1.1, "gene1", "gene3"], [3,0.9, "gene3", "gene2"]], columns=['rank', 'score', 'TF', 'TG']) >>> method1_rank.index = method1_rank['TF']+'_'+method1_rank['TG'] >>> method2_rank = pd.DataFrame([[1,1.4, "gene1", "gene3"], [2,1.0, "gene1", "gene2"], [3,0.9, "gene3", "gene2"]], columns=['rank', 'score', 'TF', 'TG']) >>> method2_rank.index = method2_rank['TF']+'_'+method2_rank['TG'] >>> ranks, scores = join_rankings_scores_df(method1=method1_rank, method2=method2_rank) >>> ranks method1 method2 gene1_gene2 1 2 gene1_gene3 2 1 gene3_gene2 3 3 >>> scores method1 method2 gene1_gene2 1.3 1.0 gene1_gene3 1.1 1.4 gene3_gene2 0.9 0.9
-
grenadine.Inference.inference.
rank_GRN
(coexpression_scores_matrix, take_abs_score=False, clean_scores=True, pyscenic_format=False)[source]¶ Ranks the co-regulation scores between transcription factors and target genes.
Parameters: - coexpression_scores_matrix (pandas.DataFrame) – co-expression score matrix where rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by a score_predictor to the regulatory relationship between target gene i and transcription factor j.
- take_abs_score (bool) – take the absolute value of the score instead of taking scores themselves
Returns: ranking matrix.
A ranking matrix contains a row for each possible regulatory link, it also contains 4 columns, namely the rank, the score, the transcription factor id, and the target gene id.
Return type: pandas.DataFrame
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> data = pd.DataFrame(np.random.randn(3, 2), index=["gene1", "gene2", "gene3"], columns=["gene1", "gene3"]) >>> # scores associated to self loops are set to nan >>> data.iloc[0,0]=np.nan >>> data.iloc[2,1]=np.nan >>> ranking_matrix = rank_GRN(data) >>> ranking_matrix rank score TF TG gene3_gene2 1.0 2.240893 gene3 gene2 gene1_gene3 2.0 1.867558 gene1 gene3 gene1_gene2 3.0 0.978738 gene1 gene2 gene3_gene1 4.0 0.400157 gene3 gene1
-
grenadine.Inference.inference.
score_links
(gene_expression_matrix, score_predictor, tf_list=None, tg_list=None, normalize=False, discr_method=None, progress_bar=False, **predictor_parameters)[source]¶ Scores transcription factors-target gene co-expressions using a predictor.
Parameters: - gene_expression_matrix (pandas.DataFrame) – gene expression matrix where rows are genes and columns ares samples (conditions). The value at row i and column j represents the expression of gene i in condition j.
- score_predictor (function) – function that receives a pandas.DataFrame X containing the transcriptor factor expressions and a pandas.Series y containing the expression of a target gene, and scores the co-expression level between each transcription factor and the target gene.
- tf_list (list or numpy.array) – list of transcription factors ids.
- tg_list (list or numpy.array) – list of target genes ids.
- normalize (boolean) – If True the gene expression of genes is z-scored
- discr_method – discretization method to use, if discretization of target gene expression is desired
- progress_bar – bool, if true include progress bar
- **predictor_parameters – Named parameters for the score predictor
Returns: co-regulation score matrix.
Rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by the score_predictor to the regulatory relationship between target gene i and transcription factor j.
Return type: pandas.DataFrame
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> data = pd.DataFrame(np.random.randn(5, 5), index=["gene1", "gene2", "gene3", "gene4", "gene5"], columns=["c1", "c2", "c3", "c4", "c5"]) >>> tf_list = ["gene1", "gene2", "gene5"]
>>> # Example with a regression method >>> from grenadine.Inference.regression_predictors import GENIE3 >>> scores1 = score_links(gene_expression_matrix=data, score_predictor=GENIE3, tf_list=tf_list) >>> scores1 gene2 gene5 gene1 gene1 0.484081 0.515919 NaN gene2 NaN 0.653471 0.346529 gene3 0.245136 0.301229 0.453634 gene4 0.309982 0.306964 0.383054 gene5 0.529839 NaN 0.470161
>>> # Example with a classification method >>> from grenadine.Inference.classification_predictors import RF_classifier_score >>> from grenadine.Preprocessing.discretization import discretize_genexp >>> discr_method = lambda X: discretize_genexp (X, "efd", 5, axis=1) >>> scores2 = score_links(gene_expression_matrix=data, score_predictor=RF_classifier_score, tf_list=tf_list, discr_method=discr_method) >>> scores2 gene2 gene5 gene1 gene1 0.512659 0.487341 NaN gene2 NaN 0.463122 0.536878 gene3 0.368175 0.317341 0.314484 gene4 0.302738 0.346799 0.350463 gene5 0.524815 NaN 0.475185
grenadine.Inference.regression_predictors module¶
This module allows to infer co-expression Gene Regulatory Networks using gene expression data (RNAseq or Microarray). This module implements severall inference algorithms based on regression, using scikit-learn.
-
grenadine.Inference.regression_predictors.
AdaBoost_regressor
(X, y, **adab_parameters)[source]¶ AdaBoost regressor, score predictor function based on scikit-learn AdaBoostRegressor.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **adab_parameters – Named parameters for the sklearn AdaBoostRegressor
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the AdaBoostRegressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = AdaBoost_regressor(tfs,tg) >>> scores array([0.32978247, 0.3617295 , 0.28896647])
-
grenadine.Inference.regression_predictors.
BayesianRidgeScore
(X, y, **brr_parameters)[source]¶ Score predictor based on scikit-learn BayesianRidge regression.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **brr_parameters – Named parameters for sklearn BayesianRidge regression
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the sklearn BayesianRidge regressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = BayesianRidgeScore(tfs,tg) >>> scores array([1.32082000e-03, 6.24177371e-05, 3.32319918e-04])
-
grenadine.Inference.regression_predictors.
Elastica
(X, y, **elastica_parameters)[source]¶ ElasticNetCV regressor, score predictor function based on scikit-learn ElasticNetCV.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **elastica_parameters – Named parameters for the sklearn ElasticNetCV
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the AdaBoostRegressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = Elastica(tfs,tg) >>> scores array([0.05512459, 0.34453337, 0. ])
-
grenadine.Inference.regression_predictors.
GENIE3
(X, y, **rf_parameters)[source]¶ GENIE3, score predictor function based on scikit-learn RandomForestRegressor.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **rf_parameters – Named parameters for the sklearn RandomForestRegressor
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the RandomForestRegressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = GENIE3(tfs,tg) >>> scores array([0.11983888, 0.28071399, 0.59944713])
-
grenadine.Inference.regression_predictors.
GRNBoost2
(X, y, **boost_parameters)[source]¶ GRNBoost2 score predictor based on scikit-learn GradientBoostingRegressor.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **boost_parameters – Named parameters for GradientBoostingRegressor
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the GradientBoostingRegressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = GRNBoost2(tfs,tg) >>> scores array([0.83904506, 0.01783977, 0.14311517])
-
grenadine.Inference.regression_predictors.
LassoLars_score
(X, y, **l1_parameters)[source]¶ Score predictor based on scikit-learn LassoLars regression.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **l1_parameters – Named parameters for sklearn Lasso regression
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the sklearn LassoLars regressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = LassoLars_score(tfs,tg, alpha=0.01) >>> scores array([0.12179406, 0.92205553, 0.15503451])
-
grenadine.Inference.regression_predictors.
Lasso_score
(X, y, **l1_parameters)[source]¶ Score predictor based on scikit-learn Lasso regression.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **l1_parameters – Named parameters for sklearn Lasso regression
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the sklearn Lasso regressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = Lasso_score(tfs,tg, alpha=0.01) >>> scores array([0.13825495, 0.94939204, 0.19118214])
-
grenadine.Inference.regression_predictors.
SVR_score
(X, y, **svr_parameters)[source]¶ Score predictor based on scikit-learn SVR (Support Vector Regression).
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **svr_parameters – Named parameters for sklearn SVR regression
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the sklearn SVR regressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = SVR_score(tfs,tg) >>> scores array([[-0.38156814, 0.28128811, -1.0230867 ]])
-
grenadine.Inference.regression_predictors.
TIGRESS
(X, y, nsplit=100, nstepsLARS=5, alpha=0.4, scoring='area')[source]¶ TIGRESS score predictor based on stability selection.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- nsplit (int) – number of splits applied, i.e., randomization tests, the highest the best
- nstepsLARS (int) – number of steps of LARS algorithm, i.e., number of non zero coefficients to keep (Lars parameter)
- alpha – Noise multiplier coefficient, Each transcription factor expression is multiplied by a random variable $in [lpha,1]$
- scoring (str) – option used to score each possible link only “area” and “max” options are available
Returns: co-regulation scores
The i-th element of the score array represents the score assigned by the sklearn randomizedlasso stability selection to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = TIGRESS(tfs,tg) >>> scores array([349. , 312.875, 588.125])
-
grenadine.Inference.regression_predictors.
XGENIE3
(X, y, **rf_parameters)[source]¶ XGENIE3, score predictor function based on scikit-learn ExtraTreesRegressor.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **rf_parameters – Named parameters for the sklearn RandomForestRegressor
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the ExtraTreesRegressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = XGENIE3(tfs,tg) >>> scores array([0.24905241, 0.43503283, 0.31591477])
-
grenadine.Inference.regression_predictors.
bagging_regressor
(X, y, **bagging_parameters)[source]¶ Apply the bagging technique to a regression algorithm, based on scikit-learn BaggingRegressor.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **adab_parameters – Named parameters for the sklearn AdaBoostRegressor
Returns: co-regulation scores.
The i-th element of the score array represents the average score assigned by the Base Regressor to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> from sklearn.svm import SVR >>> np.random.seed(0) >>> svr = SVR(kernel="linear") >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> bagging_parameters = {"base_estimator":svr, "n_estimators":100, "max_samples":0.7} >>> scores = bagging_regressor(tfs,tg,**bagging_parameters) >>> scores array([0.32978247, 0.3617295 , 0.28896647])
-
grenadine.Inference.regression_predictors.
stability_randomizedlasso
(X, y, **rl_parameters)[source]¶ Score predictor based on scikit-learn randomizedlasso stability selection.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **rl_parameters – Named parameters for sklearn randomizedlasso
Returns: co-regulation scores.
The i-th element of the score array represents the score assigned by the sklearn randomizedlasso stability selection to the regulatory relationship between the target gene and transcription factor i.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = stability_randomizedlasso(tfs,tg) >>> scores array([0.11 , 0.17 , 0.085])
grenadine.Inference.statistical_predictors module¶
This module allows to infer co-expression Gene Regulatory Networks using gene expression data (RNAseq or Microarray). This module implements severall inference algorithms based on statistical predictors, using scipy-stats and scikit-learn.
-
grenadine.Inference.statistical_predictors.
CLR
(X, y, **mi_parameters)[source]¶ Score predictor function based on scikit-learn mutual_info_regression score.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **mi_parameters – Named parameters for sklearn mutual_info_regression
Returns: co-regulation scores.
The i-th element of the score array represents the score of the sklearn mutual_info_regression computation between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = CLR(tfs,tg) >>> scores array([6.66666667e-02, 1.16666667e-01, 2.22044605e-16])
-
grenadine.Inference.statistical_predictors.
abs_pearsonr_coef
(X, y)[source]¶ Score predictor function based on the scipy-stats absolute Pearson correlation.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
Returns: co-regulation scores.
The i-th element of the score array represents the absolute value of the correlation between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = abs_pearsonr_coef(tfs,tg) >>> scores array([0.41724166, 0.02212467, 0.23708491])
-
grenadine.Inference.statistical_predictors.
abs_spearmanr_coef
(X, y)[source]¶ Score predictor function based on the scipy-stats absolute Spearman correlation.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
Returns: co-regulation scores.
The i-th element of the score array represents the absolute value of the correlation between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = abs_spearmanr_coef(tfs,tg) >>> scores array([0.5, 0.3, 0.3])
-
grenadine.Inference.statistical_predictors.
energy_distance_score
(X, y, **energy_distance_parameters)[source]¶ Score predictor function based on the scipy-stats energy distance between 1D distributions.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **energy_distance_parameters – Named parameters for the scipy-stats energy distance
Returns: co-regulation scores.
The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = energy_distance_score(tfs,tg) >>> scores array([0.40613705, 0.6881455 , 0.72786711])
-
grenadine.Inference.statistical_predictors.
f_regression_score
(X, y)[source]¶ Score predictor function based on the scikit-learn f_regression score.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
Returns: co-regulation scores.
The i-th element of the score array represents the score of the f_regression linear test between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = f_regression_score(tfs,tg) >>> scores array([0.63235967, 0.00146922, 0.17867071])
-
grenadine.Inference.statistical_predictors.
kendalltau_score
(X, y, **kendalltau_parameters)[source]¶ Score predictor function based on the scipy-stats Kendall’s tau correlation measure.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **kendalltau_parameters – Named parameters for the scipy-stats kendall’s tau correlation measure
Returns: co-regulation scores.
The i-th element of the score array represents the score of the score between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = kendalltau_score(tfs,tg) >>> scores array([0.8487997 , 1.30065214, 0.20467198])s
-
grenadine.Inference.statistical_predictors.
mannwhitneyu_score
(X, y, **mannwhitneyu_parameters)[source]¶ Score predictor function based on the scipy-stats Mann-Whitney rank test.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **mannwhitneyu_parameters – Named parameters for the scipy-stats Mann-Whitney rank test
Returns: co-regulation scores.
The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = mannwhitneyu_score(tfs,tg) >>> scores array([1.52213525, 0.47101693, 0.3795872 ])
-
grenadine.Inference.statistical_predictors.
theilslopes_score
(X, y, **theilslopes_parameters)[source]¶ Score predictor function based on the scipy-stats Theil-Sen robust slope estimator.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **theilslopes_parameters – Named parameters for the scipy-stats Theil-Sen robust slope estimator
Returns: co-regulation scores.
The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = theilslopes_score(tfs,tg) >>> scores array([0.92309299, 0.90933202, 0.26451817])
-
grenadine.Inference.statistical_predictors.
wasserstein_distance_score
(X, y, **wasserstein_distance_parameters)[source]¶ Score predictor function based on the scipy-stats Wasserstein distance between 1D distributions.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **wasserstein_distance_parameters – Named parameters for the scipy-stats Wasserstein distance
Returns: co-regulation scores.
The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3), index =["c1","c2","c3","c4","c5"], columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = wasserstein_distance_score(tfs,tg) >>> scores array([0.36457586, 0.72057084, 0.81207932])
-
grenadine.Inference.statistical_predictors.
wilcoxon_score
(X, y, **wilcoxon_parameters)[source]¶ Score predictor function based on the scipy-stats Wilcoxon signed-rank test.
Parameters: - X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
- y (pandas.Series) – Target gene expression vector where rows are experimental conditions
- **wilcoxon_parameters – Named parameters for the scipy-stats Wilcoxon signed-rank test
Returns: co-regulation scores.
The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.
Return type: numpy.array
Examples
>>> import pandas as pd >>> import numpy as np >>> np.random.seed(0) >>> tfs = pd.DataFrame(np.random.randn(5,3),index =["c1","c2","c3","c4","c5"],columns=["tf1","tf2","tf3"]) >>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"]) >>> scores = wilcoxon_score(tfs,tg) >>> scores array([1.36537718, 0.64797987, 0.30086998])
Module contents¶
This submodule contains different data-driven scoring functions to infer GRNs from gene expression datasets