grenadine.Inference package

Submodules

grenadine.Inference.classification_predictors module

This module allows to infer Gene Regulatory Networks using gene expresion data (RNAseq or Microarray). This module implements several inference algorithms based on classification, using scikit-learn.

grenadine.Inference.classification_predictors.AdaBoost_classifier_score(X, y, **adab_parameters)[source]

AdaBoost Classifier, score predictor function based on scikit-learn AdaBoostClassifier.

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **adab_parameters – Named parameters for the sklearn AdaBoostClassifier
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the AdaBoostClassifier to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = AdaBoost_classifier_score(tfs,tg)
>>> scores
array([0.24, 0.44, 0.32])
grenadine.Inference.classification_predictors.ComplementNB_classifier_score(X, y, **nb_parameters)[source]

Complement Naive Bayes Classifier, score predictor function based on scikit-learn ComplementtNB.

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **nb_parameters – Named parameters for the sklearn MultinomialNB
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the ComplementNB to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = ComplementNB_classifier_score(tfs,tg)
>>> scores
array([0.28113447, 0.39096368, 0.45629413])
grenadine.Inference.classification_predictors.GB_classifier_score(X, y, **gb_parameters)[source]

Gradient Boosting Classifier, score predictor function based on scikit-learn GradientBoostingClassifier.

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **gb_parameters – Named parameters for the sklearn _sklearn_ExtraTreesClassifier
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the GradientBoostingClassifier to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = GB_classifier_score(tfs,tg)
>>> scores
 array([0.33959125, 0.21147015, 0.4489386 ])
grenadine.Inference.classification_predictors.MultinomialNB_classifier_score(X, y, **nb_parameters)[source]

Multinomial Naive Bayes Classifier, score predictor function based on scikit-learn MultinomialNB.

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **nb_parameters – Named parameters for the sklearn MultinomialNB
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the MultinomialNB to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = MultinomialNB_classifier_score(tfs,tg)
>>> scores
array([0.3010284 , 0.41871716, 0.4272386 ])
grenadine.Inference.classification_predictors.RF_classifier_score(X, y, **rf_parameters)[source]

Random Forest Classifier, score predictor function based on scikit-learn RandomForestClassifier.

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **rf_parameters – Named parameters for the sklearn _sklearn_RandomForestClassifier
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the RandomForestClassifier to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = RF_classifier_score(tfs,tg)
>>> scores
array([0.21071429, 0.4       , 0.28928571])
grenadine.Inference.classification_predictors.SVM_classifier_score(X, y, **svm_parameters)[source]

SVM Classifier, score predictor function based on scikit-learn SVC (Support Vector Classifier).

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **svm_parameters – Named parameters for the sklearn SVC
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the SVC to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = SVM_classifier_score(tfs,tg)
>>> scores
array([0.58413783, 0.5448345 , 0.31764191])
grenadine.Inference.classification_predictors.XRF_classifier_score(X, y, **xrf_parameters)[source]

Randomized decision trees Classifier, score predictor function based on scikit-learn ExtraTreesClassifier.

Parameters:
  • X (pandas.DataFrame) – Transcription factor gene expressions (discretized or not) where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector (discretized) where rows are experimental conditions
  • **xrf_parameters – Named parameters for the sklearn _sklearn_ExtraTreesClassifier
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the ExtraTreesClassifier to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,3,size=5), index=["c1","c2","c3","c4","c5"])
>>> scores = XRF_classifier_score(tfs,tg)
>>> scores
array([0.31354167, 0.35520833, 0.33125   ])
grenadine.Inference.classification_predictors.bagging_classifier_score(X, y, **bagging_parameters)[source]

Apply the bagging technique to a regression algorithm, based on scikit-learn BaggingClassifier.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **adab_parameters – Named parameters for the sklearn AdaBoostRegressor
Returns:

co-regulation scores.

The i-th element of the score array represents the average score assigned by the Base Regressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.svm import SVR
>>> np.random.seed(0)
>>> svc = SVC(kernel="linear",decision_function_shape='ovr')
>>> nb_conditions = 10
>>> tfs = pd.DataFrame(np.random.randn(nb_conditions,3),
               index =["c"+str(i) for i in range(nb_conditions)],
               columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randint(0,2,size=nb_conditions),
                   index =["c"+str(i) for i in range(nb_conditions)])
>>> bagging_parameters = {"base_estimator":svc,
                          "n_estimators":5,
                          "max_samples":0.9}
>>> scores = bagging_classifier_score(tfs,tg,**bagging_parameters)
>>> scores
array([0.269231,0.412219,0.299806])

grenadine.Inference.inference module

This module allows to infer co-expression Gene Regulatory Networks using gene expression data (RNAseq or Microarray).

grenadine.Inference.inference.clean_nan_inf_scores(scores)[source]

Replaces nan and -inf scores by the (minimum_score - 1), and inf scores by (maximum_score + 1)

Parameters:
  • scores (pandas.DataFrame) – co-regulation score matrix.
  • are target genes and columns are transcription factors. (Rows) –
  • value at row i and column j represents the score assigned by the (The) –
  • to the regulatory relationship between target gene i (score_predictor) –
  • transcription factor j. (and) –
Returns:

co-regulation score matrix.

Rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by the score_predictor to the regulatory relationship between target gene i and transcription factor j.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> data = pd.DataFrame(np.random.randn(5, 5),
                    index=["gene1", "gene2", "gene3", "gene4", "gene5"],
                    columns=["c1", "c2", "c3", "c4", "c5"])
>>> tf_list = ["gene1", "gene2", "gene5"]
>>> # Example with a regression method
>>> from grenadine.Inference.regression_predictors import GENIE3
>>> scores1 = score_links(gene_expression_matrix=data,
                          score_predictor=GENIE3,
                          tf_list=tf_list)
>>> scores1
          gene2     gene5     gene1
gene1  0.484081  0.515919       NaN
gene2       NaN  0.653471  0.346529
gene3  0.245136  0.301229  0.453634
gene4  0.309982  0.306964  0.383054
gene5  0.529839       NaN  0.470161
>>> clean_nan_inf_scores(scores1)
          gene2     gene5     gene1
gene1  0.484081  0.515919  0.245126
gene2  0.245126  0.653471  0.346529
gene3  0.245136  0.301229  0.453634
gene4  0.309982  0.306964  0.383054
gene5  0.529839  0.245126  0.470161

Makes an ensemble co-regulation score matrix from a list of co-regulation score matrices obtained using different methods, and possibly a list of weights for each method

Parameters:
  • score_links_matrices (list) – list of co-regulation score matrices (pandas DataFrames)
  • score_links_weights (list) – list of weights for each method (the higher the more confidence on the method). If no value is provided each method as a unitary weight
Returns:

co-regulation score matrix.

Rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by the score_predictor to the regulatory relationship between target gene i and transcription factor j.

Return type:

pandas.DataFrame

grenadine.Inference.inference.join_rankings_scores_df(**rank_scores)[source]

Join rankings and scores data frames generated by different methods.

Parameters:**rank_scores – Named parameters, where arguments names should be the methods names and arguments values correspond to pandas.DataFrame output of rank_GRN
Returns:
joined ranks and joined scores
where rows represent possible regulatory links and columns represent each method. Values at row i and column j represent resp. the rank or the score of edge i computed by method j.
Return type:(pandas.DataFrame, pandas.DataFrame)

Examples

>>> import pandas as pd
>>> method1_rank = pd.DataFrame([[1,1.3, "gene1", "gene2"],
                                 [2,1.1, "gene1", "gene3"],
                                 [3,0.9, "gene3", "gene2"]],
                                 columns=['rank', 'score', 'TF', 'TG'])
>>> method1_rank.index = method1_rank['TF']+'_'+method1_rank['TG']
>>> method2_rank = pd.DataFrame([[1,1.4, "gene1", "gene3"],
                                 [2,1.0, "gene1", "gene2"],
                                 [3,0.9, "gene3", "gene2"]],
                                 columns=['rank', 'score', 'TF', 'TG'])
>>> method2_rank.index = method2_rank['TF']+'_'+method2_rank['TG']
>>> ranks, scores = join_rankings_scores_df(method1=method1_rank, method2=method2_rank)
>>> ranks
             method1  method2
gene1_gene2        1        2
gene1_gene3        2        1
gene3_gene2        3        3
>>> scores
             method1  method2
gene1_gene2      1.3      1.0
gene1_gene3      1.1      1.4
gene3_gene2      0.9      0.9
grenadine.Inference.inference.rank_GRN(coexpression_scores_matrix, take_abs_score=False, clean_scores=True, pyscenic_format=False)[source]

Ranks the co-regulation scores between transcription factors and target genes.

Parameters:
  • coexpression_scores_matrix (pandas.DataFrame) – co-expression score matrix where rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by a score_predictor to the regulatory relationship between target gene i and transcription factor j.
  • take_abs_score (bool) – take the absolute value of the score instead of taking scores themselves
Returns:

ranking matrix.

A ranking matrix contains a row for each possible regulatory link, it also contains 4 columns, namely the rank, the score, the transcription factor id, and the target gene id.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> data = pd.DataFrame(np.random.randn(3, 2),
                    index=["gene1", "gene2", "gene3"],
                    columns=["gene1", "gene3"])
>>> # scores associated to self loops are set to nan
>>> data.iloc[0,0]=np.nan
>>> data.iloc[2,1]=np.nan
>>> ranking_matrix = rank_GRN(data)
>>> ranking_matrix
             rank     score     TF     TG
gene3_gene2   1.0  2.240893  gene3  gene2
gene1_gene3   2.0  1.867558  gene1  gene3
gene1_gene2   3.0  0.978738  gene1  gene2
gene3_gene1   4.0  0.400157  gene3  gene1

Scores transcription factors-target gene co-expressions using a predictor.

Parameters:
  • gene_expression_matrix (pandas.DataFrame) – gene expression matrix where rows are genes and columns ares samples (conditions). The value at row i and column j represents the expression of gene i in condition j.
  • score_predictor (function) – function that receives a pandas.DataFrame X containing the transcriptor factor expressions and a pandas.Series y containing the expression of a target gene, and scores the co-expression level between each transcription factor and the target gene.
  • tf_list (list or numpy.array) – list of transcription factors ids.
  • tg_list (list or numpy.array) – list of target genes ids.
  • normalize (boolean) – If True the gene expression of genes is z-scored
  • discr_method – discretization method to use, if discretization of target gene expression is desired
  • progress_bar – bool, if true include progress bar
  • **predictor_parameters – Named parameters for the score predictor
Returns:

co-regulation score matrix.

Rows are target genes and columns are transcription factors. The value at row i and column j represents the score assigned by the score_predictor to the regulatory relationship between target gene i and transcription factor j.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> data = pd.DataFrame(np.random.randn(5, 5),
                    index=["gene1", "gene2", "gene3", "gene4", "gene5"],
                    columns=["c1", "c2", "c3", "c4", "c5"])
>>> tf_list = ["gene1", "gene2", "gene5"]
>>> # Example with a regression method
>>> from grenadine.Inference.regression_predictors import GENIE3
>>> scores1 = score_links(gene_expression_matrix=data,
                          score_predictor=GENIE3,
                          tf_list=tf_list)
>>> scores1
          gene2     gene5     gene1
gene1  0.484081  0.515919       NaN
gene2       NaN  0.653471  0.346529
gene3  0.245136  0.301229  0.453634
gene4  0.309982  0.306964  0.383054
gene5  0.529839       NaN  0.470161
>>> # Example with a classification method
>>> from grenadine.Inference.classification_predictors import RF_classifier_score
>>> from grenadine.Preprocessing.discretization import discretize_genexp
>>> discr_method = lambda X: discretize_genexp (X, "efd", 5, axis=1)
>>> scores2 = score_links(gene_expression_matrix=data,
                                score_predictor=RF_classifier_score,
                                tf_list=tf_list,
                                discr_method=discr_method)
>>> scores2
          gene2     gene5     gene1
gene1  0.512659  0.487341       NaN
gene2       NaN  0.463122  0.536878
gene3  0.368175  0.317341  0.314484
gene4  0.302738  0.346799  0.350463
gene5  0.524815       NaN  0.475185

grenadine.Inference.regression_predictors module

This module allows to infer co-expression Gene Regulatory Networks using gene expression data (RNAseq or Microarray). This module implements severall inference algorithms based on regression, using scikit-learn.

grenadine.Inference.regression_predictors.AdaBoost_regressor(X, y, **adab_parameters)[source]

AdaBoost regressor, score predictor function based on scikit-learn AdaBoostRegressor.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **adab_parameters – Named parameters for the sklearn AdaBoostRegressor
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the AdaBoostRegressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
               index =["c1","c2","c3","c4","c5"],
               columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = AdaBoost_regressor(tfs,tg)
>>> scores
array([0.32978247, 0.3617295 , 0.28896647])
grenadine.Inference.regression_predictors.BayesianRidgeScore(X, y, **brr_parameters)[source]

Score predictor based on scikit-learn BayesianRidge regression.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **brr_parameters – Named parameters for sklearn BayesianRidge regression
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the sklearn BayesianRidge regressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = BayesianRidgeScore(tfs,tg)
>>> scores
array([1.32082000e-03, 6.24177371e-05, 3.32319918e-04])
grenadine.Inference.regression_predictors.Elastica(X, y, **elastica_parameters)[source]

ElasticNetCV regressor, score predictor function based on scikit-learn ElasticNetCV.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **elastica_parameters – Named parameters for the sklearn ElasticNetCV
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the AdaBoostRegressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
               index =["c1","c2","c3","c4","c5"],
               columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = Elastica(tfs,tg)
>>> scores
array([0.05512459, 0.34453337, 0.        ])
grenadine.Inference.regression_predictors.GENIE3(X, y, **rf_parameters)[source]

GENIE3, score predictor function based on scikit-learn RandomForestRegressor.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **rf_parameters – Named parameters for the sklearn RandomForestRegressor
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the RandomForestRegressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = GENIE3(tfs,tg)
>>> scores
array([0.11983888, 0.28071399, 0.59944713])
grenadine.Inference.regression_predictors.GRNBoost2(X, y, **boost_parameters)[source]

GRNBoost2 score predictor based on scikit-learn GradientBoostingRegressor.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **boost_parameters – Named parameters for GradientBoostingRegressor
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the GradientBoostingRegressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = GRNBoost2(tfs,tg)
>>> scores
array([0.83904506, 0.01783977, 0.14311517])
grenadine.Inference.regression_predictors.LassoLars_score(X, y, **l1_parameters)[source]

Score predictor based on scikit-learn LassoLars regression.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **l1_parameters – Named parameters for sklearn Lasso regression
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the sklearn LassoLars regressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = LassoLars_score(tfs,tg, alpha=0.01)
>>> scores
array([0.12179406, 0.92205553, 0.15503451])
grenadine.Inference.regression_predictors.Lasso_score(X, y, **l1_parameters)[source]

Score predictor based on scikit-learn Lasso regression.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **l1_parameters – Named parameters for sklearn Lasso regression
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the sklearn Lasso regressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = Lasso_score(tfs,tg, alpha=0.01)
>>> scores
array([0.13825495, 0.94939204, 0.19118214])
grenadine.Inference.regression_predictors.SVR_score(X, y, **svr_parameters)[source]

Score predictor based on scikit-learn SVR (Support Vector Regression).

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **svr_parameters – Named parameters for sklearn SVR regression
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the sklearn SVR regressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = SVR_score(tfs,tg)
>>> scores
array([[-0.38156814,  0.28128811, -1.0230867 ]])
grenadine.Inference.regression_predictors.TIGRESS(X, y, nsplit=100, nstepsLARS=5, alpha=0.4, scoring='area')[source]

TIGRESS score predictor based on stability selection.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • nsplit (int) – number of splits applied, i.e., randomization tests, the highest the best
  • nstepsLARS (int) – number of steps of LARS algorithm, i.e., number of non zero coefficients to keep (Lars parameter)
  • alpha – Noise multiplier coefficient, Each transcription factor expression is multiplied by a random variable $in [lpha,1]$
  • scoring (str) – option used to score each possible link only “area” and “max” options are available
Returns:

co-regulation scores

The i-th element of the score array represents the score assigned by the sklearn randomizedlasso stability selection to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = TIGRESS(tfs,tg)
>>> scores
array([349.   , 312.875, 588.125])
grenadine.Inference.regression_predictors.XGENIE3(X, y, **rf_parameters)[source]

XGENIE3, score predictor function based on scikit-learn ExtraTreesRegressor.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **rf_parameters – Named parameters for the sklearn RandomForestRegressor
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the ExtraTreesRegressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = XGENIE3(tfs,tg)
>>> scores
array([0.24905241, 0.43503283, 0.31591477])
grenadine.Inference.regression_predictors.bagging_regressor(X, y, **bagging_parameters)[source]

Apply the bagging technique to a regression algorithm, based on scikit-learn BaggingRegressor.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **adab_parameters – Named parameters for the sklearn AdaBoostRegressor
Returns:

co-regulation scores.

The i-th element of the score array represents the average score assigned by the Base Regressor to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.svm import SVR
>>> np.random.seed(0)
>>> svr = SVR(kernel="linear")
>>> tfs = pd.DataFrame(np.random.randn(5,3),
               index =["c1","c2","c3","c4","c5"],
               columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> bagging_parameters = {"base_estimator":svr,
                          "n_estimators":100,
                          "max_samples":0.7}
>>> scores = bagging_regressor(tfs,tg,**bagging_parameters)
>>> scores
array([0.32978247, 0.3617295 , 0.28896647])
grenadine.Inference.regression_predictors.stability_randomizedlasso(X, y, **rl_parameters)[source]

Score predictor based on scikit-learn randomizedlasso stability selection.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **rl_parameters – Named parameters for sklearn randomizedlasso
Returns:

co-regulation scores.

The i-th element of the score array represents the score assigned by the sklearn randomizedlasso stability selection to the regulatory relationship between the target gene and transcription factor i.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = stability_randomizedlasso(tfs,tg)
>>> scores
array([0.11 , 0.17 , 0.085])

grenadine.Inference.statistical_predictors module

This module allows to infer co-expression Gene Regulatory Networks using gene expression data (RNAseq or Microarray). This module implements severall inference algorithms based on statistical predictors, using scipy-stats and scikit-learn.

grenadine.Inference.statistical_predictors.CLR(X, y, **mi_parameters)[source]

Score predictor function based on scikit-learn mutual_info_regression score.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **mi_parameters – Named parameters for sklearn mutual_info_regression
Returns:

co-regulation scores.

The i-th element of the score array represents the score of the sklearn mutual_info_regression computation between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = CLR(tfs,tg)
>>> scores
array([6.66666667e-02, 1.16666667e-01, 2.22044605e-16])
grenadine.Inference.statistical_predictors.abs_pearsonr_coef(X, y)[source]

Score predictor function based on the scipy-stats absolute Pearson correlation.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
Returns:

co-regulation scores.

The i-th element of the score array represents the absolute value of the correlation between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = abs_pearsonr_coef(tfs,tg)
>>> scores
array([0.41724166, 0.02212467, 0.23708491])
grenadine.Inference.statistical_predictors.abs_spearmanr_coef(X, y)[source]

Score predictor function based on the scipy-stats absolute Spearman correlation.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
Returns:

co-regulation scores.

The i-th element of the score array represents the absolute value of the correlation between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = abs_spearmanr_coef(tfs,tg)
>>> scores
array([0.5, 0.3, 0.3])
grenadine.Inference.statistical_predictors.energy_distance_score(X, y, **energy_distance_parameters)[source]

Score predictor function based on the scipy-stats energy distance between 1D distributions.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **energy_distance_parameters – Named parameters for the scipy-stats energy distance
Returns:

co-regulation scores.

The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = energy_distance_score(tfs,tg)
>>> scores
array([0.40613705, 0.6881455 , 0.72786711])
grenadine.Inference.statistical_predictors.f_regression_score(X, y)[source]

Score predictor function based on the scikit-learn f_regression score.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
Returns:

co-regulation scores.

The i-th element of the score array represents the score of the f_regression linear test between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = f_regression_score(tfs,tg)
>>> scores
array([0.63235967, 0.00146922, 0.17867071])
grenadine.Inference.statistical_predictors.kendalltau_score(X, y, **kendalltau_parameters)[source]

Score predictor function based on the scipy-stats Kendall’s tau correlation measure.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **kendalltau_parameters – Named parameters for the scipy-stats kendall’s tau correlation measure
Returns:

co-regulation scores.

The i-th element of the score array represents the score of the score between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = kendalltau_score(tfs,tg)
>>> scores
array([0.8487997 , 1.30065214, 0.20467198])s
grenadine.Inference.statistical_predictors.mannwhitneyu_score(X, y, **mannwhitneyu_parameters)[source]

Score predictor function based on the scipy-stats Mann-Whitney rank test.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **mannwhitneyu_parameters – Named parameters for the scipy-stats Mann-Whitney rank test
Returns:

co-regulation scores.

The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = mannwhitneyu_score(tfs,tg)
>>> scores
array([1.52213525, 0.47101693, 0.3795872 ])
grenadine.Inference.statistical_predictors.theilslopes_score(X, y, **theilslopes_parameters)[source]

Score predictor function based on the scipy-stats Theil-Sen robust slope estimator.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **theilslopes_parameters – Named parameters for the scipy-stats Theil-Sen robust slope estimator
Returns:

co-regulation scores.

The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = theilslopes_score(tfs,tg)
>>> scores
array([0.92309299, 0.90933202, 0.26451817])
grenadine.Inference.statistical_predictors.wasserstein_distance_score(X, y, **wasserstein_distance_parameters)[source]

Score predictor function based on the scipy-stats Wasserstein distance between 1D distributions.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **wasserstein_distance_parameters – Named parameters for the scipy-stats Wasserstein distance
Returns:

co-regulation scores.

The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),
                       index =["c1","c2","c3","c4","c5"],
                       columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = wasserstein_distance_score(tfs,tg)
>>> scores
array([0.36457586, 0.72057084, 0.81207932])
grenadine.Inference.statistical_predictors.wilcoxon_score(X, y, **wilcoxon_parameters)[source]

Score predictor function based on the scipy-stats Wilcoxon signed-rank test.

Parameters:
  • X (pandas.DataFrame) – Transcriptor factor gene expressions where rows are experimental conditions and columns are transcription factors
  • y (pandas.Series) – Target gene expression vector where rows are experimental conditions
  • **wilcoxon_parameters – Named parameters for the scipy-stats Wilcoxon signed-rank test
Returns:

co-regulation scores.

The i-th element of the score array represents the score between target gene expression and the i-th transcription factor gene expression.

Return type:

numpy.array

Examples

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> tfs = pd.DataFrame(np.random.randn(5,3),index =["c1","c2","c3","c4","c5"],columns=["tf1","tf2","tf3"])
>>> tg = pd.Series(np.random.randn(5),index=["c1","c2","c3","c4","c5"])
>>> scores = wilcoxon_score(tfs,tg)
>>> scores
array([1.36537718, 0.64797987, 0.30086998])

Module contents

This submodule contains different data-driven scoring functions to infer GRNs from gene expression datasets