Detailed list of models¶
This page lists all models included in the Epistasis Package.
- EpistasisLinearRegression: estimate epistatic coefficents in a linear genotype-phenotype map.
- EpistasisRidge: estimate epistatic coefficients using L2-regularization in a linear genotype-phenotype map
- EpistasisLasso: estimate sparse epistatic coefficients using L1-regularization in a linear genotype-phenotype map
- EpistasisElasticNet: estimate sparse epistatic coefficients, mixing L1- and L2-regularization, in a linear genotype-phenotype map
- EpistasisNonlinearRegression: estimates nonlinear scale in genotype-phenotype map using an arbitrary defined nonlinear function.
- EpistasisSpline: estimates nonlinear scale in genotype-phenotype map using a spline.
- EpistasisPowerTransform: estimates nonlinear scale in genotype-phenotype map using a power transform.
- EpistasisLogisticRegression: use logistic regression to classify phenotypes as dead/alive.
- EpistasisEnsembleRegression: use a statistical ensemble of “states” to decompose variation in a genotype-phenotype map.
EpistasisLinearRegression¶
A linear, high-order epistasis model. This uses an ordinary least-squares regression to estimate high-order, epistatic coefficients in an arbitrary genotype-phenotype map. Simple define the order of the model.
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisLinearRegression
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisLinearRegression(order=2)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisRidge¶
A L2-norm epistasis model for estimating sparse epistatic coefficients. The optimization function imposes a penalty on the number of coefficients and finds the model that maximally explains the data while using the fewest coefficients.
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisRidge
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisRidge(order=2, alpha=0.1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisLasso¶
A L1-norm epistasis model for estimating sparse epistatic coefficients. The optimization function imposes a penalty on the number of coefficients and finds the model that maximally explains the data while using the fewest coefficients.
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisLasso
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisLasso(order=2, alpha=0.1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisElasticNet¶
A L1-norm+L2-norm epistasis model for estimating sparse epistatic coefficients. The optimization function imposes a penalty on the number of coefficients and finds the model that maximally explains the data while using the fewest coefficients.
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisElasticNet
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisElasticNet(order=2, alpha=0.1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisNonlinearRegression¶
A nonlinear, high-order epistasis model. This uses nonlinear, least-squares
regression (provided by lmfit
) to estimate high-order, epistatic
coefficients in an arbitrary genotype-phenotype map.
- This models has three steps:
- Fit an additive, linear regression to approximate the average effect of individual mutations.
- Fit the nonlinear function to the observed phenotypes vs. the additive phenotypes estimated in step 1. This function is defined by the user as a callable python function
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisLinearRegression
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
def func(x, A):
return np.exp(A * x)
def reverse(y, A):
return np.log(x) / A
# Initialize the data.
model = EpistasisNonlinearRegression(function=func, A=1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisSpline¶
Use Spline function, via nonlinear least-squares regression, to estimate epistatic coefficients and the nonlinear scale in a nonlinear genotype-phenotype map.
- Like the nonlinear model, this model has three steps:
- Fit an additive, linear regression to approximate the average effect of individual mutations.
- Fit the nonlinear function to the observed phenotypes vs. the additive phenotypes estimated in step 1.
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisSpline
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisSpline(k=3)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisPowerTransform¶
Use power-transform function, via nonlinear least-squares regression, to estimate epistatic coefficients and the nonlinear scale in a nonlinear genotype-phenotype map.
- Like the nonlinear model, this model has three steps:
- Fit an additive, linear regression to approximate the average effect of individual mutations.
- Fit the nonlinear function to the observed phenotypes vs. the additive phenotypes estimated in step 1.
Methods are described in the following publication:
Sailer, Z. R. & Harms, M. J. ‘Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps’. Genetics 205, 1079-1088 (2017).
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisPowerTransform
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisPowerTransform(lmbda=1, A=1, B=1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisLogisticRegression¶
A high-order epistasis regression that classifies genotypes as viable/nonviable (given some threshold).
from epistasis.models import EpistasisLogisticRegression
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0, .2, .1, 1]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisLogisticRegression(threshold=.1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
EpistasisEnsembleRegression¶
A regression object that models phenotypes as a statistical (Boltmann-weighted) average of “states”. Mutations are modeled as having different effects in each state.
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisEnsembleRegression
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.2, 0.7, 1.2]
# Read genotype-phenotype map.
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
# Initialize the data.
model = EpistasisEnsembleRegression(order=1, nstates=1)
# Add Genotype-phenotype map data.
model.add_gpm(gpm)
# Fit the model.
model.fit()
# Print effects in state A.
print(model.state_A.epistasis.values)