Transmission Based Conditional Logistic Model for Testing Main and Interaction Effects ()
1. Introduction
To avoid false positive results because of confounding, some genetic association methods based on pedigree were proposed. Transmission disequilibrium test (TDT) introduced by Spielman et al. [1] is a family-based test. Only trios, including parents and one affected offspring, are needed in TDT. TDT were generalized for multi-allelic markers [2] [3].
For many diseases, especially those of late onset age, parental information is not available, and the classical TDT for triad data cannot implement. Some methods for incomplete families were proposed, e.g. S-TDT [4] and Sibass [5] for siblings, and PDT (pedigree disequilibrium test) for general pedigrees [6].
Single nucleotide polymorphisms (SNPs) are highly abundant, stable genetic markers in humans. TDT methods for haplotype transmission using multiple tightly linked loci were proposed [7] [8]. In these approaches, the individuals’ underlying haplotypes must be reconstructed using observed genotypes even if there are no missing genotype data.
Genotype relative risk may vary across levels of environmental exposure. That is to say, there maybe exist interactions between gene and covariates. Taub et al. [9] derive an extension of genotypic TDT to assess gene-environment interactions for binary environmental variables. However, joint effects of genotype and exposure or environmental covariates were not considered in classical allelic TDT. In the present paper, under conditional logistic regression structure, an allele or haplotype transmission based model is built to detect and assess main effects and gene-environment interactions.
2. Method
2.1. Transmission Based Model
Let H denote the number of allele or haplotypes for one or multiple tightly linked loci. The collection of all possible
genotypes is
.
For an affected offspring with genotype g and covariate vector
, the joint effects of gene and covariates are considered in the genotype risk relative to reference genotype
, i.e.
(1)
where A is being affected. Let
denote the genotypes for father, mother and the affected offspring respectively. Under some conditional independence,
and
, then
(2)
where
is the collection of all possible genotypes of a child given both parents’ genotypes. Let
. Then
(3)
Suppose that the genotype relative risk satisfies robust multiplicative model
(4)
and
with
, and then
. (5)
Therefore, both parents’ transmission probability
(6)
where
(7)
Equation (6) means paternal transmission and maternal transmission are independent, and transmission probability for a parent (father or mother) with genotype
is
. (8)
For a homozygous parent, transmission probability (8) is 1. For a heterozygous parent with genotype
, we introduce dummy variables
(9)
Then a heterozygous parent with genotype
is just
(10)
Equation (10) can be regarded as conditional logistic model for
matched pairs, where
is the number of heterozygous parents. The homozygous parents are excluded because of no contribution to likelihood. In such matched data, the affected offspring with predictors (
) was taken as case, the pseudo offspring with non-transmitted genotype with predictors (
) was taken as matched controls. The parameters
and
measure the main effects of alleles and gene-covariate interaction effects. However, effects of covariates X cannot be included since there is no difference between the X values of the case and matched control.
The maximum likelihood estimates (MLEs) of the parameters can be given via standard conditional logistic model or stratified proportional hazard Cox model, such as PHREG (proportional hazard regression) procedure in statistical software SAS, or coxph in R package “survival”.
2.2. EM Algorithm for Dealing Ambiguities in Allele Transmission
Haplotype phase is often uncertain for linked multi-locus genotype. There may be several haplotype pairs compatible with observed genotype. In addition, even when only one locus is considered, there might be missing parental genotypes, especially for late-onset diseases. Therefore, there are ambiguities to decide which allele or haplotype is transmitted from the parent.
Suppose there are N parents-case trios, and then there are 2N parents in all. The genotypes the r-th parent and his/her offspring are denoted by
, covariates vector for the offspring is
. The log-likelihood
(11)
where
is the set of haplotype groups
which haplotype pair
is compatible with parent genotype
.
It is difficult to find the MLEs of parameter
directly. However, if we take underling haplotype pairs as “missing data” in Expectation-maximization (EM) algorithm, an iterative procedure can be provided to find the MLE. Given the current estimate, the expected complete-data log-likelihood in E (expectation) step is given by
, (12)
where
(13)
can be regarded as the log-likelihood for a weighted conditional logistic model for matched case and controls. However, haplotype frequencies
are often unknown and must be estimated too. Therefore, starting with initial values
and
, the (t + 1)-th iteration of EM algorithm consists of 2 steps.
Step 1: Calculate the weights
, and obtain MLE of
via weighted conditional logistic model for matched case-pseudocontrols.
Step 2: Update hapotype frequencies
(14)
Likelihood ratio test (LRT) can be used to detect gene effect and gene-covariates interactions. Likelihood ratio tests can be used to select model or to test gene effect and gene-covariates interaction. For example, if we consider only one SNP and one covariate, we can construct three models, the null model in which
, the model without interaction in which
, and the full model with interaction. Then we can use
to test gene effect and
to test gene-covariates interaction.
3. Application
Essential hypertension is a multi-factional disorder that is influenced by genetic and environmental factors. The angiotensinogen (AGT) gene of the renin-angiotensin system (RAS) has been considered important elements in blood pressure regulation. Some studies show the M/T polymorphism in exon 2 of the AGT gene at position 235 (M235T) has been related to essential hypertension with controversy in white Europeans [10] [11].
In our study, 126 families with at least one hypertensive sibling, a total of 434 siblings from Hong Kong Chinese population are included in the analysis. As shown in Table 1, 59.5% of the families had two or three siblings with a further 33.4% having four or five siblings, and parents are not available in most of the families (86.5%). The information of siblings is very useful to reduce the uncertainty of the transmission from parent with unknown genotype.
The AGE gene M235T and covariate gender are introduced into the proposed model. Give initial value
and
and precision ε = 10−5. After the EM iterative procedure (shown in Section 2.2) stops, the MLEs for the parameters are
and
, where allele M is
Table 1. Nuclear families in the analyse.
reference allele. To detect the effect of M235T and interaction effect of M235T*sex, we perform likelihood ratio test (LRT) with statistic
and
, respectively. The log-likelihoods with
,
, and
yield
(p < 0.001) and
(p = 0.046).
The results show that M235T is association with hypertension and there is interaction between M235T and gender. The relative risk for allele T is
for male and
for female. This finding overlaps with several other association reports about gene-by-sex interaction of insulin-related traits and demonstrates the importance of considering interactions in the search for related genes [12] [13].
4. Discussions
Gene-covariates interactions are considered in allele/haplotype relative risk, and furthermore, in transmission probability in this transmission model. The missing parental genotypes and multiple tightly linked loci are allowed. For missing genotype or multi-locus genotype data, the underlying haplotypes or alleles are looked as missing data; the weighted conditional logistic models are given via EM algorithm.
As an application, 126 nuclear family data from Hong Kong Chinese population are used in haplotype-based model to detect the association between M235T in angiotensinogen gene and essential hypertension. The results suggest that the 235T is a risk allele with essential hypertension (ESH) for HongKong Chinese people, and contributes to higher risk in ESH men than in women. The 235T allele was more preferentially transmitted from heterozygous parents to ESH male patients than to female patients.
Acknowledgements
This work was supported by Guangdong Basic and Applied Basic Research Foundation (2020B1515310007), and Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University (2020B1212060032).