Prominent Strain-Condition Interactions in Yeast Transcripts

Gene-environment interactions occur when the effect of a genetic variant differs in multiple environments. Representing an intermediate between the “nature and nurture” sides of genetic variation, these interactions are important contributors to the development of complex phenotypes. There have been many studies performed on humans and model organisms that have attempted to observe the effects of these interactions, most of them involving the use of techniques such as genome-wide association studies (GWAS) and candidate gene association studies (CGAS). These methods, however, have proven ineffective in elucidating the molecular mechanisms of gene-environment interactions.

Expression quantitative trait loci (eQTL) mapping is a powerful extension of standard quantitative mapping techniques that has shown promise in studying these interactions. As its name suggests, eQTL mapping involves the association of variances in gene expression and genetic polymorphisms. With modern technology, thousands of genetic transcripts can be simultaneously measured under varying genetic and environmental conditions. This allows for the application of eQTL mapping techniques when observing difficult-to-study phenomena such as gene-environment interactions.

Based on existing conclusions surrounding gene-environment interactions in different organisms, Smith and Kruglyak intended to characterize the overall genetic architecture of the gene-environment interactions in yeast. They particularly focused on the metabolism of two sugar sources: glucose and ethanol. These conditions provide an interesting environmental contrast, as yeast metabolizes both compounds through reverse metabolic pathways (fermentation for glucose and anaerobic respiration for ethanol). The researchers also narrowed down the genetic component by focusing on two particular strains: BY and RM. Due to the molecular differences between the conditions and the strains tested, most of the measured variance was hypothesized to be due to strain and condition effects. However, based on the literature evidence for gene-environment interactions, they also expected to observe a significant contribution of this component in phenotypic variance.

The two parental strains BY and RM, as well as 109 segregant strains derived from a cross between the parental strains, were grown in glucose and ethanol. Each strain was expression profiled by using DNA microarrays to hybridize mRNA that was extracted from the cells. A hybridization standard was created by mixing equal amounts of mRNA from both parents grown in both conditions.

Smith and Kruglyak’s experiments produced two major conclusions. First, from their analyses of variance (ANOVA), they determined that yeast transcripts are influenced by genetic components, environmental components, and by the interactions between both of these components (Figure 1). The presence of all three components demonstrates that even though genetic and environmental contributions to the phenotypic variance are dominant, the interactions between the two are still significant. Another major conclusion from this experiment was the characterization of local and distant linkages. Local linkages were described as being more stable and less dependent on the environment. They also usually affected both environmental conditions in the same manner. On the other hand, distant linkages were described as being more volatile and environmentally-dependent. In addition, distant linkages usually affected only one condition. In general, the majority of the interactions between genes and the environment occurred at distant linkages.

Figure 1. The relative proportion of strain-condition, strain, and condition variance for all transcripts where these three components accounted for more than 50% of total variance. Insets illustrate two-factor plots for representative transcripts. The averages of the BY and RM strains are in orange and purple, respectively, with error bars indicating standard deviation.

Figure 1. The relative proportion of strain-condition, strain, and condition variance for all transcripts where these three components accounted for more than 50% of total variance. Insets illustrate two-factor plots for representative transcripts. The averages of the BY and RM strains are in orange and purple, respectively, with error bars indicating standard deviation.

The results of this experiment highlight the importance of considering gene-environment interactions when performing linkage studies, as ignoring this component produces bias towards certain loci and compromises the data’s validity. More specifically, even though gene-environment interactions “play a dominant role in a minority of traits” (Smith & Kruglyak, 2008), these traits have the potential to play a significant role in determining the overall phenotypic variance. This is particularly relevant for creating lifestyle choices in relation to human diseases that have already been proven to possess a significant contribution from these interactions, like in the cases of heart disease, depression, and cancer.


Smith, E. N., & Kruglyak, L. (2008). Gene-Environment Interaction in Yeast Gene Expression. PLoS Biology, 6(4), 0812-0824.

The difficulty with mapping QTL traits

Genes are a complicated area of study.  While a person’s genotype – the sequence of nucleotides that codes for a specific trait – may say one thing, their phenotype – the physical manifestation of a genotype – may say something different.  One reason for this is that one locus is under the control of many genes.  Quantitative genetics describe phenotypes that are under the influence of many genetic factors as well as factors deriving from the environment.  Such phenotypes are controlled by a quantitative trait locus (QTL), the sum of all of the genes that compose that phenotype.

There is no debate that identifying QTL is extremely difficult; many different genes can contribute to one phenotype, combinations of genes can influence expression and environmental factors also affect characterization.  In Dissecting the architecture of a quantitative trait locus in yeast, researchers sought to determine the specific genes behind the high-temperature-growth (Htg) phenotype in Saccharomyces cerevisiae. The Htg phenotype varies in different degrees, making it a great trait to test new approaches for studying quantitative traits. Overall, they used genome-wide mapping and reciprocal-hemizygosity analysis to highlight key problems with QTL mapping.

The Htg phenotype was selected for by running a colony-size assay and a quantitative competition assay, which showed that the S288c strain grows poorly in high temperatures  (Htg-) compared to YJM145 (Htg+) (Steinmetz 2002).  The hybrid of the two strains grew even better than the two homozygous strains which demonstrates heterosis, suggesting that the Htg phenotype displays a non-additive model of phenotypes.

The researchers first sought to understand how these traits were inherited. They performed a cross between isogenic strains of YJM145 and S288c and found through another colony assay that 104 out of the 960 (1:9 ratio) progeny segregants were Htg+ . They predicted that there were about 3 causative genes (1/9= ½ ^3.2), with that in mind, they knew that quantitative traits are almost impossible to predict. They then relied on QTL mapping to detect which genes are the major contributors.

QTL mapping was conducted by hybridizing total genomic DNA from the isogenic YJM and S288c strains to high density oligonucleotide arrays.

Screen Shot 2015-04-15 at 2.22.47 AM

They found 3,444 biallelic markers which had decreased signals (ie. where the YJM alleles differ from the isogenic S288c). They then took 19 different YJM-S288c (Htg+) hybrid and performed a second array to determine which alleles came from which parent. They found that among the 19 colonies there were two regions in which there was evidence to suggest NON-RANDOM segregation– on chromosome 16 (a 8.1 kb long region), and 14 (a 51.6kb long region). Screen Shot 2015-04-15 at 2.33.03 AM

(probability that alleles are segregated randomly)

They then calculated the relative risk for each (2.1 and 30.6 respectively) and determined that the region on chromosome 14 is a better candidate to map the QTL due to its 87.5% association with the Hgt positive phenotype.

Screen Shot 2015-04-15 at 2.39.15 AM

The researchers then sought to find which alleles of the 15 genes on chromosome 14 were most phenotypically relevant. They first sequenced the region for both the isogenic YJM and S288c yeast, and found 10 nonsynonymous mutations  that were identical for all Htg+ strains. However, statistically there was no association for these SNPs. So they then decided to perform a reciprocal-hemizygosity analysis.

To perform a reciprocal-hemizygosity analysis, the researchers took a diploid hybrid with one S288c strand and one YJM strand and deleted each allele of the 15 genes so that when the strands would recombine, they could see if that allele had an affect on the phenotype (growth rate at 30, 40 and 41 degrees Celsius).

Screen Shot 2015-04-15 at 2.22.11 AM

Screen Shot 2015-04-15 at 2.26.22 AM

They narrowed their search down to 3 genes MKT1, END3, and RHO2.  When these alleles were deleted from the hybrid they did not grow as well as the Htg+ hybrid not containing the deletion, but even with the deleted alleles they still grew better than their parent homozygous strains. Both MKT1 and RHO2 derive from the YJM parent, but what was most interesting was that the END3 gene was deleted from the S288c parent strand which is typically (Htg- ). This highlights that the interactions between the two genes contribute to phenotype– showing that QTL mapping is harder to predict than previously thought.

Results from the study demonstrate the problems with QTL studies.  Since quantitative traits are under the influence of many factors, both genetic and environmental, attempting to pinpoint the exact cause of a trait can lead to the discovery of more genes than originally bargained for.  Linked loci can cause problems with the standard single-gene-per-locus approach, and narrowing the search interval can cause researchers to miss neighboring genes that also have an effect on the trait.  Yet other genes can be missed because they have a lesser effect on the final phenotype.  It is suggested that a new test be designed in which multiple genes can be tested for quantified traits both together and individually.

Other work has identified additional alleles with a smaller role in contributing to the Htg phenotype discussed in this paper (Sinha 2008).  A common problem in many QTL studies is the determination of new genes affecting the QTL; many studies have proposed alternative methods, from using drug-resistant markers (Steinmetz 2007) to proof-of-concept studies to better synthesis the rapidly growing wealth of knowledge about quantitative genes (Glazier 2002).


Glazier, A. M., Nadeau, J. H., Aitman, T. J.  2002.  Finding genes that underlie complex traits.  Science.  298:  2345-2349.

Sinha, H., Lior, D., Pascon, R. C. et al.  2008.  Sequential elimination of major-effect contributors identifies additional quantitative trait loci conditioning high-temperature growth in yeast.  Genetics.  180:  1661-1670.

Steinmetz, L. M., Sinha, H., et al.  2007.  Dissecting the architecture of a quantitative trait locus in yeast.  Nature.  416:  326-330.

Evidence for gene flow between Homo neanderthalensis and Homo sapiens

Homo neanderthalensis was a species of Pleistocene hominids that lived alongside humans some 50,000 years ago. Their coexistence with humans has led to the controversial theory that the two populations interbred. Citing past research in which mitochondrial DNA (mtDNA) was not found to substantiate claims that Neanderthals and Humans interbred, Pääbo’s team sought to determine the genetic contribution, if any, that could be a result of such interbreeding using next-generation sequencing techniques. In addition to learning whether Neanderthal DNA persists today in the human genome, Pääbo’s research also screened for evidence of positive selection that may have occurred in modern humans since their divergence from Neanderthals.

skullsOf 21 Neanderthal bones initially analyzed, three were chosen for the basis of this project due to the presence of Neanderthal mtDNA. Two of the three bones contained matching mtDNA, but further analysis of genomic and Y chromosomal DNA supplied evidence that the three bones were from three different female individuals, despite earlier claims that one of the bones originated from a male individual.

While analyzing Neanderthal DNA, Pääbo’s team took many precautions to avoid contamination with human DNA in order to verify that the results were representative of bona fide Neanderthal DNA. Previous attempts estimated 11-40% contamination with modern human DNA, thus Pääbo spent twenty years minimizing the amount of contamination in his laboratory. These efforts led to analyses of less than 1% contamination with modern human DNA in this more recent study. Another challenge of the analysis of ancient genomes is the deamination of nucleotide bases that causes C to T and G to A misincorporations when the DNA is sequenced and amplified. To eliminate the effect of these changes, the team focused on transversions over these transitions when comparing various sequences to obtain more accurate representations of the Neanderthal sequence.

Three-way alignments were generated between chimpanzee, human, and Neanderthal DNA to determine divergence and levels of selection. The chimpanzee genome served as an ancestral reference sequence that both Neanderthal and human segments evolved from due to common ancestry. Human and Neanderthal sequences with fixed SNPs were known as the derived state of ancestral genes, and since Neanderthal and human DNA contain many of the same derived SNPs, Pääbo’s team had to develop a way to screen for positive selection. They applied the concept of a selective sweep to find SNPs that were common for the derived state between Neanderthals and humans.


Results for this study indicate that Neanderthal DNA contains more similarities to non-African DNA than reference sequences from West Africans. Additionally, Pääbo’s team found that Neanderthal DNA is as closely related to East Asians as to Europeans, indicating that interbreeding occurred before human migration further East. Gene flow between the two species was unidirectional, in that only Neanderthal DNA was incorporated into the human genome rather than the other way around. Pääbo goes on to conclude that only about 1-4% of Neanderthal DNA is found in the human genome.


Mutation and the long game of genomic evolution

Genomic evolution results from the accumulation of mutations within a population over time. Although most mutations are neutral or even deleterious, occasionally beneficial mutations arise and become fixated within a gene pool. These mutations produce phenotypes of greater fitness than the preceding generation, and are more adapted to survive their environment.

The relationship between genome evolution and adaptation is quite complex, as not all mutations are beneficial. While neutralists argue that genetic drift is the main factor of interest, in which it causes the accumulation of neutral mutations at a roughly constant rate, selectionists argue that the rate of beneficial and deleterious mutations depends on the environment, as well as population size/structure. These rates may not follow discernible patterns, at least when seen over long periods of time, and have been found to undergo “events” of sudden, rapid growth or decline.2 Complex organismal features can suddenly appear within a population, due to random mutations and selection.3 In some cases, the phenotype of greatest fitness may even be overtaken by one of lesser fitness.4

The advent of efficient genome sequencing techniques in E. coli has made it possible to observe the genomic evolution and adaptation relationship. A 2009 study looked at how this genomic evolution occurs, and at what rates. It also assessed whether genetic drift or selection was the main cause of genomic evolution, hypothesizing genetic drift as the main driver for mutation.

The study sequenced the genomes of E. coli bacteria at generations 2k, 5k, 10k, 15k, 20k and 40k. The E. coli were evolved by propagating 12 populations at 37oC for 6,000 days and by transferring 0.2 mL of culture into 9.9 mL of fresh medium each day. Mutations were identified using NimbleGen to conduct comparative sequencing with microarrays.

The study found that the E. coli clones had more non-synonymous mutations than synonymous ones. It was also found that mutation often occurred in the same genes across populations. It was found that the mutations found in the 2k and 15k generations were also found in subsequent generations. The study found that nearly all of the mutations provided a fitness advantage. All four of these findings contradict what would be expected in the populations given the genetic drift hypothesis; in all four cases, the opposite was expected to occur.

Screen Shot 2015-03-24 at 9.59.16 PM

Figure 1: The mutations found in E.coli in various generations. (Ref. 1)

The study also proposes ecological pressures that drive mutation and genomic adaptation rates. An experiment with yeast showed that adaptation measured in each episode of selection was greater than when adaptation was measured from start to finish, although this was not supported with in the E. coli populations from the study. The study also proposed clonal interference as an alternative explanation for the negation of the neutral genetic drift hypothesis. Here, beneficial mutations would be outcompeted for success by mutation with an even greater benefit. Some mutations might have negative side-effects, but these would also create additional opportunity for beneficial mutations, albeit on a smaller scale.1

Screen Shot 2015-03-24 at 10.01.14 PM

Figure 2: The relative fitness and number of mutations graphed from generation 0 to generation 20k. Inlay: The number of mutations up to generation 40k. (Ref. 1)


Finally, the paper looks at synonymous changes. The hypermutable phenotype results in a higher amount of transversion mutations, which are more likely to cause nonsynonymous mutations than other types of mutations. This was reflected in the data, as a lower fraction of mutations in the 40K genome were synonymous than would have been expected due to random chance. The number of synonymous mutations was used to estimate the mutation rate after the emergence of the hypermutable phenotype, which was found to be about 70 times as large as the previous mutation rate.

The study provides a glimpse into the complex world of quantitative genetics. While the results of this study contradicted the hypothesis of genetic drift as the primary driver of genomic adaptation, this is hardly the only interpretation of or cause for adaptation.


  1. Barrick, Jeffrey E., et. al. “Genome evolution and adaptation in a long-term experiment with Escherichia coli.” Nature 461 (2009): 1243-1247. doi:10.1038/nature080480.
  2. Lenski, RE, and M. Travisano. “Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations.” Proc. Natl. Acad. Sci. USA 91 (1994): 6808-6814. Accessed March 24, 2015.
  3. Lenski, RE, Charles Ofria, Robert T. Pennock, and Christoph Adami. “The evolutionary origin of complex features.” Nature 423 (2003): 139-144. Accessed March 24, 2015.
  4. Wilke, Claus O., Jia Lan Wang, Charles Ofria, Richard E. Lenski, and Christoph Adami. “Evolution of digital organisms at high mutation rates leads to survival of the flattest.” Nature 412 (2001): 331-333. Accessed March 24, 2015. doi:10.1038/35085569

Prevalence of weak negative selection as a major evolutionary force in humans

Since the late 1850s when Charles Darwin published “The Origin of Species,” natural selection has been studied as an evolutionary force. However, the importance of selection has been greatly questioned, particularly by scientists participating in the Neutralist-Selectionist Debate. This debate discusses the role of advantageous, neutral, and deleterious mutations in the evolution of species to determine whether natural selection or genetic drift is more significantly linked to evolution.

To contribute to answering this debate, a group of researchers from Cornell University, together with various colleagues, studied the extent of natural selection on protein coding genes by examining the role of synonymous and nonsynonymous variants within the human genome and between humans and chimpanzees. While synonymous changes were simplified as neutral or silent, nonsynonymous polymorphisms within a species was evidence for negative selection and nonsynonymous divergence between species was evidence for positive selection.

Based on the number of variant sites, the data was split into two sets: genes with at least four variable nonsynonymous sites were potentially informative about positive selection (IPS data) and genes with at least two variable nonsynonymous sites were potentially informative only about negative selection (INS). Variabilities were then divided into four categories, as seen in Figure 1a: synonymous divergences (dS=1.02%), nonsynonymous divergences (dN=0.242%), synonymous polymorphisms (pS=0.470%), and nonsynonymous polymorphisms (pN=0.169%). Several assumptions were used during these calculations, but the researchers assessed their assumptions by comparing their data to three simulated data sets (Figure 1b). With these numbers, the researchers were able to perform a modification of the McDonald-Kreitman Test (MKT) in order to determine the level of significance, or credibility intervals (CIs) as referred to in the paper, between the polymorphisms and the divergences found. Figure 1c depicts the distribution of the IPS data set after the statistical analyses were performed, showing a lower amount of genes undergoing positive selection (“95% CI above 0”).

Screen Shot 2015-02-24 at 9.59.24 PM

The tests determined 304 (9%) loci that showed evidence of positive selection and 813 (13.5%) loci that showed evidence of weak negative selection. Transcription factors, nuclear hormone receptors, and genes involved in nucleoside, nucleotide, and nucleic acid metabolism as groups appear to contain an excess of rapidly evolving genes while loci involved in actin binding, cytoskeletal formation, ectoderm development, and general vesicle transport appear to contain an excess of amino acid polymorphisms. Figure 1d summarizes a few of these trends. All in all, the study provided evidence that positive selection is a minor force and that weak negative selection is much more prevalent.

Weak negative selection allows for a deeper understanding of circumstances when genetic drift surpasses natural selection and allows slightly deleterious mutations to increase in frequency within a population. This information can contribute to the knowledge of disease susceptibility in subpopulations. The small percentage of genes showing positive selection can be studied further to deepen the analysis of the evolutionary past of humans.


Severe Inbreeding Depression in Grey Wolves Population

Inbreeding is a term that carries a negative connotation in our culture today, but there is biological evidence to highlight the risks of inbred populations. Inbred populations are strongly homozygous which can increase offsprings’ chance of inheriting recessive traits and generally leads to a decreased biological fitness of the whole population. Small populations are at danger from inbreeding depression (decreased population fitness) due to their small gene pool. In the past it was hard to prove inbreeding depression in nature due to the fact that it is difficult to construct pedigrees for wild populations. Thanks to advancements in population genetics, molecular techniques help model and measure inbreeding in natural populations. In Severe Inbreeding in a wild wolf (Canis lupus) population, researchers used both ecological field data and DNA techniques to construct a full pedigree that demonstrates inbreeding depression in a wild population of wolves

Canis lupus, commonly known as the Grey Wolf, was considered extinct in Sweden and Denmark at the end of the 1960s. But in 1980, a few wolves immigrated back to Scandinavia and were the founders of a new population. Since the wolves had limited mate choices, they represented a population that exemplified the founders effect; increased sensitivity to genetic drift, increased chances of inbreeding and low genetic variation. The population of wolves grew from (about) two wolves to 100 wolves over the course of 19 years (1983-2002).

 The researchers relied on snow tracking and radio telemetry to track the wolves throughout the study. Fitness– the ability to pass down genetic information– was quantified by number of “pups” per litter.  By obtaining DNA from blood, fecal matter, and dead muscle tissues, the researchers were able to construct a pedigree using 16-32 autosomal loci and determine the number of heterozygotes present in the population. They used PEDIGREE VIEWER 5.0 to calculate inbreeding coefficients {F= 1- (Heterozygotes/ HW Heterozygotes)} and the genetic load— the quantification of decreased fitness among the population.

 The results of this study indicate a decrease in fitness which was measured as pup survival through their first winter. This was attributed to the negative effects of inbreeding. By constructing the first complete pedigree back to the founders of the population, Liberg et al were able to determine the inbreeding coefficients of breeding pairs over time. Data collected shows that in 1991, a male immigrated into the population which at the time could be described as not a population at all, but “just a strongly inbred family,” and mated with a daughter of the first breeding pair. This introduced a large amount of genetic variation that resulted in an inbreeding coefficient (F) that varied between 0.00 and 0.41.

fasdsReferring to Figure 1, wolves born prior to this immigrant male were the products of incestuous mating between the original three wolves. This period of time spanning 1983 to 1990 saw an increase in inbreeding coefficients from 0.25-0.375 that’s indicative of full sibling mating. This increase would have continued were it not for the genetic variation introduced into the population by immigrant B6. Subsequent mating events showed a decrease in inbreeding coefficients to around 0.125. However, following this brief input of genetic diversity, inbreeding  had a steady increase in inbreeding coefficients to around 0.25 again. Later years approached even greater levels, reaching 0.359-0.402.

To ensure these trends weren’t attributed to environmental factors such as weather or prey availability, researchers also took into account the number of moose per square kilometre which was found not to be within range of affecting wolf populations (Messier, 1994).

cvsdasd All in all, there was a reduction of 1 pup for every 0.1 increase for F (the inbreeding coefficient) and the population of the wolves dropped by 5 percent (1.29-1.21). Before the arrival of the new immigrant male however, the population of wolves was just considered a largely inbred family and thus relieved genetic load from the population. The researchers suggest that the only way to relieve genetic load from this population is to introduce new genes.

In Aldo Leopold’s “Think like a Mountain” passage, he tells the readers a story about the guilt  he felt when he once shot a wolf; he then goes on to explain that wolves are important for ecosystems around the globe because they are predate on deer (which if left unchecked can destroy forest habitats). The importance of wolves in maintaining a stable ecosystem goes without question. Yet as a result of habitat loss or poaching, apex predators such as these face extinction. A further issue compounding this effect is demonstrated in this study. Lidberg et al demonstrate the effect of inbreeding on Hardy-Weinberg equilibrium where a wild population of wolves with high inbreeding coefficients (F) promote homozygosity. This deviation from HWE through inbreeding depression is manifested in the decreased survival rates for offspring with high F values. Seeing this decline in a real world population of an iconic and important species such as the grey wolf should be further justification to take a greater initiative in their conservation.

Variants in three genes associated with the majority of coat phenotypes in purebred dogs

Humans and dogs have a long history with one another.  Dogs were one of the first domesticated species, with evidence suggesting domestication dates between 11,000 and 14,000 years ago.  This long history has been peppered with both natural and artificial selection, leading to incredible amounts of phenotypic diversity.

Image from Boyko Genome Biology 2011 12:216 doi:10.1186/gb-2011-12-2-216

All this phenotypic diversity provides more than just cute furry friends; it has given researchers a powerful system with which to attack a basic problem in quantitative genetics:  identifying the genetic variants that underlie phenotypic variation.  Dogs are an especially attractive system for these types of studies: because of their unique breed structure genetic variants can be fixed or variable within breeds, providing a diverse source of variation for genome-wide association studies (GWAS).

In 2009, one of the first canine studies to use this approach studied coat types, including length, curl, and the presence or absence of furnishings (bushy eyebrows and beards).  Using genetic data from over 900 dogs representing 80 breeds, Edouard Cadieu and Elaine A. Ostrander of the National Human Genome Research Institute and colleagues were able to identify point mutations in three genes specifically linked to these three phenotypes.

The researchers went even further and validated their results in another set of 662 dogs representing 108 of the ~160 American Kennel Club recognized breeds. Importantly, they found that specific combinations of alleles (or variants) in these three genes accounted for the vast majority of coat variation in almost all of the dogs the examined!  Only a few breeds, including Afghan hounds, have coats that can’t be explained by these variants, suggesting that additional regions of the genome also contribute to coat style.

Overall, the results present a surprising picture, elegant in its simplicity, of the underlying genetics of coat variation in the domestic dog.



Gene Expression Measurement and Its Real World Applications

Before delving into the article discussed this week, it is helpful to ask: what exactly are expression quantitative trait loci (eQTL)? Like standard quantitative trait loci, eQTL are areas of DNA which contain or are linked to genes that relate to quantitative traits, quantitative traits being traits which exists in a sort of gradient and are not associated with one single gene (i.e., human height). However, eQTL are specific areas which regulate quantitative traits, in this case mRNA expression. The below figure attempts to describe the importance of eQTL.

eQTL Figure Source

eQTL also refers to a type of analysis conducted to locate expression quantitative trait loci and link these loci to mRNA expression levels (confusing, we know). eQTL studies are sometimes considered preferential to standard QTL studies as eQTL treats levels of mRNA expression as a sort a phenotype and thus analyzes direct effects on gene transcripts. eQTL studies answer questions such as; how many loci are responsible for expression variation? This allows for the study to get “closer to the genome” than would otherwise be possible.

Again akin to QTL analysis, eQTL analysis produces logarithm of odds (LOD) scores which correlate to the probability that an eQTL is affecting a mRNA expression versus the probability that an eQTL is not affecting mRNA expression. The higher the LOD score, the higher the probability an eQTL is playing a role in the mRNA expression level. By mapping areas with high LOD scores to the genome, specific genes which may be regulating quantitative trait genes can be located.

The article this week, by Schadt et al., studies gene expression in mice, humans, and corn and associates gene expression with quantitative traits. In this case, the expression of any one gene is considered one quantitative trait. Using the correlation between levels of expression (genotypes) and certain traits (phenotypes), researchers tried to identify loci that were indicative of complex diseases, mainly obesity in mice and humans.

Mouse Figure Source  Baby Figure Source

The first analysis within this study looks at the gene expression in liver tissues from 111 F2 mice using a mouse gene oligonucleotide microarray. Expression values within this test were treated as quantitative traits and tested using linkage analysis. Overall, researchers found that looking at the behavior of a gene based on genotype may provide more information regarding its biological activity than simply looking at the regulation of a single gene. Pause for a moment: Does this conclusion make sense in the light of gene expression and function?

After plotting the percentage of eQTL at different LOD score regions across evenly spaced sections (each 2 cM wide), eQTL hotspots, or regions of significance, were found on mouse chromosomes 2, 6, 7, 9, 10, 16, and 17. Based on the clustering of eQTL to loci and the relationship between genes and expression, researchers can see certain patterns that may be associated with phenotypes of certain diseases.

Source: Schadt et al., 2003

One trend observed within the mouse genome analysis is that eQTL with high LOD scores are mostly cis-acting. The figure above indicates a few gene-based polymorphisms that exist within DBA and B6 mouse strains. The four specific genes indicated are; the C5, Alad, St7, and Nnmt. Each of the five hotspot regions are significantly enriched for eQTL for these genes that are related to FPM. Results suggest that chromosome 2 and 19 QTL affect only a portion of the F2 mouse population, demonstrating the complexity of traits such as obesity.

In order to further investigate the relationship between gene expression, complex traits, and common diseases researchers looked at obesity-related pathways in humans. They found that the mouse chromosome 2 locus is homologous to a human chromosome 20 region that has previously been linked to obesity-like phenotypes, and that may be responsible for the QTL associated with this phenotype. In order to look further into gene heritability of gene expression researchers looked at maize in terms of complex traits and found a positive relationship between eQTL of different genes.

What do you think: What advantages does the use of both traditional QTL and eQTL provide in this study? Why is this so significant? For class on Thursday please think about how eQTL can help us better understand complex traits in general . Also consider why researchers used gene expression as a quantitative trait.

Dachshunds, “Water Dogs,” and…GWAS?

In this week’s Journal Club, we will be focusing on the variation of coat growth pattern, length, and curl in domesticated dogs.  While previous articles have focused more on ancient species, such as the Neandertals and Denisovan’s, since domesticated canines are the topic of this research, we hope that it will allow everyone to use their own experiences with these animals to contextualize the findings presented.  Also, this allows for an opportunity to examine artificial selection!

Cadieu, et al., use perform a genome-side association study on over 1000 dogs from many different breeds when analyzing coat phenotypes.  However, increased time is devoted to analysis of Dachshunds, from which the researchers identified RSPO2 as responsible for furnishing growth patterns, and Portuguese Water Dogs, from which the KRT71 gene was identified as influential in coat length and curl character.

The researchers used the general practice of performing marker-based intrabreed analysis, before expanding the scope to an interbreed study using fine-mapping.  This is because coat phenotype variation across breeds is remarkable, yet interbreed variation is relatively simple.  Thus, the researchers developed a relatively straightforward method toward identifying genes from what would otherwise be a daunting combinatorial mess!

Genome-wide single nucleotide polymorphism (SNP) data sets were created for Dachshunds, Portuguese Water Dogs, and one including 903 dogs from 80 breeds, termed CanMap.  From this information, a GWAS for a species presenting a studied phenotype (e.g. wire-haired Dachshunds), with false-positives due to population structure eliminated by interbreed comparison to a GWAS of the CanMap data set, treating those with the phenotype as cases, and those lacking it as controls.  Fine-mapping allowed for the further parsing of data, identifying the smallest shared haplotype, and sequencing enabled the researchers to identify mutants responsible for the phenotype.

As mentioned, the relative familiarity we have with domesticated dogs frames this article in a distinctly different contextual picture, which we found easier to digest.  Did you feel the same?  Do you think that the fact that artificial selection exists in this population has any implications for the data that has been obtained?

Gene Duplicates: Their Fate and Role in Evolution

Gene duplication is regarded as a key producer of genetic diversity and even speciation. Gene duplication can occur in a variety of ways such as crossover events, DNA polymerase errors, reverse transcription, replication of a specific gene sequence,  or through polyploidy (creating multiple copies of a single gene through duplication of the entire genome).

While the study reviewed, The Evolutionary Fate and Consequences of Duplicate Genes (Lynch and Conery, Science 2000), states that it is unclear how often duplication events occur or how duplicate genes “navigate an evolutionary trajectory” from an initial state of redundancy to that of distinct genes, three different theories exist regarding the outcomes of gene duplicate evolution : nonfuctionalization, neo-functionalization, and subfunctionalization. In nonfuctionalization, a gene duplicate is silenced through multiple deleterious mutations. In neo-functionalization, a gene duplicate may mutate in such away that a new, beneficial protein is created while the original protein is also maintained in the other gene duplicate. Finally, in subfunctionalization, both copies of the duplicated gene become compromised by mutation, resulting in a protein production level akin to the original, non-duplicated, non-modified gene.

The relationship between the number of substitutions per replacement site (R) and the number of substitutions per silent site (S) in duplicate genes provides insight into selection pressures acting upon duplicate genes and whether these duplicates are selected for or against within a population over evolutionary time. For example, if the R/S ratio is less than 1, it is suggested that there is a selection pressure against the development of replacement sites. On the other hand, if the R/S ratio is greater than 1, then there is a selection for replacement sites and protein evolution is accelerated. If there is no selection either way, the R/S ratio should be approximately 1.

In this paper,  nine different taxa (H. sapiens, M. musculus, G. gallus, Danio rerio, D. melanogaster, C. Elegans, A. thaliana, O. sativa, and S. cerevisiae) were studied via a genome wide analysis of protein coding regions to compare the number of replacement site substitutions to the number of  silent site substitutions and produce R/S ratios.  It was found that initially following gene duplication, replacement site substitutions were under positive selection. As time passed, replacement site substitutions were selected against, resulting in a R/S ratio ≪ 1.  This change in R/S ratio reflects the increase magnitude of selective constraint on the gene duplicate seen in the functional genes analyzed in this study.

Sanetra et al. Frontiers in Zoology 2005 2:15

The article also discusses the role gene duplication plays in speciation via the build-up of mutations in duplicate genes, eventually resulting in reproductive isolation between populations and hence speciation.  As gene duplication occurs, chromosomal repatterning can affect how chromosomes match with one another during meiosis and affect gametic fitness.

What do you think? Why would replacement site substitutions be initially selected for, only then to be selected against later in time? Does gene duplication and subsequent modification of the duplicants seem like a feasible cause of speciation? Let us know what your thoughts are on Thursday!