Wednesday 19 May 2010

an easy guide to remember what a SNP is

We read about SNPs (pronounced snips) in the papers and online, we hear about them on TV, on the radio and in people's conversations. But what are they?
A SNP is used to understand someone's Story - which population they belong to and who were his/her ancestors - and what makes them the Person they are - what do they carry in their DNA.

Thus, SNPs are used to find someone's S and P i.e. SNP.

----

A SNP stands for Single Nucleotide Polymorphism.

Consider our DNA as a very very very long road, with millions of houses - which we call nucleotides - each of which has an address - we call this its "position" or "co-ordinate". Thus when we talk about a single nucleotide, we talk about 1 of the, 6 billion in the case of humans, houses that make up our road.


There are 4 types of nucleotides/houses (we call these alleles): blue (Cytosine or C), green (Adenine or A), yellow (Guanine or G), red (Thymine or T). If the houses in the photo above were a nucleotide sequence, this would have been GCTATG.

But a SNP is not just a single nucleotide, it is a single nucleotide polymorphism. What is meant by polymorphism is that the house in the middle of the photo above, say the house with address 17854853, in my road is of a different colour than the house of the same address in your road. In this case, mine is green whereas yours could be blue. In biology talk, this is equivalent to "there is a SNP at position 17854853, where I have allele A and you have allele C".

In order for a house to be called a SNP, it is required for at least 1% of the human population to have a different type of house (allele) at that address than the rest. As a result SNPs are not very common: on average only 1 in 1000 houses is different between two individuals.

"So far so good" you could say, "but why is your house green and mine blue?".

The answer is mutation. When DNA gets copied (mitosis) or when it is split in two sets of chromosomes to produce the cells that can give rise to a new organism (the gametes, i.e. the sperm and the egg), mistakes happen. What I mean by mistake is that sometimes the house changes colour, but it can also mean that a house is destroyed (i.e. deleted) or a new house can be inserted.

But this does not exactly explain your question: it only tells you why our houses in this specific address are different, but not why my house in this particular address is green and not red, and yours is blue and not yellow.
This is where the S and P come in. My house is green and yours is blue either because our ancestors were different (the S part) or because a mistake was made when either one of us got created (the P part).
----

We' ll consider at the S part first. Lets say that the tree below is a tiny part of the "tree of life". I am the green dot and you are the blue dot. You will be happy to know that we closely related since we had a common ancestor just near the green line.

Our common ancestor had a green house at address 17854853 of his/her road. This ancestor had two kids, one of which is my ancestor and the other is your ancestor. A mistake was made when your ancestor was created and his house turned from green to blue. And just so that you do not get offended, mistakes are not necessarily bad. They can be good (advantageous mutation), bad (disadvantageous mutations) or they can make no difference whatsoever (neutral mutations). We now know that most often mistakes make no difference.

So the reason why my house is green and yours is blue is explained by who were our ancestors! Differences are explained by our history but this also means that looking at these differences we can understand our history. Find out what is our story. And this is one of the two reasons why SNPs are important: they allow us to understand where we come from and why we are the way we are!!!

Let me give you an example. I choose lactase persistence since it is a very clear and very famous example and because one of the people I worked with during my PhD worked on this.

Lactase is the enzyme that digests the lactose in milk. In some humans lactase activity decreases after weaning (we call these lactose intolerant). In others, lactase activity persists at a high level throughout adult life (we call these lactorse tolerant). Biologists wanted to find out how did this difference arise, what makes these individuals different. For this reason they had to find the SNPs associated with lactase persistence.

Two SNPs have been identified as the best able to explain why one individual is able to digest lactose and another one isn't. The first is "rs4988235 (−13910C→T)".

Don't be scared! All this means is that:
  1. the name of the SNP/house is rs4988235,
  2. the SNP/house is found 13910 houses before the start of the LCT gene, the gene makes the lactase enzyme (for the record the house's actual address is 136325115),
  3. the house can be either of type blue (C) or red (T), and
  4. the ancestral house was of type blue (C) so when one has that type of house is lactose intolerant, whereas if their house is red (T) they are lactose tolerant.
The other SNP is "rs182549 (−22018G→A)". You can now guess what this SNP is from my explanation above.

What Bersaglieri et al (2004) did was to look at those two SNP addresses in a number of individuals and count how many of them had one type of house or the other. In other words they wanted to determine the frequencies of the persistence-associated alleles (T in SNP rs4988235 and A in SNP rs182549). The individuals they used came from three populations (European Americans, African Americans, and East Asians) and for each of these individuals they knew if they were lactose tolerant or intolerant. What they found was a correlation between how common were the persistence-associated alleles and the level of lactose persistence in a population. European Americans, the population with the most lactose tolerant individuals, had the greatest percentage of persistence-associated alleles in these SNPs(77%). In contrast, the other two populations show low lactose tolerance and they have the lowest frequencies of persistence-associated alleles in these SNPs (13-14% in African Americans and 0% in East Asians).

So what did we understand about human history from looking at these SNPs? Based on this data, Bersaglieri et al (2004) estimated that these mistakes (C to T in SNP rs4988235 and G to A for SNP rs182549) rapidly became more common at a time near the estimated origin of dairy farming in northern Europe i.e. ∼9,000 years ago. They thus concluded that added nutrition from dairy appears to have provided an advantage in northern Europe. When dairy farming appeared, humans could not drink the milk. Mistakes happened in the DNA of some of these people and since being able to digest lactose gave an advantage, these mistakes spread through the population (i.e. the frequency of the persistence-associated alleles increased). In fact, these mistakes are in the top 3 of the most advantageous mistakes estimated to date.

------

Now lets talk about the P.

Our DNA does not only tell us about our past, but also about our present and our future. In other words, it tells us what we are and what does this mean in terms of our future. We get this information, in other words we understand what makes us the Person we are, when we compare our DNA to that of others. And one of the main reasons we want to do these comparisons is for our health. To predict what will happen to out state of health in the future. I will use cancer as an example.

Almost a decade after the sequencing of the human genome, new technologies have been developed that - given this sequence - make the creation of individual "SNP maps" or "SNP profiles" easier and easier. They also - and maybe more important - make it a lot cheaper.

But what are SNP maps? (I will use SNP maps from now on but SNP maps and SNP profiles are the same thing).

As I mentioned before, most of our DNA does not differ from person to person. Thus, in order to understand what causes our phenotypic differences - e.g. why do I get breast cancer and you are not - we do not need to compare the whole of our genomes. We only need to look at those addresses where there are known differences in the types of house found there. In other words we just need to look at the parts of our DNA which are polymorphic. Our SNPs that is.

A SNP map is like a registry that says what type of house (allele) we have in those addresses (houses) which are known to differ between individuals. It looks a bit like this: individual X at SNP rs4278313 (whose address is 105123) has allele C, at SNP rs9708285 (whose address is 105195) has allele T, at SNP rs9751025 (whose address is 105213) has allele A, etc etc.

But this is not the only source of information that doctors have for each cancer patient: they also know about their phenotype i.e. what cancer they have, for how long, what treatment worked for them and what did not, their sex, their age, their lifestyle choices, etc. etc.

You may now ask me "since doctors have all this other information, why are SNP maps helpful?".

The reason why SNP maps are important is that they can lead to faster and personalised medicine since they can be used
(a) for the better understanding of the cancer (red part of the following figure) but also
(b) for diagnosis and personalised treatment (blue part of the following figure) .
I would say that currently we are mainly using them for the former, but i will explain both of these below.

(taken from http://nci.nih.gov/images/Documents/f6e06278-e717-4465-b5b4-fda72f95584b/cancer41.jpg)

(a) understanding the biology of cancer: when scientists compare SNP maps of individuals with the same cancer, they can find in which SNPs these patients have the same allele and therefore which are the candidates for the cancer-causing mistakes (mutations). In other words, if individual A and individual B both have prostate cancer and they also have allele T at SNP rs4430796, allele G at SNP rs7501939 and allele C rs3760511 C, when men without prostate cancer have C, A and A respectively, then scientists assume that these SNPs are likely to be associated with this cancer. Of course these comparisons happen with a large number of individuals from many populations.

Similarly, SNP profiles can be compared to better understand the response to cancer treatments. If individual A and individual B both have prostate cancer, both got a lot better when prescribed a specific drug and both have the same alleles at a number of their SNPs, then scientists assume that these SNPs a likely to be associated not only with this cancer but also with this treatment.

(b) diagnosis and treatment planning: the stage above is aiming at personalised medicine. In the previous scenario it means that once the SNPs most associated with this cancer have been identified, a test is created to test each man for this set of SNPs. If they are found to have the cancer-causing alleles in these addresses then they have to check their prostate a lot more often than others who do not. In this way the cancer can be diagnosed in the earliest stage, increasing the chances of those people of living a long life. By the way, the first such test exists. A company called Proactive Genomics created two years ago the
Focus5™ Prostate Cancer Risk Test.

What stage (a) is aiming at is the following scenario: patient enters the room, doctor compares the patient's SNP map to the SNP maps of cancer patients. From this comparison the doctor is able to identify immediately, not only the specific nature of the cancer of the patient but also the best treatment for this particular patient. Cancer is diagnosed and treated at a very early stage and patient has less chances of dying because of the cancer.

However, we should not forget the ethics of this "mapping" and also to take into account the patient's psychology. Patients react differently to news about their health. They make a number of decisions some of which could prolong their lives, but others may have the opposite effect. The question you need to ask yourself is: Would you like to know that there is a probability that you will get cancer? Would you have liked to know this at the age of 18? Or 15? or 5? Would you like other people to know about this? Will it be possible for you to restrict who knows and who doesn't? How is this knowledge going to affect your life?

-----

So from now on, when you see or hear something about SNPs remember your S and your P: remember that the importance of SNPs is that they can tell us things about our hiStory and they can lead to Personalised medicine.

No comments:

Post a Comment