Big Data Comes To Genetics

There is a powerful new tool in the geneticist’s tool box helping push back the boundaries of our knowledge. The Exome Aggregation Consortium (ExAC) has amassed data on the exomes of more than 60,000 individuals. An exome is the part of the genome that actually encodes for proteins and is considered to be the functionally important area.

The ExAC database is an order of magnitude larger than the previous largest data set, the Exome Sequencing Project produced and curated by the NHLBI in the US. Not only is this new database very large but, importantly, it has data on people from a diverse set of genetic backgrounds.

The reason this is important is that different populations can have a slightly different genetic makeup and what’s common in one population might not be so common in another. This has practical implications when you are trying to decide on the pathogenicity of a variant you have found when conducting a diagnostic genetic test, which is what we do in the lab I work in.

If I’m looking for mutations in a caucasian individual and I find a novel variant then one of the ways we try to decide if it is disease causing or just a part of natural variation is by checking the online databases. These are mostly full of caucasian people. If the variant has never been seen before or seen very rarely then that would push the analysis towards classing the variant as pathogenic. If it was found in 10% of the population then that would be proof that it was benign because 10% of the population isn’t affected by whatever rare genetic disorder you’re trying to diagnose.

It could easily be the case, however that the variant may be rare in caucasians but perfectly common and normal in another population, Latinos and Yoruban indians for example. ExAC will allow us to have access to these larger data sets that should give us the ability to more accurately classify variants we find. Indeed, the ExAC team have already reclassified some variants as benign that were previously thought to be pathogenic.

Although, technically, the database is only just coming online, it has actually been available to search for a year or so and genetics labs across the world, mine included, have been making good use of it. As impressively large as ExAC is it won’t be the largest for long. Work has already begun on the UK’s 100,000 genomes project which will spend the next 2-3 years collecting samples and generating exome data; a project my little lab is doing its part for.

The DNA double helix. Image courtesy of the National Genetics and Genomics Education Centre

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s