“Big data” is involved with most aspects of research these days, according to Francis S. Collins, director of the National Institutes of Health, who spoke Wednesday at the University of Virginia’s Old Cabell Hall.
Collins, who graduated from UVA with a degree in chemistry in 1970, is the former director of the Human Genome Project. He joined Phil Bourne, director of UVA’s Data Science Institute and professor of biomedical engineering, in a wide-ranging discussion of data science and life science before an audience that filled most of the Old Cabell Hall Auditorium.
Collins said big data work – which involves deploying large amounts of computing power to seek insights from huge data sets – originally was concentrated on physics and astronomy, but is increasingly being applied to biomedical and life sciences. He said that biomedical applications are among the fastest-growing sectors of big data.
“It crosses through all the areas, from basic science to clinical trials,” Collins said.
There are a lot of employment opportunities for people who can work in data science, Collins said. No matter what field of science students are going into today, they will need to understand data science, he added.
“If you want to benefit humanity, there are massive projects out there for people who want to take on something hard,” Collins said. “The workforce will need well-trained data science people 10 years from now.”
Among the next big projects are mapping the circuits of the human brain, studying how it functions and how things in it go wrong, and a project involving the study of electronic medical records to determine how people stay healthy, he said.
Collins said data science has been a boon to life sciences research.
“When I was at the University of Michigan trying to find the causes of cystic fibrosis – this was before the Human Genome Project – it took five years to find the gene that causes cystic fibrosis,” Collins said. “And it burned out a lot of graduate students.”
Collins replaced James Watson, one of the co-discoverers of the DNA sequence, as the director of the Human Genome Project, an effort he said changed how scientific information was disseminated.
Up to that point, scientific data was released when research was published, but with the enormity of the genome project, it would take years for the entire mapping to be complete. Meanwhile, researchers could use the data that had been determined up to that point to research specific diseases and conditions. He said after a vigorous debate, the researchers agreed to post their data daily.
“The data could be useful to someone who was researching a specific disease,” Collins said. “There is no excuse to keep the data hidden away.”
He said this has become the standard procedure in the industry, he said. While researchers had been concerned about getting “scooped,” Collins said they soon realized that by posting the information on the internet, their publication is time-stamped and therefore their authorship is protected. He said there was also concern about patient confidentiality issues, but he said that protocols for protecting the anonymity of research subjects were put into place.
While some researchers agreed with this standard, he said others objected. The NIH settled the issue when it made the posting of the data a condition of its support.
“This could give researchers insight into pathways for little-understood diseases,” he said. “It gave them a chance to build upon the data.”
Data science can also benefit patients by attracting researchers to more uncommon maladies, he noted. Patients, who can now transfer their own medical records to researchers, are more involved in research today than they have been, and they are anxious to see the data on their conditions, he said. He said the patients are also not shy in expressing themselves to researchers as to what they want and are seeking.
Collins advised life science researchers working with big data to be cautious in forming initial hypotheses, because that may lead them astray. He suggested researchers let the data work out the patterns, and then seek to form and model their theories.
Collins received a standing ovation for his talk.