UPDATED, Sept. 18, 10:30 a.m., with new photo of Don Brown.
During the last year, the University of Virginia has been working with faculty and administrators across Grounds to organize a Big Data Institute. Don Brown, William Stansfield Calcott Professor of Engineering and Applied Science, has been named the institute’s founding director. The Big Data Institute will reside in Olsson Hall and report to the Executive Vice President and Provost.
An expert on data fusion, statistical learning and predictive modeling with applications to security and safety, Brown is the principal investigator or co-principal investigator on more than 70 research contracts with federal, state and private organizations and has published more than 100 papers. He also is co-editor of the books, “Operations Research and Artificial Intelligence: The Integration of Problem Solving Strategies” and “Intelligent Scheduling Systems.”
Brown has served on the National Research Council Committee on Transportation Security and is a recipient of the Norbert Wiener Award for Outstanding Research in the areas of systems engineering, data fusion, and information analysis.
UVA Today science writer Fariss Samarrai recently met with Brown to learn more about the new Big Data Institute, which is an initiative of the University’s strategic plan.
Q: What exactly is Big Data, and why does the University need an institute?
A: “Big data” is something that has multiple definitions, but it’s characterized by one or more three attributes: amount, the rapidity with which it comes in, and its variety. Currently, many domains, including science, engineering, health care, environmental science, e-commerce and, increasingly, the humanities, generate massive amounts of data. These data are accumulating to the point where making sense of them is a huge challenge. And it’s not just about size, speed and variety, but also the complexity of the data sets, the enormous numbers of variables and the uncertainty in measurements from global environmental monitoring systems, studies of gene expression and others. Because data can come from multiple sources, they also must be integrated, creating additional complications.
So we need a Big Data Institute at the University because many of the problems we face in science, engineering, health care and the humanities require very powerful tools for answering questions across the domains. And because U.Va. is a complete university, we can combine our efforts to solve some of the most challenging problems.
Q: What are your early plans and priorities?
A: The big priority right now is to get a master’s of data science degree program in place by next fall. We also are going to push for an undergraduate minor in data science and hope to eventually implement a Ph.D. program.
We also are going to sponsor colloquia to link up researchers and research groups from across Grounds. The colloquia will start soon and continue throughout the academic year. We’ve already had two big data summits, so now we’re looking at smaller meetings focused on specific areas of interest that different groups have identified: ethics is one area; data management, data security, analytics and data integration are other key areas.
Q: What is the long-term view for the institute? What do you expect to ultimately accomplish?
A: We are going to produce students who already are in demand by employers. We are going to see faculty become engaged in research that would not be possible without an institute like this. People are going to be able to do things that are creative and insightful, and important to the country as a whole because of the kind of integrated approach to data science that the institute will allow going forward.
Q: What services will the institute provide?
A: We’re going to build up a service component that will provide advice on data management – the extraction and transport of data. We’re also going to provide startup funds for new faculty so the schools can leverage this into new programs. And we will provide what is, effectively, a stay-at-home sabbatical where people can work at the institute for a period of time on intense, interdisciplinary data science research.
Also, the classes within the master’s program will be usable by the departments, providing the departments with students who can do complex research that they otherwise might not have been able to do without this program. And there will be spinoffs from this, such as certificates for new skills.
Q: So can people come to the institute to seek help for their problems, in the way they might go to ITC with a computer issue?
A: One of our goals is to create a capability like that, where there would be support for those kinds of questions. But it is not going to happen in the first year; that’s longer term. But we are going to do something better than that – it’s not just about giving a guy a fish, but rather teaching him to fish, so we will not just give you an answer, but we will teach graduate students so they can do it on their own and bring the training and knowledge back to the schools.
Q: How will the institute be staffed?
A: It is going to be a director, an advisory board of faculty members from the schools, an administrator, and all of the people who are on sabbaticals at the institute will effectively serve as full-time to part-time participants in teaching or research. Eventually, we will staff up the service support capability. The way we will do that probably will be through the schools, to staff existing capabilities there, giving the schools the capability to do much of the data science themselves.
Q: Earlier this year, the U.Va. Alumni Association’s Jefferson Trust awarded $100,000 to three multidisciplinary, student-led research projects in data science. Is there potential for more funding?
A: I’m actually the chair of the review committee for the Jeffress Trust. Last year they gave out 10 grants across the sciences in Virginia at approximately $100,000 apiece, specifically in the area of data science. The trust is doing that again this year, so that’s one that’s likely available to help start things.
The idea of the institute definitely is to make the University more competitive for winning major grants. We will be able to solve important problems for the country, so absolutely there is the potential for funding. Our goal is to become very competitive in grant areas where we have not been so competitive, and even more competitive where we already are highly competitive.
Q: What are some of those areas where you foresee greater competitiveness?
A: Clearly in the health sciences, in engineering, in the environmental sciences, social sciences and humanities there is a massive amount of data that people struggle with. There are opportunities for funding from agencies of the Department of Defense, from the National Institutes of Health and the National Science Foundation for projects that require the capability to handle big data. We will facilitate that capability.
And I think foundations will be very interested in our data ethics focus. Given the ongoing publicity about the National Security Agency’s possible use of private data, there is a great deal of interest in what that means. Because of U.Va.’s strong emphasis on ethics, we may have some creative and insightful approaches to these problems that may interest foundations with similar interests.
It’s not just the technologies and the mechanics of data science, it’s also these kinds of important policy questions we can help answer.
Q: How many other universities have institutes like this?
A: A rough count is between 10 and 20, closer to 10. But what makes U.Va. distinct is that this is a program that is not a discipline or a department or even a school. It is above that – it instead is multidisciplinary, functioning across multiple schools. Most of the programs at other universities are in a single discipline; so, for instance, you’ll see them in business, in statistics, in computer science, in engineering, but you won’t see the combination of disciplines that we have here. That’s what makes our effort special. We are planning an integrated program that exposes students to integrated projects.
Students will come to see that there are a lot of similarities between projects in how the analytics work, how the data management works, and they will gain insights that they can bring to the various disciplines. And they will each have capstone projects of their choosing – but first they will be exposed to multiple varieties of perspectives on the management and use and analysis of data.
They’re going to get that from faculty across the disciplines, not just from statistics, not just engineering, but also in the humanities, in health, in environmental science, in all these different areas they are going to see the potential to do really good things in data science.
There’s nothing else out there like it.