Ankur Sarker is fascinated by the power of data.
“Data collected for a while can reveal much information about the environment,” Sarker said. “It is like solving puzzles. Raw data may look very messy and meaningless, but we can extract much information after a thorough analysis.”
But you may not want some of that information extracted.
Sarker, a computer science Ph.D. student in the University of Virginia’s School of Engineering, is working on algorithms to refine raw data for specific purposes without revealing too much information about the people from whom that data is being gathered.
People are monitored for multiple purposes – health data, location data, data related to the “Internet of Things” and smart cities – to help keep them healthy, determine insurance rates and generally keep things convenient in their lives.
To Sarker, data collection is a double-edged sword. While applications may ensure more comfort, higher reliability and efficiency, they also may expose a lot of a user’s everyday life.
Consider the activity data gathered on a smart watch.
“In collecting activity data of this kind, it doesn’t provide any location data, but over time I can analyze your data, and attract some other data, and I can determine your daily activities – when you are going to bed, when you are going to the office,” Sarker said. “In the era of artificial intelligence and big data analytics, a user is always vulnerable in sharing any data.
“We live in a society where there are many internet-connected things that will be hovering around us,” he said. “Different types of smart devices, such as wristwatches, cellular telephones, refrigerators, televisions, electric switches, vehicles and traffic signals, will be collecting different levels of data to provide comfort, safety and efficiency. We cannot ignore these ‘smart things’ ultimately, and data collection should not be a hidden practice.”
He thinks it is important to raise awareness among users, policymakers, industrialists and inventors about the potential outcomes of data collecting and sharing practices.
Sarker’s current research, in collaboration with Chenxi Qiu, an assistant professor in the Department of Computer Science at Rowan University in Glassboro, New Jersey, focuses on insurance company data about brake usage. He said automobile insurance companies offer customers applications, either for their mobile telephones or to be attached to their automobiles, to record whenever the vehicle’s brakes are applied.
“These datasets can be used for safety applications and used to determine how risky you are as a driver,” Sarker said. “It can show how often you use the brakes and how often you drive more than the speed limit. So this is trying to analyze your driving habits and use the data for your insurance policy rates. If I see that you are a very good driver, I don’t need to charge you so much for your insurance.
“But the problem is that while I am analyzing your data sets, I can also indirectly infer your locations, and then make assumptions about why you are going from one place to another place. Even though you think the brake data is not very malicious, and it is not violating any privacy, it can be used to monitor your daily activities.”
He said some insurance companies gather GPS data on their customers, while others limit themselves to just the brake application data, and all seek to safeguard their customers’ data.
“The insurance companies are interested in this algorithm so that they can take necessary prevention techniques and ensure the location privacy of the policyholders/customers,” while still collecting information that is applicable to the rates they charge, he said.
Sarker’s algorithm, which can filter out the ancillary data collected like speed profiles, the continuous speed values from a route and the clusters of possible destinations, can be applied to other forms of data collection.
“This algorithm can be directly applicable to other smart city-related applications, such as speed advisories for manual or autonomous vehicles, driving style prediction and identification, fuel consumption prediction, smart traffic signal controllers, and cooperative adaptive cruise control for autonomous vehicles,” he said. “With minor modifications of the current algorithm, it can be applied to other related applications.
“In a broader view, the proposed system can be applied to health insurance providers, vehicle vendors, law enforcement agencies, departments of transportation, public relations agencies and commercial advertisement agencies.”
A native of Bangladesh, Sarker graduated from the University of Dhaka with a bachelor’s degree in computer science and engineering in 2011, and then with a master’s degree in 2014. He came to the United States to work on his Ph.D., starting at Clemson University before coming to UVA. Here he is conducting his research under adviser Haiying Shen, an associate professor of computer science.
“I have always liked solving mathematical problems,” Sarker said. “In computer science, almost everything is about solving a real-world problem using mathematics. It is a bridge between theory and practice.”
Sarker, who has published some of this research, has worked on projects other than his algorithm.
“Ankur is a very diligent student,” Brian L. Smith, chair of the Department of Engineering Systems and Environment, said. “He has played a leading role in the project that I am working on and has demonstrated his technical leadership capabilities.”
Data collection has become a commonplace element of daily life.
“We have many sensors around us, in your computer, your home, your office and now we are using mini-sensors in our own body and in your cellphone,” Sarker said “So we are collecting a lot of data. My long-term goal is how we can harness the benefits of this and how we can design applications which are user-friendly and at the same time are not violating any privacy or security.”