“It encompasses various disciplines such as systems engineering and human factors where we try to understand the human behaviors, then from there to artificial intelligence and machine learning,” he said. “We model the human behaviors, their interaction with the environment, with others in groups. We work on various perceptions, decision-making and control algorithms on the robot side. We do a lot of manipulation and navigation on the robot side so that the robot can help the human to perform the task better.”
Iqbal, leader of the Collaborative Robotics Lab, uses multiple cameras and other physiological sensors such as smart watches to track and record gross and subtle human movements, gestures and expressions. Iqbal cited a factory setting where humans work with “cobots,” or “collaborative robots,” with the humans performing fine motor skills and the robots performing gross motor tasks, such as fetching tools for a human worker. Iqbal wants artificial intelligence and machine learning to inform the robot what is expected of it.
“We want the robotic manipulator to try to understand what activity humans are performing, which phases of that activity the human is currently in, and then go and bring the object that the human will need in the near future so that the human doesn’t need to go back and forth,” Iqbal said.
He acknowledged it is difficult to translate the plethora of human social cues for a machine to understand.
“Whenever we try to build something to understand human behavior, it always changes,” Iqbal said. “Understanding the human intent is so hard of a problem itself, because we are expressing ourselves in so many immense ways, and capturing all those is very hard. In many cases, every time we learn something new, it’s hard to teach the machine how to interpret the human intent.”
Part of the difficulty is the variety of human expression.
“Whatever I’m saying is not just the message that I’m passing,” Iqbal said. “I’m passing a lot of my messages with my gestures, so just understanding the verbal message is not sufficient. If I am saying, ‘Give me that thing,’ from just the audio, there is no way for you to know which thing I’m referring to because I’m referring to some objects with my hand gesture.”
To overcome this, Iqbal works with “multimodal representation learning,” a system of verbal messages; nonverbal gestures such as pointing, eye gaze and head motion; and even human physiological sensing, such as heart rate dynamics and skin temperature dynamics.