Humans outdo AI when it comes to understanding or describing complex social interactions from moving images, a quintet of researchers at Johns Hopkins University found after testing about 350 artificial intelligence language, video and image models.
That could have repercussions where companies might want to deploy AI in a real-world setting involving constant processing of moving images, for example robots that support or interact with humans in manufacturing and healthcare, or in self-driving cars.
“If you think about AI that needs to operate in any real-world setting, and what we do as humans, we are constantly processing moving images,” said Leyla Isik, a professor of cognitive science at Johns Hopkins University, and a co-author of the paper being presented Thursday at an AI conference in Singapore. “And as humans, we’re very good at integrating these dynamic social cues.”
Isik said it was striking how badly the AI models did when it came to these cues.
AI models are becoming more sophisticated, with a recent Stanford University report finding that artificial intelligence models are mastering new benchmarks quickly. Still, complex reasoning remains an issue.
The research indicates a major deficiency in AI models and could guide future model development, the authors of the paper said.
“Together, these results identify a major gap in AI’s ability to match human social vision and provide insights to guide future model development for dynamic, natural contexts,” they concluded.
To contact the reporter on this story:
To contact the editors responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
Learn About Bloomberg Law
AI-powered legal analytics, workflow tools and premium legal & business news.
Already a subscriber?
Log in to keep reading or access research tools.