AI technology has become increasingly sophisticated in recent years. So many products and services now rely on technology to provide automation and intelligence that it is irrevocably intertwined with our everyday world. Whether through devices we use to enable convenience at home or in the way products we use all the time are manufactured, its impact is everywhere, driving innovation in just about every aspect of our lives. But there are missing pieces to this puzzle that still cause frustration for end-users and present significant challenges for researchers trying to improve how AI technology performs.
Before his passing in 2018, Microsoft co-founder Paul Allen dedicated an admirable amount of time and resources to solving an essential challenge that seems to come up again and again: The fundamental lack of common sense in AI technologies. Mr. Allen, whose Allen Institute for Artificial Intelligence (AI2) launched Mosaic to continue addressing this problem, framed it like this:
“Early in AI research, there was a great deal of focus on common sense, but that work stalled. AI still lacks what most 10-year-olds possess: ordinary common sense. We want to jump start that research to achieve major breakthroughs in the field.”
Allen’s analogy highlights a big problem with the current state of deep learning technologies. As intelligent as our AI products often are, they still can’t answer extremely simple questions that we might ask a co-worker or our spouse. For example, “If I paint this wall red, will it still be red tomorrow?” To illustrate how far we have to go to solve this problem, Oren Etzioni, CEO of AI2 cites the example that “…when AlphaGo beat the number one Go player in the world in 2016, the program did not know that Go is a board game.” Until we solve for this, AI’s potential for success will be limited to narrow applications.
Complex solutions to common sense problems
It’s become obvious that a multi-pronged strategy for Common Sense AI will be necessary to break the technology out of its limitations. To this end, Allen’s Mosaic project “integrates machine-reading and reasoning, natural language understanding, computer vision, and crowdsourcing techniques to create a new extensive, foundational common sense knowledge base for future AI systems to build upon.” What does this look like at the research level for an organization like AI2?
- Visual Common Sense Reasoning (VCR) is a new task and large-scale dataset for cognition-level visual understanding. The research is focused on creating higher-order cognition and commonsense reasoning for AI-based vision systems. VCR is an effort between researchers at the University of Washington and AI2. VCR used a group of crowd workers to annotate the data for this project.
- Commonsense Knowledge Graphs provide a semi-structured way of representing commonsense concepts. This structure gives a different viewpoint than other knowledge sources, however, what kinds of knowledge to represent and how best to incorporate them into modern neural networks remains an important question for research in this area. To tackle it, the team is currently building and releasing resources that explore different aspects of commonsense, such as information about social situations, mental states, and causal relationships.
- SWAG large-scale dataset for this task of grounded commonsense inference, unifying natural language inference, and physically grounded reasoning. The dataset consists of 113k multiple choice questions about grounded situations. Each question is a video caption from LSMDC or ActivityNet Captions, with four answer choices about what might happen next in the scene. The correct answer is the (real) video caption for the next event in the video; the three incorrect answers are adversarially generated and human-verified, so as to fool machines but not humans. The team’s goal is for SWAG to be a benchmark for evaluating grounded commonsense NLI and for learning representations.
How can Appen help you with your AI initiatives?
Like most machine learning projects, common sense AI often requires huge volumes of thoughtfully annotated training data. Appen’s global network of over 1 million skilled contractors operating in 130+ countries and 180+ languages and dialects means we can collect and label high volumes of image, text, speech, audio, and video data used to build and improve artificial intelligence systems.
Learn more about machine learning on Appen’s Machine Learning FAQ.