Science Blog - Can Computers Communicate Like People Do?

Press Release

NSF PR 97-14 - February 18, 1997

This material is available primarily for archival purposes. Telephone numbers or other contact information may be out of date; please see current contact information at media contacts.

Can Computers Communicate Like People Do?

Imagine two people at a table in a restaurant. Are they intimately leaning toward each other or sitting stiffly? Are they gazing dreamily at each other or avoiding eye contact? How are they gesturing? What are they saying and with what tone of voice? A mere glance and a snippet of conversation make it easy for a person to quite accurately guess the situation: is it lovers, friends having an argument, or a business meeting?

Humans have an ability that exceeds computers to process many different types of information -- images, words and intonation, posture and gestures, and written language -- and from these draw a conclusion. More "natural" interactions with "smarter" computers will make them accessible to a broader range of people (including people with disabilities) in a wider range of settings while being more useful in helping people sort through and synthesize the glut of available information.

A set of 15 awards in a new $10 million program led by the National Science Foundation -- Speech, Text, Image and Multimedia Advanced Technology Effort (STIMULATE) -- will fund university researchers investigating human communication and seeking to improve our interaction with computers. Four agencies, NSF, National Security Agency Office of Research and SIGINT Technology, the Central Intelligence Agency Office of Research and Development, and the Defense Advanced Research Projects Agency Information Technology Office are participating.

"This program goes well beyond the era of graphical interfaces with our computers," said Gary Strong, NSF program manager. "Perhaps some day we can interact with our computers like we interact with each other, even having `intelligent' computer assistants. STIMULATE has the potential for enormous impact on anyone who must process large amounts of data as well as for people with disabilities, the illiterate and others who might not be able to use a computer keyboard."

Funded projects include: a filter for TV, radio and newspaper accounts that will quickly provide a user with a synopsis; a computerized translation program; and a "humanoid" computer that will understand human communication including facial expressions, gestures and speech intonation. Others projects include speech recognition, understanding handwriting, and indexing and retrieving video.

Attachment: List of STIMULATE awardees

Attachment

STIMULATE Awards

Contact: Beth Gaston, NSF
(703) 305-1070

Midge Holmes, CIA
(703) 482-6686

Judith Emmel, NSA Public Affairs
(301) 688-6524

Alfred Aho, Shih Fu Chang and Kathleen McKeown
Columbia University
(212) 939-7004, [email protected]
An Environment for Illustrated Briefing and Follow-up Search Over Live Multimedia Information
Researchers seek to provide up-to-the-minute briefings on topics of interest, linking the user into a collection of related multimedia documents. On the basis of a user profile or query, the system will sort multimedia information to match the user's interests, retrieving video, images and text. The system will automatically generate a briefing on information extracted from the documents and determined to be of interest to the user.
James Allan and Allen Hanson
University of Massachusetts, Amherst
(413) 545-3240, [email protected]
Multi-Modal Indexing, Retrieval, and Browsing: Combining Content-Based Image Retrieval with Text Retrieval.
In the rapidly emerging area of multimedia information systems, effective indexing and retrieval techniques are critically important. In this project, the Center for Intelligent Information Retrieval will develop a system to index and retrieve collections including combinations of images, video and text.
Jaime Carbonell
Carnegie Mellon University
(412) 268-3064, [email protected]
Generalized Example-based Machine Translation
With example-based machine translation, computers search pre-translated texts for the closest match to each new sentence being translated. The goal of this project is to develop generalizations that will increase the accuracy of translations and reduce the size of the necessary data base.
Justine Cassell
MIT
(617) 253-4899, [email protected]
A Unified Framework for Multimodal Conversational Behaviors in Interactive Humanoid Agents
Humans communicate using speech with intonation and modulation, gestures, gaze and facial expression. Researchers will study how humans interact and develop a humanoid computer that can produce human-like communicative behaviors and comprehend complex communication on the part of humans.
Charles Fillmore
International Computer Science Institute, UC Berkeley
[email protected]
Tools for Lexicon Building
This project contains two parts: computational tools for language research and a thesaurus-like database of English words with definitions, how each word relates to other similar words and the range of each word's use. The tools and the database will be useful for researchers studying language processing and speech recognition.
James Flanagan, Casimir Kulikowski; Joseph Wilder Grigore Burdea and Ivan Marsic
Rutgers University
(908) 445-3443, [email protected]
Synergistic Multimodal Communication in Collaborative Multiuser Environments
Digital networking and distributed computing open opportunities for collaborative work by geographically-separated participants. But participants must communicate with one another, and with the machines they are using. The sensory dimensions of sight, sound and touch, used in combination, are natural modes for the human. This research establishes computer interfaces that simultaneously use the modalities of sight, sound and touch for human-machine communication. Emerging technologies for image processing, automatic speech recognition, and force- feedback tactile gloves support these multimodal interfaces.
James Glass, Stephanie Seneff and Victor Zue
MIT
(617) 253-1640, [email protected]
A Hierarchical Framework for Speech Recognition and Understanding
Most current speech recognizers use very simple representations of words and sentences. In this project, researchers aim to incorporate additional sources of linguistic information such as the syllable, phrase and intonation, into a system which can be used for understanding conversational speech. They plan to develop a model that can be applied to many languages.
Barbara Grosz and Stuart Shieber
Harvard University
(617) 495-3673, [email protected]
Human-Computer Communication and Collaboration
This project will develop methods for designing and building software that operates in collaboration with a human user, rather than as a passive servant. The aim is to apply theories of how people collaborate to the problem of the design of software, keeping in mind the differing capabilities of the human and computer collaborators.
Jerry Hobbs and Andrew Kehler
SRI International
(415) 859-2229, [email protected]
Multimodal Access to Spatial Data
This project will focus on enabling computers to understand what people are referring to as they use language and gesture while interacting with computer systems that provide access to geographical information. The results will enhance the capabilities and ease of use of future interactive systems, such as systems for travel planning and crisis management.
Fred Jelinek, Eric Brill, Sanjeev Khudanpur and David Yarowsky
Johns Hopkins University
(410) 516-7730, [email protected]
Exploiting Nonlocal and Syntactic Word Relationships in Language Models for Conversational Speech Recognition
Interacting with computers by speech or handwriting will make computers more accessible to people with disabilities and will allow users to carry on other tasks, like querying an on-line maintenance manual while performing mechanical repairs. To recognize speech or handwriting, most mechanical systems look only at nearby words to identify unknowns, while people doing the same tasks use the entire context. This project will focus on improving the recognition accuracy for spoken and handwritten language and will provide techniques applicable to all types of language modeling.
Kathleen McKeown and Judith Klavans
Columbia University
(212) 939-7118, [email protected]
Generating Coherent Summaries of On-Line Documents: Combining Statistical and Symbolic Techniques
This project will allow computers to analyze the text from a set of related documents across many subject areas and summarize the documents. Within the summary, similarities and differences between documents will be highlighted, indicating what each document is about. The research will be part of a digital library project emphasizing aids for reducing information overload.
Mari Ostendorf
Boston University
(617) 353-5430, [email protected]
Modeling Structure in Speech above the Segment for Spontaneous Speech Recognition
Current speech recognition technology leads to unacceptably high error rates of 30-50 percent on natural conversational or broadcast speech, in large part because current models were developed on read speech and do not account for variability in speaking style. This project aims to improve recognition performance by representing structure in speech at the level of the syllable, the phrase and with a different speaker.
Francis Quek and Rashid Ansari
University of Illinois at Chicago
(312) 996-5494, [email protected]
Gesture, Speech and Gaze in Discourse Management
This project involves experiments to discover and quantify the cues to human communication, including the role of gestures, speech intonation and gaze, and then develop computer programs capable of recognizing such cues in videos.
Elizabeth Shriberg and Andreas Stolcke
SRI International
(415) 859-3798, [email protected]
Modeling and Automatic Labeling of Hidden Word-Level Events in Speech
Most computer systems that process natural language require input that resembles written text, such as one would read in a newspaper. Spoken discourse, however, differs from text in ways that present challenges to computers. One challenge is that speech does not contain explicit punctuation such as periods to separate sentences. Another challenge is that when people speak naturally, they say things like "um" or "uh," "you-know" and other word-level events which interrupt the formal structure of sentences. This project will use word patterns as well as the timing and melody of speech to identify sentence boundaries and nongrammatical events to help computers better understand natural speech.
Yao Wang and Edward Wong
Brooklyn Polytechnic University
(718) 260-3469, [email protected]
Video Scene Segmentation and Classification Using Motion and Audio Information
A video sequence includes lots of different types of information, including speech, text, audio, color patterns and shapes in individual frames, movement of objects as shown by changes between frames. Humans can quickly interpret information; computer understanding of video is still primitive. The aim of this project is to develop new theory and techniques for scene segmentation and classification in a video sequence, which will have direct applications in information indexing and retrieval in multimedia databases, spotting and tracking of special events in surveillance video, and video editing.

National Science Foundation
Office of Legislative and Public Affairs
4201 Wilson Boulevard
Arlington, Virginia 22230, USA
Tel: 703-292-8070
FIRS: 800-877-8339 | TDD: 703-292-5090