ISCA Medal for Scientific Achievements 2017

Fumitada Itakura

 

Fumitada Itakura

Prefessor, Nagoya University, Japan

 

 

Biography

Fumitada Itakura was born in Toyokawa, in Japan, in August 1940. He studied electronic engineering at Nagoya University,1958-1963. He advanced to its graduate school and studied information engineering such as statistical optical character recognition and time series analysis of cardiac rhythmicity. After finishing his master degree in 1965, he he has been working on speech signal processing using statistical approach. He received the Doctor of engineering from Nagoya University in 1971 for his work a statistical method for speech analysis and synthesis.
Itakura’s early work on speech spectral envelope and formant estimation using the maximum likelihood methods (1967) laid the ground work for much of the research work in speech signal processing in the three subsequent decades, ranging from vocoder designs for low bit-rate transmission to distance measures(Itakura-Saito distance) for speech pattern recognition. He introduced the concepts of the auto-regressive model and the partial auto-correlation to the speech area and developed a first mathematically tractable formulation of the speech recognition problem based on the minimum prediction residual principle, providing a solid framework for integrating speech analysis, representation, and pattern matching into a complete engineering system. His work on the autoregressive modeling of speech is used in almost every low- to-medium bit rate speech transmission system. The Line Spectral Pair (LSP) representation, which he developed in the 1975, is now used in nearly every cellular phone system and handset. Itakura and Hong Wang’s recent work in sub-band dereverberation algorithms has also become the foundation of many new breakthroughs. His singular and yet broad contributions to speech signal processing earned him the IEEE Morris Liebmann Award in 1986, the most prestigious Society Award from the IEEE Signal Processing Society in 1996, IEEE Fellow in 2003, the Purple Ribbon Medal from Japanese government in 2003 and the Distinguished Achievement and Contributions Award from IEICE in 2003. These technical achievement was performed mainly at Nagoya University(1965-68), the 4th research section of Musashino Electrical Communication Laboratory of NTT(1963-73、1975-1983) and Acoustic research laboratory(1973-75) of Bell Telephone laboratories, Murray Hill、 Nagoya university(1983-2003again), and Meijo University(2003-2011).

Keynote Speakers

James Allen

James Allen

Professor of Computer Science, University of Rochester
Associate Director of the Institute for Human and Machine Cognition in Pensacola Florida

 

 

 

 

Dialogue as Collaborative Problem Solving

I will describe the current status of a long-term effort at developing dialogue systems that go beyond simple task execution models to systems that involve collaborative problem solving. Such systems involve open-ended discussion and the tasks cannot be accomplished without extensive interaction (e.g., 10 turns or more). The key idea is that dialogue itself arises from an agent’s ability for collaborative problem solving (CPS). In such dialogues, agents may introduce, modify and negotiate goals; propose and discuss the merits possible paths to solutions; explicitly discuss progress as the two agents work towards the goals; and evaluate how well a goal was accomplished. To complicate matters, user utterances in such settings are much more complex than seen in simple task execution dialogues and requires full semantic parsing. A key question we have been exploring in the past few years is how much of dialogue can be accounted for by domain-independent mechanisms. I will discuss these issues and draw examples from a dialogue system we have built that, except for the specialized domain reasoning required in each case, uses the same architecture to perform three different tasks: collaborative blocks world planning, when the system and user build structures and may have differing goals; biocuration, in which a biologist and the system interact in order to build executable causal models of biological pathways; and collaborative composition, where the user and system collaborate to compose simple pieces of music.

 

Biography

James Allen is the John H Dessauer Professor of Computer Science at the University of Rochester, and Associate Director of the Institute for Human and Machine Cognition in Pensacola Florida, He is a Founding Fellow of the American Association for Artificial Intelligence (AAAI) and a Fellow of the Cognitive Science Society. He was editor-in-chief of the journal Computational Linguistics from 1983-1993, and authored the well-known textbook “Natural Language Understanding”. His research concerns defining computational models of intelligent collaborative and conversational agents, with a strong focus on the connection between knowledge, reasoning and language comprehension and dialog.

 

Catherine Pelachaud

 Catherine Pelachaud

Director of Research CNRS at ISIR, University of Pierre and Marie Curie

 

 

 

 

Conversing with social agents that smile and laugh

Our aim is to create virtual conversational partners. As such we have developed computational models to enrich virtual characters with socio-emotional capabilities that are communicated through multimodal behaviors. The approach we follow to build interactive and expressive interactants relies on theories from human and social sciences as well as data analysis and user-perception based design. We have explored specific social signals such as smile and laughter, capturing their variation in production but also their different communicative functions and their impact in human-agent interaction. Lately we have been interested in modeling agents with social attitudes. Our aim is to model how social attitudes color the multimodal behaviors of the agents. We have gathered a corpus of dyads that was annotated along two layers: social attitudes and nonverbal behaviors. By applying sequence mining methods we have extracted behavior patterns involved in the change of perception of an attitude. We are particularly interested in capturing the behaviors that correspond to a change of perception of an attitude. In this talk I will present the GRETA/VIB platform where our research is implemented.

Biography

Catherine Pelachaud is a Director of Research at CNRS in the laboratory ISIR, University of Pierre and Marie Curie. Her research interest includes embodied conversational agent, nonverbal communication (face, gaze, and gesture), expressive behaviors and socio-emotional agents. With her research team, she has been developing an interactive virtual agent platform GRETA that can display emotional and communicative behaviors. She has been involved and is still involved in several European projects related to believable embodied conversational agents, emotion and social behaviors. She is associate editors of several journals among which IEEE Transactions on Affective Computing, ACM Transactions on Interactive Intelligent Systems and Journal on Multimodal User Interfaces. She has co-edited several books on virtual agents and emotion-oriented systems. She participated to the organization of international conferences such as IVA, ACII and AAMAS, virtual agent track. She is recipient of the ACM – SIGAI Autonomous Agents Research Award 2015.

 

Björn Lindblom

 

Björn Lindblom

Professor emeritus University of Stockholm Sweden
Professor emeritus University of Texas at Austin USA

 

 

 

 

Re-inventing speech – the biological way

The mapping of the Speech Chain has so far been focused on the experimentally more accessible links – e g, acoustics – whereas the brain’s activity during speaking and listening has understandably received less attention. That state of affairs is about to change now thanks to the new sophisticated tools offered by brain imaging technology.
At present many key questions concerning human speech processes remain incompletely understood despite the significant research efforts of the past half century. As speech research goes neuro we could do with some better answers.
In this paper I will attempt to shed some light on some of the issues. I will do so by heeding the advice that Tinbergen once gave his fellow biologists on explaining behavior. I paraphrase: Nothing in biology makes sense unless you simultaneously look at it with the following questions at the back of your mind: How did it evolve? How is it acquired? How does it work here and now?
Applying the Tinbergen strategy to speech I will, in broad strokes, trace a path from the small and fixed innate repertoires of non-human primates to the open-ended vocal systems that humans learn today.
Such an agenda will admittedly identify serious gaps in our present knowledge but, importantly, it will also bring an overarching possibility:

It will strongly suggest the feasibility of bypassing the traditional linguistic operational approach to speech units and replacing it by a first-principles account anchored in biology.

I will argue that this is the road-map we need for a more profound understanding of the fundamental nature spoken language and for educational, medical and technological applications.

 

Biography

I began by studying for a medical degree but gradually my focus shifted to music and languages. Planning to make a living as a foreign language teacher I attended classes that happened to include two lectures on acoustic phonetics by Gunnar Fant at KTH in Stockholm. ‘Anyone interested in s summer job? We could use people with a linguistics background’. He then went on to describe the project. Although I cannot honestly say that I had understood much of the lectures, I volunteered and got lucky. I was completely blown away by the dynamics of the KTH lab and its research activities. This was the early sixties – the post-World War II era with lavish funding on communications and computer technology.

Later in life, I came across an anecdote about Richard Feynman, famous physicist who is said to have left the following formulation permanently on the blackboard of his office: ‘What I cannot create I do not understand!

Bingo! Was he referring to the acoustic theory of speech production and copy speech synthesis? In a way, he could have been. More importantly I believe, in this short phrase, he managed to capture the ultimate essence of good science – general knowledge based on first principles. It has been at the back of mind for over fifty years as I have studied how spoken language works on-line, how it is learned and how it came to be.

Applying the Feynman criterion to our own broad field shows that we still have a long way to go. There would be nothing wrong with embarking on that voyage equipped with the tools of Big Data and modern hi-tech neuroscience – on the contrary. But ultimately the quality of our applications – e g clinical, educational –will be a function of how well we really understand how humans do it.

End of sermon. Chop, chop.