Special session descriptions
These are short descriptions of the special sessions.
Acoustic Manifestations of Social Characteristics (AMSCh)
The objective of the workshop is to bring together an interdisciplinary group of researchers and engineers (phoneticians, psychologists and others) working at the interface of speech production, perception, attitude and social identity to explore questions around the human-human and human-machine interface. We aim at exploring and discussing the methodologies and results of multimodal experiments investigating speech variability and speech interpretation and its impact on modelling potential human-machine interaction. We hope to increase our knowledge of how these two complex systems of human sociality and language interact to understand the workings of these attribution processes, how and in what way stereotypes are exploited in speech perception, and just what kinds of cues trigger or undo social clichés that are based on speech.
Organizers:
- Melanie Weirich (Friedrich-Schiller Univ. Jena)
- Stefanie Jannedy (ZAS Berlin)
Contact:
Melanie Weirich ()
Computational models in child language acquisition
Modeling child language acquisition does no longer need to restrict itself to single component predictors, as with the advent of large data sets multiple component integration becomes possible. For weight modeling of multiple contributors to come closer to the complexity of child language acquisition, the field needs a synthesis at this very point. Computational scientists need to get closer to dirty real data and child language researchers collecting these data need to gain a better understanding of what is required to feed models with optimal in‐data. Together, they will be able to approach a more realistic description of the contribution of different factors to the complex process of child language acquisition and derive testable hypotheses from language acquisition models. This special session is a suitable forum to bring these two lines of research together and elicit fruitful discussions across the disciplinary boundaries.
Organizers:
- Christina Bergmann (Ecole Normale Supérieure)
- Emmanuel Dupoux (Ecole Normale Supérieure)
- Mats Wirén (Stockholm University)
- Lisa Gustavsson (Stockholm University)
- Ellen Marklund (Stockholm University)
- Gintare Grigonyté (Stockholm University)
Contact:
Iris-Corinna Schwarz ()
Data collection, transcription and annotation issues in child language acquisition settings
With the advent of more exhaustive recording possibilities such as for example daylong ecordings of a child’s language environment, new avenues and new challenges arise. Researchers within child language acquisition are suddenly faced with Big Data and need to adapt their analysis methods to meet the opportunities that come with this wealth of information. Wordbank <http://wordbank.stanford.edu/>, Databrary, <https://nyu.databrary.org/>the WEIRCLE <https://sites.google.com/site/weircle/>workshop and the Darcle <http://darcle.org/>-Research group are all recent international initiatives to build databases and collaboration across languages that can answer even future research questions within child language
acquisition.
Important aspects of the current discussion across research groups are the optimal breadth and depth of annotation in submitted materials, being limited by resources, and the reoccurring call for automatization of at least parts of the tedious and costly transcription process. This special session also incorporates methods of automatized data analysis
and lays the foundation of the other special session on child language acquisition at Interspeech 2017, discussing computational modeling of child language acquisition.
Organizers:
- Elika Bergelson (Duke University)
- Alejandrina Cristia (École Normale Supérieure)
- Tove Gerholm (Stockholm University)
- Kristina Nilsson Björkenstam (Stockholm University)
- Iris-Corinna Schwarz (Stockholm University)
Contact:
Iris-Corinna Schwarz ()
Digital Revolution for Under-resourced Languages (DigRev-URL)
This special session aims to accelerate the research activities for under-resourced languages, and to provide a forum for linguistic and speech technology researchers, as well as academic and industrial counterparts to share achievements and challenges in all areas related to natural language processing and spoken language processing of under-resourced languages, mainly used in South, Southeast and West Asia; North and Sub-Africa; North and Eastern Europe. Particularly, as Interspeech 2017 is held in Sweden, we highly encourage any submissions on under-resourced languages from Nordic, Uralic, and Slavic regions. The theme of this special session will focus towards digital revolution for underresourced languages, including but not limited to:
- Linguistic and cognitive studies
- Resources acquisition of text and speech corpora
- Zero resource speech technologies
- Cross-lingual/multi-lingual acoustic and lexical modeling
- Code-switched Lexical modeling
- Speech-to-text and speech-to-speech translation
- Speech recognition, text-to-speech synthesis, and dialog system
- Applications of spoken language technologies for under-resourced languages
Organizers:
- Sakriani Sakti (NAIST)
- Laurent Besacier (University of Grenoble Alpes)
- Oddur Kjartansson (Google Research)
- Kristiina Jokinen (University of Helsinki)
- Alexey Karpov (SPIIRAS)
- Charl van Heerden (TensorAI)
- Shyam Agrawal (Kamrah Institute of Information Technology)
Contact:
Sakriani Sakti ()
Incremental Processing and Responsive Behaviour
This special session sets out to unite a spread-out eld and provide a venue for joint discussion of incrementality itself rather than having to focus on the individual problems tackled. We invite all researchers interested in incrementality in all its aspects such as solving speci c problems in an incremental fashion, methodology and theory of incremental processing including its evaluation, psycholingiustic ndings and interaction of modalities, predictive/reactive frameworks and the integration of incrementality into systems, researching the inžuence on interaction in end-to-end systems (as well as human-human interaction), reactive/human-centered machine learning of reactive behaviour based on partial understanding, and moreThe special session sets out to unite the spread-out field in order to provide a venue for joint discussion of incrementality itself rather than having to focus on the individual problems tackled.
Organizers:
- Timo Baumann (Universität Hamburg)
- Tomas Hueber (CNRS)
- David Schlangen (Bielefeld University)
Contact:
Timo Baumann ()
Speech & Human-Robot Interaction
The topic ‘Speech & Human-Robot Interaction’ encompasses many research fields; e.g., those which investigate speech in interaction, notably the characteristics of situated dialogs, turn taking and accommodation; those interested in the relationship between speech and gesture; and those who are working to develop platforms for human-robot communication and interaction (e.g., a key topic for sociable humanoid robots), just to name a few. The session brings together researchers from many disciplines to share techniques and investigative methods as well as research findings. It provides a forum for researchers to explore the extent to which results concerning human communication are important for enabling social machines. Conversely, it provides an opportunity for researchers working with machines (e.g., computer vision; machine learning, robot design, etc.) to showcase developments in their field. The feedback between the two communities will be stimulating and rewarding.
Organizers:
- Gérard Bailly (Univ. Grenoble-Alpes)
- Gabriel Skantze (KTH)
- Samer Al Moubayed (Furhat Robotics)
Contact:
Gérard Bailly ()
Speech Technologies for Code-Switching in Multilingual Communities
The special session will have oral presentations and a poster session. Topics of
interest for this special session will include but are not limited to:
- Speech Recognition of code-switched speech
- Language Modeling for code-switched speech
- Speech Synthesis of code-switched text
- Speech Translation of code-switched languages
- Spoken Dialogue Systems that can handle code-switching
- Speech data and resources for code-switching
- Language Identification from speech
Organizers:
- Kalika Bali (Microsoft Research India)
- Alan W Black (Carnegie Mellon)
Contact:
Kalika Bali ()
State of the art in physics-based voice simulation
The physics of voice is very intricate, as it involves turbulent flows interacting with elastic solids that vibrate, deform and collide, generating acoustic waves which propagate through complex, time varying, contorted ducts. It is very easy to pronounce a simple sound like /a/, but when doing so we are not aware of the tremendous amount of physical phenomena that occur in our voice organ. On the one hand, in this special session we welcome papers on numerical approaches to voice production. These include finite element and finite difference methods, as well as multimodal and waveguide approaches, among others. On the other hand, papers on experimental mechanical replicas that can elucidate aspects related to voice generation will be also appreciated. The scope of the session is wide and covers from the flow-driven oscillation of the vocal folds to the generation of static sounds like vowels, nasals and fricatives, the production of dynamic sounds like plosives, diphthongs or syllables or considering expressivity effects that may be simulated relying on physical grounds.
Organizers:
- Sten Ternström (KTH)
- Oriol Guasch (Universitat Ramon Llull)
Contact:
Sten Ternström ()
The 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof)
With four years having passed since the first edition, two years since the last ASVspoof evaluation, and with a growing research community working to develop spoofing countermeasures, it is time for the next step. The INTERSPEECH 2017 special session on Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) will break the mould whereby countermeasures are developed using high-quality TTS and VC data, to include attacks of varied technical quality and including, importantly, more challenging ‘unknown’ spoofing conditions that might not be seen in countermeasure training data. Attention will be also paid to detector output consistency across unseen and seen replay attack configurations, possibly dealing with a mixture of replay, TTS and VC attacks. With this approach to ASVspoof 2017, the organizers hope to promote the development of generalised and practical spoofing countermeasures which have practical potential `in-the-wild’. In contrast to a recent competing challenge (https://www.idiap.ch/dataset/avspoof) which included a mixture of different spoofing attacks collected by the same research team under homogenous acoustic conditions, ASVspoof 2017 includes replay data collected through a crowd-sourcing exercise across multiple locations and replay-rerecording device combinations leading to more diverse attacks.
Organizers:
- Tomi Kinnunen (University of Eastern Finland)
- Nicholas Evans (EURECOM)
- Junichi Yamagishi (University of Edinburgh)
- Kong Aik Lee (Institute for Infocomm Research)
- Md Sahidullah (University of Eastern Finland)
- Massimiliano Todisco (EURECOM)
- Héctor Delgado (EURECOM)
Contact:
Tomi Kinnunen ()
The INTERSPEECH 2017 Computational Paralinguistics ChallengE (ComParE)
The Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) is an open Challenge dealing with states of speakers as manifested in their speech signal’s acoustic properties. There have so far been eight consecutive Challenges at INTERSPEECH since 2009 (cf. http://compare.openaudio.eu/) but there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena. Thus, we introduce three new, so far less touched upon tasks by the first of its kind Addressee Sub-Challenge directly contributing to Interspeech 2017’s theme Situated Interaction and indirectly contributing by the novel health and well-being prone Cold Sub-Challenge and Snoring Sub-Challenge. We further revisit sleepiness in the Drowsiness Sub-Challenge. Situated interaction benefits from knowing who is addressed in communication. Obviously, it will also benefit efficient interaction if an interface is aware of a user’s drowsiness or suffering from a cold. In addition, the value of the health-related tasks speaks for itself. The Snoring SubChallenge introduces for the first time also a purely non-speech, yet vocal inspiratory sound. A challenge is usually a great occasion to increase attention in the tasks and unite expertise from different areas to advance the field.
Organization:
- Björn W. Schuller (University of Passau)
- Stefan Steidl (Friedrich-Alexander-University)
- Anton Batliner (University of Passau)
- Elika Bergelson (Duke University)
- Jarek Krajewski (University of Wuppertal)
- Christoph Janott (Technische Universität München)
Contact:
Björn Schuller ()
Voice Attractiveness
This special session on ‘voice attractiveness’ would be the perfect setting for presenting research dealing with: perceived vocal preferences of men, women, and synthesized voices in welldefined social situations, acoustic correlates of voice attractiveness/pleasantness/charisma, interrelations between vocal features and individual physical and physiological characteristics, consequences for sexual selection, predictive value of voice for personality and for other psychological traits, experimental definition of aesthetic standards for the vocal signal, cultural variation of voice attractiveness/pleasantness and standards, the link between vocal pathology and vocal characteristics.
Organizers:
- Melissa Barkat-Defradas (University of Montpellier)
- Benjamin Weiss (TU Berlin)
- Jürgen Trouvain (Saarland University)
- John Ohala (ICSI)
Contact:
Melissa Barkat-Defradas ()