Technical Program
For the most up-to-date version of the program, see the conference app.
Searchable version available here
(late changes not incorporated)
Updated: 10 August.
Date |
Time |
Room |
Session name |
Type |
Papercode |
Paper ID |
Title |
Authors |
2017-08-21 | 11:00-11:20 | A2 | Multimodal Paralinguistics | O | Mon-O-1-2-1 | 98 | Multimodal markers of persuasive speech : designing a Virtual Debate Coach | Volha Petukhova, Manoj Raju, Harry Bunt |
2017-08-21 | 11:20-11:40 | A2 | Multimodal Paralinguistics | O | Mon-O-1-2-2 | 179 | Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder | Daniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth Narayanan, Ruth Grossman |
2017-08-21 | 11:40-12:00 | A2 | Multimodal Paralinguistics | O | Mon-O-1-2-3 | 1278 | A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors | Alec Burmania, Carlos Busso |
2017-08-21 | 12:00-12:20 | A2 | Multimodal Paralinguistics | O | Mon-O-1-2-4 | 999 | An information theoretic analysis of the temporal synchrony between head gestures and prosodic patterns in spontaneous speech | Gaurav Fotedar, Prasanta Ghosh |
2017-08-21 | 12:20-12:40 | A2 | Multimodal Paralinguistics | O | Mon-O-1-2-5 | 1088 | Multimodal Prediction of Affect Dimensions Fusing Multiple Regression Techniques | Dongyan Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Xinguo Yu, Minghui Dong, Haizhou Li |
2017-08-21 | 12:40-13:00 | A2 | Multimodal Paralinguistics | O | Mon-O-1-2-6 | 1329 | Co-production of speech and pointing gestures in clear and perturbed interactive tasks: multimodal designation strategies | Marion Dohen, Benjamin Roustan |
2017-08-21 | 14:30-14:50 | A2 | Pathological Speech and Language | O | Mon-O-2-2-1 | 378 | Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis | Amir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen |
2017-08-21 | 14:50-15:10 | A2 | Pathological Speech and Language | O | Mon-O-2-2-2 | 626 | Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study | Duc Le, Keli Licata, Emily Mower Provost |
2017-08-21 | 15:10-15:30 | A2 | Pathological Speech and Language | O | Mon-O-2-2-3 | 819 | Evaluation of the neurological state of people with Parkinson’s disease using i-vectors | Nicanor Garcia, Juan Rafael Orozco-Arroyave, Luis Fernando D’Haro, Najim Dehak, Elmar Noeth |
2017-08-21 | 15:30-15:50 | A2 | Pathological Speech and Language | O | Mon-O-2-2-4 | 138 | Objective Severity Assessment From Disordered Voice Using Estimated Glottal Airflow | Yu-Ren Chien, Michal Borsky, Jon Gudnason |
2017-08-21 | 15:50-16:10 | A2 | Pathological Speech and Language | O | Mon-O-2-2-5 | 1007 | Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-based Approach | Florian Pokorny, Björn Schuller, Peter Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter |
2017-08-21 | 16:10-16:30 | A2 | Pathological Speech and Language | O | Mon-O-2-2-6 | 1078 | Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease | Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave, Elmar Noeth |
2017-08-21 | 11:00-11:20 | Aula Magna | Conversational Telephone Speech Recognition | O | Mon-O-1-1-1 | 1513 | Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features | William Hartmann, Roger Hsiao, Tim Ng, Jeff Ma, Francis Keith, Man-hung Siu |
2017-08-21 | 11:20-11:40 | Aula Magna | Conversational Telephone Speech Recognition | O | Mon-O-1-1-2 | 145 | Student-teacher training with diverse decision tree ensembles | Jeremy H. M. Wong, Mark Gales |
2017-08-21 | 11:40-12:00 | Aula Magna | Conversational Telephone Speech Recognition | O | Mon-O-1-1-3 | 460 | Embedding-Based Speaker Adaptive Training of Deep Neural Networks | Xiaodong Cui, Vaibhava Goel, George Saon |
2017-08-21 | 12:00-12:20 | Aula Magna | Conversational Telephone Speech Recognition | O | Mon-O-1-1-4 | 1058 | Improving Deliverable Speech-to-text Systems with Multilingual Knowledge Transfer | Jeff Ma, Francis Keith, Owen Kimball, Man-hung Siu, Tim Ng |
2017-08-21 | 12:20-12:40 | Aula Magna | Conversational Telephone Speech Recognition | O | Mon-O-1-1-5 | 405 | English Conversational Telephone Speech Recognition by Humans and Machines | George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall |
2017-08-21 | 12:40-13:00 | Aula Magna | Conversational Telephone Speech Recognition | O | Mon-O-1-1-6 | 1544 | Comparing Human and Machine Errors in Conversational Speech Transcription | Andreas Stolcke, Jasha Droppo |
2017-08-21 | 14:30-14:50 | Aula Magna | Neural Networks for Language Modeling | O | Mon-O-2-1-1 | 1310 | Approaches for Neural-Network Language Model Adaptation | Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar |
2017-08-21 | 14:50-15:10 | Aula Magna | Neural Networks for Language Modeling | O | Mon-O-2-1-2 | 818 | A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models | Youssef Oualil, Dietrich Klakow |
2017-08-21 | 15:10-15:30 | Aula Magna | Neural Networks for Language Modeling | O | Mon-O-2-1-3 | 513 | Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition | Xie Chen, Anton Ragni, Xunying Liu, Mark Gales |
2017-08-21 | 15:30-15:50 | Aula Magna | Neural Networks for Language Modeling | O | Mon-O-2-1-4 | 564 | FAST NEURAL NETWORK LANGUAGE MODEL LOOKUPS AT N-GRAM SPEEDS | Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran |
2017-08-21 | 15:50-16:10 | Aula Magna | Neural Networks for Language Modeling | O | Mon-O-2-1-5 | 723 | Empirical Exploration of Novel Architectures and Objectives for Language Models | Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon |
2017-08-21 | 16:10-16:30 | Aula Magna | Neural Networks for Language Modeling | O | Mon-O-2-1-6 | 1442 | Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks | Karel Beneš, Murali Baskar, Lukáš Burget |
2017-08-21 | 11:00-11:20 | B4 | Dereverberation, Echo Cancellation and Speech | O | Mon-O-1-4-1 | 461 | Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing | Peter Guzewich, Stephen Zahorian |
2017-08-21 | 11:20-11:40 | B4 | Dereverberation, Echo Cancellation and Speech | O | Mon-O-1-4-2 | 46 | Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance | Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt |
2017-08-21 | 11:40-12:00 | B4 | Dereverberation, Echo Cancellation and Speech | O | Mon-O-1-4-3 | 1084 | A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems | Jan Franzen, Tim Fingscheidt |
2017-08-21 | 12:00-12:20 | B4 | Dereverberation, Echo Cancellation and Speech | O | Mon-O-1-4-4 | 78 | Speech Enhancement Based on Harmonic Estimation combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients | Dongmei Wang, John H.L. Hansen |
2017-08-21 | 12:20-12:40 | B4 | Dereverberation, Echo Cancellation and Speech | O | Mon-O-1-4-5 | 771 | Improving speech intelligibility in binaural hearing aids by estimating a time-frequency mask with a weighted least squares classifier | David Ayllon, Roberto Gil-Pita, Manuel Rosa-Zurera |
2017-08-21 | 12:40-13:00 | B4 | Dereverberation, Echo Cancellation and Speech | O | Mon-O-1-4-6 | 858 | Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant | Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee |
2017-08-21 | 14:30-14:50 | B4 | Speech Analysis and Representation 1 | O | Mon-O-2-4-1 | 1179 | Phone Classification using a Non-Linear Manifold with Broad Phone Class Dependent DNNs | Linxue Bai, Peter Jancovic, Martin Russell, Philip Weber, Steve Houghton |
2017-08-21 | 14:50-15:10 | B4 | Speech Analysis and Representation 1 | O | Mon-O-2-4-2 | 70 | An Investigation of Crowd Speech for Room Occupancy Estimation | Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Le |
2017-08-21 | 15:10-15:30 | B4 | Speech Analysis and Representation 1 | O | Mon-O-2-4-3 | 726 | Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals | Karthika Vijayan, Jitendra Dhiman, Chandra Sekhar Seelamantula |
2017-08-21 | 15:30-15:50 | B4 | Speech Analysis and Representation 1 | O | Mon-O-2-4-4 | 316 | Musical Speech: a New Methodology for Transcribing Speech Prosody | Alexsandro Meireles, Antônio Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros |
2017-08-21 | 15:50-16:10 | B4 | Speech Analysis and Representation 1 | O | Mon-O-2-4-5 | 1074 | Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training | K S Nataraj, Prem C. Pandey, Hirak Dasgupta |
2017-08-21 | 16:10-16:30 | B4 | Speech Analysis and Representation 1 | O | Mon-O-2-4-6 | 389 | Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source | Tom Bäckström |
2017-08-21 | 11:00-11:20 | C6 | Acoustic and Articulatory Phonetics | O | Mon-O-1-6-1 | 1601 | Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: an rt-MRI Study | Zainab Hermes, Marissa Barlaz, Ryan Shosted, Zhi-Pei Liang, Brad Sutton |
2017-08-21 | 11:20-11:40 | C6 | Acoustic and Articulatory Phonetics | O | Mon-O-1-6-2 | 1039 | Glottal opening and strategies of production of fricatives | Benjamin Elie, Yves Laprie |
2017-08-21 | 11:40-12:00 | C6 | Acoustic and Articulatory Phonetics | O | Mon-O-1-6-3 | 1292 | Acoustics and articulation of medial versus final coronal stop gemination contrasts in Moroccan Arabic | Mohamed Yassine Frej, Christopher Carignan, Catherine T. Best |
2017-08-21 | 12:00-12:20 | C6 | Acoustic and Articulatory Phonetics | O | Mon-O-1-6-4 | 1553 | How are four-level length distinctions produced? Evidence from Moroccan Arabic | Giuseppina Turco, Karim Shoul, Rachid Ridouane |
2017-08-21 | 12:20-12:40 | C6 | Acoustic and Articulatory Phonetics | O | Mon-O-1-6-5 | 1304 | Nature of contrast and coarticulation: Evidence from Mizo tones and Assamese vowel harmony | Indranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah |
2017-08-21 | 12:40-13:00 | C6 | Acoustic and Articulatory Phonetics | O | Mon-O-1-6-6 | 1552 | Cancelled: Vowels in the Barunga variety of North Australian Kriol | Caroline Jones, Katherine Demuth, Weicong Li, Andre Almeida |
2017-08-21 | 14:30-14:50 | C6 | Perception of Dialects and L2 | O | Mon-O-2-6-1 | 1031 | End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives | Sucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini |
2017-08-21 | 14:50-15:10 | C6 | Perception of Dialects and L2 | O | Mon-O-2-6-2 | 18 | Dialect perception by older children | Ewa Jacewicz, Robert A. Fox |
2017-08-21 | 15:10-15:30 | C6 | Perception of Dialects and L2 | O | Mon-O-2-6-3 | 207 | Perception of non-contrastive variations in American English by Japanese learners: Flaps are less favored than stops | Kiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima |
2017-08-21 | 15:30-15:50 | C6 | Perception of Dialects and L2 | O | Mon-O-2-6-4 | 1150 | How L1 speakers perceive L2 prosody: The cumulative effect of intonation, rhythm, and speech rate on accentedness and comprehensibility ratings | Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts |
2017-08-21 | 15:50-16:10 | C6 | Perception of Dialects and L2 | O | Mon-O-2-6-5 | 763 | Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese | IZUMI TAKIGUCHI |
2017-08-21 | 16:10-16:30 | C6 | Perception of Dialects and L2 | O | Mon-O-2-6-6 | 1210 | A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners | Yuanyuan Zhang, Hongwei Ding |
2017-08-21 | 11:00-11:30 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-1 | 1111 | The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection | Tomi Kinnunen, Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, Kong Aik Lee |
2017-08-21 | 11:30-11:45 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-2 | 450 | Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge | Roberto Javier Font Ruiz, María José Cano Vicente, Juan Manuel Espín López |
2017-08-21 | 11:45-12:00 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-3 | 1362 | Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection | Hemant Patil, Madhu Kamble, Tanvina Patel, Meet Soni |
2017-08-21 | 12:00-12:15 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-4 | 906 | Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion | Weicheng Cai, Danwei Cai, Wenbo Liu, Ming Li |
2017-08-21 | 12:15-12:30 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-5 | 930 | Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features | Sarfaraz Jelil, Rohan Kumar Das, S R Mahadeva Prasanna, Rohit Sinha |
2017-08-21 | 12:30-12:45 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-6 | 776 | Audio Replay Attack Detection with High-Frequency Features | Marcin Witkowski, Stanisław Kacprzak, Piotr Żelasko, Konrad Kowalczyk, Jakub Gałka |
2017-08-21 | 12:45-13:00 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1 | SS | Mon-SS-1-8-7 | 304 | Feature selection based on CQCCs for Automatic Speaker Verification spoofing | Wang Xianliang, Xiao Yanhong, Zhu Xuan |
2017-08-21 | 14:30-14:45 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-1 | 360 | Audio replay attack detection with deep learning frameworks | Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexandr Kozlov, Oleg Kudashev, Vadim Shchemelinin |
2017-08-21 | 14:45-15:00 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-2 | 1246 | Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017 | Zhe Ji, Zhi-Yi Li, Peng Li, Maobo An, Shengxiang Gao, Dan Wu, Faru Zhao |
2017-08-21 | 15:00-15:15 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-3 | 456 | A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification | Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng |
2017-08-21 | 15:15-15:30 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-4 | 1377 | Replay Attack Detection using DNN for Channel Discrimination | Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland |
2017-08-21 | 15:30-15:45 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-5 | 1085 | ResNet and Model Fusion for Automatic Spoofing Detection | Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu |
2017-08-21 | 15:45-16:00 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-6 | 676 | SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 | K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Anil Kumar Vuppala |
2017-08-21 | 16:00-16:30 | D8 | Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2 | SS | Mon-SS-2-8-7 | Discussion | Nicholas Evans, Kong Aik Lee | |
2017-08-21 | 11:00-11:20 | E10 | Multimodal and Articulatory Synthesis | O | Mon-O-1-10-1 | 325 | The Influence of Synthetic Voice on the Evaluation of a Virtual Character | Joao Cabral, Benjamin Cowan, Katja Zibrek, Rachel McDonnell |
2017-08-21 | 11:20-11:40 | E10 | Multimodal and Articulatory Synthesis | O | Mon-O-1-10-2 | 900 | Articulatory Text-to-Speech Synthesis using the Digital Waveguide Mesh driven by a Deep Neural Network | Amelia Gully, Takenori Yoshimura, Damian Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda |
2017-08-21 | 11:40-12:00 | E10 | Multimodal and Articulatory Synthesis | O | Mon-O-1-10-3 | 936 | An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis | Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer |
2017-08-21 | 12:00-12:20 | E10 | Multimodal and Articulatory Synthesis | O | Mon-O-1-10-4 | 1410 | VCV Synthesis using Task Dynamics to Animate a Factor-based Articulatory Model | Rachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth Narayanan |
2017-08-21 | 12:20-12:40 | E10 | Multimodal and Articulatory Synthesis | O | Mon-O-1-10-5 | 1438 | Beyond the Listening Test: An interactive approach to TTS Evaluation | Joseph Mendelson, Matthew Aylett |
2017-08-21 | 12:40-13:00 | E10 | Multimodal and Articulatory Synthesis | O | Mon-O-1-10-6 | 1762 | Integrating Articulatory Information into Deep Learning-Based Text-to-Speech Synthesis | Beiming Cao, Myungjong Kim, Jan van Santen, Ted Mau, Jun Wang |
2017-08-21 | 14:30-14:50 | E10 | Far-field Speech Recognition | O | Mon-O-2-10-1 | 1510 | Generation of simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home | Chanwoo Kim, Ananya Misra, K.K. Chin, Thad Hughes, Arun Narayanan, Tara Sainath, Michiel Bacchiani |
2017-08-21 | 14:50-15:10 | E10 | Far-field Speech Recognition | O | Mon-O-2-10-2 | 733 | Neural network-based spectrum estimation for online WPE dereverberation | Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani |
2017-08-21 | 15:10-15:30 | E10 | Far-field Speech Recognition | O | Mon-O-2-10-3 | 852 | Factorial Modeling for Effective Suppression of Directional Noise | Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven Rennie |
2017-08-21 | 15:30-15:50 | E10 | Far-field Speech Recognition | O | Mon-O-2-10-4 | 853 | On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones | Yan-Hui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee |
2017-08-21 | 15:50-16:10 | E10 | Far-field Speech Recognition | O | Mon-O-2-10-5 | 234 | Acoustic Modeling for Google Home | Bo Li, Tara Sainath, Joe Caroselli, Arun Narayanan, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, K.K. Chin, Khe Chai Sim, Ron Weiss, Kevin Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchell Weintraub, Erik McDermott, Richard Rose, Matt Shannon |
2017-08-21 | 16:10-16:30 | E10 | Far-field Speech Recognition | O | Mon-O-2-10-6 | 398 | On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition | Seyedmahdad Mirsamadi, John H.L. Hansen |
2017-08-21 | 11:00-13:00 | E306 | Show & Tell 1 | S&T | Mon-S&T-1-A-1 | 10034 | Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora | Alp Oktem, Mireia Farrús, Leo Wanner |
2017-08-21 | 11:00-13:00 | E306 | Show & Tell 1 | S&T | Mon-S&T-1-A-2 | 10048 | ChunkitApp: Investigating the relevant units of online speech processing | Svetlana Vetchinnikova, Anna Mauranen, Nina Mikusova |
2017-08-21 | 11:00-13:00 | E306 | Show & Tell 1 | S&T | Mon-S&T-1-A-3 | 10049 | Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control | Markus Jochim |
2017-08-21 | 11:00-13:00 | E306 | Show & Tell 1 | S&T | Mon-S&T-1-A-4 | 10051 | HomeBank: A repository for long-form real-world audio recordings of children | Anne Warlaumont, Mark vanDam, Elika Bergelson, Alejandrina Cristia |
2017-08-21 | 11:00-13:00 | E306 | Show & Tell 1 | S&T | Mon-S&T-1-A-5 | 10052 | A system for real-time collaborative transcription correction | Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair |
2017-08-21 | 11:00-13:00 | E306 | Show & Tell 1 | S&T | Mon-S&T-1-A-6 | 10058 | MoPAReST – Mobile Phone Assisted Remote Speech Therapy Platform | Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu |
2017-08-21 | 14:30-16:30 | E306 | Show & Tell 1 | S&T | Mon-S&T-2-A-1 | 10034 | Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora | Alp Oktem, Mireia Farrús, Leo Wanner |
2017-08-21 | 14:30-16:30 | E306 | Show & Tell 1 | S&T | Mon-S&T-2-A-2 | 10048 | ChunkitApp: Investigating the relevant units of online speech processing | Svetlana Vetchinnikova, Anna Mauranen, Nina Mikusova |
2017-08-21 | 14:30-16:30 | E306 | Show & Tell 1 | S&T | Mon-S&T-2-A-3 | 10049 | Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control | Markus Jochim |
2017-08-21 | 14:30-16:30 | E306 | Show & Tell 1 | S&T | Mon-S&T-2-A-4 | 10051 | HomeBank: A repository for long-form real-world audio recordings of children | Anne Warlaumont, Mark vanDam, Elika Bergelson, Alejandrina Cristia |
2017-08-21 | 14:30-16:30 | E306 | Show & Tell 1 | S&T | Mon-S&T-2-A-5 | 10052 | A system for real-time collaborative transcription correction | Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair |
2017-08-21 | 14:30-16:30 | E306 | Show & Tell 1 | S&T | Mon-S&T-2-A-6 | 10058 | MoPAReST – Mobile Phone Assisted Remote Speech Therapy Platform | Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu |
2017-08-21 | 11:00-13:00 | E397 | Show & Tell 2 | S&T | Mon-S&T-1-B-1 | 10035 | An apparatus to investigate western opera singing skill learning using performance and result biofeedback, and measuring its neural correlates | Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte |
2017-08-21 | 11:00-13:00 | E397 | Show & Tell 2 | S&T | Mon-S&T-1-B-2 | 10044 | PercyConfigurator — Perception Experiments as a Service | Christoph Draxler |
2017-08-21 | 11:00-13:00 | E397 | Show & Tell 2 | S&T | Mon-S&T-1-B-3 | 10045 | System for speech transcription and post-editing in Microsoft Word | Askars Salimbajevs, Indra Ikauniece |
2017-08-21 | 11:00-13:00 | E397 | Show & Tell 2 | S&T | Mon-S&T-1-B-4 | 10047 | Emojive! Collecting Emotion Data from Speech and Facial Expression using Mobile Game App | Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung |
2017-08-21 | 11:00-13:00 | E397 | Show & Tell 2 | S&T | Mon-S&T-1-B-5 | 10059 | Mylly – The Mill: A new platform for processing speech and text corpora easily and efficiently | Mietta Lennes, Jussi Piitulainen, Martin Matthiesen |
2017-08-21 | 11:00-13:00 | E397 | Show & Tell 2 | S&T | Mon-S&T-1-B-6 | 10040 | Visual Learning 2: Pronunciation app using ultrasound, video, and MRI | Kyori Suzuki, Ian Wilson, Hayato Watanabe |
2017-08-21 | 14:30-16:30 | E397 | Show & Tell 2 | S&T | Mon-S&T-2-B-1 | 10035 | An apparatus to investigate western opera singing skill learning using performance and result biofeedback, and measuring its neural correlates | Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte |
2017-08-21 | 14:30-16:30 | E397 | Show & Tell 2 | S&T | Mon-S&T-2-B-2 | 10044 | PercyConfigurator — Perception Experiments as a Service | Christoph Draxler |
2017-08-21 | 14:30-16:30 | E397 | Show & Tell 2 | S&T | Mon-S&T-2-B-3 | 10045 | System for speech transcription and post-editing in Microsoft Word | Askars Salimbajevs, Indra Ikauniece |
2017-08-21 | 14:30-16:30 | E397 | Show & Tell 2 | S&T | Mon-S&T-2-B-4 | 10047 | Emojive! Collecting Emotion Data from Speech and Facial Expression using Mobile Game App | Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung |
2017-08-21 | 14:30-16:30 | E397 | Show & Tell 2 | S&T | Mon-S&T-2-B-5 | 10059 | Mylly – The Mill: A new platform for processing speech and text corpora easily and efficiently | Mietta Lennes, Jussi Piitulainen, Martin Matthiesen |
2017-08-21 | 14:30-16:30 | E397 | Show & Tell 2 | S&T | Mon-S&T-2-B-6 | 10040 | Visual Learning 2: Pronunciation app using ultrasound, video, and MRI | Kyori Suzuki, Ian Wilson, Hayato Watanabe |
2017-08-21 | 11:00-11:20 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-10 | Introduction | Kalika Bali, Alan W Black | |
2017-08-21 | 11:20-11:40 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-1 | 301 | Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech | Emre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk Van den Heuvel, David Van Leeuwen |
2017-08-21 | 11:40-12:00 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-2 | 391 | Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection | Emre Yilmaz, Henk van den Heuvel, David van Leeuwen |
2017-08-21 | 12:00-12:20 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-3 | 1198 | Jee haan, I’d like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog | Vikram Ramanarayanan, David Suendermann-Oeft |
2017-08-21 | 12:20-12:40 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-4 | 1244 | On building mixed lingual speech synthesis systems | SaiKrishna Rallabandi, Alan W Black |
2017-08-21 | 12:40-13:00 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-5 | 1259 | Speech Synthesis for Mixed-Language Navigation Instructions | Khyathi Chandu, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W Black |
2017-08-21 | 14:30-14:50 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-6 | 1373 | Addressing Code-Switching in French/Algerian Arabic Speech | Amazouz Djegdjiga, Martine Adda-Decker, Lori Lamel |
2017-08-21 | 14:50-15:10 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-7 | 1429 | Metrics for modeling code-switching across corpora | Wally Guzman, Joseph Ricard, Jacqueline Serigos, Barbara Bullock, Almeida Jacqueline Toribio |
2017-08-21 | 15:10-15:30 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-8 | 1437 | Synthesising isiZulu-English code-switch bigrams using word embeddings | Ewald Van der westhuizen, Thomas Niesler |
2017-08-21 | 15:30-15:50 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-9 | 1663 | Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching | Victor Soto, Julia Hirschberg |
2017-08-21 | 15:50-16:30 | F11 | Special Session: Speech Technology for Code-Switching in Multilingual Communities | SS | Mon-SS-1-11-11 | Discussion | Kalika Bali, Alan W Black | |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-1 | 67 | Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system | Masanori Morise, Kenji Ozawa, Genta Miayashita |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-2 | 210 | Robust Source-Filter Separation of Speech Signal in the Phase Domain | Erfan Loweimi, Jon Barker, Oscar Saz Torralba, Thomas Hain |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-3 | 382 | A Time-Warping Pitch Tracking Algorithm considering fast f0 changes | Simon Stone, Peter Steiner, Peter Birkholz |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-4 | 436 | A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and FO estimation | Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-5 | 624 | Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments | Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-6 | 678 | Time-domain envelope modulating the noise component of excitation in a continuous residual-based vocoder for statistical parametric speech synthesis | Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-7 | 781 | Wavelet Speech Enhancement Based on Robust Principal Component Analysis | Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-8 | 790 | Vowel Onset Point Detection using Sonority Information | Bidisha Sharma, S R Mahadeva Prasanna |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-9 | 1232 | Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies | Unto K. Laine |
2017-08-21 | 11:00-13:00 | Poster 1 | Speech Analysis and Representation 2 | P | Mon-P-1-1-10 | 1681 | Learning the mapping function from voltage amplitudes to sensor positions in 3D-EMA using deep neural networks | Christian Kroos, Mark D. Plumbley |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-1 | 2 | Factors Affecting the Intelligibility of Low-pass Filtered Speech | Lei Wang, Fei Chen |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-2 | 4 | Phonetic Restoration of Temporally Reversed Speech | Shi-yu Wang, Fei Chen |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-3 | 83 | Simultaneous articulatory and acoustic distortion in L1 and L2 Listening: Locally time-reversed “fast” speech | Mako Ishida |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-4 | 618 | Lexically Guided Perceptual Learning in Mandarin Chinese | L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-5 | 948 | The effect of spectral profile on the intelligibility of emotional speech in noise | Chris Davis, Chee Seng Chong, Jeesun Kim |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-6 | 1517 | Whether long-term tracking of speech rate affects perception depends on who is talking | Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-7 | 1719 | Emotional thin-slicing: a proposal for a short- and long-term division of emotional speech | Daniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-8 | 1735 | Predicting epenthetic vowel quality from acoustics | Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-9 | 282 | The effect of spectral tilt on size discrimination of voiced speech sounds | Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy Patterson |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-10 | 532 | Misperceptions of the emotional content of natural and vocoded speech in a car | Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-11 | 375 | The relative cueing power of F0 and duration in German prominence perception | Oliver Niebuhr, Jana Winkler |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-12 | 570 | Perception and acoustics of vowel nasality in Brazilian Portuguese | Luciana Marques, Rebecca Scarborough |
2017-08-21 | 14:30-16:30 | Poster 1 | Speech Perception | P | Mon-P-2-1-13 | 1742 | Sociophonetic realizations guide subsequent lexical access | Jonny Kim, Katie Drager |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-1 | 74 | Multilingual I-Vector based Statistical Modeling for Music Genre Classification | Jia Dai, Wei Xue, Wenju Liu |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-2 | 309 | Indoor/Outdoor Audio Classification using Foreground Speech Segmentation | Banriskhem K. Khonglah, Deepak K T, S R Mahadeva Prasanna |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-3 | 440 | Attention based CLDNNs for short-duration acoustic scene classification | Jinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-4 | 746 | Frame-wise dynamic threshold based polyphonic acoustic event detection | Xianjun Xia, Roberto Togneri, Ferdous Sohel, David Huang |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-5 | 792 | Enhanced Feature Extraction for Speech Detection in Media Audio | Inseon Jang, ChungHyun Ahn, Jeongil Seo, Younseon Jang |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-6 | 982 | AUDIO CLASSIFICATION USING CLASS-SPECIFIC LEARNED DESCRIPTORS | Sukanya Sonowal, Tushar Sandhan, Inkyu Choi, Nam Soo Kim |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-7 | 1160 | Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery | Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-8 | 1238 | Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks | Matthias Zöhrer, Franz Pernkopf |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-9 | 1386 | Montreal Forced Aligner: trainable text-speech alignment using Kaldi | Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger |
2017-08-21 | 11:00-13:00 | Poster 2 | Speech and Audio Segmentation and Classification 2 | P | Mon-P-1-2-10 | 1388 | A robust Voiced/Unvoiced phoneme classification from whispered speech using the “color” of whispered phonemes and Deep Neural Network | Nisha Meenakshi, Prasanta Ghosh |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-1 | 742 | Critical articulators identification from RT-MRI of the vocal tract | Samuel Silva, António Teixeira |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-2 | 1580 | Semantic Edge Detection for Tracking Vocal Tract Air-tissue Boundaries in Real-time Magnetic Resonance Images | Krishna Somandepalli, Asterios Toutios, Shrikanth Narayanan |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-3 | 1016 | Vocal Tract Airway Tissue Boundary Tracking for rtMRI using Shape and Appearance Priors | Sasan Asadiabadi, Engin Erzin |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-4 | 201 | An objective critical distance measure based on the relative level of spectral valley | Ananthapadmanabha T V, Ramakrishnan Angarai Ganesan, Shubham Sharma |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-5 | 608 | Database of volumetric and real-time vocal tract MRI for speech science | Tanner Sorensen, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna Nayak, Shrikanth Narayanan |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-6 | 1267 | The Influence on Realization and Perception of Lexical Tones from Affricate’s Aspiration | Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-7 | 122 | Audiovisual recalibration of vowel categories | Matthias Franken, Frank Eisner, Jan-Mathijs Schoffelen, Dan Acheson, Peter Hagoort, James McQueen |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-8 | 194 | The effect of gesture on persuasive speech | Judith Peters, Marieke Hoetjes |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-9 | 1069 | Auditory-visual integration of talker gender in Cantonese tone perception | Wei Lai |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-10 | 139 | Event-related potentials associated with somatosensory effect in audio-visual speech perception | Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent Gracco |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-11 | 353 | When a dog is a cat and how it changes your pupil size: Pupil dilation in response to information mismatch | Lena F. Renner, Marcin Wlodarczak |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-12 | 1236 | Cross-modal Analysis between Phonation Differences and Texture Images based on Sentiment Correlations | Win Thuzar Kyaw, Yoshinori Sagisaka |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-13 | 48 | Wireless neck-surface accelerometer and microphone on flex circuit with application to noise-robust monitoring of Lombard speech | Daryush Mehta, Patrick Chwalek, Thomas Quatieri, Laura Brattain |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-14 | 1371 | Video-based tracking of jaw movements during speech: Preliminary results and future directions | Andrea Bandini, Aravind Namasivayam, Yana Yunusova |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-15 | 1374 | Accurate Synchronization of Speech and EGG signal using Phase Information | Sunil Kumar S B, K Sreenivasa Rao, Tanumay Mandal |
2017-08-21 | 14:30-16:30 | Poster 2 | Speech Production and Perception | P | Mon-P-2-2-16 | 1065 | The acquisition of focal lengthening in Stockholm Swedish | Anna Sara Hexeberg Romøren, Aoju Chen |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-1 | 111 | Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition | Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-2 | 505 | CTC Training of Multi-Phone Acoustic Models for Speech Recognition | Olivier Siohan |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-3 | 1242 | An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation | Sibo Tong, Philip N. Garner, Herve Bourlard |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-4 | 1775 | 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation | Martin Karafiat, Murali Karthick Baskar, Pavel Matejka, Karel Vesely, Frantisek Grezl, Lukas Burget, Jan Černocký |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-5 | 755 | OPTIMIZING DNN ADAPTATION FOR RECOGNITION OF ENHANCED SPEECH | Marco Matassoni, Alessio Brutti, Falavigna Daniele |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-6 | 783 | Deep Least Squares Regression for Speaker Adaptation | Younggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-7 | 788 | Multi-task Learning using Mismatched Transcription for Under-resourced Speech Recognition | Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-8 | 874 | Generalized Distillation Framework For Speaker Normalization | Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh, Basil Abraham |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-9 | 1136 | Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models | Lahiru Samarakoon, Brian Mak, Khe Chai Sim |
2017-08-21 | 14:30-16:30 | Poster 3 | Multi-lingual Models and Adaptation for ASR | P | Mon-P-2-3-10 | 1365 | Factorised representations for neural network adaptation to diverse acoustic environments | Joachim Fainberg, Steve Renals, Peter Bell |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-1 | 1671 | Rescoring-aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition | Ian Williams, Petar Aleksic |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-2 | 1683 | Comparison of Different Decoding Strategies for CTC Acoustic Models | Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-3 | 1680 | Phone duration modeling for LVCSR using neural networks | Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-4 | 343 | Towards better decoding and language model integration in sequence to sequence models | Jan Chorowski, Navdeep Jaitly |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-5 | 547 | Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling | Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-6 | 1343 | Binary Deep Neural Networks for Speech Recognition | Xu Xiang, Yanmin Qian, Kai Yu |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-7 | 1583 | Hierarchical Constrained Bayesian Optimization for Joint Feature, Acoustic Model and Decoder Parameter Optimization | Akshay Chandrashekaran, Ian Lane |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-8 | 717 | Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition | Shohei Toyama, Daisuke Saito, Nobuaki Minematsu |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-9 | 1247 | Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks | Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-10 | 729 | Estimation of Gap Between Current Language Models and Human Performance | Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow |
2017-08-21 | 11:00-13:00 | Poster 4 | Search, Computational Strategies and Language Modeling | P | Mon-P-1-4-11 | 204 | A phonological phrase sequence modelling approach for resource efficient and robust real-time punctuation recovery | Anna Moró, György Szaszák |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-1 | 35 | An RNN Model of Text Normalization | Richard Sproat, Navdeep Jaitly |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-2 | 487 | Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels | Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-3 | 521 | Prosody Aware Word-level Encoder Based on BLSTM-RNNs for DNN-based Speech Synthesis | Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-4 | 669 | Global Syllable Vectors for Building TTS Front-End with Deep Learning | Jinfu Ni, Yoshinori Shiga, Hisashi Kawai |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-5 | 708 | Prosody Control of Utterance Sequence for Information Delivering | Ishin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-6 | 949 | Multi-Task Learning for Prosodic Structure Generation using BLSTM RNN with Structured Output Layer | Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-7 | 1086 | Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction | Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-8 | 1144 | Discrete Duration Model For Speech Synthesis | Bo Chen, Tianling Bian, Kai Yu |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-9 | 1152 | Comparison of Modeling Target in LSTM-RNN Duration Model | Bo Chen, Jiahao Lai, Kai Yu |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-10 | 1340 | Learning word vector representations based on acoustic counts | Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi |
2017-08-21 | 14:30-16:30 | Poster 4 | Prosody and Text Processing | P | Mon-P-2-4-11 | 1507 | Synthesising uncertainty: the interplay of vocal effort and hesitation disfluencies | Eva Szekely, Joseph Mendelson, Joakim Gustafson |
2017-08-22 | 10:00-10:20 | A2 | Models of Speech Production | O | Tue-O-3-2-1 | 181 | Functional principal component analysis of vocal tract area functions | Jorge Lucero |
2017-08-22 | 10:20-10:40 | A2 | Models of Speech Production | O | Tue-O-3-2-2 | 260 | Analysis of acoustic-to-articulatory speech inversion across different accents and languages | Ganesh Sivaraman, Carol Espy-Wilson, Martijn Wieling |
2017-08-22 | 10:40-11:00 | A2 | Models of Speech Production | O | Tue-O-3-2-3 | 617 | Integrated mechanical model of [r]-[l] and [b]-[m]-[w] producing consonant cluster [br] | Takayuki Arai |
2017-08-22 | 11:00-11:20 | A2 | Models of Speech Production | O | Tue-O-3-2-4 | 804 | A Speaker Adaptive DNN Training Approach for Speaker-independent Acoustic Inversion | Leonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil |
2017-08-22 | 11:20-11:40 | A2 | Models of Speech Production | O | Tue-O-3-2-5 | 1010 | Acoustic-to-articulatory mapping based on mixture of probabilistic canonical correlation analysis | Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu |
2017-08-22 | 11:40-12:00 | A2 | Models of Speech Production | O | Tue-O-3-2-6 | 1488 | Test-retest repeatability of articulatory strategies using real-time magnetic resonance imaging | Tanner Sorensen, Asterios Toutios, Johannes Toger, Louis Goldstein, Shrikanth Narayanan |
2017-08-22 | 13:30-13:50 | A2 | Models of Speech Perception | O | Tue-O-4-2-1 | 567 | A Comparison of Sentence-level Speech Intelligibility Metrics | Alexander Kain, Max Del Giudice, Kris Tjaden |
2017-08-22 | 13:50-14:10 | A2 | Models of Speech Perception | O | Tue-O-4-2-2 | 196 | An auditory model of speaker size perception for voiced speech sounds | Toshio Irino, Eri Takimoto, Toshie Matsui, Roy Patterson |
2017-08-22 | 14:10-14:30 | A2 | Models of Speech Perception | O | Tue-O-4-2-3 | 1048 | The recognition of compounds: a computational account | Louis ten Bosch, Lou Boves, Mirjam Ernestus |
2017-08-22 | 14:30-14:50 | A2 | Models of Speech Perception | O | Tue-O-4-2-4 | 1158 | Humans do not maximize the probability of correct decision when recognizing DANTALE words in noise | Mohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen |
2017-08-22 | 14:50-15:10 | A2 | Models of Speech Perception | O | Tue-O-4-2-5 | 1360 | Single-ended prediction of listening effort based on automatic speech recognition | Rainer Huber, Constantin Spille, Bernd T. Meyer |
2017-08-22 | 15:10-15:30 | A2 | Models of Speech Perception | O | Tue-O-4-2-6 | 1611 | Modeling categorical perception with the receptive fields of auditory neurons | Chris Neufeld |
2017-08-22 | 16:00-16:20 | A2 | Speaker Recognition Evaluation | O | Tue-O-5-2-1 | 203 | The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016 | Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu, Mickael Rouvier, Wei Rao, Federico Alegre, Jianbo Ma, Manwai Mak, Achintya Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Guangsen Wang, Trung Hieu Nguyen, Bin Ma, Ville Vestman, Md Sahidullah, Miikka Halonen, Anssi Kanervisto, Gael Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch, Georgios Tzimiropoulos, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit Fauve, Vidhyasaharan Sethu, Kaavya Sriskandaraja, W. W. Lin, Zheng-Hua Tan, Dennis Alexander Lehmann Thomsen, Massimiliano Todisco, Nicholas Evans, Haizhou Li, John H.L. Hansen, Jean-Francois Bonastre |
2017-08-22 | 16:20-16:40 | A2 | Speaker Recognition Evaluation | O | Tue-O-5-2-2 | 537 | The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System | Pedro Torres-Carrasquillo, Fred Richardson, Shahan Nercessian, Douglas Sturim, William Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Harish Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Reda Dehak |
2017-08-22 | 16:40-17:00 | A2 | Speaker Recognition Evaluation | O | Tue-O-5-2-3 | 797 | Nuance – Politecnico di Torino’s 2016 NIST Speaker Recognition Evaluation System | Daniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface |
2017-08-22 | 17:00-17:20 | A2 | Speaker Recognition Evaluation | O | Tue-O-5-2-4 | 555 | UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation | Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H.L. Hansen |
2017-08-22 | 17:20-17:40 | A2 | Speaker Recognition Evaluation | O | Tue-O-5-2-5 | 1498 | Analysis and Description of ABC Submission to NIST SRE 2016 | Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondřej Novotný, Mireia Diez, Johan Rohdin, Ondrej Glembek, Niko Brummer, Albert Swart, Jesús Jorrín, Leibny Paola Garcia Perera, Luis Buera, Patrick Kenny, Md Jahangir Alam, Gautam Bhattacharya |
2017-08-22 | 17:40-18:00 | A2 | Speaker Recognition Evaluation | O | Tue-O-5-2-6 | 458 | The 2016 NIST Speaker Recognition Evaluation | Seyed Omid Sadjadi, Timothee Kheyrkhah, Audrey Tong, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason, Jaime Hernandez-Cordero |
2017-08-22 | 10:00-10:20 | Aula Magna | Neural Network Acoustic Models for ASR 1 | O | Tue-O-3-1-1 | 233 | A Comparison of Sequence-to-Sequence Models for Speech Recognition | Rohit Prabhavalkar, Kanishka Rao, Tara Sainath, Bo Li, Leif Johnson, Navdeep Jaitly |
2017-08-22 | 10:20-10:40 | Aula Magna | Neural Network Acoustic Models for ASR 1 | O | Tue-O-3-1-2 | 1073 | CTC in the Context of Generalized Full-Sum HMM Training | Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney |
2017-08-22 | 10:40-11:00 | Aula Magna | Neural Network Acoustic Models for ASR 1 | O | Tue-O-3-1-3 | 1296 | Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM | Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan |
2017-08-22 | 11:00-11:20 | Aula Magna | Neural Network Acoustic Models for ASR 1 | O | Tue-O-3-1-4 | 71 | Multitask Learning with CTC and Segmental CRF for Speech Recognition | Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith |
2017-08-22 | 11:20-11:40 | Aula Magna | Neural Network Acoustic Models for ASR 1 | O | Tue-O-3-1-5 | 546 | Direct Acoustics-to-Word Models for English Conversational Speech Recognition | Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo |
2017-08-22 | 11:40-12:00 | Aula Magna | Neural Network Acoustic Models for ASR 1 | O | Tue-O-3-1-6 | 1164 | Reducing the Computational Complexity of Two-Dimensional LSTMs | Bo Li, Tara Sainath |
2017-08-22 | 13:30-13:50 | Aula Magna | WaveNet and Novel Paradigms | O | Tue-O-4-1-1 | 314 | Speaker-dependent WaveNet vocoder | Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda |
2017-08-22 | 13:50-14:10 | Aula Magna | WaveNet and Novel Paradigms | O | Tue-O-4-1-2 | 336 | Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension | Yu Gu, Zhen-Hua Ling |
2017-08-22 | 14:10-14:30 | Aula Magna | WaveNet and Novel Paradigms | O | Tue-O-4-1-3 | 488 | Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis | Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi |
2017-08-22 | 14:30-14:50 | Aula Magna | WaveNet and Novel Paradigms | O | Tue-O-4-1-4 | 628 | A Hierarchical Encoder-Decoder Model for Statistical parametric speech synthesis | Srikanth Ronanki, Oliver Watts, Simon King |
2017-08-22 | 14:50-15:10 | Aula Magna | WaveNet and Novel Paradigms | O | Tue-O-4-1-5 | 986 | Statistical voice conversion with WaveNet-based waveform generation | Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda |
2017-08-22 | 15:10-15:30 | Aula Magna | WaveNet and Novel Paradigms | O | Tue-O-4-1-6 | 1107 | Google’s Next-Generation Real-Time Unit-Selection Synthesizer using Sequence-To-Sequence LSTM-based Autoencoders | Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silen, Jakub Vit |
2017-08-22 | 16:00-16:20 | Aula Magna | Neural Network Acoustic Models for ASR 2 | O | Tue-O-5-1-1 | 1705 | Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping | Hasim Sak, Matt Shannon, Kanishka Rao, Francoise Beaufays |
2017-08-22 | 16:20-16:40 | Aula Magna | Neural Network Acoustic Models for ASR 2 | O | Tue-O-5-1-2 | 429 | Highway-LSTM and Recurrent Highway Networks for Speech Recognition | Golan Pundak, Tara Sainath |
2017-08-22 | 16:40-17:00 | Aula Magna | Neural Network Acoustic Models for ASR 2 | O | Tue-O-5-1-3 | 775 | Improving speech recognition by revising gated recurrent units | Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio |
2017-08-22 | 17:00-17:20 | Aula Magna | Neural Network Acoustic Models for ASR 2 | O | Tue-O-5-1-4 | 856 | Stochastic Recurrent Neural Network for Speech Recognition | Jen-Tzung Chien, Chen Shen |
2017-08-22 | 17:20-17:40 | Aula Magna | Neural Network Acoustic Models for ASR 2 | O | Tue-O-5-1-5 | 1064 | Frame and Segment Level Recurrent Neural Networks for Phone Classification | Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf |
2017-08-22 | 17:40-18:00 | Aula Magna | Neural Network Acoustic Models for ASR 2 | O | Tue-O-5-1-6 | 1695 | Deep Learning-based Telephony Speech Recognition in the Wild | Kyu Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian Lane |
2017-08-22 | 10:00-10:20 | B4 | Speaker Recognition | O | Tue-O-3-4-1 | 620 | Deep Neural Network Embeddings for Text-Independent Speaker Verification | David Snyder, Daniel Garcia-Romero, Dan Povey, Sanjeev Khudanpur |
2017-08-22 | 10:20-10:40 | B4 | Speaker Recognition | O | Tue-O-3-4-2 | 1018 | Tied Variational Autoencoder Backends for i-Vector Speaker Recognition | Jesus Villalba, Niko Brummer, Najim Dehak |
2017-08-22 | 10:40-11:00 | B4 | Speaker Recognition | O | Tue-O-3-4-3 | 1182 | Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features | Shivesh Ranjan, John H.L. Hansen |
2017-08-22 | 11:00-11:20 | B4 | Speaker Recognition | O | Tue-O-3-4-4 | 49 | Autoencoder based Domain Adaptation for Speaker Recognition under Insufficient Channel Information | Suwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko |
2017-08-22 | 11:20-11:40 | B4 | Speaker Recognition | O | Tue-O-3-4-5 | 829 | Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification | Abbas Khosravani, Mohammad Mehdi Homayounpour |
2017-08-22 | 11:40-12:00 | B4 | Speaker Recognition | O | Tue-O-3-4-6 | 144 | DNN bottleneck features for speaker clustering | Jesús Jorrín, Leibny Paola Garcia Perera, Luis Buera |
2017-08-22 | 13:30-13:50 | B4 | Source Separation and Auditory Scene Analysis | O | Tue-O-4-4-1 | 830 | A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation | Yannan Wang, Jun Du, Lirong Dai, Chin-Hui Lee |
2017-08-22 | 13:50-14:10 | B4 | Source Separation and Auditory Scene Analysis | O | Tue-O-4-4-2 | 721 | Deep clustering-based beamforming for separation with unknown number of sources | Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolikova, Tomohiro Nakatani |
2017-08-22 | 14:10-14:30 | B4 | Source Separation and Auditory Scene Analysis | O | Tue-O-4-4-3 | 66 | Time-frequency masking for blind source separation with preserved spatial cues | Shadi Pirhosseinloo, Kostas Kokkinakis |
2017-08-22 | 14:30-14:50 | B4 | Source Separation and Auditory Scene Analysis | O | Tue-O-4-4-4 | 832 | Variational Recurrent Neural Networks for Speech Separation | Jen-Tzung Chien, Kuan-Ting Kuo |
2017-08-22 | 14:50-15:10 | B4 | Source Separation and Auditory Scene Analysis | O | Tue-O-4-4-5 | 188 | Detecting overlapped speech on short timeframes using deep learning | Valentin Andrei, Horia Cucu, Corneliu Burileanu |
2017-08-22 | 15:10-15:30 | B4 | Source Separation and Auditory Scene Analysis | O | Tue-O-4-4-6 | 549 | Ideal ratio mask estimation using deep neural networks for monaural speech segregation in noisy reverberant conditions | Xu Li, Junfeng Li, Yonghong Yan |
2017-08-22 | 16:00-16:20 | B4 | Glottal Source Modeling | O | Tue-O-5-4-1 | 15 | A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis | Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino |
2017-08-22 | 16:20-16:40 | B4 | Glottal Source Modeling | O | Tue-O-5-4-2 | 400 | Speaking style conversion from normal to Lombard speech using a glottal vocoder and Bayesian GMMs | Ana Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku |
2017-08-22 | 16:40-17:00 | B4 | Glottal Source Modeling | O | Tue-O-5-4-3 | 848 | Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system | Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku |
2017-08-22 | 17:00-17:20 | B4 | Glottal Source Modeling | O | Tue-O-5-4-4 | 1202 | Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities | Alexander Sorin, Slava Shechtman, Asaf Rendel |
2017-08-22 | 17:20-17:40 | B4 | Glottal Source Modeling | O | Tue-O-5-4-5 | 1722 | Modeling laryngeal muscle activation noise for low-order physiological based speech synthesis | Rodrigo Manriquez, Sean Peterson, Pavel Prado, Patricio Orio, Matias Zañartu |
2017-08-22 | 17:40-18:00 | B4 | Glottal Source Modeling | O | Tue-O-5-4-6 | 1647 | Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis | Felipe Espic, Cassia Valentini-Botinhao, Simon King |
2017-08-22 | 10:00-10:20 | C6 | Phonation and Voice Quality | O | Tue-O-3-6-1 | 1155 | Creak as a feature of lexical stress in Estonian | Kätlin Aare, Pärtel Lippus, Juraj Šimko |
2017-08-22 | 10:20-10:40 | C6 | Phonation and Voice Quality | O | Tue-O-3-6-2 | 1535 | Cross-speaker Variation in Voice Source Correlates of Focus and Deaccentuation | Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl |
2017-08-22 | 10:40-11:00 | C6 | Phonation and Voice Quality | O | Tue-O-3-6-3 | 604 | Acoustic Characterization of Word-final Glottal Stops in Mizo and Assam Sora | Sishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S R Mahadeva Prasanna, Samarendra Dandapat |
2017-08-22 | 11:00-11:20 | C6 | Phonation and Voice Quality | O | Tue-O-3-6-4 | 79 | Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering | Parham Mokhtari, Hiroshi Ando |
2017-08-22 | 11:20-11:40 | C6 | Phonation and Voice Quality | O | Tue-O-3-6-5 | 870 | Automatic Measurement of Pre-aspiration | Yaniv Sheena, Michaela Hejna, Yossi Adi, Joseph Keshet |
2017-08-22 | 11:40-12:00 | C6 | Phonation and Voice Quality | O | Tue-O-3-6-6 | 1774 | Acoustic and electroglottographic study of breathy and modal vowels as produced by heritage and native Gujarati speakers | Kiranpreet Nara |
2017-08-22 | 13:30-13:50 | C6 | Prosody: Tone and Intonation | O | Tue-O-4-6-1 | 1635 | The Vocative Chant and Beyond: German Calling Melodies under Routine and Urgent Contexts | Sergio Quiroz, Marzena Zygis |
2017-08-22 | 13:50-14:10 | C6 | Prosody: Tone and Intonation | O | Tue-O-4-6-2 | 1044 | Comparing languages using hierarchical prosodic analysis | Juraj Šimko, Antti Suni, Katri Hiovain, Martti Vainio |
2017-08-22 | 14:10-14:30 | C6 | Prosody: Tone and Intonation | O | Tue-O-4-6-3 | 264 | Intonation Facilitates Prediction of Focus even in the Presence of Lexical Tones | Martin Ho Kwan Ip, Anne Cutler |
2017-08-22 | 14:30-14:50 | C6 | Prosody: Tone and Intonation | O | Tue-O-4-6-4 | 839 | Mind the peak: When museum is temporarily understood as musical in Australian English | Katharina Zahner, Heather Kember, Bettina Braun |
2017-08-22 | 14:50-15:10 | C6 | Prosody: Tone and Intonation | O | Tue-O-4-6-5 | 1353 | Pashto intonation patterns | Luca Rognoni, Judith Bishop, Miriam Corris |
2017-08-22 | 15:10-15:30 | C6 | Prosody: Tone and Intonation | O | Tue-O-4-6-6 | 175 | A new model of final lowering in spontaneous monologue | Kikuo Maekawa |
2017-08-22 | 16:00-16:20 | C6 | Prosody: Rhythm, Stress, Quantity and Phrasing | O | Tue-O-5-6-1 | 544 | Similar prosodic structure perceived differently in German and English | Heather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler |
2017-08-22 | 16:20-16:40 | C6 | Prosody: Rhythm, Stress, Quantity and Phrasing | O | Tue-O-5-6-2 | 1214 | Disambiguate or not? – The role of prosody in unambiguous and potentially ambiguous anaphora production in strictly Mandarin parallel structures | Luying Hou, Bert Le Bruyn, René Kager |
2017-08-22 | 16:40-17:00 | C6 | Prosody: Rhythm, Stress, Quantity and Phrasing | O | Tue-O-5-6-3 | 1514 | Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian Portuguese | Angeliki Athanasopoulou, Irene Vogel, Hossep Dolatian |
2017-08-22 | 17:00-17:20 | C6 | Prosody: Rhythm, Stress, Quantity and Phrasing | O | Tue-O-5-6-4 | 987 | Phonological complexity, segment rate and speech tempo perception | Leendert Plug, Rachel Smith |
2017-08-22 | 17:20-17:40 | C6 | Prosody: Rhythm, Stress, Quantity and Phrasing | O | Tue-O-5-6-5 | 29 | On the Duration of Mandarin Tones | Jing Yang, Yu Zhang, Aijun Li, Li Xu |
2017-08-22 | 17:40-18:00 | C6 | Prosody: Rhythm, Stress, Quantity and Phrasing | O | Tue-O-5-6-6 | 1134 | The formant dynamics of long close vowels in three varieties of Swedish | Otto Ewald, Eva Liina Asu, Susanne Schötz |
2017-08-22 | 10:00-10:20 | D8 | Speech Synthesis Prosody | O | Tue-O-3-8-1 | 246 | An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis | Xin Wang, Shinji Takaki, Junichi Yamagishi |
2017-08-22 | 10:20-10:40 | D8 | Speech Synthesis Prosody | O | Tue-O-3-8-2 | 419 | Phrase break prediction for long-form reading TTS: exploiting the text structure information | Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman |
2017-08-22 | 10:40-11:00 | D8 | Speech Synthesis Prosody | O | Tue-O-3-8-3 | 688 | Physically constrained statistical F0 prediction for electrolaryngeal speech enhancement | Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura |
2017-08-22 | 11:00-11:20 | D8 | Speech Synthesis Prosody | O | Tue-O-3-8-4 | 719 | DNN-SPACE: DNN-HMM-based Generative Model of Voice $F_0$ Contours for Statistical Phrase/Accent Command Estimation | Nobukatsu Hojo, Ohsugi Yasuhito, Yusuke Ijima, Hirokazu Kameoka |
2017-08-22 | 11:20-11:40 | D8 | Speech Synthesis Prosody | O | Tue-O-3-8-5 | 1355 | Controlling prominence realisation in parametric DNN-based speech synthesis. | Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson |
2017-08-22 | 11:40-12:00 | D8 | Speech Synthesis Prosody | O | Tue-O-3-8-6 | 1528 | Increasing Recall of Lengthening Detection via Semi-Automatic Classification | Simon Betz, Jana Voße, Sina Zarrieß, Petra Wagner |
2017-08-22 | 13:30-13:50 | D8 | Emotion Modeling | O | Tue-O-4-8-1 | 619 | Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space | Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai |
2017-08-22 | 13:50-14:10 | D8 | Emotion Modeling | O | Tue-O-4-8-2 | 1421 | Adversarial Auto-encoders for Speech Based Emotion Recognition | Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael Abdalmageed, Carol Espy-Wilson |
2017-08-22 | 14:10-14:30 | D8 | Emotion Modeling | O | Tue-O-4-8-3 | 512 | An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression | Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah |
2017-08-22 | 14:30-14:50 | D8 | Emotion Modeling | O | Tue-O-4-8-4 | 548 | Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition | Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, Emily Mower Provost |
2017-08-22 | 14:50-15:10 | D8 | Emotion Modeling | O | Tue-O-4-8-5 | 1181 | Voice-to-affect mapping: inferences on language voice baseline settings | Ailbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl |
2017-08-22 | 15:10-15:30 | D8 | Emotion Modeling | O | Tue-O-4-8-6 | 917 | Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech | Michael Neumann, Ngoc Thang Vu |
2017-08-22 | 16:00-16:20 | D8 | Speech Recognition for Langauge Learning | O | Tue-O-5-8-1 | 250 | Bidirectional LSTM-RNN for Improving Automated Assessment of Non-native Children’s Speech | Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland |
2017-08-22 | 16:20-16:40 | D8 | Speech Recognition for Langauge Learning | O | Tue-O-5-8-2 | 728 | Automatic Scoring of Shadowing Speech based on DNN Posteriors and their DTW | Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu |
2017-08-22 | 16:40-17:00 | D8 | Speech Recognition for Langauge Learning | O | Tue-O-5-8-3 | 1174 | Off-Topic Spoken Response Detection Using Siamese Convolutional Neural Networks | Chong Min Lee, Su-Youn Yoon, Xinhao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini |
2017-08-22 | 17:00-17:20 | D8 | Speech Recognition for Langauge Learning | O | Tue-O-5-8-4 | 1350 | Phonological Feature Based Mispronunciation Detection and Diagnosis using Multi-Task DNNs and Active Learning | Vipul Arora, Aditi Lahiri, Henning Reetz |
2017-08-22 | 17:20-17:40 | D8 | Speech Recognition for Langauge Learning | O | Tue-O-5-8-5 | 1522 | Detection of Mispronunciations and Disfluencies in Children Reading Aloud | Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão |
2017-08-22 | 17:40-18:00 | D8 | Speech Recognition for Langauge Learning | O | Tue-O-5-8-6 | 366 | Automatic assessment of non-native prosody by measuring distances on prosodic label sequences | David Escudero-Mancebo, César González-Ferreras, Eva Estebas-Vilaplana, Lourdes Aguilar |
2017-08-22 | 10:00-10:20 | E10 | Emotion Recognition | O | Tue-O-3-10-1 | 200 | Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms | Aharon Satt, Shai Rozenberg, Ron Hoory |
2017-08-22 | 10:20-10:40 | E10 | Emotion Recognition | O | Tue-O-3-10-2 | 713 | Interaction and Transition Model for Speech Emotion Recognition in Dialogue | Ruo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono |
2017-08-22 | 10:40-11:00 | E10 | Emotion Recognition | O | Tue-O-3-10-3 | 1637 | Progressive Neural Networks for Transfer Learning in Emotion Recognition | John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost |
2017-08-22 | 11:00-11:20 | E10 | Emotion Recognition | O | Tue-O-3-10-4 | 1494 | Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning | Srinivas Parthasarathy, Carlos Busso |
2017-08-22 | 11:20-11:40 | E10 | Emotion Recognition | O | Tue-O-3-10-5 | 94 | Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network | Duc Le, Zakaria Aldeneh, Emily Mower Provost |
2017-08-22 | 11:40-12:00 | E10 | Emotion Recognition | O | Tue-O-3-10-6 | 736 | Towards Speech Emotion Recognition “in the wild” using Aggregated Corpora and Deep Multi-Task Learning | Jaebok Kim, Gwenn Englebienne, Khiet Truong, Vanessa Evers |
2017-08-22 | 13:30-13:50 | E10 | Voice Conversion 1 | O | Tue-O-4-10-1 | 247 | Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities | Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari |
2017-08-22 | 13:50-14:10 | E10 | Voice Conversion 1 | O | Tue-O-4-10-2 | 349 | Learning Latent Representations for Speech Generation and Transformation | Wei-Ning Hsu, Yu Zhang, James Glass |
2017-08-22 | 14:10-14:30 | E10 | Voice Conversion 1 | O | Tue-O-4-10-3 | 961 | Parallel-data-free Many-to-many Voice Conversion based on DNN Integrated with Eigenspace Using a Non-parallel Speech Corpus | Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu |
2017-08-22 | 14:30-14:50 | E10 | Voice Conversion 1 | O | Tue-O-4-10-4 | 970 | Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks | Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino |
2017-08-22 | 14:50-15:10 | E10 | Voice Conversion 1 | O | Tue-O-4-10-5 | 1453 | A mouth opening effect based on pole modification for expressive singing voice transformation | Luc Ardaillon, Axel Roebel |
2017-08-22 | 15:10-15:30 | E10 | Voice Conversion 1 | O | Tue-O-4-10-6 | 1434 | Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion | Seyed Hamidreza Mohammadi, Alexander Kain |
2017-08-22 | 16:00-16:20 | E10 | Stance, Credibility, and Deception | O | Tue-O-5-10-1 | 159 | Inferring Stance from Prosody | Nigel Ward, Jason Carlson, Olac Fuentes, Diego Castan, Elizabeth Shriberg, Andreas Tsiartas |
2017-08-22 | 16:20-16:40 | E10 | Stance, Credibility, and Deception | O | Tue-O-5-10-2 | 1706 | Exploring Dynamic Measures of Stance in Spoken Interaction | Gina-Anne Levow, Richard A. Wright |
2017-08-22 | 16:40-17:00 | E10 | Stance, Credibility, and Deception | O | Tue-O-5-10-3 | 1035 | Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields | Valentin Barriere, Chloé Clavel, Slim Essid |
2017-08-22 | 17:00-17:20 | E10 | Stance, Credibility, and Deception | O | Tue-O-5-10-4 | 121 | TRANSFER LEARNING BETWEEN CONCEPTS FOR HUMAN BEHAVIOR MODELING: AN APPLICATION TO SINCERITY AND DECEPTION PREDICTION | Qinyi Luo, Rahul Gupta, Shrikanth Narayanan |
2017-08-22 | 17:20-17:40 | E10 | Stance, Credibility, and Deception | O | Tue-O-5-10-5 | 384 | The Sound of Deception – What Makes a Speaker Credible? | Anne Schröder, Simon Stone, Peter Birkholz |
2017-08-22 | 17:40-18:00 | E10 | Stance, Credibility, and Deception | O | Tue-O-5-10-6 | 1723 | Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection | Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg |
2017-08-22 | 10:00-12:00 | E306 | Show & Tell 3 | S&T | Tue-S&T-3-A-1 | 10017 | Applications of the BBN Sage Speech Processing Platform | Ralf Meermeier, Sean Colbath |
2017-08-22 | 10:00-12:00 | E306 | Show & Tell 3 | S&T | Tue-S&T-3-A-2 | 10025 | Bob Speaks Kaldi | Milos Cernak, Alain Komaty, Amir Mohammadi, Andre Anjos, Sebastien Marcel |
2017-08-22 | 10:00-12:00 | E306 | Show & Tell 3 | S&T | Tue-S&T-3-A-3 | 10028 | Real time pitch shifting with formant structure preservation using the phase vocoder | Michał Lenarczyk |
2017-08-22 | 10:00-12:00 | E306 | Show & Tell 3 | S&T | Tue-S&T-3-A-4 | 10043 | A Signal Processing Approach for Speaker Separation using SFF Analysis | Nivedita Chennupati, Narayana Murthy BHVS, Bayya Yegnanarayana |
2017-08-22 | 10:00-12:00 | E306 | Show & Tell 3 | S&T | Tue-S&T-3-A-5 | 10056 | Speech Recognition and Understanding on Hardware-Accelerated DSP | Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef Bauer, Jakub Nowicki, Tobias Bocklet, Hannah Colett, Ohad Falik, Michael Deisher, Sylvia Downing |
2017-08-22 | 10:00-12:00 | E306 | Show & Tell 3 | S&T | Tue-S&T-3-A-6 | 10053 | MetaLab: A repository for meta-analyses on language development, and more | Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristia |
2017-08-22 | 13:30-15:30 | E306 | Show & Tell 3 | S&T | Tue-S&T-4-A-1 | 10017 | Applications of the BBN Sage Speech Processing Platform | Ralf Meermeier, Sean Colbath |
2017-08-22 | 13:30-15:30 | E306 | Show & Tell 3 | S&T | Tue-S&T-4-A-2 | 10025 | Bob Speaks Kaldi | Milos Cernak, Alain Komaty, Amir Mohammadi, Andre Anjos, Sebastien Marcel |
2017-08-22 | 13:30-15:30 | E306 | Show & Tell 3 | S&T | Tue-S&T-4-A-3 | 10028 | Real time pitch shifting with formant structure preservation using the phase vocoder | Michał Lenarczyk |
2017-08-22 | 13:30-15:30 | E306 | Show & Tell 3 | S&T | Tue-S&T-4-A-4 | 10043 | A Signal Processing Approach for Speaker Separation using SFF Analysis | Nivedita Chennupati, Narayana Murthy BHVS, Bayya Yegnanarayana |
2017-08-22 | 13:30-15:30 | E306 | Show & Tell 3 | S&T | Tue-S&T-4-A-5 | 10056 | Speech Recognition and Understanding on Hardware-Accelerated DSP | Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef Bauer, Jakub Nowicki, Tobias Bocklet, Hannah Colett, Ohad Falik, Michael Deisher, Sylvia Downing |
2017-08-22 | 13:30-15:30 | E306 | Show & Tell 3 | S&T | Tue-S&T-4-A-6 | 10053 | MetaLab: A repository for meta-analyses on language development, and more | Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristia |
2017-08-22 | 10:00-12:00 | E397 | Show & Tell 4 | S&T | Tue-S&T-3-B-1 | 10030 | Evolving recurrent neural networks that process and classify raw audio in a streaming fashion | Adrien DANIEL |
2017-08-22 | 10:00-12:00 | E397 | Show & Tell 4 | S&T | Tue-S&T-3-B-2 | 10032 | Combining Gaussian mixture models and segmental feature models for speaker recognition | Milana Milošević, Ulrike Glavitsch |
2017-08-22 | 10:00-12:00 | E397 | Show & Tell 4 | S&T | Tue-S&T-3-B-3 | 10036 | Did you laugh enough today? – Deep Neural Networks for Mobile and Wearable Laughter Trackers | Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn Schuller |
2017-08-22 | 10:00-12:00 | E397 | Show & Tell 4 | S&T | Tue-S&T-3-B-4 | 10037 | Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation | Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim |
2017-08-22 | 10:00-12:00 | E397 | Show & Tell 4 | S&T | Tue-S&T-3-B-5 | 10039 | Real-time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson | Sean Wood, Jean Rouat |
2017-08-22 | 10:00-12:00 | E397 | Show & Tell 4 | S&T | Tue-S&T-3-B-6 | 10033 | Reading validation for pronunciation evaluation in the Digitala project | Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo |
2017-08-22 | 13:30-15:30 | E397 | Show & Tell 4 | S&T | Tue-S&T-4-B-1 | 10030 | Evolving recurrent neural networks that process and classify raw audio in a streaming fashion | Adrien DANIEL |
2017-08-22 | 13:30-15:30 | E397 | Show & Tell 4 | S&T | Tue-S&T-4-B-2 | 10032 | Combining Gaussian mixture models and segmental feature models for speaker recognition | Milana Milošević, Ulrike Glavitsch |
2017-08-22 | 13:30-15:30 | E397 | Show & Tell 4 | S&T | Tue-S&T-4-B-3 | 10036 | Did you laugh enough today? – Deep Neural Networks for Mobile and Wearable Laughter Trackers | Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn Schuller |
2017-08-22 | 13:30-15:30 | E397 | Show & Tell 4 | S&T | Tue-S&T-4-B-4 | 10037 | Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation | Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim |
2017-08-22 | 13:30-15:30 | E397 | Show & Tell 4 | S&T | Tue-S&T-4-B-5 | 10039 | Real-time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson | Sean Wood, Jean Rouat |
2017-08-22 | 13:30-15:30 | E397 | Show & Tell 4 | S&T | Tue-S&T-4-B-6 | 10033 | Reading validation for pronunciation evaluation in the Digitala project | Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo |
2017-08-22 | 10:00-10:15 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-8 | Introduction | Timo Baumann, Thomas Hueber, David Schlangen | |
2017-08-22 | 10:15-10:30 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-1 | 1223 | Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect | Brian Stasak, Julien Epps, Roland Goecke |
2017-08-22 | 10:30-10:45 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-2 | 1308 | Robustness over time-varying channels in DNN-HMM ASR based human-robot interaction | Jose Novoa, Jorge Wuth, Juan Pablo Escudero, Josue Fredes, Rodrigo Mahu, Richard Stern, Nestor Becerra Yoma |
2017-08-22 | 10:45-11:00 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-3 | 1395 | Analysis of Engagement and User Experience with a Laughter Responsive Social Robot | Bekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, Metin Sezgin |
2017-08-22 | 11:00-11:15 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-4 | 730 | Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results | Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Mauric Gerczuk, Björn Schuller |
2017-08-22 | 11:15-11:30 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-5 | 926 | Crowd-Sourced Design of Artificial Attentive Listeners | Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson |
2017-08-22 | 11:30-11:45 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-6 | 1431 | Studying the link between inter-speaker coordination and speech imitation through human-machine interactions | Leonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot |
2017-08-22 | 13:30-13:45 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-7 | Introduction | Timo Baumann, Thomas Hueber, David Schlangen | |
2017-08-22 | 13:45-14:00 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-1 | 396 | Adjusting the Frame: Biphasic Performative Control of Speech Rhythm | Samuel Delalez, Christophe d’Alessandro |
2017-08-22 | 14:00-14:15 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-2 | 1676 | Attentional factors in listeners’ uptake of gesture cues during speech processing | Raheleh Saryazdi, Craig Chambers |
2017-08-22 | 14:15-14:30 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-3 | 631 | Motion analysis in vocalized surprise expressions | Carlos Ishi, Takashi Minato, Hiroshi Ishiguro |
2017-08-22 | 14:30-14:45 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-4 | 1606 | Enhancing Backchannel Prediction Using Word Embeddings | Robin Rüde, Markus Müller, Sebastian Stüker, Alex Waibel |
2017-08-22 | 14:45-15:00 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-5 | 1042 | A Computational Model for Phonetically Responsive Spoken Dialogue Systems | Eran Raveh, Ingmar Steiner, Bernd Möbius |
2017-08-22 | 15:00-15:15 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-6 | 738 | Incremental Dialogue Act Recognition: token- vs chunk-based classification | Eustace Ebhotemhen, Volha Petukhova, Dietrich Klakow |
2017-08-22 | 15:15-15:30 | F11 | Special Session: Incremental Processing and Responsive Behaviour | SS | Tue-SS-4-11-8 | Discussion | Timo Baumann, Thomas Hueber, David Schlangen | |
2017-08-22 | 16:00-16:05 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-10 | Introduction | Stefanie Jannedy, Melanie Weirich | |
2017-08-22 | 16:00-16:30 | F11 | Special Session: Speech and Human-Robot Interaction | SS | Tue-SS-3-11-7 | Discussion | Nicholas Evans, Kong Aik Lee | |
2017-08-22 | 16:05-16:25 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-1 | 28 | Clear Speech – Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners | Oliver Niebuhr |
2017-08-22 | 16:25-16:45 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-6 | 1248 | To see or not to see: Interlocutor visibility and likeability influence convergence in intonation | Katrin Schweitzer, Michael Walsh, Antje Schweitzer |
2017-08-22 | 16:45-17:05 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-7 | 1394 | Acoustic correlates of parental role and gender identity in the speech of expecting parents | Melanie Weirich, Adrian Simpson |
2017-08-22 | 17:05-17:25 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-9 | 1746 | Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions | Rachael Tatman, Conner Kasten |
2017-08-22 | 17:25-18:00 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-2 | 293 | Relationships between speech timing and perceived hostility in a French corpus of political debates | Charlotte Kouklia, Nicolas Audibert |
2017-08-22 | 17:25-18:00 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-3 | 328 | Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution | Laura Fernández Gallardo, Benjamin Weiss |
2017-08-22 | 17:25-18:00 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-4 | 623 | Prosodic analysis of attention-drawing speech | Carlos Ishi, Jun Arai, Norihiro Hagita |
2017-08-22 | 17:25-18:00 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-5 | 1055 | Perceptual and acoustic correlates of gender in the prepubertal voice | Adrian Simpson, Riccarda Funk, Frederik Palmer |
2017-08-22 | 17:25-18:00 | F11 | Special Session: Acoustic Manifestations of Social Characteristics | SS | Tue-SS-5-11-8 | 1732 | A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains | Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrao, Anna Pompili, Ramón Fernández-Astudillo, Joana Campos, Ana Paiva, Isabel Trancoso |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-1 | 1419 | Content Normalization for Text-dependent Speaker Verification | Subhadeep Dey, Srikanth Madikeri, Petr Motlicek, Marc Ferras |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-2 | 1608 | End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances | Chunlei Zhang, Kazuhito Koishida |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-3 | 883 | Adversarial Network Bottleneck Features for Noise Robust Speaker Verification | Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-4 | 1125 | What Does the Speaker Embedding Encode? | Shuai Wang, Yanmin Qian, Kai Yu |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-5 | 266 | Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification | Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-6 | 1036 | DNN i-vector Speaker Verification with Short, Text-constrained Test Utterances | Jinghua Zhong, Wenping Hu, Frank Soong, Helen Meng |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-7 | 734 | Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions | Ville Vestman, Dhananjaya Gowda, Md Sahidullah, Paavo Alku, Tomi Kinnunen |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-8 | 1575 | Deep Speaker Embeddings for Short-Duration Speaker Verification | Gautam Bhattacharya, Md Jahangir Alam, Patrick Kenny |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-9 | 157 | Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems | Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia Keating, Abeer Alwan |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-10 | 108 | Gain Compensation for Fast I-Vector Extraction over Short Duration | Kong Aik Lee, Haizhou Li |
2017-08-22 | 10:00-12:00 | Poster 1 | Short Utterances Speaker Recognition | P | Tue-P-3-1-11 | 1050 | Joint Training of Expanded End-to-end DNN for Text-dependent Speaker Verification | Hee-Soo Heo, Jee-Weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Ha-Jin Yu |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-1 | 129 | An exploration of dropout with LSTMs | Gaofeng Cheng, Vijayaditya Peddinti, Dan Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-2 | 477 | Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition | Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-3 | 873 | UNFOLDED DEEP RECURRENT CONVOLUTIONAL NEURAL NETWORK WITH JUMP AHEAD CONNECTIONS FOR ACOUSTIC MODELING | Tien Dung Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-4 | 554 | Forward-backward Convolutional LSTM for Acoustic Modeling | Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-5 | 1737 | Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting | Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-6 | 1233 | Deep Activation Mixture Model for Speech Recognition | Chunyang Wu, Mark Gales |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-7 | 920 | Ensembles of Multi-scale VGG Acoustic Models | Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-8 | 338 | Training Context-Dependent DNN Acoustic Models using Probabilistic Sampling | Tamás Grósz, Gábor Gosztolya, László Tóth |
2017-08-22 | 13:30-15:30 | Poster 1 | Acoustic Models for ASR 1 | P | Tue-P-4-1-9 | 899 | A Comparative Evaluation of GMM-Free State Tying Methods for ASR | Tamás Grósz, Gábor Gosztolya, László Tóth |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-1 | 379 | An Automatically Aligned Corpus of Child-directed Speech | Micha Elsner, Kiwako Ito |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-2 | 9 | A comparison of Danish listeners’ processing cost in judging the truth value of Norwegian, Swedish, and English sentences | Ocke-Schwen Bohn, Trine Askjær-Jørgensen |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-3 | 1282 | On the role of temporal variability in the acquisition of the German vowel length contrast | Felicitas Kleber |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-4 | 1607 | A data-driven approach for perceptually validated acoustic features for children’s sibilant fricative productions | Patrick Reidy, Mary Beckman, Jan Edwards, Benjamin Munson |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-5 | 64 | Quality Assessment of ESL Learner’s Sentence Prosody with TTS Synthesized Voice as Reference | Yujia Xiao, Frank Soong |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-6 | 143 | Mechanisms of Tone Sandhi Rule Application by Non-native Speakers | Si Chen, YUNJUAN HE, Chun Wah Yuen, Bei Li, Yike Yang |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-7 | 289 | Changes in early L2 cue-weighting of non-native speech: Evidence from learners of Mandarin Chinese | Seth Wiener |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-8 | 1600 | Directing Attention during Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers | Ying Chen, Eric Pederson |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-9 | 332 | Prosody analysis of L2 English for naturalness evaluation through speech modification | Dean Luo, Ruxin Luo, Lixin Wang |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-10 | 337 | Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production | Gintare Grigonyte, Gerold Schneider |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-11 | 369 | Lexical adaptation to a novel accent in German: A comparison between German, Swedish, and Finnish listeners | Adriana Hanulikova, Jenny Ekström |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-12 | 743 | Qualitative differences in L3 learners’ neurophysiological response to L1 versus L2 transfer | Alejandra Keidel Fernández, Thomas Hörberg |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-13 | 1052 | Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for | Johan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-14 | 714 | The relationship between the perception and production of non-native tones | Kaile Zhang, Gang Peng |
2017-08-22 | 16:00-18:00 | Poster 1 | L1 and L2 Acquisition | P | Tue-P-5-1-15 | 1110 | MMN responses in adults after exposure to bimodal and unimodal frequency distributions of rotated speech | Ellen Marklund, Elísabet Eir Cortes, Johan Sjons |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-1 | 633 | Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares | Chen Chen, Jiqing Han, Yilin Pan |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-2 | 452 | Deep Speaker Feature Learning for Text-independent Speaker Verification | Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-3 | 93 | Duration mismatch compensation using four-covariance model and deep neural network for speaker verification | Pierre-Michel Bousquet, Mickael Rouvier |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-4 | 1586 | Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition | Alan McCree, Greg Sell, Daniel Garcia-Romero |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-5 | 438 | Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data | Jonas Borgstrom, Elliot Singer, Douglas Reynolds, Seyed Omid Sadjadi |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-6 | 656 | I-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification | Zhili Tan, Manwai Mak |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-7 | 803 | Analysis of Score Normalization in Multilingual Speaker Recognition | Pavel Matejka, Oldrich Plchot, Ondřej Novotný, Lukas Burget, Mireia Diez Sánchez, Jan Černocký |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-8 | 1062 | Alternative Approaches to Neural Network based Speaker Verification | Anna Silnova, Lukas Burget, Jan Černocký |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-9 | 219 | A Distribution Free Formulation of the Total Variability Model | Ruchir Travadi, Shrikanth Narayanan |
2017-08-22 | 10:00-12:00 | Poster 2 | Speaker Characterization and Recognition | P | Tue-P-3-2-10 | 668 | Domain mismatch modeling of out-domain i-vectors for PLDA speaker verification | Md Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan |
2017-08-22 | 13:30-15:30 | Poster 2 | Acoustic Models for ASR 2 | P | Tue-P-4-2-1 | 1323 | Backstitch: Counteracting Finite-sample Bias via Negative Steps | Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Dan Povey, Sanjeev Khudanpur |
2017-08-22 | 13:30-15:30 | Poster 2 | Acoustic Models for ASR 2 | P | Tue-P-4-2-2 | 779 | Node pruning based on Entropy of Weights and Node Activity for Small-footprint Acoustic Model based on Deep Neural Networks | Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani |
2017-08-22 | 13:30-15:30 | Poster 2 | Acoustic Models for ASR 2 | P | Tue-P-4-2-3 | 1284 | End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow | Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani |
2017-08-22 | 13:30-15:30 | Poster 2 | Acoustic Models for ASR 2 | P | Tue-P-4-2-4 | 1557 | An Efficient Phone N-gram Forward-backward Computation Using Dense Matrix Multiplication | Khe Chai Sim, Arun Narayanan |
2017-08-22 | 13:30-15:30 | Poster 2 | Acoustic Models for ASR 2 | P | Tue-P-4-2-5 | 1747 | Parallel Neural Network Features for Improved Tandem Acoustic Modeling | Zoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney |
2017-08-22 | 13:30-15:30 | Poster 2 | Acoustic Models for ASR 2 | P | Tue-P-4-2-6 | 1581 | Acoustic feature learning with deep variational canonical correlation analysis | Qingming Tang, Weiran Wang, Karen Livescu |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-1 | 335 | Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers | Antonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-2 | 478 | Classification of bulbar ALS from kinematic features of the jaw and lips: Towards computer-mediated assessment | Andrea Bandini, Jordan Green, Lorne Zinman, Yana Yunusova |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-3 | 589 | Zero Frequency Filter Based Analysis of Voice Disorders | Nagaraj Adiga, Vikram C M, Keerthi Pullela, S R Mahadeva Prasanna |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-4 | 1245 | Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area. | Nikitha K, Sishir Kalita, CM Vikram, M. Pushpavathi, S R Mahadeva Prasanna |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-5 | 1363 | Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech | Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-6 | 416 | Apkinson – A mobile monitoring solution for Parkinson’s disease | Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave, Elmar Noeth |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-7 | 762 | Dysprosody differentiate between Parkinson’s disease, progressive supranuclear palsy, and multiple system atrophy | Jan Hlavnička, Tereza Tykalová, Roman Čmejla, Jiří Klempíř, Evžen Růžička, Jan Rusz |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-8 | 1222 | Interpretable Objective Assessment of Dysarthric Speech based on Deep Neural Networks | Ming Tu, Visar Berisha, Julie Liss |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-9 | 1318 | Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition | Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-10 | 1740 | Prediction of Speech Delay from Acoustic Measurements | Jason Lilley, Madhavi Ratnagiri, H Timothy Bunnell |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-11 | 329 | The Frequency Range of “The Ling Six Sounds” in Standard Chinese | Aijun Li, Hua Zhang, Wen Sun |
2017-08-22 | 16:00-18:00 | Poster 2 | Voice, Speech and Hearing Disorders | P | Tue-P-5-2-12 | 1698 | Production of sustained vowels and categorical perception of tones in Mandarin among cochlear-implanted children | Wentao Gu, Jiao Yin, James Mahshie |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-1 | 651 | Online End-of-Turn Detection from Speech based on Stacked Time-Asynchronous Sequential Networks | Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-2 | 1176 | Improving prediction of speech activity using multi-participant respiratory state | Marcin Wlodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-3 | 1495 | Turn-Taking Offsets and Dialogue Context | Peter Heeman, Rebecca Lunsford |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-4 | 1593 | Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems | Angelika Maier, Julian Hough, David Schlangen |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-5 | 837 | End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech | Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-6 | 965 | A Turn-taking Estimation Model based on Joint Embedding of Lexical and Prosodic Contents | Chaoran Liu, Carlos Ishi, Hiroshi Ishiguro |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-7 | 457 | Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC | Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-8 | 1568 | Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels | Zahra Rahimi, Anish Kumar, Diane Litman, Susannah Paletz, Mingzhi Yu |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-9 | 1604 | Measuring Synchrony in Task-based Dialogues | Justine Reverdy, Carl Vogel |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-10 | 161 | Sequence to Sequence Modeling for User Simulation in Dialog Systems | Paul Crook, Alex Marin |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-11 | 1213 | Issues in Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions | Vikram Ramanarayanan, Patrick Lange, Keelan Evanini, Hillary Molloy, David Suendermann-Oeft |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-12 | 725 | Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls | Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-13 | 1032 | Domain-independent User Satisfaction Reward Estimation for Dialogue Policy Learning | Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina M. Rojas Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gasic, Steve Young |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-14 | 1006 | Analysis of the Relationship between Prosodic Features of Fillers and Its Forms or Occurrence Positions | Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara |
2017-08-22 | 13:30-15:30 | Poster 3 | Dialog Modeling | P | Tue-P-4-3-15 | 1413 | Cross-Subject Continuous Emotion Recognition using Speech and Body Motion in Dyadic Interactions | Syeda Narjis Fatima, Engin Erzin |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-1 | 40 | Audio Content based Geotagging in Multimedia | Anurag Kumar, Benjamin Elizalde, Bhiksha Raj |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-2 | 55 | Time Delay Histogram Based Speech Source Separation Using a Planar Array | Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-3 | 135 | Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence | Gayadhar Pradhan, Avinash Kumar, Syed Shahnawazuddin |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-4 | 189 | A Contrast Function and Algorithm for Blind Separation of Audio Signals | Wei Gao, Roberto Togneri, Victor Sreeram |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-5 | 199 | Weighted Spatial Covariance Matrix Estimation for MUSIC based TDOA Estimation of Speech Source | Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-6 | 229 | Speaker Direction-of-Arrival Estimation Based On Frequency-Independent Beampattern | Feng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-7 | 271 | A Mask Estimation Method Integrating Data Field Model for Speech Enhancement | Xianyun Wang, Changchun Bao, Feng Bao |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-8 | 496 | Improved end-of-query detection for streaming speech recognition | Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-9 | 593 | Using Approximated Auditory Roughness as a Pre-filtering Feature for Human Screaming and Affective Speech AED | Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-10 | 754 | Improving Source Separation via Multi-Speaker Representations | Jeroen Zegers, Hugo Van hamme |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-11 | 940 | Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector | Bing Yang, Hong Liu, Cheng Pang |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-12 | 954 | Subband selection for binaural speech source localization | Karthik Girija Ramesan, Prasanta Ghosh |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-13 | 1227 | Unmixing Convolutive Mixtures by Exploiting Amplitude Co-modulation: Methods and Evaluation on Mandarin Speech Recordings | Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-14 | 1573 | Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection | Fei Tao, Carlos Busso |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-15 | 1673 | Domain-Specific Utterance End-Point Detection for Speech Recognition | Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Bjorn Hoffmeister |
2017-08-22 | 16:00-18:00 | Poster 3 | Source Separation and Voice Activity Detection | P | Tue-P-5-3-16 | 1760 | Speech detection and enhancement using single microphone for distant speech applications in reverberant environments | Vinay Kothapally, John H.L. Hansen |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-1 | 62 | A Post-filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement | YICHIAO WU, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-2 | 240 | Multi-target Ensemble Learning for Monaural Speech Separation | Hui Zhang, Xueliang Zhang, Guanglai Gao |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-3 | 543 | Improved Example-based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search | Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-4 | 1041 | Subjective intelligibility of deep neural network-based speech enhancement | Femke B. Gelderblom, Tron V. Tronstad, Erlend M. Viggen |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-5 | 1157 | REAL-TIME MODULATION ENHANCEMENT OF TEMPORAL ENVELOPES FOR INCREASING SPEECH INTELLIGIBILITY | Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-6 | 1173 | On the influence of modifying magnitude and phase spectrum to enhance noisy speech signals | Hans-Guenter Hirsch, Michael Gref |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-7 | 1243 | MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement | Robert Rehr, Timo Gerkmann |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-8 | 1257 | Binary mask estimation strategies for constrained imputation-based speech enhancement | Ricard Marxer, Jon Barker |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-9 | 1465 | A Fully Convolutional Network for Speech Enhancement | Serim Park, Jinwon Lee |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-10 | 1492 | Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization | Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-11 | 1504 | A comparison of perceptually motivated loss functions for binary mask estimation in speech separation | Danny Websdale, Ben Milner |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-12 | 1620 | Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification | Daniel Michelsanti, Zheng-Hua Tan |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-13 | 1672 | Speech Enhancement Using Bayesian Wavenet | Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-14 | 297 | BINAURAL REVERBERANT SPEECH SEPARATION BASED ON DEEP NEURAL NETWORKS | Xueliang Zhang, DeLiang Wang |
2017-08-22 | 16:00-18:00 | Poster 4 | Speech-enhancement | P | Tue-P-5-4-15 | 1225 | On the quality and intelligibility of noisy speech processed for near-end listening enhancement | Catalin Zorila, Yannis Stylianou |
2017-08-23 | 10:00-10:20 | A2 | Special Session: Digital Revolution for Under-resourced Languages 1 | SS | Wed-SS-6-2-1 | 180 | Team ELISA System for DARPA LORELEI Speech Evaluation 2016 | Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick B, Martin Karafiat, Lukas Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth Narayanan |
2017-08-23 | 10:20-10:40 | A2 | Special Session: Digital Revolution for Under-resourced Languages 1 | SS | Wed-SS-6-2-2 | 1558 | First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region | Peter Mihajlik, Lili Szabo, Balazs Tarjan, Andras Balog, Krisztina Rabai |
2017-08-23 | 10:40-11:00 | A2 | Special Session: Digital Revolution for Under-resourced Languages 1 | SS | Wed-SS-6-2-3 | 215 | The motivation and development of MPAi, a Māori Pronunication Aid. | Catherine Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, Jeanette King |
2017-08-23 | 11:00-11:20 | A2 | Special Session: Digital Revolution for Under-resourced Languages 1 | SS | Wed-SS-6-2-4 | 300 | On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling | Siyuan Feng, Tan Lee |
2017-08-23 | 11:20-11:40 | A2 | Special Session: Digital Revolution for Under-resourced Languages 1 | SS | Wed-SS-6-2-5 | 582 | Deep Autoencoder Based Multi-task Learning Using Probabilistic Transcriptions | Amit Das, Mark Hasegawa-Johnson, Karel Vesely |
2017-08-23 | 11:40-12:00 | A2 | Special Session: Digital Revolution for Under-resourced Languages 1 | SS | Wed-SS-6-2-6 | 160 | Areal and Phylogenetic Features for Multilingual Speech Synthesis | Alexander Gutkin, Richard Sproat |
2017-08-23 | 13:30-13:50 | A2 | Noise Robust Speech Recognition | O | Wed-O-7-2-1 | 901 | Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR | Purvi Agrawal, Sriram Ganapathy |
2017-08-23 | 13:50-14:10 | A2 | Noise Robust Speech Recognition | O | Wed-O-7-2-2 | 642 | Combined Multi-channel NMF-based Robust Beamforming for Noisy Speech Recognition | Masato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara |
2017-08-23 | 14:10-14:30 | A2 | Noise Robust Speech Recognition | O | Wed-O-7-2-3 | 305 | Recognizing Multi-talker Speech with Permutation Invariant Training | Dong Yu, Xuankai Chang, Yanmin Qian |
2017-08-23 | 14:30-14:50 | A2 | Noise Robust Speech Recognition | O | Wed-O-7-2-4 | 61 | Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information | Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken’ichi Furuya, Shinji Watanabe, Jonathan Le Roux |
2017-08-23 | 14:50-15:10 | A2 | Noise Robust Speech Recognition | O | Wed-O-7-2-5 | 211 | Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR | Erfan Loweimi, Jon Barker, Thomas Hain |
2017-08-23 | 15:10-15:30 | A2 | Noise Robust Speech Recognition | O | Wed-O-7-2-6 | 1570 | Robust Speech Recognition Via Anchor Word Representations | Brian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, SHK (Hari) Parthasarathi, Bjorn Hoffmeister |
2017-08-23 | 10:00-10:20 | Aula Magna | Speech Production and Physiology | O | Wed-O-6-1-1 | 285 | Aerodynamic features of French fricatives | Rosario Signorello, Sergio Hassid, Didier Demolin |
2017-08-23 | 10:20-10:40 | Aula Magna | Speech Production and Physiology | O | Wed-O-6-1-2 | 1126 | Inter-speaker variability: speaker normalisation and quantitative estimation of articulatory invariants in speech production for French | Antoine Serrurier, Pierre Badin, Louis-Jean Boe, Laurent Lamalle, Christiane Neuschaefer-Rube |
2017-08-23 | 10:40-11:00 | Aula Magna | Speech Production and Physiology | O | Wed-O-6-1-3 | 1190 | Comparison of Basic Beatboxing Articulations between Expert and Novice Artists using Real-Time Magnetic Resonance Imaging | Nimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth Narayanan |
2017-08-23 | 11:00-11:20 | Aula Magna | Speech Production and Physiology | O | Wed-O-6-1-4 | 1576 | Speaker-specific Biomechanical Model-based Investigation of a Simple Speech Task based on Tagged-MRI | Keyi Tang, Negar Mohaghegh Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney Fels |
2017-08-23 | 11:20-11:40 | Aula Magna | Speech Production and Physiology | O | Wed-O-6-1-5 | 1631 | Sounds of the Human Vocal Tract | Reed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth Narayanan |
2017-08-23 | 11:40-12:00 | Aula Magna | Speech Production and Physiology | O | Wed-O-6-1-6 | 1675 | A simulation study on the effect of glottal boundary conditions on vocal tract formants | Yasufumi Uezu, Tokihiko Kaburagi |
2017-08-23 | 13:30-13:50 | Aula Magna | Cognition and Brain Studies | O | Wed-O-7-1-1 | 73 | An entrained rhythm’s frequency, not phase, influences temporal sampling of speech | Hans Rutger Bosker, Anne Kösem |
2017-08-23 | 13:50-14:10 | Aula Magna | Cognition and Brain Studies | O | Wed-O-7-1-2 | 658 | Context regularity indexed by auditory N1 and P2 event-related potentials | Xiao Wang, Yanhui Zhang, Gang Peng |
2017-08-23 | 14:10-14:30 | Aula Magna | Cognition and Brain Studies | O | Wed-O-7-1-3 | 842 | Discovering Language in Marmoset Vocalization | Sakshi Verma, Lok Prateek Kotha, Karthik Pandia D S, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema Murthy |
2017-08-23 | 14:30-14:50 | Aula Magna | Cognition and Brain Studies | O | Wed-O-7-1-4 | 854 | Subject-independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response during Speech Perception | Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura |
2017-08-23 | 14:50-15:10 | Aula Magna | Cognition and Brain Studies | O | Wed-O-7-1-5 | 934 | The phonological status of the French Initial Accent and its role in semantic processing: an Event-Related Potentials study | Noemie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano |
2017-08-23 | 15:10-15:30 | Aula Magna | Cognition and Brain Studies | O | Wed-O-7-1-6 | 1741 | A Neuro-Experimental Evidence for the Motor Theory of Speech Perception | Bin Zhao, Jianwu Dang, Gaoyan Zhang |
2017-08-23 | 16:00-16:20 | Aula Magna | Speaker Database and Anti-spoofing | O | Wed-O-8-1-1 | 256 | Detection of Replay Attacks using Single Frequency Filtering Cepstral Coefficients | K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Anil Kumar Vuppala |
2017-08-23 | 16:20-16:40 | Aula Magna | Speaker Database and Anti-spoofing | O | Wed-O-8-1-2 | 1393 | Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection | Hardik Sailor, Madhu Kamble, Hemant Patil |
2017-08-23 | 16:40-17:00 | Aula Magna | Speaker Database and Anti-spoofing | O | Wed-O-8-1-3 | 836 | Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection | Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah |
2017-08-23 | 17:00-17:20 | Aula Magna | Speaker Database and Anti-spoofing | O | Wed-O-8-1-4 | 1758 | Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data | Achintya Sarkar, Md Sahidullah, Zheng-Hua Tan, Tomi Kinnunen |
2017-08-23 | 17:20-17:40 | Aula Magna | Speaker Database and Anti-spoofing | O | Wed-O-8-1-5 | 950 | VoxCeleb: A large-scale speaker identification dataset | Arsha Nagrani, Joon Son Chung, Andrew Zisserman |
2017-08-23 | 17:40-18:00 | Aula Magna | Speaker Database and Anti-spoofing | O | Wed-O-8-1-6 | 1521 | Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology | Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright |
2017-08-23 | 10:00-10:20 | B4 | Speech and Harmonic Analysis | O | Wed-O-6-4-1 | 1172 | A robust and alternative approach to zero frequency filtering method for epoch extraction | Gangamohan Paidi, Bayya Yegnanarayana |
2017-08-23 | 10:20-10:40 | B4 | Speech and Harmonic Analysis | O | Wed-O-6-4-2 | 21 | Improving YANGsaf F0 Estimator with Adaptive Kalman Filter | Kanru Hua |
2017-08-23 | 10:40-11:00 | B4 | Speech and Harmonic Analysis | O | Wed-O-6-4-3 | 1138 | A Spectro-Temporal Demodulation Technique for Pitch Estimation | Jitendra Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula |
2017-08-23 | 11:00-11:20 | B4 | Speech and Harmonic Analysis | O | Wed-O-6-4-4 | 1061 | Robust method for estimating F0 of complex tone based on pitch perception of amplitude modulated signal | Kenichiro Miwa, Masashi Unoki |
2017-08-23 | 11:20-11:40 | B4 | Speech and Harmonic Analysis | O | Wed-O-6-4-5 | 1254 | Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra | Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt |
2017-08-23 | 11:40-12:00 | B4 | Speech and Harmonic Analysis | O | Wed-O-6-4-6 | 68 | Harvest: A high-performance fundamental frequency estimator from speech signals | Masanori Morise |
2017-08-23 | 13:30-13:50 | B4 | Topic Spotting, Entity Extraction and Semantic Analysis | O | Wed-O-7-4-1 | 518 | Towards Zero-Shot Frame Semantic Parsing for Domain Scaling | Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck |
2017-08-23 | 13:50-14:10 | B4 | Topic Spotting, Entity Extraction and Semantic Analysis | O | Wed-O-7-4-2 | 1075 | ClockWork-RNN based architectures for Slot Filling | Despoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis |
2017-08-23 | 14:10-14:30 | B4 | Topic Spotting, Entity Extraction and Semantic Analysis | O | Wed-O-7-4-3 | 1482 | Investigating the Effect of ASR tuning on Named Entity Recognition | Mohamed Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset |
2017-08-23 | 14:30-14:50 | B4 | Topic Spotting, Entity Extraction and Semantic Analysis | O | Wed-O-7-4-4 | 1480 | Label-dependency coding in Simple Recurrent Networks for Spoken Language Understanding | Marco Dinarelli, Vedran Vukotic, Christian Raymond |
2017-08-23 | 14:50-15:10 | B4 | Topic Spotting, Entity Extraction and Semantic Analysis | O | Wed-O-7-4-5 | 590 | Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech | Zhong Meng, Biing-Hwang (Fred) Juang |
2017-08-23 | 15:10-15:30 | B4 | Topic Spotting, Entity Extraction and Semantic Analysis | O | Wed-O-7-4-6 | 1093 | Topic Identification for Speech without ASR | Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur |
2017-08-23 | 16:00-16:20 | B4 | Speech Translation | O | Wed-O-8-4-1 | 503 | Sequence-to-Sequence Models Can Directly Translate Foreign Speech | Ron Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen |
2017-08-23 | 16:20-16:40 | B4 | Speech Translation | O | Wed-O-8-4-2 | 944 | Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation | Takatomo Kano, Sakriani Sakti, Satoshi Nakamura |
2017-08-23 | 16:40-17:00 | B4 | Speech Translation | O | Wed-O-8-4-3 | 1690 | Assessing the tolerance of Neural Machine Translation systems against Speech Recognition Errors | Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico |
2017-08-23 | 17:00-17:20 | B4 | Speech Translation | O | Wed-O-8-4-4 | 896 | Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis | Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura |
2017-08-23 | 17:20-17:40 | B4 | Speech Translation | O | Wed-O-8-4-5 | 1320 | NMT-based Segmentation and Punctuation Insertion for Real-time Spoken Language Translation | Eunah Cho, Jan Niehues, Alex Waibel |
2017-08-23 | 10:00-10:20 | C6 | Dialog and Prosody | O | Wed-O-6-6-1 | 1159 | Prosodic Event Recognition using Convolutional Neural Networks with Context Information | Sabrina Stehwien, Ngoc Thang Vu |
2017-08-23 | 10:20-10:40 | C6 | Dialog and Prosody | O | Wed-O-6-6-2 | 453 | Prosodic Facilitation and Interference while Judging on the Veracity of Synthesized Statements | Ramiro H. Galvez, Štefan Beňuš, Agustín Gravano, Marian Trnka |
2017-08-23 | 10:40-11:00 | C6 | Dialog and Prosody | O | Wed-O-6-6-3 | 811 | An investigation of pitch matching across adjacent turns in a corpus of spontaneous German | Margaret Zellers, Antje Schweitzer |
2017-08-23 | 11:00-11:20 | C6 | Dialog and Prosody | O | Wed-O-6-6-4 | 795 | The Relationship between F0 Synchrony and Speech Convergence in Dyadic Interaction | Sankar Mukherjee, Alessandro D’Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino |
2017-08-23 | 11:20-11:40 | C6 | Dialog and Prosody | O | Wed-O-6-6-5 | 424 | The role of linguistic and prosodic cues on the prediction of self-reported satisfaction in contact centre phone calls | Jordi Luque, Ariadna Sánchez, Carlos Segura, Martí Umbert, Luis Ángel Galindo |
2017-08-23 | 11:40-12:00 | C6 | Dialog and Prosody | O | Wed-O-6-6-6 | 124 | Cross-linguistic study of the production of turn-taking cues in American English and Argentine Spanish | Pablo Brusco, Agustin Gravano, Juan Manuel Pérez |
2017-08-23 | 13:30-13:50 | C6 | Dialog Systems | O | Wed-O-7-6-1 | 1326 | An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog | Bing Liu, Ian Lane |
2017-08-23 | 13:50-14:10 | C6 | Dialog Systems | O | Wed-O-7-6-2 | 1060 | Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates | Heriberto Cuayahuitl, Seunghak Yu |
2017-08-23 | 14:10-14:30 | C6 | Dialog Systems | O | Wed-O-7-6-3 | 1574 | Towards End-to-End Spoken Dialogue Systems with Turn Embeddings | Ali Orkan Bayer, Evgeny Stepanov, Giuseppe Riccardi |
2017-08-23 | 14:30-14:50 | C6 | Dialog Systems | O | Wed-O-7-6-4 | 501 | Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction | Oleg Akhtiamov, Maxim Sidorov, Alexey Karpov, Wolfgang Minker |
2017-08-23 | 14:50-15:10 | C6 | Dialog Systems | O | Wed-O-7-6-5 | 1205 | Rushing to Judgement: How Do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human–Machine Dialog? | Vikram Ramanarayanan, Chee Wee (Ben) Leong, David Suendermann-Oeft |
2017-08-23 | 15:10-15:30 | C6 | Dialog Systems | O | Wed-O-7-6-6 | 753 | Hyperarticulation of Corrections in Multilingual Dialogue Systems | Ivan Kraljevski, Diane Hirschfeld |
2017-08-23 | 16:00-16:20 | C6 | Multi-channel Speech Enhancement | O | Wed-O-8-6-1 | 187 | Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings | Lukas Drude, Reinhold Haeb-Umbach |
2017-08-23 | 16:20-16:40 | C6 | Multi-channel Speech Enhancement | O | Wed-O-8-6-2 | 667 | Speaker-aware neural network based beamformer for speaker extraction in speech mixtures | Katerina Zmolikova, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani |
2017-08-23 | 16:40-17:00 | C6 | Multi-channel Speech Enhancement | O | Wed-O-8-6-3 | 1186 | Eigenvector-based Speech Mask Estimation using Logistic Regression | Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf |
2017-08-23 | 17:00-17:20 | C6 | Multi-channel Speech Enhancement | O | Wed-O-8-6-4 | 1458 | Real-time Speech Enhancement with GCC-NMF | Sean Wood, Jean Rouat |
2017-08-23 | 17:20-17:40 | C6 | Multi-channel Speech Enhancement | O | Wed-O-8-6-5 | 1464 | Coherence-based dual-channel noise reduction algorithm in a complex noisy environment | Youna Ji, Jun Byun, Young-cheol Park |
2017-08-23 | 17:40-18:00 | C6 | Multi-channel Speech Enhancement | O | Wed-O-8-6-6 | 1659 | Glottal Model Based Speech Beamforming for Ad-Hoc Microphone Arrays | Yang Zhang, Dinei Florencio, Mark Hasegawa-Johnson |
2017-08-23 | 10:00-10:20 | D8 | Social Signals, Styles, and Interaction | O | Wed-O-6-8-1 | 87 | Emotional Features for Speech Overlaps Classification | Olga Egorow, Andreas Wendemuth |
2017-08-23 | 10:20-10:40 | D8 | Social Signals, Styles, and Interaction | O | Wed-O-6-8-2 | 563 | Computing Multimodal Dyadic Behaviors during Spontaneous Diagnosis Interviews toward Automatic Categorization of Autism Spectrum Disorder | Chin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee |
2017-08-23 | 10:40-11:00 | D8 | Social Signals, Styles, and Interaction | O | Wed-O-6-8-3 | 569 | Deriving Dyad-Level Interaction Representation using Interlocutors Structural and Expressive Multimodal Behavior Features | Yun-Shao Lin, Chi-Chun Lee |
2017-08-23 | 11:00-11:20 | D8 | Social Signals, Styles, and Interaction | O | Wed-O-6-8-4 | 635 | Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective | Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn Schuller |
2017-08-23 | 11:20-11:40 | D8 | Social Signals, Styles, and Interaction | O | Wed-O-6-8-5 | 932 | Optimized Time Series Filters for Detecting Laughter and Filler Events | Gábor Gosztolya |
2017-08-23 | 11:40-12:00 | D8 | Social Signals, Styles, and Interaction | O | Wed-O-6-8-6 | 1633 | Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement within TED Talks. | Fasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell |
2017-08-23 | 13:30-13:50 | D8 | Lexical and Pronunciation Modeling | O | Wed-O-7-8-1 | 1436 | Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion | Benjamin Milde, Christoph Schmidt, Joachim Köhler |
2017-08-23 | 13:50-14:10 | D8 | Lexical and Pronunciation Modeling | O | Wed-O-7-8-2 | 588 | Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework | Xiaohui Zhang, Vimal Manohar, Dan Povey, Sanjeev Khudanpur |
2017-08-23 | 14:10-14:30 | D8 | Lexical and Pronunciation Modeling | O | Wed-O-7-8-3 | 1081 | Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text | Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig |
2017-08-23 | 14:30-14:50 | D8 | Lexical and Pronunciation Modeling | O | Wed-O-7-8-4 | 103 | Improved subword modeling for WFST-based speech recognition | Peter Smit, Sami Virpioja, Mikko Kurimo |
2017-08-23 | 14:50-15:10 | D8 | Lexical and Pronunciation Modeling | O | Wed-O-7-8-5 | 47 | Pronunciation learning with RNN-transducers | Antoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Francoise Beaufays |
2017-08-23 | 15:10-15:30 | D8 | Lexical and Pronunciation Modeling | O | Wed-O-7-8-6 | 1117 | Learning Similarity Functions for Pronunciation Variations | Einat Naaman, Yossi Adi, Joseph Keshet |
2017-08-23 | 16:00-16:20 | D8 | Speech Recognition: Applications in Medical Practice | O | Wed-O-8-8-1 | 280 | Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-level ASR Posterior Features | Yuanyuan Liu, Tan Lee, P.C. Ching, Thomas K.T. Law, Kathy Y.S. Lee |
2017-08-23 | 16:20-16:40 | D8 | Speech Recognition: Applications in Medical Practice | O | Wed-O-8-8-2 | 303 | Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech | Emre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik |
2017-08-23 | 16:40-17:00 | D8 | Speech Recognition: Applications in Medical Practice | O | Wed-O-8-8-3 | 455 | Improving child speech disorder assessment by incorporating out-of-domain adult speech | Daniel Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera-Tawil, Angela Morgan |
2017-08-23 | 17:00-17:20 | D8 | Speech Recognition: Applications in Medical Practice | O | Wed-O-8-8-4 | 878 | On Improving Acoustic Models For TORGO Dysarthric Speech Database | Neethu Mariam Joy, Srinivasan Umesh, Basil Abraham |
2017-08-23 | 17:20-17:40 | D8 | Speech Recognition: Applications in Medical Practice | O | Wed-O-8-8-5 | 1251 | Glottal Source Features for Automatic Speech-based Depression Assessment | Olympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke |
2017-08-23 | 17:40-18:00 | D8 | Speech Recognition: Applications in Medical Practice | O | Wed-O-8-8-6 | 1712 | Speech Processing Approach for Diagnosing Dementia in an Early Stage | Roozbeh Sadeghian, J. David Schaffer, Stephen Zahorian |
2017-08-23 | 10:00-10:20 | E10 | Acoustic Model Adaptation | O | Wed-O-6-10-1 | 519 | Large-Scale Domain Adaptation via Teacher-Student Learning | Jinyu Li, Michael Seltzer, Xi Wang, Rui Zhao, Yifan Gong |
2017-08-23 | 10:20-10:40 | E10 | Acoustic Model Adaptation | O | Wed-O-6-10-2 | 302 | Improving Children’s Speech Recognition through Explicit Pitch Scaling based on Iterative Spectrogram Inversion | Waquar Ahmad, Syed Shahnawazuddin, Hemant Kumar Kathania, Gayadhar Pradhan, A. B. Samaddar |
2017-08-23 | 10:40-11:00 | E10 | Acoustic Model Adaptation | O | Wed-O-6-10-3 | 368 | RNN-LDA Clustering for Feature Based DNN Adaptation | Xurong Xie, Xunying Liu, Tan Lee, Lan Wang |
2017-08-23 | 11:00-11:20 | E10 | Acoustic Model Adaptation | O | Wed-O-6-10-4 | 1342 | Robust online i-vectors for unsupervised adaptation of DNN acoustic models: A study in the context of digital voice assistants | Harish Arsikere, Sri Garimella |
2017-08-23 | 11:20-11:40 | E10 | Acoustic Model Adaptation | O | Wed-O-6-10-5 | 1446 | Semi-supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control | Ajay Srinivasamurthy, Petr Motlicek, Ivan Himawan, Gyorgy Szaszak, Youssef Oualil, Hartmut Helmke |
2017-08-23 | 11:40-12:00 | E10 | Acoustic Model Adaptation | O | Wed-O-6-10-6 | 556 | Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition | Taesup Kim, Inchul Song, Yoshua Bengio |
2017-08-23 | 13:30-13:50 | E10 | Language Recognition | O | Wed-O-7-10-1 | 1334 | Spoken Language Identification using LSTM-based Angular Proximity | Gregory Gelly, Jean-Luc Gauvain |
2017-08-23 | 13:50-14:10 | E10 | Language Recognition | O | Wed-O-7-10-2 | 44 | End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling | Ma Jin, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai |
2017-08-23 | 14:10-14:30 | E10 | Language Recognition | O | Wed-O-7-10-3 | 576 | Dialect Recognition Based on Unsupervised Bottleneck Features | Qian Zhang, John H.L. Hansen |
2017-08-23 | 14:30-14:50 | E10 | Language Recognition | O | Wed-O-7-10-4 | 596 | Investigating Scalability in Hierarchical Language Identification System | Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li |
2017-08-23 | 14:50-15:10 | E10 | Language Recognition | O | Wed-O-7-10-5 | 245 | Improving Sub-phone Modeling for Better Native Language Identification with Non-native English Speech | Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A Pugh, Patrick L Lange, Hillary R Molloy, Frank K Soong |
2017-08-23 | 15:10-15:30 | E10 | Language Recognition | O | Wed-O-7-10-6 | 1391 | QMDIS: QCRI-MIT Advanced Dialect Identification System | Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James Glass |
2017-08-23 | 16:00-16:20 | E10 | Language models for ASR | O | Wed-O-8-10-1 | 1203 | Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals | Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro |
2017-08-23 | 16:20-16:40 | E10 | Language models for ASR | O | Wed-O-8-10-2 | 1598 | Semi-supervised Adaptation of RNNLMs by Fine-tuning with Domain-specific Auxiliary Features | Salil Deena, Raymond W. M. Ng, Pranava Madhyastha, Lucia Specia, Thomas Hain |
2017-08-23 | 16:40-17:00 | E10 | Language models for ASR | O | Wed-O-8-10-3 | 147 | Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition | Mittul Singh, Youssef Oualil, Dietrich Klakow |
2017-08-23 | 17:00-17:20 | E10 | Language models for ASR | O | Wed-O-8-10-4 | 493 | Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap | Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy |
2017-08-23 | 17:20-17:40 | E10 | Language models for ASR | O | Wed-O-8-10-5 | 426 | Multi-scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions | Manoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas Lyon, Shrikanth Narayanan |
2017-08-23 | 17:40-18:00 | E10 | Language models for ASR | O | Wed-O-8-10-6 | 1790 | Using Knowledge Graph And Search Query Click Logs in Statistical Language Model For Speech Recognition | Weiwu Zhu |
2017-08-23 | 10:00-12:00 | E306 | Show & Tell 5 | S&T | Wed-S&T-6-A-1 | 10022 | Creating a Voice for MiRo, the World’s First Commercial Biomimetic Robot | Roger Moore, Ben Mitchinson |
2017-08-23 | 10:00-12:00 | E306 | Show & Tell 5 | S&T | Wed-S&T-6-A-2 | 10023 | A Thematicity-based Prosody Enrichment Tool for CTS | Monica Dominguez, Mireia Farrús, Leo Wanner |
2017-08-23 | 10:00-12:00 | E306 | Show & Tell 5 | S&T | Wed-S&T-6-A-3 | 10024 | WebSubDub – Experimental system for creating high-quality alternative audio track for TV broadcasting | Martin Grůber, Jindrich Matousek, Zdeněk Hanzlíček, Jakub Vít, Daniel Tihelka |
2017-08-23 | 10:00-12:00 | E306 | Show & Tell 5 | S&T | Wed-S&T-6-A-4 | 10026 | Voice Conservation and TTS System for People Facing Total Laryngectomy | Markéta Jůzová, Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlicek |
2017-08-23 | 10:00-12:00 | E306 | Show & Tell 5 | S&T | Wed-S&T-6-A-5 | 10042 | TBT(Toolkit to Build TTS): A High Performance Framework to build Multiple Language HTS Voice | Atish Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, Aswin Shanmugam, Sasikumar Mukundan, Hema Murthy |
2017-08-23 | 10:00-12:00 | E306 | Show & Tell 5 | S&T | Wed-S&T-6-A-6 | 10046 | SIAK – A Game for Foreign Language Pronunciation Learning | Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Maria Uther, Katja Junttila, Perttu Hämäläinen, Mikko Kurimo |
2017-08-23 | 13:30-15:30 | E306 | Show & Tell 5 | S&T | Wed-S&T-7-A-1 | 10022 | Creating a Voice for MiRo, the World’s First Commercial Biomimetic Robot | Roger Moore, Ben Mitchinson |
2017-08-23 | 13:30-15:30 | E306 | Show & Tell 5 | S&T | Wed-S&T-7-A-2 | 10023 | A Thematicity-based Prosody Enrichment Tool for CTS | Monica Dominguez, Mireia Farrús, Leo Wanner |
2017-08-23 | 13:30-15:30 | E306 | Show & Tell 5 | S&T | Wed-S&T-7-A-3 | 10024 | WebSubDub – Experimental system for creating high-quality alternative audio track for TV broadcasting | Martin Grůber, Jindrich Matousek, Zdeněk Hanzlíček, Jakub Vít, Daniel Tihelka |
2017-08-23 | 13:30-15:30 | E306 | Show & Tell 5 | S&T | Wed-S&T-7-A-4 | 10026 | Voice Conservation and TTS System for People Facing Total Laryngectomy | Markéta Jůzová, Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlicek |
2017-08-23 | 13:30-15:30 | E306 | Show & Tell 5 | S&T | Wed-S&T-7-A-5 | 10042 | TBT(Toolkit to Build TTS): A High Performance Framework to build Multiple Language HTS Voice | Atish Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, Aswin Shanmugam, Sasikumar Mukundan, Hema Murthy |
2017-08-23 | 13:30-15:30 | E306 | Show & Tell 5 | S&T | Wed-S&T-7-A-6 | 10046 | SIAK – A Game for Foreign Language Pronunciation Learning | Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Maria Uther, Katja Junttila, Perttu Hämäläinen, Mikko Kurimo |
2017-08-23 | 10:00-12:00 | E397 | Show & Tell 6 | S&T | Wed-S&T-6-B-1 | 10029 | Integrating the Talkamatic Dialogue Manager with Alexa | Staffan Larsson, Fredrik Kronlid, Andreas Krona, Alex Berman |
2017-08-23 | 10:00-12:00 | E397 | Show & Tell 6 | S&T | Wed-S&T-6-B-2 | 10031 | A Robust Medical Speech-to-Speech/Speech-to-Sign Phraselator | Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis |
2017-08-23 | 10:00-12:00 | E397 | Show & Tell 6 | S&T | Wed-S&T-6-B-3 | 10041 | Towards an Autarkic Embedded Cognitive User Interface | Frank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff |
2017-08-23 | 10:00-12:00 | E397 | Show & Tell 6 | S&T | Wed-S&T-6-B-4 | 10050 | Nora the Empathetic Psychologist | Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung |
2017-08-23 | 10:00-12:00 | E397 | Show & Tell 6 | S&T | Wed-S&T-6-B-5 | 10057 | Modifying Amazon’s Alexa ASR Grammar and Lexicon – A Case Study | Aman Kumar, Hassan Alam, Manan Vyas, Tina Werner, Rachmat Hartono |
2017-08-23 | 13:30-15:30 | E397 | Show & Tell 6 | S&T | Wed-S&T-7-B-1 | 10029 | Integrating the Talkamatic Dialogue Manager with Alexa | Staffan Larsson, Fredrik Kronlid, Andreas Krona, Alex Berman |
2017-08-23 | 13:30-15:30 | E397 | Show & Tell 6 | S&T | Wed-S&T-7-B-2 | 10031 | A Robust Medical Speech-to-Speech/Speech-to-Sign Phraselator | Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis |
2017-08-23 | 13:30-15:30 | E397 | Show & Tell 6 | S&T | Wed-S&T-7-B-3 | 10041 | Towards an Autarkic Embedded Cognitive User Interface | Frank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff |
2017-08-23 | 13:30-15:30 | E397 | Show & Tell 6 | S&T | Wed-S&T-7-B-4 | 10050 | Nora the Empathetic Psychologist | Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung |
2017-08-23 | 13:30-15:30 | E397 | Show & Tell 6 | S&T | Wed-S&T-7-B-5 | 10057 | Modifying Amazon’s Alexa ASR Grammar and Lexicon – A Case Study | Aman Kumar, Hassan Alam, Manan Vyas, Tina Werner, Rachmat Hartono |
2017-08-23 | 10:00-10:20 | F11 | Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition | SS | Wed-SS-6-11-5 | 1443 | Top-down versus bottom-up theories of phonological acquisition: A big data approach | Christina Bergmann, Sho Tsuji, Alejandrina Cristia |
2017-08-23 | 10:20-10:40 | F11 | Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition | SS | Wed-SS-6-11-3 | 1409 | What do babies hear? Analyses of child- and adult-directed speech | Marisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne Warlaumont, Elika Bergelson |
2017-08-23 | 10:40-11:00 | F11 | Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition | SS | Wed-SS-6-11-2 | 1287 | The LENA system applied to Swedish: Reliability of the Adult Word Count estimate | Iris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund |
2017-08-23 | 11:00-11:20 | F11 | Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition | SS | Wed-SS-6-11-6 | 1468 | Which acoustic and phonological factors shape infants’ vowel discrimination? Exploiting natural variation in InPhonDB | Sho Tsuji, Alejandrina Cristia |
2017-08-23 | 11:20-11:40 | F11 | Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition | SS | Wed-SS-6-11-4 | 1418 | A New Workflow for Semi-automatized Annotations: Tests with Long-Form Naturalistic Recordings of Children’s Language Environments | Marisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristia, Melanie Soderstrom, Mark VanDam, Han Sloetjes |
2017-08-23 | 11:40-12:00 | F11 | Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition | SS | Wed-SS-6-11-1 | 636 | SLPAnnotator: Tools for implementing Sign Language Phonetic Annotation | Kathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman |
2017-08-23 | 13:30-13:50 | F11 | Special Session: Computational Models in Child Language Acquisition | SS | Wed-SS-7-11-2 | 937 | Relating unsupervised word segmentation to reported vocabulary acquisition | Elin Larsen, Alejandrina Cristia, Emmanuel Dupoux |
2017-08-23 | 13:50-14:10 | F11 | Special Session: Computational Models in Child Language Acquisition | SS | Wed-SS-7-11-5 | 1634 | Approximating phonotactic input in children’s linguistic environments from orthographic transcripts | Sofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam |
2017-08-23 | 14:10-14:30 | F11 | Special Session: Computational Models in Child Language Acquisition | SS | Wed-SS-7-11-4 | 1289 | Computational simulations of temporal vocalization behavior in adult-child interaction | Ellen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson |
2017-08-23 | 14:30-14:50 | F11 | Special Session: Computational Models in Child Language Acquisition | SS | Wed-SS-7-11-3 | 1143 | Modelling the Informativeness of Non-Verbal Cues in Parent–Child Interaction | Mats Wirén, Kristina N. Björkenstam, Robert Östling |
2017-08-23 | 14:50-15:10 | F11 | Special Session: Computational Models in Child Language Acquisition | SS | Wed-SS-7-11-6 | 1689 | Learning weakly-supervised multimodal phoneme embeddings | Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux |
2017-08-23 | 15:10-15:30 | F11 | Special Session: Computational Models in Child Language Acquisition | SS | Wed-SS-7-11-1 | 520 | Multi-Task Learning for Mispronunciation Detection on Singapore Children’s Mandarin Speech | Rong Tong, Nancy Chen, Bin Ma |
2017-08-23 | 16:00-16:10 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-10 | Introduction | Melissa Barkat-Defradas, John Ohala | |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-1 | 130 | Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space | Yasunari Obuchi |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-2 | 142 | The role of temporal amplitude modulations in the political arena: Hillary Clinton vs. Donald Trump | Hans Rutger Bosker |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-3 | 326 | Perceptual Ratings of Voice Likability Collected through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing | Laura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-4 | 367 | Attractiveness of French voices for German listeners – results from native and non-native read speech | Juergen Trouvain, Frank Zimmerer |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-5 | 833 | Social Attractiveness in Dialogs | Antje Schweitzer, Natalie Lewandowski, Daniel Duran |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-6 | 1349 | A gender bias in the acoustic-melodic features of charismatic speech? | Eszter Novak-Tot, Oliver Niebuhr, Aoju Chen |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-7 | 1520 | Pitch convergence as an effect of perceived attractiveness and likability | Jan Michalsky, Heike Schoormann |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-8 | 1691 | Does Posh English Sound Attractive? | Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu |
2017-08-23 | 16:10-17:40 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-9 | 1697 | Large-scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings | Timo Baumann |
2017-08-23 | 17:40-18:00 | F11 | Special Session: Voice Attractiveness | SS | Wed-SS-8-11-11 | Discussion | Melissa Barkat-Defradas, John Ohala | |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-1 | 166 | Developing On-Line Speaker Diarization System | Dimitrios Dimitriadis, Petr Fousek |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-2 | 339 | Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing | Shreyas Seshadri, Ulpu Remes, Okko Räsänen |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-3 | 1541 | Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords | Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-4 | 388 | Off-topic Spoken Response Detection with Word Embeddings | Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-5 | 464 | Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models | Wei Li, Nancy F Chen, Sabato Marco Siniscalchi, Chin-Hui Lee |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-6 | 750 | Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides | Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-7 | 952 | Multiview Representation Learning via Deep CCA for Silent Speech Recognition | Myungjong Kim, Beiming Cao, Ted Mau, Jun Wang |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-8 | 978 | Use of Graphemic Lexicons for Spoken Language Assessment | Kate Knill, Mark Gales, Kostas Kyriakopoulos, Anton Ragni, Yu Wang |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-9 | 1079 | Distilling Knowledge from an Ensemble of Models for Punctuation Prediction | Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-10 | 1274 | A Mostly Data-driven Approach to Inverse Text Normalization | Ernest Pusateri, Bharat Ambati, Elizabeth Brooks, Ondrej Platek, Donald McAllaster, Venki Nagesha |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-11 | 1567 | Mismatched Crowdsourcing From Multiple Annotator Languages For Recognizing Zero-resourced Languages: A Nullspace Clustering Approach | Wenda Chen, Mark Hasegawa-Johnson, Nancy Chen, Boon Pang Lim |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-12 | 1710 | Experiments in Character-level Neural Network Models for Punctuation | William Gale, Sarangarajan Parthasarathy |
2017-08-23 | 10:00-12:00 | Poster 1 | Speech Recognition: Technologies for New Applicaitions and Paradigms | P | Wed-P-6-1-13 | 1778 | Multi-Channel Apollo Mission Speech Transcript Calibration | Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-1 | 1407 | The ABAIR initiative: Bringing Spoken Irish into the Digital Space | Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-2 | 880 | Very low resource radio browsing for agile developmental and humanitarian monitoring | Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-3 | 226 | Extracting Situation Frames from non-English Speech: Evaluation Framework and Pilot Results | Nikolaos Malandrakis, Ondrej Glembek, Shrikanth Narayanan |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-4 | 855 | Eliciting meaningful units from speech | Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-5 | 1476 | Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications | Saurabhchand Bhati, Shekhar Nayak, Sri Rama Murty Kodukula |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-6 | 268 | Machine Assisted Analysis of Vowel Length Contrasts in Wolof | Elodie Gauthier, Laurent Besacier, Sylvie Voisin |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-7 | 1262 | Leveraging Text Data for Word Segmentation for Underresourced Languages | Thomas Glarner, Benedikt Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-8 | 1129 | Improving DNN Bluetooth Narrowband Acoustic Models by Cross-bandwidth and Cross-lingual Initialization | Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-9 | 1028 | Joint Estimation of Articulatory Features and Acoustic models for Low-Resource Languages | Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-10 | 1009 | Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages | Basil Abraham, Tejaswi Seeram, Srinivasan Umesh |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-11 | 903 | Building an ASR corpus using Althingi’s Parliamentary Speeches | Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jon Gudnason |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-12 | 928 | Implementation of a Radiology Speech Recognition System for Estonian using Open Source Software | Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-13 | 1352 | Building ASR corpora using Eyra | Jon Gudnason, Matthías Pétursson, Róbert Kjaran, Simon Kluepfel, Anna Nikulásdóttir |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-14 | 1139 | Rapid development of TTS corpora for four South African languages | Daniel Van Niekerk, Charl Van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-15 | 37 | Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages | Alexander Gutkin |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-16 | 1398 | Nativization of foreign names in TTS for automatic reading of world news in Swahili | Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-17 | Panelist poster 1 | Claudia Soria | |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-18 | Panelist poster 2 | Alexey Karpov | |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-19 | Panelist poster 3 | Emmanuel Dupoux | |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-20 | Panelist poster 4 | Mary Harper | |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-21 | Panelist poster 5 | Sebastian Stueker | |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-22 | Panelist poster 6 | Sanjeev Khudanpur | |
2017-08-23 | 13:30-15:30 | Poster 1 | Special Session: Digital Revolution for Under-resourced Languages 2 | SS | Wed-SS-7-1-23 | Panelist poster 7 | Linne Ha | |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-1 | 291 | Trisyllabic tone 3 sandhi patterns in Mandarin produced by Cantonese speakers | Jung-Yueh Tu, Janice Wing-Sze Wong, Jih-Ho Cha |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-2 | 840 | Intonation of contrastive topic in Estonian | Heete Sahkai, Meelis Mihkla |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-3 | 1235 | Reanalyze Fundamental Frequency Peak Delay in Mandarin | Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-4 | 1430 | How does the absence of shared knowledge between interlocutors affect the production of French prosodic forms? | Amandine Michelas, Cécile Cau, Maud Champagne-Lavau |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-5 | 1500 | Three Dimensions of Sentence Prosody and their (Non-)Interactions | Michael Wagner, Michael McAuliffe |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-6 | 710 | Using Prosody to Classify Discourse Relations | Janine Kleinhans, Mireia Farrús, Agustin Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-7 | 1585 | Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech | Elizabeth Godoy, James Williamson, Thomas Quatieri |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-8 | 1237 | Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions | Sofoklis Kakouros, Okko Räsänen, Paavo Alku |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-9 | 1578 | Creaky voice as a function of tonal categories and prosodic boundaries | Jianjing Kuang |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-10 | 417 | The Acoustics of Word Stress in Czech as a Function of Speaking Style | Radek Skarnitzl, Anders Eriksson |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-11 | 177 | What You See Is What You Get Prosodically Less – Visibility Shapes Prosodic Prominence Production in Spontaneous Interaction | Petra Wagner, Nataliya Bryhadyr |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-12 | 1167 | Focus Acoustics in Mandarin Nominals | Yu-Yin Hsu, Anqi Xu |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-13 | 1502 | Exploring multidimensionality: Acoustic and articulatory correlates of Swedish word accents | Malin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald |
2017-08-23 | 16:00-18:00 | Poster 1 | Prosody | P | Wed-P-8-1-14 | 1279 | The Perception of English Intonation Patterns by German L2 speakers of English | Karin Puga, Robert Fuchs, Jane Setter, Peggy Mok |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-1 | 530 | Calibration Approaches for Language Detection | Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-2 | 286 | Bidirectional Modelling for Short Duration Language Identification | Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-3 | 553 | Conditional Generative Adversarial Nets Classifier for Spoken Language Identification | Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-4 | 1314 | Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition | Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida Solano |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-5 | 923 | Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels | Sungrack Yun, Hye Jin Jang, Taesu Kim |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-6 | 84 | Domain Adaptation of PLDA models in Broadcast Diarization by means of Unsupervised Speaker Clustering | Ignacio Viñals, Alfonso Ortega, Jesus Villalba, Antonio Miguel, Eduardo Lleida Solano |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-7 | 407 | LSTM Neural Network-based Speaker Segmentation using Acoustic and Language Modelling | Miquel Angel India Massana, José A. R. Fonollosa, Javier Hernando |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-8 | 1311 | Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization | Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-Francois Bonastre |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-9 | 152 | Homogeneity Measure Impact on Target and Non-target Trials in Forensic Voice Comparison | Moez Ajili, Jean-Francois Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-10 | 1023 | Null-Hypothesis LLR: A proposal for Forensic Automatic Speaker Recognition | Yosef A. Solewicz, Michael Jessen, David van der Vloed |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-11 | 997 | The Opensesame NIST 2016 Speaker Recognition Evaluation System | Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-12 | 1307 | IITG-Indigo System for NIST 2016 SRE Challenge | Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B K, Harish Kashyap, Sri Rama Murty Kodukula, Sriram Ganapathy, Rohit Sinha, S R Mahadeva Prasanna |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-13 | 581 | Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification | Abhinav Misra, Shivesh Ranjan, John H.L. Hansen |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-14 | 545 | Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition | Suwon Shon, Seongkyu Mun, Hanseok Ko |
2017-08-23 | 10:00-12:00 | Poster 2 | Speaker and Language Recognition Applications | P | Wed-P-6-2-15 | 137 | A Generative Model for Score Normalization in Speaker Recognition | Albert Swart, Niko Brummer |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-1 | 1720 | Mental Representation of Japanese Mora: focusing on intrinsic duration | Kosuke Sugai |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-2 | 765 | Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian English | Jia Ying, Christopher Carignan, Jason Shaw, Michael Proctor, Donald Derrick, Catherine Best |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-3 | 1154 | Vowel and Consonant Sequences in three Bavarian varieties in Austria | Nicola Klingler, Sylvia Moosmüller, Hannes Scheutz |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-4 | 1609 | Acoustic cues to the singleton-geminate contrast: the case of Libyan Arabic sonorants | Amel Issa |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-5 | 838 | Mel-cepstral distortion of German vowels in different information density contexts | Erika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-6 | 1161 | Effect of formant and F0 discontinuity on perceived vowel duration: Impacts for concatenative speech synthesis | Tomáš Bořil, Pavel Šturm, Radek Skarnitzl, Jan Volín |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-7 | 578 | An ultrasound study of alveolar and retroflex consonants in Arrernte: stressed and unstressed syllables | Marija Tabain, Richard Beare |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-8 | 1140 | Reshaping the transformed LF model: generating the glottal source from the waveshape parameter Rd | Christer Gobl |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-9 | 722 | Kinematic signatures of prosody in Lombard speech | Štefan Beňuš, Juraj Šimko, Mona Lehtinen |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-10 | 1285 | What do Finnish and Central Bavarian have in common? Towards an acoustically based quantity typology | Markus Jochim, Felicitas Kleber |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-11 | 1027 | Locating burst onsets using SFF envelope and phase information | Bhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Bayya Yegnanarayana |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-12 | 876 | A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin Chinese | Hongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang |
2017-08-23 | 13:30-15:30 | Poster 2 | Articulatory and Acoustic Phonetics | P | Wed-P-7-2-13 | 1306 | A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability | Thomas Schatz, Rory Turnbull, Francis Bach, Emmanuel Dupoux |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-1 | 104 | The Perception of Emotions in Noisified Nonsense Speech | Emilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn Schuller |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-2 | 218 | Attention Networks for Modeling Behavior in Addiction Counseling | James Gibson, Dogan Can, Panayiotis Georgiou, David Atkins, Shrikanth Narayanan |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-3 | 466 | Computational Analysis of Acoustic Descriptors in Psychotic Patients | Torsten Wörtwein, Tadas Baltrušaitis, Eugene Laksana, Luciana Pennant, Elizabeth Liebson, Dost Öngür, Justin Baker, Louis-Philippe Morency |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-4 | 562 | Modeling Perceivers Neural-Responses using Lobe-dependent Convolutional Neural Network to Improve Speech Emotion Recognition | Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-5 | 887 | Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition | Bogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn Schuller |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-6 | 1379 | Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets | Farhad Bin Siddique, Pascale Fung |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-7 | 994 | Emotion category mapping to emotional space by cross-corpus emotion labeling | Yoshiko Arimoto, Hiroki Mori |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-8 | 1194 | Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus | Cédric Fayet, Arnaud Delhay, Damien Lolive, Pierre-Francois Marteau |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-9 | 1584 | Speech Rate Comparison when Talking to a System and Talking to a Human: A study from a Speech-to-Speech, Machine Translation mediated Map Task | Akira Hayakawa, Carl Vogel, Saturnino Luz, Nick Campbell |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-10 | 1621 | Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings | Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-11 | 1641 | Complexity in speech and its relation to emotional bond in therapist-patient interactions during suicide risk assessment interviews | Md Nasir, Brian Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis Georgiou |
2017-08-23 | 16:00-18:00 | Poster 2 | Speaker States and Traits | P | Wed-P-8-2-12 | 1707 | An Investigation of Emotion Dynamics and Kalman Filtering for Speech-based Emotion Prediction | Zhaocheng Huang, Julien Epps |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-1 | 1592 | Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings | Shane Settle, Keith Levin, Herman Kamper, Karen Livescu |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-2 | 634 | Constructing Acoustic Distances between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection | Daisuke Kaneko, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-3 | 1367 | Fast and Accurate OOV Decoder on High-Level Features | Yuri Khokhlov, Natalia Tomashenko, Ivan Medennikov, Aleksei Romanenko |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-4 | 612 | Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval | Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-5 | 893 | Incorporating Acoustic Features for Spontaneous Speech driven Content Retrieval | Hiroto Tasaki, Tomoyosi Akiba |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-6 | 862 | Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification | Bo Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-yi Lee, Lin-shan Lee |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-7 | 1752 | Automatic Alignment between Classroom Lecture Utterances and Slide Components | Masatoshi Tsuchiya, Ryo Minamiguchi |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-8 | 1183 | Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion | Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-9 | 516 | Zero-Shot Learning across Heterogenous Overlapping Domains | Anjishnu Kumar, Pavankumar Muddireddy, Markus Dreyer, Bjorn Hoffmeister |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-10 | 392 | Hierarchical Recurrent Neural Network for Story Segmentation | Emiru Tsunoo, Peter Bell, Steve Renals |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-11 | 1231 | Evaluating automatic topic segmentation as a segment retrieval task | Abdessalam Bouchekif, Delphine Charlet, Geraldine Damnati, Nathalie Camelin, Yannick Estève |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-12 | 650 | Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps | Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon |
2017-08-23 | 10:00-12:00 | Poster 3 | Spoken Document Processing | P | Wed-P-6-3-13 | 1087 | A relevance score estimation for spoken term detection based on RNN-generated pronunciation embeddings | Jan Švec, Josef V. Psutka, Luboš Šmídl, Jan Trmal |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-1 | 17 | Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference | Kin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-2 | 101 | Audio Scene Classification with Deep Recurrent Neural Networks | Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-3 | 119 | Automatic time-frequency analysis of echolocation signals using the matched Gaussian multitaper spectrogram | Maria Sandsten, Isabella Reinhold, Josefin Starkhammar |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-4 | 213 | Classification-Based Detection of Glottal Closure Instants from Speech Signals | Jindrich Matousek, Daniel Tihelka |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-5 | 222 | A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network | Xiaoke Qi, Jianhua Tao |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-6 | 315 | Laryngeal Articulation during Trumpet Performance: An Exploratory Study | Luis M.T. Jesus, Bruno Rocha, Andreia Hall |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-7 | 395 | Matrix of Polynomials Model based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling | Jian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-8 | 431 | Acoustic Scene Classification using a CNN-SuperVector system trained with Auditory and Spectrogram Image Features | Rakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H.L. Hansen, Taufiq Hasan |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-9 | 485 | AN ENVIRONMENTAL FEATURE REPRESENTATION FOR ROBUST SPEECH RECOGNITION AND FOR ENVIRONMENT IDENTIFICATION | Xue Feng |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-10 | 486 | Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging | Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-11 | 866 | An audio based piano performance evaluation method using deep neural network based acoustic modeling | Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-12 | 1000 | Music Tempo Estimation Using Sub-band Synchrony | Shreyan Chowdhury, Tanaya Guha, Rajesh Hegde |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-13 | 1469 | A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification | Yun Wang, Florian Metze |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-14 | 1590 | A Note Based Query By Humming System using Convolutional Neural Network | Naziba Mostafa, Pascale Fung |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-15 | 831 | Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification | Hardik Sailor, Dharmesh Agrawal, Hemant Patil |
2017-08-23 | 13:30-15:30 | Poster 3 | Music and Audio Processing | P | Wed-P-7-3-16 | 1422 | Novel Shifted Real Spectrum for Exact Signal Reconstruction | Meet Soni, Rishabh Tak, Hemant Patil |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-1 | 638 | Zero-shot Learning for Natural Language Understanding using Domain-Independent Sequential Structure and Question Types | Kugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-2 | 269 | Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification | Naoki Sawada, Ryo Masumura, Hiromitsu Nishizaki |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-3 | 357 | Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding | Mohamed Morchid |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-4 | 422 | Character-based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions | Mandy Korpusik, Zachary Collins, James Glass |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-5 | 1029 | Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations | Titouan Parcollet, Mohamed Morchid, Georges Linares |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-6 | 1178 | ASR error management for improving spoken language understanding | Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato de Mori |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-7 | 1321 | Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks | Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-8 | 1525 | To Plan or not to Plan? Discourse planning in slot-value informed sequence to sequence models for language generation | Neha Nayak, Dilek Hakkani-Tur, Marilyn Walker, Larry Heck |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-9 | 921 | Online adaptation of an attention-based neural network for natural language generation | Matthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefevre |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-10 | 275 | Spanish Sign Language Recognition with Different Topology Hidden Markov Models | Carlos-D. Martínez-Hinarejos, Zuzanna Parcheta |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-11 | 1382 | OpenMM: An Open-source Multimodal Feature Extraction Tool | Michelle Morales, Stefan Scherer, Rivka Levitan |
2017-08-23 | 16:00-18:00 | Poster 3 | Language Understanding and Generation | P | Wed-P-8-3-12 | 1496 | Speaker Dependency Analysis, Audiovisual Fusion Cues and A Multimodal BLSTM for Conversational Engagement Recognition | Yuyun Huang, Emer Gilmartin, Nick Campbell |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-1 | 36 | Predicting Automatic Speech Recognition Performance over Communication Channels from Instrumental Speech Quality and Intelligibility Scores | Laura Fernández Gallardo, Sebastian Möller, John Beerends |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-2 | 105 | Speech intelligibility in cars: the effect of speaking style, noise and listener age | Cassia Valentini-Botinhao, Junichi Yamagishi |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-3 | 170 | Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio | Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-4 | 281 | Intelligibilities of Mandarin Chinese Sentences with Spectral “Holes” | Yafan Chen, Yong Xu, Jun Yang |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-5 | 500 | The effect of situation-specific non-speech acoustic cues on the intelligibility of speech in noise | Lauren Ward, Ben Shirley, Yan Tang, William Davies |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-6 | 1043 | On the use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure | Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen |
2017-08-23 | 10:00-12:00 | Poster 4 | Speech Intelligibility | P | Wed-P-6-4-7 | 1168 | Listening in the dips: Comparing relevant features for speech recognition in humans and machines | Constantin Spille, Bernd T. Meyer |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-1 | 112 | Manual and Automatic Transcriptions in Dementia Detection from Speech | Jochen Weiner, Mathis Engelbart, Tanja Schultz |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-2 | 120 | An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks | Rahul Gupta, Saurabh Sahu, Carol Espy-Wilson, Shrikanth Narayanan |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-3 | 216 | Cross-Database Models for the Classification of Dysarthria Presence | Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-4 | 381 | Acoustic evaluation of nasality in cerebellar syndromes | Michal Novotný, Jan Rusz, Karel Spálenka, Jiří Klempíř, Dana Horáková, Evžen Růžička |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-5 | 409 | Emotional Speech of Mentally and Physically Disabled Individuals: Introducing The EmotAsS Database and First Findings | Simone Hantke, Hesam Sagha, Nicholas Cummins, Björn Schuller |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-6 | 621 | Phonological markers of Oxytocin and MDMA ingestion | Carla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo Cecchi |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-7 | 690 | An avatar-based system for identifying individuals likely to develop dementia | Bahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-8 | 1015 | Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation | Yue Zhang, Felix Weninger, Björn Schuller |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-9 | 1201 | Depression Detection Using Automatic Transcriptions of De-Identified Speech | Paula Lopez-Otero, Laura Docio-Fernandez, Alberto Abad, Carmen Garcia-Mateo |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-10 | 1572 | An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer’s Disease from Spoken Language | Sebastian Wankerl, Elmar Noeth, Stefan Evert |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-11 | 1599 | Exploiting Intra-annotator Rating Consistency through Copeland’s Method for Estimation of Ground Truth Labels in Couples’ Therapy | Karel Mundnich, Md Nasir, Panayiotis Georgiou, Shrikanth Narayanan |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-12 | 850 | Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and Polish | Massimo Pettorino, Wentao Gu, Paweł Półrola, Ping Fan |
2017-08-23 | 13:30-15:30 | Poster 4 | Disorders Related to Speech and Language | P | Wed-P-7-4-13 | 25 | Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali | Visar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-1 | 63 | Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks | Chin-Cheng Hsu, Hsin-Te Hwang, YICHIAO WU, Yu Tsao, Hsin-Min Wang |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-2 | 133 | CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice Conversion | Toru Nakashika |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-3 | 664 | Phoneme-Discriminative Features for Dysarthric Speech Conversion | Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-4 | 694 | Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion | Jie Wu, Dongyan Huang, Lei Xie, Haizhou Li |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-5 | 841 | Speaker Dependent Approach for Enhancing a Glossectomy Patient’s Speech via GMM-based Voice Conversion | Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-6 | 962 | Generative Adversarial Network-based Postfilter for STFT Spectrograms | Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-7 | 1288 | Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis | Bajibabu Bollepalli, Lauri Juvela, Paavo Alku |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-8 | 984 | Emotional Voice Conversion with Adaptive Scales F0 based on Wavelet Transform using Limited Amount of Emotional Data | Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-9 | 1038 | Speaker adaptation in DNN-based speech synthesis using d-vectors | Rama Sanand Doddipatla, Norbert Braunschweiler, Ranniery Maia |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-10 | 1122 | Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion | Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai |
2017-08-23 | 16:00-18:00 | Poster 4 | Voice Conversion 2 | P | Wed-P-8-4-11 | 1538 | Segment Level Voice Conversion with Recurrent Neural Networks | Miguel Ramos, Alan W Black, Ramón Astudillo, Isabel Trancoso, Nuno Fonseca |
2017-08-24 | 10:00-10:20 | A2 | Speaker Diarization | O | Thu-O-9-2-1 | 51 | Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement | Zbynek Zajic, Marek Hruz, Ludek Muller |
2017-08-24 | 10:20-10:40 | A2 | Speaker Diarization | O | Thu-O-9-2-2 | 1650 | Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold using Deep Neural Networks with an Evaluation on Speaker Segmentation | Arindam Jati, Panayiotis Georgiou |
2017-08-24 | 10:40-11:00 | A2 | Speaker Diarization | O | Thu-O-9-2-3 | 270 | A Triplet Ranking-based Neural Network for Speaker Diarization and Linking | Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier |
2017-08-24 | 11:00-11:20 | A2 | Speaker Diarization | O | Thu-O-9-2-4 | 492 | Estimating Speaker Clustering Quality Using Logistic Regression | Yishai Cohen, Itshak Lapidot |
2017-08-24 | 11:20-11:40 | A2 | Speaker Diarization | O | Thu-O-9-2-5 | 1067 | Combining speaker turn embedding and incremental structure prediction for low-latency speaker diarization | Guillaume Wisniewski, Hervé Bredin, Gregory Gelly, Claude Barras |
2017-08-24 | 11:40-12:00 | A2 | Speaker Diarization | O | Thu-O-9-2-6 | 411 | pyannote.metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems | Hervé Bredin |
2017-08-24 | 13:30-13:50 | A2 | Robust Speaker Recognition | O | Thu-O-10-2-1 | 430 | CNN-based joint mapping of short and long utterance i-vectors for speaker verification using short utterances | Jinxi Guo, Usha Nookala, Abeer Alwan |
2017-08-24 | 13:50-14:10 | A2 | Robust Speaker Recognition | O | Thu-O-10-2-2 | 1199 | Curriculum Learning based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition | Shivesh Ranjan, Abhinav Misra, John H.L. Hansen |
2017-08-24 | 14:10-14:30 | A2 | Robust Speaker Recognition | O | Thu-O-10-2-3 | 731 | I-vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-robust Speaker Recognition | Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka |
2017-08-24 | 14:30-14:50 | A2 | Robust Speaker Recognition | O | Thu-O-10-2-4 | 727 | Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification | Qiongqiong Wang, Takafumi Koshinaka |
2017-08-24 | 14:50-15:10 | A2 | Robust Speaker Recognition | O | Thu-O-10-2-5 | 1240 | Speaker Verification Under Adverse Conditions Using I-vector Adaptation and Neural Networks | Md Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann |
2017-08-24 | 15:10-15:30 | A2 | Robust Speaker Recognition | O | Thu-O-10-2-6 | 605 | Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data | Diego Castan, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez |
2017-08-24 | 10:00-10:20 | Aula Magna | Discriminative Training for ASR | O | Thu-O-9-1-1 | 1118 | Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition | Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu |
2017-08-24 | 10:20-10:40 | Aula Magna | Discriminative Training for ASR | O | Thu-O-9-1-2 | 639 | Optimizing expected word error rate via sampling for speech recognition | Matt Shannon |
2017-08-24 | 10:40-11:00 | Aula Magna | Discriminative Training for ASR | O | Thu-O-9-1-3 | 231 | Annealed F-smoothing as a Mechanism to Speed up Neural Network Training | Tara Sainath, Vijay Peddinti, Olivier Siohan, Arun Narayanan |
2017-08-24 | 11:00-11:20 | Aula Magna | Discriminative Training for ASR | O | Thu-O-9-1-4 | 583 | Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting | Zhong Meng, Biing-Hwang (Fred) Juang |
2017-08-24 | 11:20-11:40 | Aula Magna | Discriminative Training for ASR | O | Thu-O-9-1-5 | 1784 | Exploiting Eigenposteriors for Semi-supervised Training of DNN Acoustic Models with Sequence Discrimination | Pranay Dighe, Afsaneh Asaei, Herve Bourlard |
2017-08-24 | 11:40-12:00 | Aula Magna | Discriminative Training for ASR | O | Thu-O-9-1-6 | 221 | Discriminative Autoencoders for Acoustic Modeling | Ming-Han Yang, Hung-Shin Lee, Yu-Ding Lu, Kuan-Yu Chen, Yu Tsao, Berlin Chen, Hsin-Min Wang |
2017-08-24 | 13:30-13:50 | Aula Magna | Neural Network Acoustic Models for ASR 3 | O | Thu-O-10-1-1 | 892 | Deep Neural Factorization for Speech Recognition | Jen-Tzung Chien, Chen Shen |
2017-08-24 | 13:50-14:10 | Aula Magna | Neural Network Acoustic Models for ASR 3 | O | Thu-O-10-1-2 | 1385 | Semi-supervised DNN training with word selection for ASR | Karel Vesely, Lukas Burget, Jan Černocký |
2017-08-24 | 14:10-14:30 | Aula Magna | Neural Network Acoustic Models for ASR 3 | O | Thu-O-10-1-3 | 751 | Gaussian Prediction based Attention for Online End-to-End Speech Recognition | Junfeng Hou, ShiLiang Zhang, Lirong Dai |
2017-08-24 | 14:30-14:50 | Aula Magna | Neural Network Acoustic Models for ASR 3 | O | Thu-O-10-1-4 | 614 | Efficient knowledge distillation from an ensemble of teachers | Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran |
2017-08-24 | 14:50-15:10 | Aula Magna | Neural Network Acoustic Models for ASR 3 | O | Thu-O-10-1-5 | 232 | An Analysis of “Attention” in Sequence-to-Sequence Models | Rohit Prabhavalkar, Tara Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly |
2017-08-24 | 15:10-15:30 | Aula Magna | Neural Network Acoustic Models for ASR 3 | O | Thu-O-10-1-6 | 1566 | Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition | Hagen Soltau, Hank Liao, Hasim Sak |
2017-08-24 | 10:00-10:20 | B4 | Spoken Term Detection | O | Thu-O-9-4-1 | 1328 | A Rescoring Approach for Keyword Search Using Lattice Context Information | Zhipeng Chen, Ji Wu |
2017-08-24 | 10:20-10:40 | B4 | Spoken Term Detection | O | Thu-O-9-4-2 | 601 | The Kaldi OpenKWS System: Improving Low Resource Keyword Search | Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Vimal Manohar, Yiming Wang, Hainan Xu, Dan Povey, Sanjeev Khudanpur |
2017-08-24 | 10:40-11:00 | B4 | Spoken Term Detection | O | Thu-O-9-4-3 | 1212 | The STC Keyword Search System For OpenKWS 2016 Evaluation | Yuri Khokhlov, Ivan Medennikov, Aleksei Romanenko, Valentin Mendelev, Maxim Korenevsky, Alexey Prudnikov, Natalia Tomashenko, Alexander Zatvornitskiy |
2017-08-24 | 11:00-11:20 | B4 | Spoken Term Detection | O | Thu-O-9-4-4 | 480 | Compressed time delay neural network for small-footprint keyword spotting | Ming Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni |
2017-08-24 | 11:20-11:40 | B4 | Spoken Term Detection | O | Thu-O-9-4-5 | 904 | Symbol sequence search from telephone conversation | Masayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Kenneth Church, Mark Drake |
2017-08-24 | 11:40-12:00 | B4 | Spoken Term Detection | O | Thu-O-9-4-6 | 1273 | Similarity Learning Based Query Modeling for Keyword Search | Batuhan Gundogdu, Murat Saraclar |
2017-08-24 | 13:30-13:50 | B4 | Multimodal Resources and Annotation | O | Thu-O-10-4-1 | 1305 | CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube | Karima Abidi, Mohamed amine Menacer, Kamel Smaili |
2017-08-24 | 13:50-14:10 | B4 | Multimodal Resources and Annotation | O | Thu-O-10-4-2 | 242 | PRAV: A Phonetically Rich Audio Visual Corpus | Abhishek Avinash Narwekar, Prasanta Ghosh |
2017-08-24 | 14:10-14:30 | B4 | Multimodal Resources and Annotation | O | Thu-O-10-4-3 | 860 | NTCD-TIMIT: A New Database and Baseline for Noise-robust Audio-visual Speech Recognition | Ahmed Hussen Abdelaziz |
2017-08-24 | 14:30-14:50 | B4 | Multimodal Resources and Annotation | O | Thu-O-10-4-4 | 1555 | The Extended SPaRKy Restaurant Corpus: designing a corpus with variable information density | David M. Howcroft, Dietrich Klakow, Vera Demberg |
2017-08-24 | 14:50-15:10 | B4 | Multimodal Resources and Annotation | O | Thu-O-10-4-5 | 1115 | Automatic Construction of the Finnish Parliament Speech Corpus | André Mansikkaniemi, Peter Smit, Mikko Kurimo |
2017-08-24 | 15:10-15:30 | B4 | Multimodal Resources and Annotation | O | Thu-O-10-4-6 | 1357 | Building audio-visual phonetically annotated Arabic corpus for expressive text to speech | Omnia Abdo, Sherif Abdou, Mervat Fashal |
2017-08-24 | 10:00-10:20 | C6 | Noise Reduction | O | Thu-O-9-6-1 | 57 | Deep Recurrent Neural Network based Monaural Speech Separation using Recurrent Temporal Restricted Boltzmann Machines | Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh |
2017-08-24 | 10:20-10:40 | C6 | Noise Reduction | O | Thu-O-9-6-2 | 109 | Improved Codebook-based Speech Enhancement based on MBE Model | Qizheng Huang, Changchun Bao, Xianyun Wang |
2017-08-24 | 10:40-11:00 | C6 | Noise Reduction | O | Thu-O-9-6-3 | 515 | Improving mask learning based speech enhancement system with restoration layers and residual connection | Zhuo Chen, Yan Huang, Jinyu Li, Yifan Gong |
2017-08-24 | 11:00-11:20 | C6 | Noise Reduction | O | Thu-O-9-6-4 | 611 | Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition | Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen |
2017-08-24 | 11:20-11:40 | C6 | Noise Reduction | O | Thu-O-9-6-5 | 1428 | SEGAN: Speech Enhancement Generative Adversarial Network | Santiago Pascual, Antonio Bonafonte, Joan Serrà |
2017-08-24 | 11:40-12:00 | C6 | Noise Reduction | O | Thu-O-9-6-6 | 1653 | Concatenative resynthesis using twin networks | Soumi Maiti, Michael Mandel |
2017-08-24 | 10:00-10:20 | D8 | Speech Recognition: Multimodal Systems | O | Thu-O-9-8-1 | 85 | Combining Residual Networks with LSTMs for Lipreading | Themos Stafylakis, Georgios Tzimiropoulos |
2017-08-24 | 10:20-10:40 | D8 | Speech Recognition: Multimodal Systems | O | Thu-O-9-8-2 | 106 | Improving computer lipreading via DNN sequence discriminative training techniques | Kwanchiva Thangthai, Richard Harvey |
2017-08-24 | 10:40-11:00 | D8 | Speech Recognition: Multimodal Systems | O | Thu-O-9-8-3 | 421 | Improving Speaker-Independent Lipreading with Domain-Adversarial Training | Michael Wand, Jürgen Schmidhuber |
2017-08-24 | 11:00-11:20 | D8 | Speech Recognition: Multimodal Systems | O | Thu-O-9-8-4 | 799 | Turbo Decoders for Audio-visual Continuous Speech Recognition | Ahmed Hussen Abdelaziz |
2017-08-24 | 11:20-11:40 | D8 | Speech Recognition: Multimodal Systems | O | Thu-O-9-8-5 | 939 | DNN-based Ultrasound-to-Speech Conversion for a Silent Speech Interface | Tamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, Alexandra Markó |
2017-08-24 | 11:40-12:00 | D8 | Speech Recognition: Multimodal Systems | O | Thu-O-9-8-6 | 502 | Visually grounded learning of keyword prediction from untranscribed speech | Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu |
2017-08-24 | 13:30-13:50 | D8 | Forensic Phonetics and Sociophonetic Varieties | O | Thu-O-10-8-1 | 1368 | What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison | Vincent Hughes, Paul Foulkes |
2017-08-24 | 13:50-14:10 | D8 | Forensic Phonetics and Sociophonetic Varieties | O | Thu-O-10-8-2 | 1080 | Voice disguise vs. Impersonation: Acoustic and perceptual measurements of vocal flexibility in non experts | Veronique Delvaux, Lise Caucheteux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies |
2017-08-24 | 14:10-14:30 | D8 | Forensic Phonetics and Sociophonetic Varieties | O | Thu-O-10-8-3 | 470 | Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-linguistic Factors in Large Corpora | Yaru WU, Martine Adda-Decker, Cecile Fougeron, Lori Lamel |
2017-08-24 | 14:30-14:50 | D8 | Forensic Phonetics and Sociophonetic Varieties | O | Thu-O-10-8-4 | 922 | The Social Life of Tswana Ejectives | Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Justus Roux |
2017-08-24 | 14:50-15:10 | D8 | Forensic Phonetics and Sociophonetic Varieties | O | Thu-O-10-8-5 | 50 | How long is too long? How pause features after requests affect the perceived willingness of affirmative answers | Lea S. Kohtz, Oliver Niebuhr |
2017-08-24 | 15:10-15:30 | D8 | Forensic Phonetics and Sociophonetic Varieties | O | Thu-O-10-8-6 | 1433 | Shadowing Synthesized Speech – Segmental Analysis of Phonetic Convergence | Iona Gessinger, Eran Raveh, Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner |
2017-08-24 | 10:00-10:15 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-1 | 43 | The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring | Björn Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou |
2017-08-24 | 10:15-10:25 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-2 | Description of the UPPER RESPIRATORY TRACT INFECTION CORPUS (URTIC) | Jarek Krajewski, Sebastian Schieder, Anton Batliner | |
2017-08-24 | 10:25-10:35 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-3 | Description of the Munich-Passau Snore Sound Corpus (MPSSC) | Christoph Janott, Anton Batliner | |
2017-08-24 | 10:35-10:45 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-4 | Description of the HOMEBANK CHILD/ADULT ADDRESSEE CORPUS (HB-CHAAC) | Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstorm, Anne Warlaumont | |
2017-08-24 | 10:45-11:00 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-5 | 1261 | It sounds like you have a cold! Testing voice features for the Interspeech 2017 Computational Paralinguistics Cold Challenge | Mark Huckvale, András Beke |
2017-08-24 | 11:00-11:15 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-6 | 1445 | End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum | Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li |
2017-08-24 | 11:15-11:30 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-7 | 1066 | Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level | Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André |
2017-08-24 | 11:30-11:45 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-8 | 1550 | Phoneme state posteriorgram features for speech based automatic classification of speakers in cold and healthy conditions | Akshay Kalkunte Suresh, Srinivasa Raghavan K M, Prasanta Ghosh |
2017-08-24 | 11:45-12:00 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1 | SS | Thu-SS-9-10-9 | 1794 | An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN | Tin Lay Nwe, Tran Huy Dat, Ng Wen Zheng Terence, Bin Ma |
2017-08-24 | 13:30-13:45 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-1 | 1211 | A dual source-filter model of snore audio for snorer group classification | Achuth Rao MV, Shivani Yadav, Prasanta Ghosh |
2017-08-24 | 13:45-14:00 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-2 | 173 | An ‘End-to-Evolution’ Hybrid Approach for Snore Sound Classification | Michael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn Schuller |
2017-08-24 | 14:00-14:15 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-3 | 434 | Snore Sound Classification Using Image-based Deep Spectrum Features | Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn Schuller |
2017-08-24 | 14:15-14:30 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-4 | 1378 | Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information | David Tavarez, Xabier Sarasola, Agustin Alonso, Jon Sanchez, Luis Serrano, Eva Navas, Inma Hernáez |
2017-08-24 | 14:30-14:45 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-5 | 905 | DNN-based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification | Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth |
2017-08-24 | 14:45-15:00 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-6 | 653 | Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold | Heysem Kaya, Alexey Karpov |
2017-08-24 | 15:00-15:15 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-7 | The INTERSPEECH 2017 Computational Paralinguistics Challenge: A Summary of Results | Stefan Steidl | |
2017-08-24 | 15:15-15:30 | E10 | Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2 | SS | Thu-SS-10-10-8 | Discussion | Björn Schuller, Anton Batliner | |
2017-08-24 | 10:00-12:00 | E306 | Show & Tell 7 | S&T | Thu-S&T-9-A-1 | 10002 | Soundtracing for realtime speech adjustment to environmental conditions in 3D simulations | Szymon Pałka, Tomasz Pędzimąż, Bartosz Ziolko |
2017-08-24 | 10:00-12:00 | E306 | Show & Tell 7 | S&T | Thu-S&T-9-A-2 | 10027 | Vocal-tract Model with Static Articulators: Lips, Teeth, Tongue, and More | Takayuki Arai |
2017-08-24 | 10:00-12:00 | E306 | Show & Tell 7 | S&T | Thu-S&T-9-A-3 | 10038 | Remote articulation test system based on WebRTC | Ikuyo Masuda-Katsuse |
2017-08-24 | 10:00-12:00 | E306 | Show & Tell 7 | S&T | Thu-S&T-9-A-4 | 10054 | The ModelTalker Project: A web-based voice banking pipeline for ALS/MND patients | H Timothy Bunnell, Jason Lilley, Kathleen McGrath |
2017-08-24 | 10:00-12:00 | E306 | Show & Tell 7 | S&T | Thu-S&T-9-A-5 | 10055 | Visible Vowels: a Tool for the Visualization of Vowel Variation | Wilbert Heeringa, Hans Van de Velde |
2017-08-24 | 13:30-15:30 | E306 | Show & Tell 7 | S&T | Thu-S&T-10-A-1 | 10002 | Soundtracing for realtime speech adjustment to environmental conditions in 3D simulations | Szymon Pałka, Tomasz Pędzimąż, Bartosz Ziolko |
2017-08-24 | 13:30-15:30 | E306 | Show & Tell 7 | S&T | Thu-S&T-10-A-2 | 10027 | Vocal-tract Model with Static Articulators: Lips, Teeth, Tongue, and More | Takayuki Arai |
2017-08-24 | 13:30-15:30 | E306 | Show & Tell 7 | S&T | Thu-S&T-10-A-3 | 10038 | Remote articulation test system based on WebRTC | Ikuyo Masuda-Katsuse |
2017-08-24 | 13:30-15:30 | E306 | Show & Tell 7 | S&T | Thu-S&T-10-A-4 | 10054 | The ModelTalker Project: A web-based voice banking pipeline for ALS/MND patients | H Timothy Bunnell, Jason Lilley, Kathleen McGrath |
2017-08-24 | 13:30-15:30 | E306 | Show & Tell 7 | S&T | Thu-S&T-10-A-5 | 10055 | Visible Vowels: a Tool for the Visualization of Vowel Variation | Wilbert Heeringa, Hans Van de Velde |
2017-08-24 | 10:00-10:20 | F11 | Special Session: State of the Art in Physics-based Voice Simulation | SS | Thu-SS-9-11-1 | 107 | Acoustic analysis of detailed three-dimensional shape of the human nasal cavity and paranasal sinuses | Tatsuya Kitamura, Hironori Takemoto, Hisanori Makinae, Tetsutaro Yamaguchi, Kotaro Maki |
2017-08-24 | 10:20-10:40 | F11 | Special Session: State of the Art in Physics-based Voice Simulation | SS | Thu-SS-9-11-2 | 448 | A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences | Marc Arnela, Saeed Dabbaghchian, Oriol Guasch, Olov Engwall |
2017-08-24 | 10:40-11:00 | F11 | Special Session: State of the Art in Physics-based Voice Simulation | SS | Thu-SS-9-11-3 | 844 | A Fast Robust 1D Flow Model for a Self-Oscillating Coupled 2D FEM Vocal Fold Simulation | Arvind Vasudevan, Victor Zappi, Peter Anderson, Sidney Fels |
2017-08-24 | 11:00-11:20 | F11 | Special Session: State of the Art in Physics-based Voice Simulation | SS | Thu-SS-9-11-4 | 875 | Waveform patterns in pitch glides near a vocal tract resonance | Tiina Murtola, Jarmo Malinen |
2017-08-24 | 11:20-11:40 | F11 | Special Session: State of the Art in Physics-based Voice Simulation | SS | Thu-SS-9-11-5 | 1239 | A unified numerical simulation of vowel production that comprises phonation and the emitted sound | Niyazi Cem Degirmenci, Johan Jansson, Johan Hoffman, Marc Arnela, Patricia Sanchez-Martin, Oriol Guasch, Pr. Sten Ternström |
2017-08-24 | 11:40-12:00 | F11 | Special Session: State of the Art in Physics-based Voice Simulation | SS | Thu-SS-9-11-6 | 1614 | Synthesis of VV Utterances from Muscle Activation to Sound with a 3D Model | Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch |
2017-08-24 | 13:30-13:50 | F11 | Speech and Audio Segmentation and Classification 1 | O | Thu-O-10-11-1 | 524 | Occupancy Detection in Commercial and Residential Environments Using Audio Signal | Shabnam Ghaffarzadegan, Attila Reiss, Mirko Ruhs, Robert Duerichen, Zhe Feng |
2017-08-24 | 13:50-14:10 | F11 | Speech and Audio Segmentation and Classification 1 | O | Thu-O-10-11-2 | 685 | Data Augmentation, Missing Feature Mask and Kernel Classification for Through-The-Wall Acoustic Surveillance | Tran-Huy Dat, Wen Zheng Terence Ng, Yi Ren Leng |
2017-08-24 | 14:10-14:30 | F11 | Speech and Audio Segmentation and Classification 1 | O | Thu-O-10-11-3 | 284 | Endpoint detection using grid long short-term memory network for streaming speech recognition | Shuo-Yiin Chang, Bo Li, Tara Sainath, Gabor Simko, Carolina Parada |
2017-08-24 | 14:30-14:50 | F11 | Speech and Audio Segmentation and Classification 1 | O | Thu-O-10-11-4 | 666 | Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages | Arun Baby, Jeena Prakash, Rupak Vignesh, Hema Murthy |
2017-08-24 | 14:50-15:10 | F11 | Speech and Audio Segmentation and Classification 1 | O | Thu-O-10-11-5 | 877 | Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries | Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee |
2017-08-24 | 15:10-15:30 | F11 | Speech and Audio Segmentation and Classification 1 | O | Thu-O-10-11-6 | 65 | Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks | Ruiqing Yin, Hervé Bredin, Claude Barras |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-1 | 1096 | Improved Automatic Speech Recognition using Subband Temporal Envelope Features and Time-delay Neural Network Denoising Autoencoder | Cong-Thanh Do, Yannis Stylianou |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-2 | 225 | Factored deep convolutional neural networks for noise robust speech recognition | Masakiyo Fujimoto |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-3 | 230 | Global SNR Estimation of Speech Signals for Unknown Noise Conditions using Noise Adapted Non-linear Regression | Pavlos Papadopoulos, Ruchir Travadi, Shrikanth Narayanan |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-4 | 579 | Joint Training of Multi-channel-condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition | Fengpei Ge, Kehuang Li, Bo Wu, Sabato Marco Siniscalchi, Yonghong Yan, Chin-Hui Lee |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-5 | 793 | Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling | Tien Dung Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-6 | 805 | Attention-based LSTM with Multi-task Learning for Distant Speech Recognition | Yu Zhang, Pengyuan Zhang, Yonghong Yan |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-7 | 1315 | To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-order Feedback From Multiple Histories | Hengguan Huang, Brian Mak |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-8 | 1536 | End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition | Suyoun Kim, Ian Lane |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-9 | 1665 | Robust Speech Recognition Based on Binaural Auditory Processing | Anjali Menon, Chanwoo Kim, Richard Stern |
2017-08-24 | 10:00-12:00 | Poster 1 | Noise Robust and Far-field ASR | P | Thu-P-9-1-10 | 1791 | Adaptive Multichannel Dereverberation for Automatic Speech Recognition | Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-1 | 1579 | The effects of real and placebo alcohol on deaffrication | Urban Zihlmann |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-2 | 1390 | Polyglot and Speech Corpus Tools: a system for representing, integrating, and querying speech corpora | Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-3 | 1508 | Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing | Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-4 | 449 | Effect of Language, Speaking Style and Speaker on Long-term F0 Estimation | Pablo Arantes, Anders Eriksson, Suska Gutzeit |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-5 | 1503 | Stability of prosodic characteristics across age and gender groups | Jan Volín, Tereza Tykalova, Tomáš Bořil |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-6 | 1392 | Electrophysiological correlates of familiar voice recognition | Julien Plante-Hebert, Victor Boucher, Boutheina Jemel |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-7 | 1280 | Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion | Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-8 | 1448 | Rd as a control parameter to explore affective correlates of the tense-lax continuum | Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-9 | 7 | Cross-linguistic Distinctions between Professional and Non-Professional Speaking Styles | Plinio Barbosa, Sandra Madureira, Philippe Boula de Mareüil |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-10 | 990 | Perception and production of word-final /ʁ/ in broadcast and spontaneous French | Cedric Gendrot |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-11 | 882 | Glottal source estimation from coded telephone speech using a deep neural network | Narendra N P, Manu Airaksinen, Paavo Alku |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-12 | 971 | Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners | George Christodoulides, Mathieu Avanzi, Anne Catherine Simon |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-13 | 164 | Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds | Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-14 | 363 | Effects of training data variety in generating glottal pulses from acoustic features with DNNs | Manu Airaksinen, Paavo Alku |
2017-08-24 | 10:00-12:00 | Poster 3 | Styles, Varieties, Forensics and Tools | P | Thu-P-9-3-15 | 406 | Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World | Simone Hantke, Zixing Zhang, Björn Schuller |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-1 | 171 | Principles for learning controllable TTS from annotated and latent variation | Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-2 | 362 | Sampling-based speech parameter generation using moment-matching networks | Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-3 | 428 | Unit selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets | Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Batzu |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-4 | 465 | Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data | Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-5 | 479 | Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores | Andrew Rosenberg, Bhuvana Ramabhadran |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-6 | 587 | Phase Modeling using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis. | Nagaraj Adiga, S R Mahadeva Prasanna |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-7 | 802 | Evaluation of a Silent Speech Interface based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary | Jose A. Gonzalez, Lam A. Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger Moore, Ed Holdsworth |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-8 | 894 | Predicting Head Pose from Speech with a Conditional Variational Autoencoder | David Greenwood, Stephen Laycock, Iain Matthews |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-9 | 1250 | Real-time reactive speech synthesis: incorporating interruptions | Mirjam Wester, David Braude, Blaise Potard, Matthew Aylett, Francesca Shaw |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-10 | 1420 | A Neural Parametric Singing Synthesizer | Merlijn Blaauw, Jordi Bonada |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-11 | 1452 | Tacotron: Towards End-To-End Speech Synthesis | Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-12 | 1798 | Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System | Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, Becci Williamson, David Winarsky, Zhizheng Wu, Hepeng Zhang |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-13 | 402 | An Expanded Taxonomy of Semiotic Classes for Text Normalization | Daan van Esch, Richard Sproat |
2017-08-24 | 10:00-12:00 | Poster 4 | Speech Synthesis: Data, Evaluation, and Novel Paradigms | P | Thu-P-9-4-14 | 584 | Complex-valued restricted Boltzmann machine for direct learning of frequency spectra | Toru Nakashika, Shinji Takaki, Junichi Yamagishi |