Areas and Topics
1. Speech Perception, Production and Acquisition |
|
1.1 Models of speech production | |
1.2 Physiology and neurophysiology of speech production | |
1.3 Coarticulation | |
1.4 Models of speech perception | |
1.5 Physiology and neurophysiology of speech perception | |
1.6 Acoustic and articulatory cues in speech perception | |
1.7 Interaction speech production-speech perception | |
1.8 Multimodal speech perception | |
1.9 Cognition and brain studies on speech | |
1.10 Multilingual studies and code switching | |
1.11 L1 acquisition and bilingual acquisition | |
1.12 L2 acquisition | |
1.13 Speech and voice disorders | |
1.14 Hearing disorders | |
1.15 Singing voice: production and perception | |
1.16 Speech and other biosignals | |
1.17 Special Session: Computational models in child language acquisition | |
1.18 Special Session: Data collection, transcription and annotation issues in child language acquisition settings | |
1.19 Special Session: Speech Technologies for Code-Switching in Multilingual Communities | |
2. Phonetics, Phonology, and Prosody |
|
2.1 Phonetics and phonology | |
2.2 Language descriptions | |
2.3 Linguistic systems | |
2.4 Acoustic phonetics | |
2.5 Phonation, voice quality | |
2.6 Articulatory and acoustic features of prosody | |
2.7 Perception of prosody | |
2.8 Laboratory phonology | |
2.9 Phonetic universals | |
2.10 Sound changes | |
2.11 Sociophonetics | |
2.12 Phonetics of L1-L2 interaction | |
2.13 Forensic phonetics | |
2.14 Special Session: Acoustic Manifestations of Social Characteristics | |
3. Analysis of Paralinguistics in Speech and Language |
|
3.1 Analysis of speaker states | |
3.2 Analysis of speaker traits | |
3.3 Automatic analysis of speaker states and traits | |
3.4 Pathological speech and language | |
3.5 Social signal processing | |
3.6 Sentiment analysis and opinion mining | |
3.7 Paralinguistics in singing | |
3.8 Perception of paralinguistic phenomena | |
3.9 Multimodal paralinguistics | |
3.10 Phonetic and linguistic aspects of paralinguistics | |
3.11 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) | |
3.12 Special Session: Voice Attractiveness | |
4. Speaker and Language Identification |
|
4.1 Language identification and verification, language diarization and code switching | |
4.2 Dialect and accent recognition | |
4.3 Speaker verification and identification | |
4.4 Features for speaker and language recognition | |
4.5 Robustness to variable and degraded channels | |
4.6 Speaker confidence estimation | |
4.7 Speaker diarization | |
4.8 Higher-level knowledge in speaker and language recognition | |
4.9 Evaluation of speaker and language identification systems | |
4.10 Multimodal/multimedia speaker recognition and diarization | |
4.11 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge | |
5. Analysis of Speech and Audio Signals |
|
5.1 Speech acoustics | |
5.2 Speech analysis and representation | |
5.3 Audio signal analysis and representation | |
5.4 Speech and audio segmentation and classification | |
5.5 Voice activity detection | |
5.6 Pitch and harmonic analysis | |
5.7 Source separation and computational auditory scene analysis | |
5.8 Speaker spatial localization | |
5.9 Music signal processing and understanding | |
5.10 Singing analysis | |
6. Speech Coding and Enhancement |
|
6.1 Speech coding and transmission | |
6.2 Low-bit-rate speech coding | |
6.3 Perceptual audio coding of speech signals | |
6.4 Noise reduction for speech signals | |
6.5 Speech enhancement: single-channel | |
6.6 Speech enhancement: multi-channel | |
6.7 Speech intelligibility | |
6.8 Active noise control | |
6.9 Speech enhancement in hearing aids | |
6.10 Adaptive beamforming for speech enhancement | |
6.11 Dereverberation for speech signals | |
6.12 Echo cancelation for speech signals | |
6.13 Evaluation of speech transmission, coding and enhancement | |
7. Speech Synthesis and Spoken Language Generation |
|
7.1 Grapheme-to-phoneme conversion for synthesis | |
7.2 Text processing for speech synthesis | |
7.3 Signal processing/statistical models for synthesis | |
7.4 Speech synthesis paradigms and methods | |
7.5 Articulatory speech synthesis | |
7.6 Segment-level and/or concatenative synthesis | |
7.7 Unit selection speech synthesis | |
7.8 Statistical parametric speech synthesis | |
7.9 Prosody modeling and generation | |
7.10 Expression, emotion and personality generation | |
7.11 Synthesis of singing voices | |
7.12 Voice modification, conversion and morphing | |
7.13 Concept-to-speech conversion | |
7.14 Cross-lingual and multilingual aspects in speech synthesis, code switching | |
7.15 Multimodal synthesis for avatars and talking heads | |
7.16 Tools and data for speech synthesis | |
7.17 Evaluation of speech synthesis | |
7.18 Special Session: State of the art in physics-based voice simulation | |
8. Speech Recognition — Signal Processing, Acoustic Modeling, Robustness and Adaptation |
|
8.1 Feature extraction and low-level feature modeling for ASR | |
8.2 Prosodic features and models | |
8.3 Robustness against noise, reverberation | |
8.4 Far field and microphone array speech recognition | |
8.5 Speaker normalization (e.g., VTLN) | |
8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN) | |
8.7 Discriminative acoustic training methods for ASR | |
8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent) | |
8.9 Speaker adaptation, speaker adapted training methods | |
8.10 Pronunciation variants and modeling for speech recognition | |
8.11 Acoustic confidence measures | |
8.12 Cross-lingual and multilingual aspects, non-native accents | |
8.13 Acoustic modeling for conversational speech (dialog, interaction) | |
9. Speech Recognition — Architecture, Search, and Linguistic Components |
|
9.1 Lexical modeling and access: units and models | |
9.2 Automatic lexicon learning | |
9.3 Supervised/unsupervised morphological models | |
9.4 Prosodic features and models for language modeling | |
9.5 Discriminative training methods for language modeling | |
9.6 Language model adaptation (domain, diachronic adaptation) | |
9.7 Language modeling for conversational speech (dialog, interaction) | |
9.8 Neural networks for language modeling | |
9.9 Search methods, decoding algorithms, lattices, multipass strategies | |
9.10 New computational strategies, data-structures for ASR | |
9.11 Computational resource constrained speech recognition | |
9.12 Confidence measures | |
9.13 Cross-lingual and multilingual components for speech recognition, code switching | |
9.14 Structured classification approaches | |
10. Speech Recognition — Technologies and Systems for New Applications |
|
10.1 Multimodal systems | |
10.2 Applications in education and learning (incl. CALL, assessment of fluency) | |
10.3 Applications in medical practice (CIS, voice assessment, etc.) | |
10.4 Speech science in end-user applications | |
10.5 Rich transcription | |
10.6 Innovative products and services based on speech technologies | |
10.7 Sparse, template-based representations | |
10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models) | |
10.9 Zero-resource speech recognition | |
11. Spoken Dialog Systems and Analysis of Conversation |
|
11.1 Spoken dialog systems | |
11.2 Discourse and dialog structures | |
11.3 Multimodal interaction and interfaces | |
11.4 Conversation, communication and interaction | |
11.5 Analysis of verbal, co-verbal and nonverbal behavior | |
11.6 Interactive systems for speech/language training, therapy, communication aids | |
11.7 Stochastic modeling for dialog | |
11.8 Question-answering from speech | |
11.9 Spoken interaction with social robots | |
11.10 Systems for spoken language understanding | |
11.11 Evaluation of speech and multimodal dialog systems | |
11.12 Special Session: Speech and Human-Robot Interaction | |
11.13 Special Session: Incremental Processing and Responsive Behaviour | |
12. Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources and Evaluation |
|
12.1 Spoken machine translation | |
12.2 Speech-to-speech translation systems | |
12.3 Transliteration | |
12.4 Voice search | |
12.5 Spoken term detection | |
12.6 Audio indexing | |
12.7 Spoken document retrieval | |
12.8 Systems for mining spoken data, search or retrieval of speech documents | |
12.9 Speech and multimodal resources and annotation, code switching | |
12.10 Evaluation of speech recognition | |
12.11 Metadata descriptions of speech, audio and text resources | |
12.12 Metadata for semantic or content markup | |
12.13 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts) | |
12.14 Methodologies and tools for language resource construction and annotation | |
12.15 Automatic segmentation and labeling of resources | |
12.16 Multilingual resources | |
12.17 Evaluation and quality insurance of language resources | |
12.18 Evaluation of translation and information retrieval systems | |
12.19 Spoken document summarization | |
12.20 Semantic analysis and classification | |
12.21 Entity extraction from speech | |
12.22 Evaluation of summarization and understanding | |
12.23 Topic spotting and classification | |
12.24 Special Session: Digital Revolution for Under-resourced Languages |