Given the ubiquity of Machine Learning (ML) systems and their relevance in daily lives, it is important to ensure private and safe handling of data alongside equity in human experience. These considerations have gained considerable interest in recent times under the realm of Trustworthy ML. Speech processing in particular presents a unique set of challenges, given the rich information carried in linguistic and paralinguistic content including speaker trait, interaction and state characteristics including health status. In this workshop on Trustworthy Speech Processing (TSP), we aim to bring together new and experienced researchers working on trustworthy ML and speech processing. We invite novel and relevant submissions from both academic and industrial research groups showcasing theoretical and empirical advancements in TSP.
Topics of interest cover a variety of papers centered on speech processing, including (but not limited to):
- Differential privacy
- Bias and Fairness
- Federated learning
- Ethics in speech processing
- Model interpretability
- Quantifying & mitigating bias in speech processing
- New datasets, frameworks and benchmarks for TSP
- Discovery and defense against emerging privacy attacks
- Trustworthy ML in applications of speech processing like ASR
Schedule
- 2:00 to 2:05 PM: Opening remarks
- 2:05 to 2:55 PM: Invited talk by Dr. Jennifer Williams, University of Southampton, UK.
- 2:55 to 3:55 PM: Paper presentations: 1
- 3:55 to 4:05 PM: Break
- 4:05 to 5:25 PM: Paper presentations: 2
- 5:25 to 5:30 PM: Closing thoughts
Invited Talk
Prof. Jennifer Williams, University of Southampton.
Title: When does the next epoch in Speech Technology begin?
Abstract:
As a discipline, speech technology is about to meet at a crossroads. Foundational research is becoming more interdisciplinary. The drivers of innovation are creating a melting pot. AI regulation and AI safety is not only popular, but it is now becoming necessary to consider its impacts. For some speech researchers, the culmination of our collective scientific progress may appear to be the most natural progression from a world driven by consumer electronics, the Internet, and global connectivity. Yet for others, how we arrived at this point reflects the ebbs and flows of funding body research priorities, shifting paradigms, and trending sociotechnical matters. This talk illuminates key revelations as to how our discipline is changing, explores several parallel revolutions happening within our field, and opens a discussion of how recent global attitudes toward AI safety may impact our technical work while also providing new research opportunities.
Bio:
Dr Jennifer Williams is an Assistant Professor at the University of Southampton, where she is PI/Co-PI on several large interdisciplinary projects, including voice anonymisation, deepfake detection, public perceptions of speech technology, speech paralinguistics for medical applications, and AI regulation. She is also the South England Engagement Director and Co-I on the new UK National Edge AI Hub, which is a consortium of 12 universities in the UK. She completed her PhD at the University of Edinburgh on representation learning for speech signal disentanglement, while working part-time in industry on edge-device voice recognition. Before that, she was a technical staff member at MIT Lincoln Laboratory for five years where she developed rapid prototyping solutions human language technology. She is Chair of the ISCA special interest group on Security and Privacy in Speech Communication (SPSC-SIG) and an affiliate member of the NIST-OSAC subcommittee on speaker recognition for forensic science, developing US and ISO standards for speech.
Accepted Papers
Following papers were accepted for oral presentation at the workshop (ordered alphabetically):
- Adversarial speech for voice privacy protection from personalized speech generation. Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai.
- Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection. Yi Zhu, Saurabh Powar, Tiago Falk.
- Improving Membership Inference in ASR Model Auditing with Loss-based Features. Francisco Teixeira, Karla Pizzi, Raphael Olivier, Alberto Abad, Bhiksha Raj, Isabel Trancoso.
- LCANETS++: Robust audio classification using multi-layer neural networks with lateral competition. Sayanton V. Dibbo, Juston Moore, Garrett T. Kenyon, Michael Teti.
- Leveraging Confidence Models for Identifying Challenging Data Subgroups in Speech Models. Alkis Koudounas, Eliana Pastor, Vittorio Mazzia, Manuel Giollo, Thomas Gueudre, Elisa Reale, Giuseppe Attanasio, Luca Cagliero, Sandro Cumani, Luca de Alfaro, Elena Baralis, Daniele Amberti.
- Multi-modal approaches for improving the robustness of audio-based COVID-19 detection systems. Drew Grant, Helena Hahn, Adebayo Eisape, Valerie Rennoll, James West.
- Partial Federated Learning: Unlocking Non-Biometric Text Information Sharing for Federated Learning. Tiantian Feng, Anil Ramakrishna, Jimit Majmudar, Charith Peris, Jixuan Wang, Clement Chung, Richard Zemel, Morteza Ziyadi.
Important Dates
- Paper submission deadline:
- January 27th, 2024, AoE (to be considered for archival on IEEE Xplore).
- February 20th, 2024, AoE (non-archival papers).
- Author notification: Two weeks after each deadline above.
- Workshop date: April 15th, 2024 from 2:00 PM to 5:30 PM KST (Room 209A).
Organizers
- Anil Ramakrishna, Amazon Inc.
- Shrikanth Narayanan, University of Southern California
- Rahul Gupta, Amazon Inc.
- Isabel Trancoso, University of Lisbon
- Bhiksha Raj, Carnegie Mellon University
- Theodora Chaspari, Texas A&M University
- Tiantian Feng, University of Southern California
Contact
If you have any questions, please contact us at trustworthyspeechproc@gmail.com.