Publications
An Introduction to Signal Processing for Singing-Voice Analysis: High Notes in the Effort to Automate the Understanding of Vocals in Music.
Eric J. Humphrey,
Sravana Reddy,
Prem Seetharaman,
Aparna Kumar,
Rachel M. Bittner,
Andrew Demetriou,
Sankalp Gulati,
Andreas Jansson,
Tristan Jehan,
Bernhard Lehner,
Anna Kruspe,
and
Luwei Yang.
IEEE Signal Processing Magazine, Volume 36 Issue 1, 2019.
paper
abstract
Humans have devised a vast array of musical instruments, but the most prevalent instrument remains the human voice. Thus, techniques for applying audio signal processing methods to the singing voice are receiving much attention as the world continues to move toward music-streaming services and as researchers seek to unlock the deep content understanding necessary to enable personalized listening experiences on a large scale. This article provides an introduction to the topic of singing-voice analysis. It surveys the foundations and state of the art in computational modeling across three main categories of singing: general vocalizations, the musical function of voice, and the singing of lyrics. We aim to establish a starting point for practitioners new to this field and frame near-field opportunities and challenges on the horizon.
Assessing and Addressing Algorithmic Bias in Practice.
Henriette Cramer,
Jean Garcia-Gathright,
Aaron Springer,
and Sravana Reddy.
In ACM Interactions, 2018.
paper
abstract
Unfair algorithmic biases and unintended negative side effects of machine learning (ML) are gaining attention -- and rightly so. We know that machine translation systems trained on existing historical data can make unfortunate errors that perpetuate gender stereotypes. We know that voice devices underperform for users with certain accents, since speech recognition isn’t necessarily trained on all speaker dialects. We know that recommendations can amplify existing inequalities. However, pragmatic challenges stand in the way of practitioners committed to addressing these issues. There are no clear guidelines or industry-standard processes that can be readily applied in practice on what biases to assess or how to address them. While researchers have begun to create a rich discourse in this space, the translation of research discussions into practice is challenging. Barriers to action are threefold: understanding the issues, inventing approaches to address the issues, and confronting organizational/institutional challenges to implementing solutions at scale. In this article, we describe our engagement with translating research literature into processes special topic within our own organization as a start to assessing both intended data and algorithm characteristics and unintended unfair biases. This includes a checklist of questions teams can ask while making decisions in data collection, modeling decisions, and assessing outcomes, as well as a product case study involving voice. We discuss lessons learned along the way and offer suggestions for others seeking guidelines and recommendations for best practices for the mitigation of algorithmic bias and negative effects of ML in consumer-facing product development.
Bring on the crowd! Using online audio crowdsourcing for large-scale New England dialectology
and acoustic sociophonetics.
Chaeyoon Kim, Sravana Reddy,
James Stanford,
Ezra Wyschogrod, and
Jack Grieve.
In American Speech, 2018.
paper
abstract
This study aims to (i) identify current geographic patterns of major New England dialect
features relative to prior work in previous generations and (ii) test the effectiveness of large-scale audio
crowdsourcing by using Amazon Mechanical Turk to collect sociophonetic data. Using recordings from
626 speakers across the region, this is the largest acoustic sociophonetic study ever conducted in New
England. Although face-to-face fieldwork will always be crucial in sociolinguistics and dialectology, we
believe that online audio-recorded crowdsourcing can play a valuable complementary role, offering an
efficient way to greatly increase the scope of sample sizes and sociolinguistic knowledge.
Implementing a Hidden Markov Model Toolkit.
Sravana Reddy.
In
Proceedings of the AAAI 2017 Symposium on Educational Advances in Artificial Intelligence (EAAI):
Model AI Assignments.
abstract
material
Hidden Markov Models (HMMs) are a backbone of speech and natural language processing
(NLP) and computational biology.
They are a great way to teach general concepts such as dynamic programming,
data-driven learning and inference, and expectation maximization for
unsupervised learning.
We have found that asking students in undergraduate NLP or machine learning
courses to implement an HMM program is helpful in solidifying these ideas.
Students also derive a great deal of satisfaction from completing a sizable project.
However, the algorithms can seem abstract,
and the scope of the project may be intimidating to some students.
Our assignment breaks down the HMM into modular chunks, and motivates HMMs
through applications drawn from NLP research rather than toy problems.
We include
- Starter code in an object-oriented Python framework, and programming guidelines.
- Data files that draw from part-of-speech tagging,
unsupervised discovery of vowels and consonants,
and decipherment of substitution ciphers.
- Lecture slides on HMMs, including step-through visualizations of the
inference algorithms that translate easily into code.
Obfuscating Gender in Social Media Writing.
Sravana Reddy and
Kevin Knight.
In
Proceedings of the EMNLP 2016 Workshop on Natural Language Processing and Computational Social Science.
paper
abstract
The vast availability of textual data on social media has led
algorithms to automatically predict user attributes such as
gender based on the user's writing.
These methods are valuable for social science research as well as
targeted advertising and profiling,
but also compromise the privacy of users who may not realize that their
personal idiolects can give away their demographic identities.
Can we automatically modify a text so that the author is classified as a
certain target gender, under limited knowledge of the classifier,
while preserving the text's fluency and meaning?
We present a basic model to modify a text using lexical substitution,
show empirical results with Twitter and Yelp data,
and outline ideas for extensions.
Toward completely automated vowel extraction: Introducing
DARLA.
Sravana Reddy and
James Stanford.
Linguistics Vanguard (2015).
pdf
paper
abstract
Automatic Speech Recognition (ASR) is reaching further and further
into everyday life with Apple's Siri, Google voice search, automated
telephone information systems, dictation devices, closed captioning,
and other applications. Along with such advances in speech
technology, sociolinguists have been considering new methods for
alignment and vowel formant extraction, including techniques like
the Penn Aligner (Yuan and Liberman, 2008) and the FAVE automated
vowel extraction program (Evanini et al., 2009, Rosenfelder et al.,
2011). With humans transcribing audio recordings into sentences,
these semi-automated methods can produce effective vowel formant
measurements (Labov et al., 2013). But as the quality of ASR
improves, sociolinguistics may be on the brink of another
transformative technology: large-scale, completely automated vowel
extraction without any need for human transcription. It would then
be possible to quickly extract vowels from virtually limitless hours
of recordings, such as YouTube, publicly available audio/video
archives, and large-scale personal interviews or streaming
video. How far away is this transformative moment? In this article,
we introduce a fully automated program called DARLA (short for
"Dartmouth Linguistic Automation,"
http://darla.dartmouth.edu),
which automatically generates transcriptions with ASR and extracts vowels using FAVE. Users simply upload an audio recording of speech, and DARLA produces vowel plots, a table of vowel formants, and probabilities of the phonetic environments for each token. In this paper, we describe DARLA and explore its sociolinguistic applications. We test the system on a dataset of the US Southern Shift and compare the results with semi-automated methods.
A Web Application for Automated Dialect
Analysis.
Sravana Reddy and James N. Stanford. In
Proceedings of NAACL 2015 (Demos).
paper
abstract
poster
website
Sociolinguists are regularly faced with the task of measuring phonetic
features from speech, which involves manually transcribing
audio recordings -- a major bottleneck to analyzing large
collections of data. We harness automatic speech
recognition to build an online end-to-end web application
where users upload untranscribed speech collections and
receive formant measurements of the vowels in their
data. We demonstrate this tool by using it to
automatically analyze President Barack Obama’s vowel
pronunciations.
Decoding Running Key Ciphers.
Sravana Reddy and Kevin Knight. In
Proceedings of ACL 2012.
paper
abstract
There has been recent interest in the problem of decoding letter substitution ciphers using techniques inspired by natural language processing. We consider a different type of classical encoding scheme known as the running key cipher, and propose a search solution using Gibbs sampling with a word language model. We evaluate our method on synthetic ciphertexts of different lengths, and find that it outperforms previous work that employs Viterbi
decoding with character-based models.
G2P Conversion of Proper Names Using Word Origin
Information. Sonjia Waxmonsky
and Sravana Reddy. In Proceedings of NAACL 2012.
paper
abstract
poster
data
Motivated by the fact that the pronunciation of a name may be
influenced by its language of origin, we present methods to improve
pronunciation prediction of proper names using
word origin information.
We train grapheme-to-phoneme (G2P) models on language-specific data
sets and interpolate the outputs.
We perform experiments on US personal surnames, a data set where word origin variation
occurs naturally. Our methods can be used with any G2P algorithm that outputs
posterior probabilities of phoneme sequences for a given word.
Learning from Mistakes: Expanding Pronunciation
Lexicons Using Word Recognition Errors. Sravana
Reddy and Evandro Gouvêa. In Proceedings of Interspeech 2011.
paper
abstract
slides
We introduce the problem of learning pronunciations of out-of-vocabulary words from word recognition mistakes made by an automatic speech recognition (ASR) system. This question is especially relevant in cases where the ASR engine is a black box -- meaning that the only acoustic cues about the speech data come from the word recognition outputs. This paper presents an expectation maximization approach to inferring pronunciations from ASR word recognition hypotheses, which outperforms pronunciation estimates of a state of the art grapheme-to-phoneme system.
Unsupervised Discovery of Rhyme Schemes. Sravana
Reddy and Kevin Knight. In Proceedings of ACL 2011.
paper
abstract
slides
data
code
This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
What We Know About The Voynich Manuscript. Sravana
Reddy and Kevin
Knight. In Proceedings of the ACL 2011 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities.
paper
abstract
slides
code and data
press
The Voynich Manuscript is an undeciphered document from medieval
Europe. We present current knowledge about the manuscript's text
through a series of questions about its linguistic properties.
An MDL-based Approach to Extracting Subword Units for Grapheme-to-Phoneme Conversion. Sravana
Reddy and John
Goldsmith. In Proceedings of NAACL 2010.
paper
abstract
We address a key problem in grapheme-to-phoneme conversion: the ambiguity in mapping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose learning chunks, or subwords, to reduce ambiguity. This can be interpreted as learning a lexicon of subwords that has minimum description length. We implement an algorithm to build such a lexicon, as well as a simple decoder that uses these subwords.
Substring-based Transliteration with Conditional Random Fields. Sravana
Reddy and Sonjia
Waxmonsky. In Proceedings of the ACL 2010 Names
Entities Workshop.
paper
abstract
Motivated by phrase-based translation research, we present a transliteration system where characters are grouped into substrings to be mapped atomically into the target language. We show how this substring representation can be incorporated into a Conditional Random Field model that uses local context and phonemic information. Our training and test data consists of three sets: English to Hindi, English to Kannada, and English to Tamil (Kumaran and Kellner, 2007) from the NEWS 2009 Machine Transliteration Shared Task (Li et al., 2009).
Understanding Eggcorns. Sravana
Reddy. In Proceedings of the the NAACL 2009 Workshop on Computational Approaches to Linguistic Creativity.
paper
abstract
An eggcorn is a type of linguistic error where a word is substituted with one that is semantically plausible -- that is, the substitution is a semantic reanalysis of what may be a rare, archaic, or otherwise opaque term. We build a system that, given the original word and its eggcorn form, finds a semantic path between the two. Based on these paths, we derive a typology that reflects the different classes of semantic reinterpretation underlying eggcorns.
Presentations
These are non-archival papers at conferences.
A large-scale online study of dialect variation in the US Northeast: Crowdsourcing with Amazon Mechanical Turk.
Chaeyoon Kim, Sravana Reddy, Ezra Wyschogrod, and James Stanford.
In NWAV 2016.
abstract
Due to the Founder Effect and the early English colonies, the US Northeast has some of the smallest dialect sub-regions in North America. Can these fine-grained distinctions be observed in an online crowd-sourced survey? Moreover, Carver predicts that new features emerge along the same lines as previous generations. Is this happening in Eastern New England?
We used Amazon Mechanical Turk for two online crowdsourcing tasks: a self-reporting survey and an audio-recording task. We analyze the data graphically and statistically and compare with prior work.
Automatic speech recognition in sociophonetics.
Sravana Reddy, James
N. Stanford, and Michael Lefkowitz.
Workshop (tutorial) in NWAV 2015.
link
Is the Future Almost
Here? Large-Scale Completely Automated Vowel Extraction of Free
Speech. Sravana Reddy and James
N. Stanford. In NWAV 2014.
abstract
slides
Automatic Speech Recognition (ASR) is reaching farther into everyday
life through applications like Apple’s Siri. Likewise,
sociolinguists have been considering new technologies for vowel
formant extraction, including semi-automated alignment/extraction
techniques like the Penn Aligner and Forced Alignment Vowel
Extraction (FAVE). With humans transcribing recordings into
sentences, these semi-automated methods produce effective
results. But sociolinguistics may be on the brink of another
transformative technology: large-scale, completely automated vowel
extraction without any need for human transcription. It would then
be possible to quickly extract vowels from virtually limitless hours
of recordings, such as YouTube, publicly available audio/video
archives, and even live-streaming video. How far away is this
transformative moment? In the present study, we apply
state-of-the-art ASR to a real-world sociolinguistic dataset
(U.S. Southern Vowel Shift) as a feasibility test.
A
Twitter-Based Study of Newly Formed Clippings in American
English. Sravana Reddy, James
N. Stanford, and Joy Zhong. In ADS 2014.
abstract
slides
press
Following Baclawski (2012), this study uses Twitter to examine newly formed clippings
among younger speakers, including awks (awkward), adorb (adorable), ridic
(ridiculous), hilar (hilarious). We analyzed 94 million tweets from
334,000 U.S. Twitter users who posted during 2013 (cf. Eisenstein et al. 2010; Bamman et al. 2012). We find
that while women and men both use truncated forms, women are the leaders of the newer,
primarily adjectival forms. These recently coined forms are also more common in tweets
from urban locations. We compare our results to classic principles (Labov 2001),
illustrating how large-scale Twitter analyses can be valuable in American dialectology.
A Document Recognition System for Early Modern
Latin. Sravana Reddy and Gregory
Crane. In DHCS 2006.
abstract
Large-scale digitization of manuscripts is facilitated by
high-accuracy optical character recognition (OCR) engines. The
focus of our work is on using these tools to digitize Latin
texts. Many of the texts in the language, especially the early
modern, make heavy use of special characters like ligatures and
accented abbreviations. Current OCRs are inadequate for our
purpose: their built-in training sets do not include all these
special characters, and further, post-processing of OCR output is
based on data and methods specific to the domain language, most of
the current systems do not implement error-correction tools for
Latin. This abstract outlines the development of a document
recognition system for medieval and early modern Latin texts. We
first evaluate the performance of the open source OCR framework,
Gamera, on these manuscripts. We then incorporate language
modeling functions to sharpen the character recognition output.