Sravana Reddy

sravana.reddy at

I am a researcher at Spotify in Boston, where I work on projects related to natural language processing and machine learning. I also hold a courtesy appointment at Dartmouth College, where I maintain DARLA, a web application for automating sociophonetics.

I received my PhD in Computer Science from the University of Chicago and my bachelors' from Brandeis University. I've worked in the past at USC ISI, Dartmouth and Wellesley.



An Introduction to Signal Processing for Singing-Voice Analysis: High Notes in the Effort to Automate the Understanding of Vocals in Music. Eric J. Humphrey, Sravana Reddy, Prem Seetharaman, Aparna Kumar, Rachel M. Bittner, Andrew Demetriou, Sankalp Gulati, Andreas Jansson, Tristan Jehan, Bernhard Lehner, Anna Kruspe, and Luwei Yang. IEEE Signal Processing Magazine, Volume 36 Issue 1, 2019.


Humans have devised a vast array of musical instruments, but the most prevalent instrument remains the human voice. Thus, techniques for applying audio signal processing methods to the singing voice are receiving much attention as the world continues to move toward music-streaming services and as researchers seek to unlock the deep content understanding necessary to enable personalized listening experiences on a large scale. This article provides an introduction to the topic of singing-voice analysis. It surveys the foundations and state of the art in computational modeling across three main categories of singing: general vocalizations, the musical function of voice, and the singing of lyrics. We aim to establish a starting point for practitioners new to this field and frame near-field opportunities and challenges on the horizon.

Assessing and Addressing Algorithmic Bias in Practice. Henriette Cramer, Jean Garcia-Gathright, Aaron Springer, and Sravana Reddy. In ACM Interactions, 2018.


Unfair algorithmic biases and unintended negative side effects of machine learning (ML) are gaining attention -- and rightly so. We know that machine translation systems trained on existing historical data can make unfortunate errors that perpetuate gender stereotypes. We know that voice devices underperform for users with certain accents, since speech recognition isn’t necessarily trained on all speaker dialects. We know that recommendations can amplify existing inequalities. However, pragmatic challenges stand in the way of practitioners committed to addressing these issues. There are no clear guidelines or industry-standard processes that can be readily applied in practice on what biases to assess or how to address them. While researchers have begun to create a rich discourse in this space, the translation of research discussions into practice is challenging. Barriers to action are threefold: understanding the issues, inventing approaches to address the issues, and confronting organizational/institutional challenges to implementing solutions at scale. In this article, we describe our engagement with translating research literature into processes special topic within our own organization as a start to assessing both intended data and algorithm characteristics and unintended unfair biases. This includes a checklist of questions teams can ask while making decisions in data collection, modeling decisions, and assessing outcomes, as well as a product case study involving voice. We discuss lessons learned along the way and offer suggestions for others seeking guidelines and recommendations for best practices for the mitigation of algorithmic bias and negative effects of ML in consumer-facing product development.

Bring on the crowd! Using online audio crowdsourcing for large-scale New England dialectology and acoustic sociophonetics. Chaeyoon Kim, Sravana Reddy, James Stanford, Ezra Wyschogrod, and Jack Grieve. In American Speech, 2018.


This study aims to (i) identify current geographic patterns of major New England dialect features relative to prior work in previous generations and (ii) test the effectiveness of large-scale audio crowdsourcing by using Amazon Mechanical Turk to collect sociophonetic data. Using recordings from 626 speakers across the region, this is the largest acoustic sociophonetic study ever conducted in New England. Although face-to-face fieldwork will always be crucial in sociolinguistics and dialectology, we believe that online audio-recorded crowdsourcing can play a valuable complementary role, offering an efficient way to greatly increase the scope of sample sizes and sociolinguistic knowledge.

Implementing a Hidden Markov Model Toolkit. Sravana Reddy. In Proceedings of the AAAI 2017 Symposium on Educational Advances in Artificial Intelligence (EAAI): Model AI Assignments.


Hidden Markov Models (HMMs) are a backbone of speech and natural language processing (NLP) and computational biology. They are a great way to teach general concepts such as dynamic programming, data-driven learning and inference, and expectation maximization for unsupervised learning. We have found that asking students in undergraduate NLP or machine learning courses to implement an HMM program is helpful in solidifying these ideas. Students also derive a great deal of satisfaction from completing a sizable project. However, the algorithms can seem abstract, and the scope of the project may be intimidating to some students. Our assignment breaks down the HMM into modular chunks, and motivates HMMs through applications drawn from NLP research rather than toy problems. We include
  • Starter code in an object-oriented Python framework, and programming guidelines.
  • Data files that draw from part-of-speech tagging, unsupervised discovery of vowels and consonants, and decipherment of substitution ciphers.
  • Lecture slides on HMMs, including step-through visualizations of the inference algorithms that translate easily into code.

Obfuscating Gender in Social Media Writing. Sravana Reddy and Kevin Knight. In Proceedings of the EMNLP 2016 Workshop on Natural Language Processing and Computational Social Science.


The vast availability of textual data on social media has led algorithms to automatically predict user attributes such as gender based on the user's writing. These methods are valuable for social science research as well as targeted advertising and profiling, but also compromise the privacy of users who may not realize that their personal idiolects can give away their demographic identities. Can we automatically modify a text so that the author is classified as a certain target gender, under limited knowledge of the classifier, while preserving the text's fluency and meaning? We present a basic model to modify a text using lexical substitution, show empirical results with Twitter and Yelp data, and outline ideas for extensions.

Toward completely automated vowel extraction: Introducing DARLA. Sravana Reddy and James Stanford. Linguistics Vanguard (2015).

pdf paper

Automatic Speech Recognition (ASR) is reaching further and further into everyday life with Apple's Siri, Google voice search, automated telephone information systems, dictation devices, closed captioning, and other applications. Along with such advances in speech technology, sociolinguists have been considering new methods for alignment and vowel formant extraction, including techniques like the Penn Aligner (Yuan and Liberman, 2008) and the FAVE automated vowel extraction program (Evanini et al., 2009, Rosenfelder et al., 2011). With humans transcribing audio recordings into sentences, these semi-automated methods can produce effective vowel formant measurements (Labov et al., 2013). But as the quality of ASR improves, sociolinguistics may be on the brink of another transformative technology: large-scale, completely automated vowel extraction without any need for human transcription. It would then be possible to quickly extract vowels from virtually limitless hours of recordings, such as YouTube, publicly available audio/video archives, and large-scale personal interviews or streaming video. How far away is this transformative moment? In this article, we introduce a fully automated program called DARLA (short for "Dartmouth Linguistic Automation,", which automatically generates transcriptions with ASR and extracts vowels using FAVE. Users simply upload an audio recording of speech, and DARLA produces vowel plots, a table of vowel formants, and probabilities of the phonetic environments for each token. In this paper, we describe DARLA and explore its sociolinguistic applications. We test the system on a dataset of the US Southern Shift and compare the results with semi-automated methods.

A Web Application for Automated Dialect Analysis. Sravana Reddy and James N. Stanford. In Proceedings of NAACL 2015 (Demos).

paper poster website

Sociolinguists are regularly faced with the task of measuring phonetic features from speech, which involves manually transcribing audio recordings -- a major bottleneck to analyzing large collections of data. We harness automatic speech recognition to build an online end-to-end web application where users upload untranscribed speech collections and receive formant measurements of the vowels in their data. We demonstrate this tool by using it to automatically analyze President Barack Obama’s vowel pronunciations.

Decoding Running Key Ciphers. Sravana Reddy and Kevin Knight. In Proceedings of ACL 2012.


There has been recent interest in the problem of decoding letter substitution ciphers using techniques inspired by natural language processing. We consider a different type of classical encoding scheme known as the running key cipher, and propose a search solution using Gibbs sampling with a word language model. We evaluate our method on synthetic ciphertexts of different lengths, and find that it outperforms previous work that employs Viterbi decoding with character-based models.

G2P Conversion of Proper Names Using Word Origin Information. Sonjia Waxmonsky and Sravana Reddy. In Proceedings of NAACL 2012.

paper poster data

Motivated by the fact that the pronunciation of a name may be influenced by its language of origin, we present methods to improve pronunciation prediction of proper names using word origin information. We train grapheme-to-phoneme (G2P) models on language-specific data sets and interpolate the outputs. We perform experiments on US personal surnames, a data set where word origin variation occurs naturally. Our methods can be used with any G2P algorithm that outputs posterior probabilities of phoneme sequences for a given word.

Learning from Mistakes: Expanding Pronunciation Lexicons Using Word Recognition Errors. Sravana Reddy and Evandro Gouvêa. In Proceedings of Interspeech 2011.

paper slides

We introduce the problem of learning pronunciations of out-of-vocabulary words from word recognition mistakes made by an automatic speech recognition (ASR) system. This question is especially relevant in cases where the ASR engine is a black box -- meaning that the only acoustic cues about the speech data come from the word recognition outputs. This paper presents an expectation maximization approach to inferring pronunciations from ASR word recognition hypotheses, which outperforms pronunciation estimates of a state of the art grapheme-to-phoneme system.

Unsupervised Discovery of Rhyme Schemes. Sravana Reddy and Kevin Knight. In Proceedings of ACL 2011.

paper slides data code

This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.

What We Know About The Voynich Manuscript. Sravana Reddy and Kevin Knight. In Proceedings of the ACL 2011 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities.

paper slides code and data press

The Voynich Manuscript is an undeciphered document from medieval Europe. We present current knowledge about the manuscript's text through a series of questions about its linguistic properties.

An MDL-based Approach to Extracting Subword Units for Grapheme-to-Phoneme Conversion. Sravana Reddy and John Goldsmith. In Proceedings of NAACL 2010.


We address a key problem in grapheme-to-phoneme conversion: the ambiguity in mapping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose learning chunks, or subwords, to reduce ambiguity. This can be interpreted as learning a lexicon of subwords that has minimum description length. We implement an algorithm to build such a lexicon, as well as a simple decoder that uses these subwords.

Substring-based Transliteration with Conditional Random Fields. Sravana Reddy and Sonjia Waxmonsky. In Proceedings of the ACL 2010 Names Entities Workshop.


Motivated by phrase-based translation research, we present a transliteration system where characters are grouped into substrings to be mapped atomically into the target language. We show how this substring representation can be incorporated into a Conditional Random Field model that uses local context and phonemic information. Our training and test data consists of three sets: English to Hindi, English to Kannada, and English to Tamil (Kumaran and Kellner, 2007) from the NEWS 2009 Machine Transliteration Shared Task (Li et al., 2009).

Understanding Eggcorns. Sravana Reddy. In Proceedings of the the NAACL 2009 Workshop on Computational Approaches to Linguistic Creativity.


An eggcorn is a type of linguistic error where a word is substituted with one that is semantically plausible -- that is, the substitution is a semantic reanalysis of what may be a rare, archaic, or otherwise opaque term. We build a system that, given the original word and its eggcorn form, finds a semantic path between the two. Based on these paths, we derive a typology that reflects the different classes of semantic reinterpretation underlying eggcorns.


These are non-archival papers at conferences.

A large-scale online study of dialect variation in the US Northeast: Crowdsourcing with Amazon Mechanical Turk. Chaeyoon Kim, Sravana Reddy, Ezra Wyschogrod, and James Stanford. In NWAV 2016.

Due to the Founder Effect and the early English colonies, the US Northeast has some of the smallest dialect sub-regions in North America. Can these fine-grained distinctions be observed in an online crowd-sourced survey? Moreover, Carver predicts that new features emerge along the same lines as previous generations. Is this happening in Eastern New England? We used Amazon Mechanical Turk for two online crowdsourcing tasks: a self-reporting survey and an audio-recording task. We analyze the data graphically and statistically and compare with prior work.

Automatic speech recognition in sociophonetics. Sravana Reddy, James N. Stanford, and Michael Lefkowitz. Workshop (tutorial) in NWAV 2015. link

Is the Future Almost Here? Large-Scale Completely Automated Vowel Extraction of Free Speech. Sravana Reddy and James N. Stanford. In NWAV 2014.


Automatic Speech Recognition (ASR) is reaching farther into everyday life through applications like Apple’s Siri. Likewise, sociolinguists have been considering new technologies for vowel formant extraction, including semi-automated alignment/extraction techniques like the Penn Aligner and Forced Alignment Vowel Extraction (FAVE). With humans transcribing recordings into sentences, these semi-automated methods produce effective results. But sociolinguistics may be on the brink of another transformative technology: large-scale, completely automated vowel extraction without any need for human transcription. It would then be possible to quickly extract vowels from virtually limitless hours of recordings, such as YouTube, publicly available audio/video archives, and even live-streaming video. How far away is this transformative moment? In the present study, we apply state-of-the-art ASR to a real-world sociolinguistic dataset (U.S. Southern Vowel Shift) as a feasibility test.

A Twitter-Based Study of Newly Formed Clippings in American English. Sravana Reddy, James N. Stanford, and Joy Zhong. In ADS 2014.

slides press

Following Baclawski (2012), this study uses Twitter to examine newly formed clippings among younger speakers, including awks (awkward), adorb (adorable), ridic (ridiculous), hilar (hilarious). We analyzed 94 million tweets from 334,000 U.S. Twitter users who posted during 2013 (cf. Eisenstein et al. 2010; Bamman et al. 2012). We find that while women and men both use truncated forms, women are the leaders of the newer, primarily adjectival forms. These recently coined forms are also more common in tweets from urban locations. We compare our results to classic principles (Labov 2001), illustrating how large-scale Twitter analyses can be valuable in American dialectology.

A Document Recognition System for Early Modern Latin. Sravana Reddy and Gregory Crane. In DHCS 2006.

Large-scale digitization of manuscripts is facilitated by high-accuracy optical character recognition (OCR) engines. The focus of our work is on using these tools to digitize Latin texts. Many of the texts in the language, especially the early modern, make heavy use of special characters like ligatures and accented abbreviations. Current OCRs are inadequate for our purpose: their built-in training sets do not include all these special characters, and further, post-processing of OCR output is based on data and methods specific to the domain language, most of the current systems do not implement error-correction tools for Latin. This abstract outlines the development of a document recognition system for medieval and early modern Latin texts. We first evaluate the performance of the open source OCR framework, Gamera, on these manuscripts. We then incorporate language modeling functions to sharpen the character recognition output.


Learning Pronunciations from Unlabeled Evidence. 2012. Doctoral Dissertation, The University of Chicago. front matter

Part of Speech Induction Using Non-negative Matrix Factorization. 2009. Masters' Thesis, The University of Chicago.

Unsupervised part-of-speech induction involves the discovery of syntactic categories in a text, given no additional information other than the text itself. One requirement of an induction system is the ability to handle multiple categories for each word, in order to deal with word sense ambiguity. We construct an algorithm for unsupervised part-of-speech induction, treating the problem as one of soft clustering. The key technical component of the algorithm is the application of the recently developed technique of non-negative matrix factorization to the task of category discovery, using word contexts and morphology as syntactic cues.

Students Advised

Ian Stewart (Senior Thesis at Dartmouth). Now a PhD student at Georgia Tech.

  • Now We Stronger Than Ever: African-American Syntax in Twitter. In Proceedings of the Student Research Workshop at EACL 2014. paper

Emily Ahn (Senior Thesis at Wellesley). Now a student at Carnegie Mellon LTI.

  • A Computational Approach to Foreign Accent Classification. thesis

Teaching & Service

I was the local organizer for the North American Computational Linguistics Olympiad (NACLO) at Dartmouth, and one of the co-chairs of the demo session at NAACL 2016. I also review for *ACL conferences and workshops, and various journals.

I organized the Wellesley CS Colloquium in my last year.

Tools and Data

This is a collection of various resources I collected/created for my work that may be useful.

Python Autograder with HTML Output (under development). Sravana Reddy and Daniela Kreimerman. 2016.
Autograder for the Introductory CS class at Wellesley.


Transcriptions for the CSLU Foreign-Accented English Corpus. Emily Ahn and Sravana Reddy. 2016.
The CSLU Foreign-Accented Speech Corpus is a great source of speech data from non-native English speakers. We crowdsourced transcriptions for 7 of the 23 native languages on Mechanical Turk, and are making them available here.


DARLA (Dartmouth Linguistic Automation). Sravana Reddy and James Stanford, with assistance from Irene Feng. 2015-2016.
DARLA is a suite of automated analysis programs tailored to research questions in sociophonetics.

website code available on request

Chicago Rhyming Poetry Corpus. Morgan Sonderegger and Sravana Reddy. 2011.
A collection of rhyming poetry in English and French, manually annotated with rhyme schemes.