I am still working on a functional project. For now, I will document my process, thoughts, and struggles up until this point.
The purpose of this assignment was to “Create something that takes non-speech input from a person and responds with speech synthesis.” I had two approaches to this assignment.
1. Phone Prank Caller
I played around with communication provider Vonage’s API, which allows users to have a robocaller read a text prompt to a given call recipient. I chose not to present this project as my assignment because the API is very polished, and effectively would have auto-generated a high quality applet requiring little to-no-work (I would have also had to have paid to call people other than myself, which is lame). That being said, I could see myself using this tool as part of one of my prospective thesis concepts; a tool that automates personal information retrieval from data brokers.
2. Posh to Cockney Accent
In class, Nicole He showed us how to work with Mozilla’s SpeechSynthesis API. I wanted to adapt her sample code that augments word pronunciations to read back text while changing the IETF British English language option, which uses a “posh” accent, to a “cockney” accent. “Posh” accents are better known as received pronunciation, chosen as the “correct” accent of Great Britain because it was the dialect of English spoken by royalty. Cockney, on the other hand, is a regional dialect, any presumptions of social status that one may attach to it is based on one’s acceptance of received pronunciation as the correct way to speak English. I find the lack of regional diversity in languages among speech synthesis technologies frustrating – who gets to decide which region represents a country’s language, and how do they make that decision? Validating received pronunciation without any disclaimer plays in to the inherent classism of its adoption, and I would prefer my use of speech technologies not to perpetuate this norm.