Siri and the Implications of Speech Recognition

It knows what you’re doing this very instant. A humble servant, it manages your daily activities without compensation. It never sleeps. Finally, the average citizen can afford a personal assistant to talk to, to listen, and to work. And it even fits in your pocket. Siri, the voice-controlled personal assistant for the iPhone 4S, takes convenience in technologies to all new levels. From scheduling meetings to sending text messages, Siri allows the user to do more than ever before without pressing a single button. While Apple loyalists collectively grumbled at the lack of perceived innovation in the new iPhone, Siri quietly peaked its virtual head from the shadows. Through this application, inherent in the operating system of all new iPhone 4S’s, Apple Inc. revolutionized the future of the interface. The technological protocols of speech recognition and natural language processing, with which Siri operates, have begun to change how we use the technologies we take for granted every day. Through the implementation of these design choices, the market will become standardized, consolidating power in the hands of a few and paving the way for the user-input technologies of the future. Consequently, the design choices intrinsic to Siri will have far greater implications on our conceptions of “human” and “machine,” prompting new fears, or novel rewards. Siri illustrates how the design choices implemented by a minority affect our use of technology and the market that supplies it, and force us to continuously define the human/machine dichotomy.

The technological protocols of Siri influence our use of technologies and who implements them. In fact, the user-input technologies that preceded Siri, speech recognition, and natural language processing, enforce this same question of access. Tim Bajarin, in an article for the technology blog Tech.pinions, discusses Apple Inc.’s innovation throughout the last three decades. Starting in 1984, with the introduction of the Mac, Apple delivered to the world the graphical user interface, as well as the mouse (Bajarin, 2011). While he created neither of these technologies, the combination of the two proved “the next user input device that has been at the heart of personal computing for nearly two decades” (Bajarin, 2011). The graphical user interface and the mouse radically changed how we interacted with digital information; a far cry from the text-based interfaces of technologies before it. As well, it established Apple Inc. as the company to provide this product, and the Mac asserted its dominance in the market. Similarly, in 2007, the introduction of the iPhone “created the touch-user interface and this time marred it to the Apple iOS” (Bajarin, 2011). Once again, Apple and its chairman Steve Jobs realized the potential of touch computing, and applied existing technologies to change how we interact with digital information.

Apple continued this trend of innovative user-interfaces with Siri, which uses speech recognition and natural language processing to remodel our interactions with technologies. While Siri seems like a spiffed-up chatterbot for the new millennium, its programming is much more complex. After your speech is encoded into digital form, the “server compares your speech against a statistical model to estimate, based on sounds you spoke and the order in which you spoke them, what letters might constitute it” (Nusca, 2011). While the original chatterbot, ELIZA, merely interpreted text and rephrased it in questions, Siri attempts to understand the intentions behind your words (Stokes, 2011). After the speech is understood as letters, Siri then processes them “through a language model, which estimates the words that your speech is comprised of” (Nusca, 2011). Finally, after interpreting the natural language in speech, Siri uses programmed outputs to respond to the recognized words. Unlike a chatterbot, which is confined to the selected outputs the engineer originally programmed, Siri can utilize the cloud. This means that “every time an Apple engineer thinks of a clever response for Siri to give to a particular bit of input, that engineer can insert the new pair into Siri’s repertoire instantaneously, so that the very next instant every one of the service’s millions of users will have access to it” (Stokes, 2011). With the services of the cloud, we can expect the number of responses Siri has at its command to increase rapidly, as users ask more questions, and engineers program more answers. The user-input technology of speech recognition is a collaborative effort, which produces for the first time the semblance of an intelligent conversation between human and machine.

The institutionalization of the technological protocol of speech recognition increases the potential for this design choice to become the next widely adopted user-input technology, which would have cardinal affects on the market and the producers of such technologies. Judging by the history of Apple’s advancement in interfaces, the way we access information is changing. From text to touch, user-input technologies have conquered the senses, and the next frontier logically seems to be voice and speech. Although Apple did not invent the technologies of speech recognition or natural language processing, they have led the surge to “make the man-machine interface easier to use” (Bajarin, 2011). Using the mouse alleviated the need to type, and the touch screen cut out the equally unnecessary steps of pointing and clicking. Speech has the potential to become the next phase in user-input technologies because it makes interfacing even easier, freeing the hands and reducing machine interactions to a conversation. Indeed, Vladimir Sejnoha, CTO of Nuance, most likely the world’s leading speech technology company, reiterates that “speech is no longer an add-on. It is a fundamental building block when designing the next generation of user interfaces” (Wildstrom, 2011).

If speech recognition is the future, then Apple Inc. is the company that will usher it in. Alexander Galloway, in his chapter “Protocol vs. Institutionalization,” stresses that “while the internet is used daily by swaths or diverse communities, the standard-makers at the heart of this technology are a small entrenched group of techno-elite peers” (Galloway, 187). Only a minority possess the skill sets and education to design the technologies we use, so the power to decide how we interact with our technologies is left to a homogenized group of engineers and programmers. This is no more evident than with Siri and speech recognition. Just as they have done in the last three decades, “Apple has been the main company to popularize these new inputs and thus help advance the overall way man communicates with machines” (Bajarin, 2011). This means that how we access our information is not only decided by a minority, but a brand, and their judgment is independent of our wants and desires. This centralized power comes from their innovation, and Apple’s grip on the market is loosened through competition. Though as we have seen with the iPod and iTunes, competition is scarce, and often weaker. Time will tell who can stand up and challenge the dominance of Apple, but until then the power to decide the user-input technologies we use rests with company who thinks of it first, and that has and continues to be Apple Inc.

According to Galloway, this unrivaled control is actually necessary for greater freedom. In order to create free technologies, protocols like speech recognition “must promote standardization in order to enable openness” and “organize peer groups into bureaucracies” (Galloway, 196). Through the dominance of Apple and applications like Siri, the protocol of speech recognition becomes the standard for user-input technologies. This standard is not through the willful force of an authoritarian company, but the consensus of the technology community that decides which protocol works best. Apple implements the protocol, which prompts technological standardization, but then opens the market up to competitors to engage with this standard. In fact, Galloway adds that “the process of standards creation is, in many ways, simply the recognition of technologies that have experienced success in the market place” (Galloway, 188). Although companies like Apple and applications like Siri lead the initial development of protocols, this dominance actually opens the market to experimentation and competition once the standard is accepted.

The design choice of speech recognition in Siri produces a standardized future for user-input technologies, that will challenge our perceptions of what humanity means. For the past twenty years, we have interacted with machines in ways that stress their distinction from humanity. Textual typing, graphical icons and pointing and clicking are not actions a human can simulate with his limbs or senses. Therefore, we have never needed to challenge the assumption that these technologies are anything but “artificial.” Contrastingly, Siri offers a natural way to interact with the environment, and as Steve Wildstrom points out in his 2011 article for tech.pinions, “touch and speech have been around since we were living in caves” (Wildstrom, 2011). Although greatly simplified, we essentially perform the same actions as Siri. We listen to speech and recognize words from a common lexicon, and respond based on the information received. Although we are not programmed by engineers to reply with specific outputs, we could be said to be conditioned by the society we inhabit to reply with particular responses. Siri may inhabit the iPhone, but the basic steps it take to listen and reply is an action humanity is constantly involved with.

As well, the task of using Siri is a dialogue between humans and machines, an encounter no interface before it has provided. Not only do we directly communicate with Siri, expecting a response from a program, but we help it to grow just as much as it helps us in our daily routines. Every input the user provides gives Siri more information on the habits of humans and even the personalized user, prompting more programming from its human engineers as well as custom preferences for response. Technology is becoming a conversation rather than an application, a two-way dialogue where each side asks questions and clarifies misunderstandings. The icon of the graphical user interface was static, but Siri is constantly changing to accommodate us better, and isn’t adaptation a trait humanity prizes most? Through conversation and interaction, Siri is challenging our perceptions of what “human” really is by sharing a striking resemblance to the human characteristics we cherish.

Like Siri, technologies of the past have always contributed to a changing definition of humanity, as we strive to discover the capabilities of machines. Jessica Riskin, in her article the “The Defecating Duck, Or, The Ambiguous Origins of Artificial Life,” points out how “not only has our understanding of what constitutes intelligent changed according to what we have been able to make machines do, but, simultaneously, our understanding of what machines can do has altered according to what we have taken intelligence to be” (Riskin, 623). Intelligence is a highly valued trait in humanity, and our superior use of it is arguably what makes us human. But the definition of intelligence, like the term “human,” changes according to the capabilities of “intelligent” machines. As each new technology accomplished more advanced feats, old paradigms of intelligence were demoted to being mere mechanical processes. When calculation was proven to be done by a machine in the beginning of the nineteenth century, humans decided that “if a machine could calculate, then something else – say, decision making or language – must be emblematic of human intelligence” (Riskin, 628). Rather than determining what traits humans were defined by most, early philosophers and engineers sought what actions a machine could not accomplish, and proposed that if a machine could not do it, it must be human. Even quite recently in 1997, the super computer Deep Blue beat grand master Gary Kasparov, which prompted a “redefinition of intelligence to exclude the ability to play chess as a defining feature” (Riskin, 623). These constant changes in our own perception of humanity and machine demonstrate the historical contingency of any definition of intelligence, and foreshadow a future of uncertainty in the increasing abilities of technologies. Ten years ago, a machine could not talk, understand the words and vernacular of our human speech, let alone remind us to pick up our laundry when we leave our office. But now it can, and the human characteristic of language or speech recognition is just as much a part of Siri as it is a part of us.

The design choices of Siri that prompt us to reconsider our definitions of “human,” just as those of technologies in the past, also influence our perception machines in both positive in negative ways. Koert van Mensvoort, in his article “Anthropomorphobia,” for the blog Next Nature, takes this blurry line between man and machine and describes its consequences. He details the theory of anthropomorphobia, or “the fear of recognizing human characteristics in non-human objects,” corresponding with the common belief that there is only “a human capacity for thought and intent” (Mensvoort, 2011). This fear comes from a natural human inclination, and does not mean that such technologies are designed with the intention of seeming that way. Mensvoort describes one theory for anthropomorphobia as stemming from the fact that “people fundamentally dislike products acting like humans because it undermines our specialness as people” (Mensvoort, 2011). If Siri can speak and listen, and better yet understand, what makes me unique? This question has constantly been rephrased throughout history as described above, and it influences a fear of technology that prompts some to resent it, and neglect to embrace it. Another theory is that “anthropomorphobia is a reaction to the inadequate quality of the anthropomorphic products we encounter” (Mensvoort, 2011). Since we naturally seek human characteristics in technology, we are bound to be disappointed by their relative success. Until now, a computer scarcely resembled the human mind to a user. But with Siri, clear characteristics of human action are evident in its very use, and some might reject it because it does not do it well enough. Indeed, Mensvoort argues that some researchers suggest “anthropomorphism in product design must always be avoided because it generates unrealistic expectations, makes human-product interaction unnecessarily messy and complex, and stands in the way of the development of genuinely powerful tools” (Mensvoort, 2011). In this hypothesis, the seemingly useful human-like qualities of Siri actually distract from its intended goal to assist us.

On the opposite side of the spectrum, anthropomorphism in products can lessen the distinction between human and machine and produce benefits in the future. Anthropomorphic qualities in technologies help us to understand their uses through convenient approaches, such as speech and voice. Where as pointing and clicking is an action most of us never have to use outside a computer, speech recognition is something we do every day. Mensvoort stresses that “anthropomorphism, if applied correctly, can offer an important advantage because it makes use of social models people already have access to” (Mensvoort, 2011). The design choices that make us question what is machine and what is human, also give us familiar ways to operate and use technologies. As well, if people become comfortable using a machine that acts like a human, they might also embrace the idea of a human acting like a machine. Donna Harraway, in her chapter “A Cyborg Manifesto: Science, Technology, and Socialist-Feminism in the Late Twentieth Century,” describes how this transformation is occurring. Harraway states that “by the late twentieth century, our time, a mythic time, we are all chimeras, theorized and fabricated hybrids of machine and organism,” or we have already become the cyborgs of science fiction (Harraway, 150). In the future, this ambivalent description of humanity will continue to progress, and the more technological design choices of Siri may permeate us. Harraway describes how one consequence of our merging with machines is that “our sense of connection to our tools is heightened” (Harraway, 178). As we further notice anthropomorphic traits in machines, and attribute classical machine traits to our features, we may become even more linked to the technologies we use. This symbiosis can prove beneficial, and the distinction we currently enforce may hinder a more complete use. As Harraway states, “the machine is not an it to be animated and dominated,” but rather it “is us, our processes, an aspect of our embodiment” (Harraway, 180). The technological protocol of speech recognition in Siri challenges our conceptions of human and machine, potentially alleviating any need for a distinction in the first place.

Communication is radically changing. The standard paradigm of humans talking to humans is a thing of the past. Through Siri, humans talk to machines, and machines talk back. The revolutionary technological protocol of speech recognition influences the hierarchy of power in an increasingly uniform market, and sets a standard for how we access and use technologies. Through widespread adoption, our current definitions of human and machine are constantly being redefined, as they have been through the increasingly advanced technologies of the past two centuries. This ambiguous perspective on the machines we use can lead to anthropomorphobia, prompting fear and a rejection of the technologies meant to make our lives easier. As well, the anthropomorphism inherent in the design choices of Siri can result in a better appreciation of the machines we use, and even the people who choose to incorporate these technologies even further. Speech recognition can change how we access information in dramatic ways, and Siri is only the beginning.

Bibliography 

Bajarin, Tim, “Why We Witnessed History at the iPhone 4S Launch,” Tech.pinions, Oct. 10, 2011. http://techpinions.com/why-we-witnessed-history-at-the-iphone-4s-launch/3288

Galloway, Alexander R., “Protocol vs. Institutionalization,” in New Media, Old Media: A History and Theory Reader, Chun & Keenan (eds.), 2006.

Haraway, Donna, “A Cyborg Manifesto: Science, Technology, and Socialist-Feminism in the Late Twentieth Century,” in Simians, Cyborgs and Women: The Reinvention of Nature, New York, Routledge Press, 1991.

Mensvoort, Koert van, “Anthropomorphobia,” Next Nature, Oct. 11, 2011. http://www.nextnature.net/2011/11/anthropomorphobia/

Nusca, Andrew, “Say a Command: How Speech Recognition Will Change the World,” Smartplanet, Nev. 2, 2011. http://www.smartplanet.com/blog/smart-takes/say-command-how-speech-recognition-will-change-the-world/19895?tag=content;siu-container

Riskin, Jessica, “The Defecating Duck, Or, The Ambiguous Origins of Artificial Life,” Critical Inquiry 29, no. 4 (Summer)

Stokes, Jon, “With Siri, Apple Could Eventually Build a Real AI,” Wired Cloudline, Nov. 25, 2011, http://www.wired.com/cloudline/2011/10/with-siri-apple-could-eventually-build-a-real-ai/

Wildstrom, Steve, “Nuance Execs on iPhone 4S, Siri and the Future of Speech,” Tech.pinions, Oct. 10, 2011. http://techpinions.com/nuance-exec-on-iphone-4s-siri-and-the-future-of-speech/3307

Advertisements
Posted in Uncategorized | Tagged , , , , | Leave a comment

Hello world!

Welcome to WordPress.com. After you read this, you should delete and write your own post, with a new title above. Or hit Add New on the left (of the admin dashboard) to start a fresh post.

Here are some suggestions for your first post.

  1. You can find new ideas for what to blog about by reading the Daily Post.
  2. Add PressThis to your browser. It creates a new blog post for you about any interesting  page you read on the web.
  3. Make some changes to this page, and then hit preview on the right. You can always preview any post or edit it before you share it to the world.
Posted in Uncategorized | 1 Comment