An Analysis of the Multivocality of Starbucks


There are over 20,000 students currently enrolled at UCSB, ranging from swarms of fresh-faced freshmen to a sprinkling of full-fledged adults returning to school to achieve their degree. Considering that the UCSB University Center (UCEN) is the physical heart of the campus, it is almost a guarantee that every student has, at the very least, walked through the UCEN to pick up their Access Card, visit the food bank, buy textbooks, or relax. The sheer number of people that pass through the UCEN daily makes it safe to assume that a large proportion of students have visited the UCEN Starbucks. Considering that the UCEN Starbucks is the caffeine epicenter of a sleep-deprived university, it represents a perfect location to observe a cacophony of voices and study the multivocality of public spaces. While the vocality of Starbucks may appear to begin and end at the register, a more detailed study of the voices within Starbucks reveals a multitude of stylized, mediated, non-traditional, emotive, voices as well as the creation of a collective voice within a public space.

Observational Information: February 18, 2020 (10:00-11:00am & 4:00-5:00pm), Balcony adjacent to UCEN Starbucks

Alongside my partner, Natalie Sanchez, the majority of our observations made during our first and second visits to the UCEN Starbucks were made outside in the balcony seating area within earshot of multiple Starbucks patrons and employees. Additional observations were recorded while physically exploring the space and noting different spacial relationships that influenced how voices interacted with one another. With a line extending nearly twenty people outside of the Starbucks during our first visit and a constant atmosphere of sound, we opted to focus on three traditional voices and vocal interactions, two non-traditional voices, one mediated voices, and the collective voice.

The first set of voices we identified were the voices of two older bilingual men who seamlessly flowed between English and Spanish as they spoke in the late morning of our first site visit. From our observational perspective, when the two men conversed in English, they spoke in lower voices and invoked a balanced pitch and timbre. However, when their conversation switched back to Spanish, their voices opened and induced laughter and were qualitatively sharper and more expressive. Furthermore, correlating their voices with their body language, the two were physically more relaxed and at ease with one another when speaking in Spanish.

The next set of voices we observed occurred between an interviewer and interviewee during our second site visit. The context of their exchange inherently empowered the interview, which was apparent through her voice. She spoke succinctly and authoritatively; her voice did not waver and was the vehicle that carried her role as a potential supervisor. Comparatively, the interviewee’s voice was distinctly stylized and performative, featuring a chipper pace that was undercut by nervous tension that appeared towards the end of the interviewee’s sentences her voice pitched upwards.

The third traditional voice we chose to explore were the shouts from the barista calling out completed drinks during both visits. The barista’s voice was the only voice we chose to include that did not exist within the confines of a conversation. The barista’s voice was inherently performative and stylized because her job required her to be heard. She constructed her voice to be louder and brasher, and as a result, her voice became deeper than her normal vocal range. Moreover, because the barista, and the drinks she held, is the primary focus of the customers, her voice was inherently privileged. It would cause noticeable dips in the collective voice with attention turning towards her voice.

Within the multilayered vocal atmosphere of Starbucks, two non-traditional voices were omnipresent: quacking ducks and scraping chairs. While these two voices are entirely different in their production and sonic qualities, they both contribute to establishing a broader collective voice and building spatial boundaries of our location, which I explore more in-depth in the next section.

In the absence of a live performance and a live voice, the vocal atmosphere of the Starbucks was supplemented by a mediated voice through the instore speakers. The mediated voice emitted by the speakers occupied a mid-level volume and was present during all points of our observations. While the voice being played through the speakers changed about every three minutes, the mediated voice it embodied curated the vibe of the Starbucks and operated as the outline for the collective voice.

Overall the voices we observed during both visits to the UCEN Starbucks acted as independent agents but operated within the broader collective atmosphere of voices how I interpreted the voices of the two bilingual men influenced how I interpreted the voice of the interviewee. The clothes and body language of Starbucks employees and patrons created biases of what I assumed their voices would be and potentially masked vocal attributes that didn’t conform to my expectations.

Analysis and Conclusions

During our two site visits, my partner and I observed, on numerous occasions, how a single voice must negotiate within the broader collective voice and directly interact with other traditional/non-traditional voices to be contextualized and to be a holistic carrier of information. The interaction between the interviewer and interviewee, as noted above, was characterized by the power disparity between the pair. From a position of power, a speaker’s voice has the advantage of being relaxed and within a healthy range because the speaker has a level of immunity, all of which is expressed through their voice. This is evident by juxtaposing a voice of power against a weak or more vulnerable voice. For example, the interviewee’s voice is an aspect of how they are being evaluated, and as a result, it has to find a balance between authenticity and enthusiasm. However, in the creation of a more stylized voice, unnatural hesitation can be created, which we observed through our subject’s tonal shifts and uneven pace. By pairing these two voices together, the performative elements of the interviewee crystalized and succinctly reflected the relationship between the subjects (Lecture Notes, Feb. 10, 2020, 1).

Additionally, the interviewed voices acted as components of the collective and affected how we, as observers, interpreted other voices at our site. The palpable tension between the two emphasized the relaxed conversational and vocal components between friends of the two men. Subtler aspects associated with the voice, such as a regular breathing rate and synced/natural body language that corresponded with vocal cues, were more evident between the two men when set against the interview. Voice represents an external extension of a person’s individuality and is a carrier of emotions. As a result, interpersonal vulnerability and comfortability have an extreme influence on how a voice sounds.

In addition to the litany of traditional voices among customers and between baristas and customers, the mediated voice of the instore music defined the collective voice. Starbucks’ music  as a form of “the media” represents a unique example of how “the media” influences society, outside of more traditional modes of media such as social media, television, and radio. Every three or so minutes, a new indie pop, soft rock, or bedroom pop song would begin playing through the store speakers. With every shift of the music, the collective vocal atmosphere would maintain the status quo and enable the continuation of conversation and the creation of voices. However, if Starbucks had decided to insert a classical piano or heavy metal song, the equilibrium between traditional voices and the mediated voice would shatter. The music choices made by Starbucks established a curated “vibe” in which people felt comfortable to interact openly with one another within the store. Through the mediated and controlled voice of Starbucks’ playlist, they assumed the role of “the media” by acting as a platform for music and voices that intrinsically influenced how their patrons acted and used voice. While Starbucks does not operate as a traditional member of “the media,” they function as one by mediating our relationships to others and the world through their chosen music (Lecture Notes, Jan. 27, 2020, 2).

As noted above, the barista who was responsible for calling out completed drinks had a privileged voice within Starbucks since she had an extraordinary power over the customers because she held what they wanted. During her shouts, other voices were deprioritized in response to the attention given to the barista and joined the previously deprioritized voices that spoke through their phones. Conversations that took place between two people in Starbucks were given a natural amount of privacy and space, and their voices were respected. In comparison, one-sided voices spoken through a phone were commonly ignored by other individuals, which allude to the personal components of a voice and how the spatial context a voice operates in contributes to the priority it is given by others and in the collective voice. During my observations, I often overlooked and deprioritized phone calls because they aren’t visibly present compared to the in-person voices I prioritized.

The final component of the collective voice was the non-traditional voices that reflected the time and energy of our site. During the late morning rush, the collective voice was bounded by scraping chairs that sharply cut through the cluttered vocal atmosphere at a constant rate as customers came and went. During the same late morning rush, flocks of ducks called out from the adjacent lagoon working in conjunction with the scraping chairs as well as voices from whirring espresso machines and clanking cash registers to transform the collective voice from a collection of spoken words to a combination of spatially relevant sounds.

In her essay “Honorable Harvest” Robin Wall Kimmerer, alluded to the silent voices of rocks found in nature and extended the definition of voice away from a purely auditory experience and into a broader form of transactional communication between two beings (Kimmerer 186-187). This alternate definition of voice gives credence to the voice of the sun and other inanimate objects surrounding us. It allows us to interpret their impacts on the collective voice not as an auxiliary component but as a central aspect to the multivocal atmosphere. When the sun was speaking with a bright shining tone in the morning, conversations followed the sun’s lead, whereas later in the day closer to sunset and when the voice of the sun weakened, so did the traditional voices of Starbucks. By incorporating the sun as a voice and considering the role of music as a mediated voice produced by Starbucks, the multivocal authenticity of our site must be called into question. The mediated voice inserted by Starbucks curated the collective voice it contained. Starbucks created an environment that enabled a specific kind of voice that aligned with the corporate vision of a Starbucks. Through a mediated voice Starbucks’ creates social barriers that isolate aggressive, harsh, and loud voices that do not conform to the desired collective voice. The curated voice of Starbucks is further influenced by the natural ebbs and flows of its environment, such as the intensity of the sun. Ultimately, through our site observations, we concluded that a traditional voice begins as an independent creation, but its context constructs the final emitted sound.

Works cited

Kimmerer, Robin. An Excerpt from Braiding Sweetgrass Indigenous Wisdom, Scientific Knowledge and the Teachings of Plants. 11 Aug. 2015.

Strobel, Nicole. “INT 137 EV: Unit 6: Voice, Sound, and Sense.” Interdisciplinary Studies -- Exploring Voices, 10 Feb. 2020, University of California, Santa Barbara. PowerPoint presentation.

Tcharos, Stefanie. “INT 137 EV: Unit 4: Access to Voices.” Interdisciplinary Studies -- Exploring Voices, 27 Jan. 2020, University of California, Santa Barbara. PowerPoint presentation.