Schedule Free Growth Strategy Meeting

Clients and results

MTV- Growth Marketing Agency Clients
Thomson Reuters - Growth Marketing Agency Clients

Let's Talk About Pricing

We’ll discuss:

Our proven SEO framework

Scaling content that converts

Accelerating SEO results

Cost and investment

What is Voice Search Optimization (VSO)?

Voice Search Optimization (VSO)

Voice Search Optimization (VSO) is an approach focused on maximizing the number of visitors to a website or app by improving the visibility and suggestion rate (or recommendation rate) in voice search results. In general, the more often a site or app is suggested and the higher it is ranked among the recommendations, the more traffic it will receive. VSO can be applied to several voice search protocols including but not limited to Google, Amazon, Siri, and Windows Cortana.


  1. History
  2. VSO Methods
  3. The Growth of Voice Search
  4. Technology


Voice search uses speech recognition technology to enable users to make voice command search queries. As a result, search engines return answers and show a list of results. The search engine matches the query to the most relevant result while considering searcher history and behavior.

But before there was voice search, there was voice recognition technology. And before there was Siri, there was Audrey.

Bell Laboratories pioneered computerized voice recognition in 1952 when it designed “Audrey” to understand digits. The technology only recognized a single voice but was created with the idea of eventually allowing anyone to dial numbers on their phone by voice alone. IBM followed suit in 1962 when it presented Shoebox at the World’s Fair in Seattle. The computer understood 16 words as well as numbers zero through nine. From there, several voice recognition advancements followed in the succeeding decades:

  • The U.S. Department of Defense commissioned Carnegie Mellon’s Harpy speech-understanding system in the 1970s. Harpy represented a major jump in voice recognition technology as it recognized approximately 1,011 words, and introduced more efficient voice recognition by processing a whole sentence rather than individual works.
  • Bell Labs made more progress in the 1980s, as it programmed a computer to distinguish multiple voices, but with limited processing power, the system could only recognize words if they were spoken slowly, and had to be trained to a particular user’s word-base.
  • DragonDictate, the first consumer product, was introduced in the 1990s but it cost $9,000. In 1997, Dragon Naturally Speaking was released and could understand 100 words per minute, but still required training and cost $695.

Voice recognition started to pick up speed in the 2000s when Google dedicated research to searching the web by voice. In 2008, Google added voice search to the BlackBerry version of Google Maps. Later that year, Google voice search was added as an app to other smartphones. In 2011, voice search was added to the Chrome browser.

Around the same time of Google’s voice search research, a Stanford Research Institute spin-off led by Dag Kittlaus, Adam Cheyer, and Tom Gruber worked on an app that understood natural speech. It was released as an iOS app in 2010 and Apple bought it months later, and eventually became the basis for Siri, the voice recognition assistant that was built into Apple products and released with the iPhone 4S in 2011. Google added their voice assistant, Google Now, to phones in 2012.

While typed searches have been around longer than voice searches and have long been the basis of search engine optimization (SEO), it’s only natural that as voice searches grow, so does the need for voice search optimization (VSO).

Over the years, SEO has evolved from spammy keyword stuffing and black hat tactics to white hat tactics, thanks in part to Google and their search algorithm updates. In particular, Google and other search engines have moved to natural language search, which is search carried out in everyday, conversational language as opposed to keyword-based search. In addition, Google’s Hummingbird algorithm update in 2013 used Latent Semantic Indexing to better understand the meaning behind search queries. The update shifted the focus to semantic language to better understand user intent as it takes into account the meaning behind user searches, not just the terms. Therefore, pages that match the meaning of the search query rank better than those that just match a few words.

These shifts in search engine algorithms serve as the basis for voice search optimization.


Optimizing your website for search continues to evolve. As there’s been a greater emphasis on mobile search optimization over desktop the past several years, the future of search optimization is moving towards voice. Just as mobile search habits differ from desktop search and requires different SEO tactics and strategy, voice has its own set of nuances that requires a different strategy and methods for optimization.

Focus on Long-tail Phrases

One of the key differences between voice search queries and typed searches is that voice searches tend to have longer phrases, including complete sentences and questions. The difference is in line with natural language search characteristics. While a business may have targeted some specific keywords for their website, with voice search, they must now focus on more long-tail keywords and full phrases. People are likely to speak in complete sentences and say more than they type. For example, if someone is planning a vacation to Florida, they may type “Disney World vacation” in a search box. But if they are speaking the search, they may say, “What are the best Disney World vacation packages?”

Businesses will need to perform keyword research on how people like to phrase queries and understand the terms they use. This research may include listening to how people talk about a business, industry, or product, and implementing that language on websites, apps and in content.

Use Questions and Answers

Voice search queries are more often questions than statements or phrases because people tend to ask questions in natural language. There’s a been a 61 percent growth year-over-year in question search queries, and phrases that start with “who” have increased by 134 percent while phrases with “how” were up 81 percent. Businesses can leverage the trend by including questions on their website, particularly on frequently asked questions (FAQ) pages.

Businesses should create questions based on long-tail conversational keyword phrases that their potential customers would likely ask. The questions can be grouped by category or product, and different pages can be created for different categories so voice search technologies have a better chance of pulling the information from the site. There are various tools that can assist with coming with up keyword question phrases, such as Answer the Public, Keyword Planner, Moz Keyword, and Google’s People Also Ask.

FAQ pages are just one strategy. It’s also beneficial to optimize existing content and create new content that answers the questions people will likely ask. Businesses can do this through blog posts, product pages, as well as headings and titles.

Voice search questions also reveal the intent of queries and where customers are on the buyer’s journey. For example:

  • What stores are in the Mall of America? — Reveals a degree of interest
  • How do I get to the Mall of America? — Reveals a greater degree of interest and action
  • Where is the Apple Store in the Mall of America? — Reveals customer is ready to act

Businesses can create content around various voice search questions as it relates to where a consumer is on the buyer’s journey.

Be Conversational

Content optimized for voice search should be in a conversational tone. Many businesses already use a more casual and conversational tone in their website content, particularly in their blog posts. But with voice search, it’s even more important to make the shift. The tone of a business’ website content should be based on how people speak and react in normal conversations.

Optimize for Local Search

Mobile voice searches are three times more likely to be local-based than text searches, and 42 percent of people say they use voice search while driving. All of this points to people looking for information via voice when they’re on the go, which is why optimizing for local search is vital to voice search optimization.

Approximately 22 percent of voice searches seek local content and information. If someone is searching for pizza in town, they’ll likely say, “find pizza near me” or “what’s the best pizza place near me?” Voice search will recognize the user’s location to call up search results. That’s why it’s more important than ever to have microdata up to date, which includes the business location, phone number, hours of operation, directions from highways, and other relevant information.

The search engine, especially those powered by Google, will go to Google My Business for this information, so businesses should make sure they’re is listed there with the most up-to-date information. Businesses shouldn’t just rely on Google but instead should list information on Bing Places and other search engine business sites. Though Google remains the dominant search engine, Siri, Apple’s voice assistant, uses Bing as its default search engine. Businesses should also list their information on review sites such as Yelp.

It’s also important to apply structured data markup using Structured data markup is code on a website to help search engines return more informative results for users. It tells search engines what the data means, just not what it says. It makes it easier for search engines to parse a business’ content and understand its context.

Structured data markup is vital in local search because SERP includes rich snippets that include schema markup. These snippets tell more about a page and a business and can rank above the top search result.

Optimizing for Home Devices

The latest trend in voice search is home virtual assistants such as Amazon Echo and Google Home. These are devices without screens, so they present a new territory in search optimization. These devices answer queries with conversational answers without the opportunity for users to explore search results.

For answers to queries, Google Home, for example, relies on Google’s featured snippets. Featured snippets, or rich answers, are brief explanations that appear at the top of search results pages (above the number one search result). These snippets began appearing in 2015 and are also called knowledge cards. Here’s how to create featured snippets:

  • Identify questions you want to answer and make sure they appear on the website page within a header tag
  • Put content that should appear in featured snippet box beneath the header in a paragraph tag and make sure it is in conversational language
  • Next, add valuable supporting content beyond the direct answer
  • Add structured data so Google can better understand the type of content on the page

Another feature of home voice devices is their ability to perform actions, not just answer queries with information. For Google Home, businesses can create apps and particular conversation actions within those apps that let users do things through a business’ products and services. Amazon’s Echo relies on skills (apps) to provide data and perform actions for users. To gain visibility on either product’s platform, brands can build conversational actions for Google or skills for the Amazon store that can respond to voice commands. Many larger brands such as Domino’s Pizza, Uber, and Stubhub have already done this.

Even if a company doesn’t create an app or skill, it can take advantage of how these devices produce results. For example, Echo uses Yelp’s database to respond to local business queries. It uses Bing for results that aren’t correlated with a skill, so businesses would be wise to optimize their websites for both Google and Bing to increase its chances of supplying an answer on the two primary voice devices.


Since voice search arrived on the scene, it has continued to grow and shows no signs of slowing down. It’s not just a new technology that people wanted to try; it’s a disruptive technology that is changing the way people search and even purchase goods and services.

Voice search is an easier and more natural way to inquire information. The average person can type 40 words per minute, but they can speak 150 words per minute. Technology is about the speed of information. If there is a way people can get information faster, they’ll use it. That’s why voice search is desirable. In fact, 43 percent of people cite using voice search because it’s quicker than using a website or an app.

About a third of people also find voice search to be more accurate than typing a search. Mobile phone ownership also plays a role in the rise of voice search, as 21 percent of people don’t like typing on their mobile phone and prefer voice search. In the U.S., 95 percent of people own mobile phones and 77 percent own smartphones.

With a better user experience, voice searches are growing and agencies are starting to offer voice as  service. Google voice search queries are up 35 times higher than they were 2008 when the company first rolled out the feature, and one in five searches on the Google Android app are voice searches. Other compelling figures include:

Much of the projected growth in the industry is tied to the emergence of voice devices, such as Amazon’s Echo and Google Home. Just look at the growth the last three years:

  • 2015: 1.7 million voice-first devices were shipped
  • 2016: 6.5 million voice-first devices were shipped
  • 2017: 25 million voice-first devices are expected to be shipped.

Accuracy of Voice Search

The accuracy of voice recognition devices and apps is a key component of voice search. If Siri or Alexa can’t understand what a user is saying, people won’t use it. Voice recognition has come a long way since Audrey and Shoebox a half century ago and has made impressive strides even in modern systems.

Back in 2013, Google’s voice search had an accuracy rate around 80 percent, meaning it got 20 percent of spoken searches wrong. That figure has improved dramatically, as it is now about 95 percent accurate. The goal is to be 99 percent accurate, and some products are inching closer to that number.

Baidu, China’s primary search engine, is even more accurate than most humans at identifying words with a 96 percent accuracy rate. It also can transcribe words three times faster than humans and understands English and Mandarin. Voice search is particularly popular in China because of the time it takes to type using the massive Mandarin alphabet. Microsoft’s latest system is said to be close to 94 percent accurate, while Siri and the Hound app, are each 95 percent accurate.

Various factors impact the accuracy rate including regional accents, speech impediments, mispronunciations, and background noise. The technology has to distinguish homophones (words with the same pronunciation but different meanings) as well as learn new words and proper names.


Voice search devices and apps use a combination of technologies to provide results, including natural language processing (NLP), text to speech (TTS), pre-programmed language voice search tools, and artificial intelligence (AI). AI powers voice search technology, as it helps queries and algorithms understand user intent by using semantics, search history, user behavior, and other factors to understand the context of queries and provide results. For example, it provides results based on the user’s location, time of day, and other factors to determine intent.

Many voice search programs use deep learning, a subset of AI, which involves a system ingesting a significant amount of data to train neural networks—systems modeled after the human brain—to make predictions based on that data. Neural networks improve the quality and accuracy of voice search.

Voice Search Platforms

There are several voice search platforms and products on the market, and more will be added. Here are the most prominent ones:


Siri is Apple’s virtual assistant and was the first one installed on a phone, and it uses natural language user interface to answer questions, make recommendations, and perform actions by delegating requests to internet services. It’s available on Apple hardware products including iPhones, iPad, iPod Touch, Mac, and Apple TV. Apple opened up access to Siri to third-party messaging apps as well as payments, ride-sharing, and internet calling apps in 2016. Siri’s default search engine is Bing, but users can change the search preferences.

Google Now and Google Assistant

Google Now and Google Assistant are virtual voice assistants powered by Google and available as apps. The two are expected to merge as one assistant but remain as separate now. Google Now came before Assistant and is available on Android and iOS devices. The app lets users quickly search the web and performs a variety of tasks via voice commands.

Google Assistant does the same things as Google Now and is available through the Google Allo chat app and is integrated into the Google Home device. The difference is Assistant provides information in a more conversational format and presents information in an easy-to-tap card format instead of a search page. It also uses deeper artificial intelligence than other virtual assistants and can hold two-way conversations with users, learn personal details about them, and recall information from previous conversations.


Cortana is Microsoft’s virtual assistant and is available on Windows 10, Windows 10 Mobile, Windows Phone, Xbox, as well as on Android and iOS. It can set reminders, answer voice queries, and perform tasks. It can also search for files on a user’s computer and OneDrive and compose emails. All search results come from Microsoft’s Bing and links open in Microsoft Edge.

Amazon Echo

Amazon Echo is a smart speaker that connects to a voice-controlled assistant, Alexa, and responds to user commands. Alexa can perform several functions by voice command including playing music, setting alarms, making to-do lists, and providing real-time information. The Echo has access to skills (similar to apps) which are third-party developed capabilities that can perform certain tasks such as making purchases, ordering food, or calling an Uber. Like apps, businesses can customize Alexa skills to accomplish desired functionality.

The Echo is an Internet of Things device as it can connect to other home devices and control lighting, thermostats, and other smart home products.

Google Home

Similar to the Echo, Google Home is a smart speaker powered by Google Assistant. It responds to voice commands and can distinguish voices of different users. It connects with other Google products which can respond to commands to view photos and videos, play music, and edit calendars. It will soon allow users to make phone calls and control Chromecast-enabled televisions. It also connects to other home devices and can control certain smart home devices.


Bixby is Samsung’s virtual assistant that was introduced with the Galaxy S8 and replaces the old voice assistant, S-Voice. The assistant has three platforms: Bixby Voice, Bixby Vision, and Bixby Home.

Bixby Voice is the voice assistant similar to Google Assistant, can perform many of the same functions and can also open and use apps and perform search queries, though it doesn’t appear to be tied to a particular search engine. The more the function is used, the more it learns about the user and users can give it feedback on how well it’s responding to commands.

Bixby Vision is built into the phone camera and can identify objects in real-time, search for them on various services, and offer the user to purchase them if available. Bixby Home includes other virtual assistant functions such as providing weather, fitness activity, and the ability to control smart home gadgets.


Hound is a virtual assistant app for Android and iOS that is similar to other assistants but considered faster and able to handle more complex queries. It can understand multiple questions at once and find the human context. For example, you can ask “What’s the cost of a three-night hotel stay at the Hilton and the Marriot in Philadelphia?” and it will give you the price of each hotel. It also remembers your questions so you can ask follow-up questions. For example, you can ask it to find a coffee shop that has free Wi-Fi, and afterward, you can tell it to exclude Starbucks in the search, and it will understand. The company has a partnership with Yelp, so businesses should make sure their information is listed on the review site.

In any given month, over 325 million people use some type of voice-controlled function. With the growing popularity of virtual assistants and voice-activated home devices, voice search is becoming more prominent. VSO is the next evolution of SEO, and businesses should implement a strategy into their marketing plans.

About the Author

Alyssa is an SEO Specialist at Teknicks where she develops, implements, and executes SEO growth strategies. When she’s not working, she enjoys spending time at the beach, attending concerts, and experimenting in the kitchen.

Choose your next steps:

1. Let's chat

To discuss working together:

2. Learn

3. Join our team (we're hiring!)

4. Contact us

Join thousands of growth marketing professionals

Receive free resources about growth and marketing.

Contact us