A Simple Solution for Prototyping Voice Interfaces

Posted 8 months ago by Shankar

Using Shortcut app on iOS

Voice UI was the hot topic last year and will continue to be so in 2018. Something I have wondered is how voice designers prototype conversational interfaces, especially voice UI.

What is Shortcut?

Shortcut is an application on iOS that lets you automate tasks as simple as calculating a tip to appending a note to a text file in Dropbox. What makes Shortcut powerful is its access to system APIs, performing calculations as part of workflows, taking input from inside and outside the app, writing output to different apps, etc.

I use my iPad for a variety of use cases, but a few weeks back I was playing with Shortcut on iOS when I came across the text to speech and speech to text workflow steps. This was not new to me since the Siri API for text to speech has been around for a while. Siri can also transcribe dictation very well. This brought about a simple idea to my mind — what if Shortcut can be used to prototype voice based UI?

What is the goal?

One of the basic things to observe when it comes to conversational design is ‘what do people say?’. The answer to that comes from observing real conversations, but what if you want to do a quick testing of what people say in response to your bot/assistant/AI’s prompt?

What do people say in response to your AI/Bot’s prompts?

Below, you will find 2 examples of workflows that show how you can simulate a voice interaction?—?the first one is an event creation workflow which passes natural language to Fantastical (an app that can parse natural language for event creation) and the second one is a simple test of what an initial conversation for booking flights may look like and then store the whole conversation as text for later reference for the researcher. The latter example is a proof of concept for storing conversation transcripts when testing voice UI.

Example 1 : Event Creation Shortcut

The goal of this workflow is to ask the user to create an event like ‘Lunch with Dad’ using voice. The conversation has the following prompts from the bot with us expecting an answer from the user for each of those questions:

What is the title of the event?

What is the date of the event?

And the time?

Event creation workflow voice interface prototyping

Event creation workflow

The steps involved in this workflow are:

1. We simulate the voice of the bot for each of the prompts outlined above using SpeakText and then use the Dictate Text step in Shortcut to get voice input from the user.

2. The voice input is then transcribed and stored in a text variable.

3. We repeat the same step for each prompt and tie everything together with all the variables to create the required output.

Simulating the bot’s voice

Add a Text step which has the text of what you want your bot to say first. Attach a Speak Text step to the first step. This brings the textToSpeech into effect.

Storing what the user has to say

Once this is done, the next step is to add a Dictate Text step to your workflow so that you can take in what the user wants to say. The next part is where it gets important?—?in order to capture what the user said in a form that can be passed to a text file or another application, you have to store that in a variable so that we can pull all the variables together in a later step. This is acheived by the Set Variable step. In essence, we are storing the output from Dictate Text in a variable called Title .

I won’t go into the rest of the steps as they are fairly self-explanatory if you go through the workflow. Moving on, it is important to see the output of this workflow that I made sure is shown inside of workflow as text so that we can see what is going on. This is the output:

Notice how the ‘bot’ has parsed the event as “Lunch with Dad at 5pm at 10 pm on 25th December”. If you look back at the top to see the questions in the conversation, the title of the event and time of the event are both asked. However, there is a tendency for users to provide additional information when title of the event is asked for. These are the kind of behaviors that can be uncovered through prototyping voice interfaces. You not only understand the behaviors, but it also gives you an idea for what you need to watch out for and how you could tweak the conversation based on the way people speak.

Example 2 : Booking flights scenario and storing conversation output to Apple Notes app

Imagine you’re doing research for voice design with this prototype — this workflow shows how you can simulate a conversation and store the responses in a Notes app.

The steps involved are very similar to the previous workflow, but instead of sending any of the information to book flights, we take the conversation and store in Apple’s Notes app. The output can obviously be stored in any app that accepts text. Here’s what the final conversation looks like —

Why this kind of prototype, again?

These examples may seem ‘over-simplistic’ in today’s sense because adding an event or booking a flight are commonplace examples. However, the point is that such a prototype can help observe behaviors especially when it comes to how humans speak. Since we are testing voice interfaces, it is very important to prototype in the same medium.

Download Shortcuts here:

This article was originally published on Shankar‘s Medium page. 

Product Designer at Asurion. Alumnus of HCI/d at Indiana University. @shankarux on twitter.

Related Posts

As a designer, the things you choose NOT to include in your designs is also design — after all, our work is about all prioritization and reduction. Similarly, as a leader, the feedback you choose NOT to give is an extremely important part of your design feedback. Lately, I’ve been paying a lot of attention to a different aspect of… Read More →

Mechanical keyboard — considered one of the best types of keyboards due to its tactile feedback. Ever watch people entering an elevator and repeatedly push the Up button? Or repeatedly push the pedestrian button at a street crossing? Or hit a refresh button many times because the loading bar keeps spinning and nothing happens? What is missing in all these… Read More →

I lead a creative team of two at a company that has grown from three co-founders to over 150 employees. We’ve acquired a company, expanded to multiple offices, and launched the world’s first end-to-end account-based marketing platform. I guess you could say we’ve grown up. But one thing that hadn’t changed over the past five years — until recently — was the Terminus brand… Read More →