Type: Personal Project
Team size: Solo
Timeframe: Jul. – Sep. 2020 (2.5 months)
My role: User Research, Prototyping,  Usability Testing

The problem

Busy professionals and many home cooks often prepare meals looking at recipes online. However, constantly going back to refer to instructions on their phones with dirty/wet hands is a big problem. Voice assistants like Alexa are the perfect fix. I started off with the intention to design a cooking assistant skill for Alexa.
However, I discovered that there are a myriad of cooking skills already available.
Since I cooked everyday because of the lockdown, I spent 4 days testing out different skills that enable Alexa to find a recipe, list the ingredients and take you through the steps.

Unfortunately, my experience mirrored what the many unhappy reviews said. The skills offered on the Alexa store are riddled with usability issues which pose an impediment to their adoption. Moreover, Alexa device owners are not even aware of the availability of such skills.

Research scope

I conducted a usability study to understand the mismatch in the home cooks expectation from a cooking assistant and what it has to offer. I limited my study to 5 skills on Amazon’s Alexa skill store, that provided assistance for the entire cooking process – planning, prepping and cooking. 
1. Understand the unmet needs of users when using the skills.
2. Understand how users currently perceive and discover features within the skill.
3. Explore opportunities to improve and make the skills more user friendly.

Research outcome

After many rounds of testing, I created a need-gap analysis using which I choose two user Intents – Searching for a recipe and Follow step by step recipe instructions to re-define and re-design. Users particularly found the cost of interaction to outweigh the benefit while searching with voice only. I proposed new features and improvements with the following overarching themes- 

  • Align conversation flow to user’s mental model
  • Build personalized experience
  • Refine error handling 
  • Multi-modal interaction – Integrate skill with companion app in the absence of device display 
  • Set user expectations with short demo videos
  • Improve discoverability of skill
  • Improve support and user-guides
  • Target the right users

See detailed research outcome

Read in depth case study


My research and design process



Alexa skills store

I choose the skills that provided assistance for the entire cooking process – planning, prepping and cooking. I prioritized ones that were built for the Indian audience.
I narrowed it down to the following:
1. Sanjeev Kapoor Recipes 
2. Recipe Speak
3. My CookBook
4. Home Cooking
5. Youchef  


Understanding home cooks

User research

I divided the research into 2 phases – surveying a broad set of people followed by interviewing potential users from this set. I floated a questionnaire among my contacts and in Facebook cooking groups and got 146 responses from people with varied backgrounds and ages.

It helped paint a picture of the average Indian home cook: who they are, their processes around planning and cooking, and their motivations, challenges, and needs while following a recipe. It was clear that people around the ages of 24-30 years, who have recently started cooking and who enjoy trying out new recipes, are the ideal users for digital assistants. I interviewed 14 home cooks in this category and dug into their minds to understand their thought process in detail.

When it came to the tasks the cooking assistant can provide support, home cooks could be divided into three categories – Beginners, Intermediate cooks and veterans. 
Participants expect personalized search results. For instance, most of them prefer to search on youtube because the results are tailored to their preferences – favorite channels, cuisine, and cooking style.  
Participants who have been cooking at least one meal every day have a meal plan that only includes dishes that they know to cook without any help. They can afford to spend only 30-60min to prepare the meal. A voice assistant would be more useful in planning rather than for everyday cooking in this case.       
Even the more experienced cooks want to follow a recipe to the dot when it comes to baking.   
Participants do not pay heed to ratings/reviews because they expect the platform to show them the best results on top. 
43% of the participants are conscious about following a healthy diet
Indian cuisine is the preferred choice for any meal for the majority of participants (88%)
22% of the participants own smart speakers. However, the majority have not used it beyond playing music, getting weather updates, and controlling the lights. I tried to find out the reason for this in my interviews. 
Many participants mentioned efficiency as an important consideration in whether they were likely to bother asking a question — if it was faster “to do it themselves,” they felt the assistant’s interaction was not worth the cost. 
Some participants did not think to search for  
When asked about privacy concerns, most users expressed strong skepticism to the claim that agents are only listening when triggered by their keyword. However, they said they would continue using them anyway. One participant said, “I post a lot of personal information online anyway, so I gave up on privacy long back.” Another said, “I am sure google is listening through my phone anyway. I can’t do anything about that because I need my phone”. 

Target Users

I summarized the research and created the following personas to guide my design process. I also created an anti-persona to make sure that I remember who I am not designing for.

Mental Model Mapping

I then proceeded to analyze data from the survey & stories from user interviews to visualize the user’s current process to understand expectations and the features in popular Alexa skills that map to those expectations. Voice only interfaces are new and unfamiliar to most users. To reduce the inherent learning curve, it is important to design the interaction model which is consistent with their previous experiences . I found that mental model maps are the best tool to  carry out this analysis.

There is clearly a large gap in the user’s expectations from intelligent assistants,  which is hindering its adoption.


Is the assistant user-friendly?


Testing Intents

I started by reading the Alexa design guidelines and other blogs before jumping into testing because I wanted to make sure that I understood how the conversation model for Alexa works. Plus, I was new to VUIs and it helped me learn the different methods for testing it.

Intents & Utterances
Intents are the actions users can take within the skill. For example, in this case, Search For a Recipe would be an intent.
Utterances are words or phrases that will invoke a given intent. For example,  user would say: “Find me a recipe for biryani”; it would invoke the Search For a Recipe intent.
Next, based on the Mental Model Map I had created, I identified and listed all possible intents the user would have. Out of the 14 intents I identified, only 8 were available in the assistants chosen for the study. 

Testing plan
I then proceeded to design scenario(s) that defined a task for users to perform, for each intent.
For example-
Intent: Search for a recipe
Scenario: #1
Task: You want to cook a simple delicious breakfast. Ask Alexa for recipes and narrow down to the one you like.
Expected outcome: User should be able to easily search and narrow down to a recipe in one go. User should not experience issues or get frustrated.
Observations: Did the user perform as expected? If not, why?  

Finding users for the usability study
I reached out to friends and family who matched my primary user personas to participate in performing the test scenarios. I was able to perform some of the tests in person when they came over. However, due to the COVID restrictions, we had to do most over video call. I was able to get a total of 8 users to participate. 

Device used
Amazon’s Echo Input. It is a voice only device with no additional display screen. 

Intent: Search for a recipe

Intent: Step by step instructions


Re-imagining the experience

Design Goals: 
Reduce interaction cost while searching 
Integrate missing intents 
Increase skill and feature discoverability 

Reduce interaction cost while searching

In the research phase, I found that users are unable to decide based on just the Dish Name. They want an image of a dish or a summary of key ingredients and a brief about the cooking process. 

How and where to include this summary? 
From my usability testing phase, I already understood that users do not find the description of the dish useful. They found that an overview of the ingredients as presented in the skill – Home Cooking more useful. However, presenting 3 summaries, one after the other, was too much information for the user to process. So, I started experimenting with different conversation flows.  

Is multi-modal interaction better in the case of search? 
I wondered if the search experience should be offered on a visual interface rather than voice only. I sketched out different ideas for a multi-modal experience. I concluded that the most user-friendly option would be to provide a companion app for the skill. Users can search and decide the recipe on the app, and with their account linked to the Alexa skill, they can then follow instructions read out by Alexa.

Test scenario – “Search for a rice with egg recipe.”
I created simple mobile UI screens on Adobe XD which displayed search results for a recipe for the user to browse. 
I prepared a conversation script for this scenario with the “summary feature” to help the user choose. 

A/B Testing
I tested both prototypes with users, observed their interaction and took feedback on which they found better.  

Users preferred searching via the companion app on the mobile phone and then using Alexa to guide them through the prepping and cooking steps. Users said that the summary helped them better understand the recipe, however, they still found the process slow and said that they would have found the recipe faster on their phone. 

Therefore, I concluded that multi-modal interaction is the best way to bring down the cost of interaction while searching while also taking advantage of the hands free experience of Alexa. 

User searching for the recipe on the app prototype
User following Alexa's instructions to cook the chosen recipe

Integrate missing intents

I made a list of all the utterances and actions the user did but were not understood by the skills I tested. Additionally, I also used my mental model mapping visualization to add other missing features.  

1. Search for a recipe
2. Step by step walk through of the ingredients
3. Step by step walk through of the recipe
4. Stop
5. Repeat
6. Pause
7. Skip
8. Go back
9. More information – summary of ingredients in a recipe
More information – summary of process recipe follows
11. Search in favorite/saved recipes
Search from favorite recipes
13. Search from meal plan
14. Prepare meal plan 
15. Cancel/Exit
16. Stop the instructions and go back

Increase skill & feature discoverability

I noticed that there are three ways the skill developers can increase awareness of the features/commands that the skill has/understands. 
1. Skill description section on the Alexa skill store page.
2. To publish a User guide or Demo videos 
3. Make advertisements 

I found that the skills that do have a detailed user guide have better reviews and fewer comments where users say they don’t understand how to use it. A skill called – Allrecipes has created several ads which has set the right expectation and is therefore a very popular skill in the US.   


Solutions for the usability issues

Improve Interaction Experience

Intent: Step by step instructions

Account for both Beginner and Intermediate user’s pace
Users should be able to choose weather they want to prep ingredients or to start cooking.
The skills in the study already take care of basic intents like Next, Previous and Repeat for each step. This does give users some control over the pace. However, key intents that users expect like – skipping to the end of the instructions or cancelling the instructions to logically end the process when they feel they don’t need assistance anymore, are missing.
I found that this frustrates the users early on as they explore the skill and dissuades them from using it. 

Handle unexpected utterances
First make sure to include all possible synonyms for the utterances. Eg: to go to the next step, the user can say – “Next”, “Ok”, “Done”, “Finished” etc.
If the user says anything other than these, direct the user to the utterances that can be processed.
Eg: Alexa can respond by saying – Sorry I cannot understand that. You can say Alexa Next, Repeat or Previous step or Start over. You can also ask How much to know the quantity of the ingredient. If you want to stop this recipe, say cancel. 

 Keep the sessions active throughout this process 
This was done brilliantly by the skill – Sanjeev Kapoor Recipes. To keep the skill active so that users don’t have to say “Alexa ask Sanjeev Kapoor Recipes for the next step” every time, an audio is played indicating that Alexa is actively listening.

Prompt the user
If there is no response after 20 seconds or so, prompt the user for a response.

Intent: Search for recipes

Multi-modal interaction
If skill providers already have an website or mobile application, it would be best to direct users to perform the searching, meal planning on the app and invoke the step by step voice instructions on Alexa by using the recipe name.

Companion App
User directly invokes cooking instructions using the recipe name he found on the companion app

Personalization & Integration
Alexa skill development SDK has the capability to personalize experiences and link external accounts to skills.
Based on my research I could recommend the following:
1. A short survey to classify if the user is a beginner or intermediate cook. I found that users do not mind answering these questions because they already know from prior experience that it would be in their best interest.
2. Designers should personalize the responses according to user’s persona.

Keeping the skill name short
Users found long names hard to say. For instance, saying: “Alexa, ask Sanjeev Kapoor Recipes for the next step” was a mouthful. The name could have been “Chef Sanjeev” instead.

Setting the right expectation

The biggest impediment to user adoption that I found was the interaction cost of going back and fourth in conversation with Alexa. This was especially the case when searching for a recipe. Interaction cost grows exponentially when users are not occupied and have their full attention while talking to Alexa.
Skill designers should show people multi-tasking while searching for a recipe. For example: Deciding what to cook while washing the dishes. Powerful examples of the skill coming in handy when user’s hands are occupied can persuade user to see the benefit of this mode of interaction. 

 Improving discoverability of skill capability
In the Alexa skill description, include list of commands for core intents and a link to a user guide. Also, provide a link to a video showing the features of the skill.  


Biggest takeaways

Challenges with Voice Interface Design

High expectation from users
Since voice is a natural form of interaction, users get frustrated if they have to learn to talk in a different way. It is important to make this learning curve as small as possible.

Cognitive overload with voice-only interactions
I learned that even listing three long recipe names can cause the user to forget or miss out on the first one and get confused. It is also essential to strike a balance between giving too much information to save interaction cost and too little, making the process time-consuming.

Technology is still evolving
AI like Alexa, Google Home, and Siri are still evolving while learning to handle unexpected questions and intents that break the interaction flow. Different companies have their own constraints that make certain designs hard or impossible to implement. In addition, to protect the privacy of users, active listening is not possible, and the “wake word” has to be uttered every time. Designers have to work around these constraints.

Lack of standardization and established design patterns
In a GUI all apps use common navigation patterns, for example. However, due to a lack of established design patterns for voice, different skill providers have different implementations, which makes the learning curve worse for users. 

It needs more user research and testing compared to mobile or web apps
As a result of the above-mentioned challenges, I realized that the design process for a VUI needs to include the user a lot more. 

Privacy concerns

Either users had completely resigned to the fact that they have no control over their privacy or users said that they would never use a smart speaker at home. Moreover, they all said that privacy settings are hard to understand and navigate. Alexa device owners did not know that the “Mute button” was a “Privacy feature” and had never used it before. Additionally, due to reports of the recordings of user’s interaction with Alexa being misused by Amazon created more doubt and anxiety for users.

Using GUI design principles

I was able to find common design principles that apply to both GUI and VUI. For example, in both cases, designers need to provide just enough information to avoid information overload; we need to provide constraints to guide users through a certain path; we need to focus on the tasks users are trying to accomplish, think about their context, their needs, and provide solutions that best suit their needs within the context.

Future work

I would love to conduct the user testing with a more diverse user base. Especially visually impaired users. I believe this solution works better than screen readers and can make the process of cooking new recipes more accessible and faster for these users.

An opportunity to learn

Thanks to this project, I’ve learned more techniques to do user research and testing – mental model mapping, task analysis, wizard of oz prototyping, and A/B testing.

All | Vocabulary through music | Identifying digital risks | Voice assistants for cooking