Type: Personal Project
Team size: Solo
Timeframe: Jul. – Sep. 2020 (2.5 months)
My role: User Research, Prototyping, Usability Testing
Busy professionals and many home cooks often prepare meals looking at recipes online. However, constantly going back to refer to instructions on their phones with dirty/wet hands is a big problem. Voice assistants like Alexa are the perfect fix. I started off with the intention to design a cooking assistant skill for Alexa.
However, I discovered that there are a myriad of cooking skills already available.
Since I cooked everyday because of the lockdown, I spent 4 days testing out different skills that enable Alexa to find a recipe, list the ingredients and take you through the steps.
Unfortunately, my experience mirrored what the many unhappy reviews said. The skills offered on the Alexa store are riddled with usability issues which pose an impediment to their adoption. Moreover, Alexa device owners are not even aware of the availability of such skills.
I conducted a usability study to understand the mismatch in the home cooks expectation from a cooking assistant and what it has to offer. I limited my study to 5 skills on Amazon’s Alexa skill store, that provided assistance for the entire cooking process – planning, prepping and cooking.
1. Understand the unmet needs of users when using the skills.
2. Understand how users currently perceive and discover features within the skill.
3. Explore opportunities to improve and make the skills more user friendly.
After many rounds of testing, I created a need-gap analysis using which I choose two user Intents – Searching for a recipe and Follow step by step recipe instructions to re-define and re-design. Users particularly found the cost of interaction to outweigh the benefit while searching with voice only. I proposed new features and improvements with the following overarching themes-
- Align conversation flow to user’s mental model
- Build personalized experience
- Refine error handling
- Multi-modal interaction – Integrate skill with companion app in the absence of device display
- Set user expectations with short demo videos
- Improve discoverability of skill
- Improve support and user-guides
- Target the right users
See detailed research outcome
Read in depth case study
My research and design process
Alexa skills store
Understanding home cooks
I divided the research into 2 phases – surveying a broad set of people followed by interviewing potential users from this set. I floated a questionnaire among my contacts and in Facebook cooking groups and got 146 responses from people with varied backgrounds and ages.
It helped paint a picture of the average Indian home cook: who they are, their processes around planning and cooking, and their motivations, challenges, and needs while following a recipe. It was clear that people around the ages of 24-30 years, who have recently started cooking and who enjoy trying out new recipes, are the ideal users for digital assistants. I interviewed 14 home cooks in this category and dug into their minds to understand their thought process in detail.
When it came to the tasks the cooking assistant can provide support, home cooks could be divided into three categories – Beginners, Intermediate cooks and veterans.
Participants expect personalized search results. For instance, most of them prefer to search on youtube because the results are tailored to their preferences – favorite channels, cuisine, and cooking style.
Participants who have been cooking at least one meal every day have a meal plan that only includes dishes that they know to cook without any help. They can afford to spend only 30-60min to prepare the meal. A voice assistant would be more useful in planning rather than for everyday cooking in this case.
Even the more experienced cooks want to follow a recipe to the dot when it comes to baking.
Participants do not pay heed to ratings/reviews because they expect the platform to show them the best results on top.
43% of the participants are conscious about following a healthy diet
Indian cuisine is the preferred choice for any meal for the majority of participants (88%)
22% of the participants own smart speakers. However, the majority have not used it beyond playing music, getting weather updates, and controlling the lights. I tried to find out the reason for this in my interviews.
Many participants mentioned efficiency as an important consideration in whether they were likely to bother asking a question — if it was faster “to do it themselves,” they felt the assistant’s interaction was not worth the cost.
Some participants did not think to search for
When asked about privacy concerns, most users expressed strong skepticism to the claim that agents are only listening when triggered by their keyword. However, they said they would continue using them anyway. One participant said, “I post a lot of personal information online anyway, so I gave up on privacy long back.” Another said, “I am sure google is listening through my phone anyway. I can’t do anything about that because I need my phone”.
I summarized the research and created the following personas to guide my design process. I also created an anti-persona to make sure that I remember who I am not designing for.
Mental Model Mapping
There is clearly a large gap in the user’s expectations from intelligent assistants, which is hindering its adoption.
Is the assistant user-friendly?
I started by reading the Alexa design guidelines and other blogs before jumping into testing because I wanted to make sure that I understood how the conversation model for Alexa works. Plus, I was new to VUIs and it helped me learn the different methods for testing it.
Intents & Utterances
Intents are the actions users can take within the skill. For example, in this case, Search For a Recipe would be an intent.
Utterances are words or phrases that will invoke a given intent. For example, user would say: “Find me a recipe for biryani”; it would invoke the Search For a Recipe intent.
Next, based on the Mental Model Map I had created, I identified and listed all possible intents the user would have. Out of the 14 intents I identified, only 8 were available in the assistants chosen for the study.
I then proceeded to design scenario(s) that defined a task for users to perform, for each intent.
Intent: Search for a recipe
Task: You want to cook a simple delicious breakfast. Ask Alexa for recipes and narrow down to the one you like.
Expected outcome: User should be able to easily search and narrow down to a recipe in one go. User should not experience issues or get frustrated.
Observations: Did the user perform as expected? If not, why?
Finding users for the usability study
I reached out to friends and family who matched my primary user personas to participate in performing the test scenarios. I was able to perform some of the tests in person when they came over. However, due to the COVID restrictions, we had to do most over video call. I was able to get a total of 8 users to participate.
Amazon’s Echo Input. It is a voice only device with no additional display screen.
Intent: Search for a recipe
Intent: Step by step instructions
Re-imagining the experience
Reduce interaction cost while searching
Integrate missing intents
Increase skill and feature discoverability
Reduce interaction cost while searching
In the research phase, I found that users are unable to decide based on just the Dish Name. They want an image of a dish or a summary of key ingredients and a brief about the cooking process.
How and where to include this summary?
From my usability testing phase, I already understood that users do not find the description of the dish useful. They found that an overview of the ingredients as presented in the skill – Home Cooking more useful. However, presenting 3 summaries, one after the other, was too much information for the user to process. So, I started experimenting with different conversation flows.
Is multi-modal interaction better in the case of search?
I wondered if the search experience should be offered on a visual interface rather than voice only. I sketched out different ideas for a multi-modal experience. I concluded that the most user-friendly option would be to provide a companion app for the skill. Users can search and decide the recipe on the app, and with their account linked to the Alexa skill, they can then follow instructions read out by Alexa.
Test scenario – “Search for a rice with egg recipe.”
I created simple mobile UI screens on Adobe XD which displayed search results for a recipe for the user to browse.
I prepared a conversation script for this scenario with the “summary feature” to help the user choose.
I tested both prototypes with users, observed their interaction and took feedback on which they found better.
Users preferred searching via the companion app on the mobile phone and then using Alexa to guide them through the prepping and cooking steps. Users said that the summary helped them better understand the recipe, however, they still found the process slow and said that they would have found the recipe faster on their phone.
Therefore, I concluded that multi-modal interaction is the best way to bring down the cost of interaction while searching while also taking advantage of the hands free experience of Alexa.
Integrate missing intents
I made a list of all the utterances and actions the user did but were not understood by the skills I tested. Additionally, I also used my mental model mapping visualization to add other missing features.
1. Search for a recipe
2. Step by step walk through of the ingredients
3. Step by step walk through of the recipe
8. Go back
9. More information – summary of ingredients in a recipe
10. More information – summary of process recipe follows
11. Search in favorite/saved recipes
12. Search from favorite recipes
13. Search from meal plan
14. Prepare meal plan
16. Stop the instructions and go back
Increase skill & feature discoverability
I noticed that there are three ways the skill developers can increase awareness of the features/commands that the skill has/understands.
1. Skill description section on the Alexa skill store page.
2. To publish a User guide or Demo videos
3. Make advertisements
I found that the skills that do have a detailed user guide have better reviews and fewer comments where users say they don’t understand how to use it. A skill called – Allrecipes has created several ads which has set the right expectation and is therefore a very popular skill in the US.
Solutions for the usability issues
Improve Interaction Experience
Intent: Step by step instructions
Account for both Beginner and Intermediate user’s pace
Users should be able to choose weather they want to prep ingredients or to start cooking.
The skills in the study already take care of basic intents like Next, Previous and Repeat for each step. This does give users some control over the pace. However, key intents that users expect like – skipping to the end of the instructions or cancelling the instructions to logically end the process when they feel they don’t need assistance anymore, are missing.
I found that this frustrates the users early on as they explore the skill and dissuades them from using it.
Handle unexpected utterances
First make sure to include all possible synonyms for the utterances. Eg: to go to the next step, the user can say – “Next”, “Ok”, “Done”, “Finished” etc.
If the user says anything other than these, direct the user to the utterances that can be processed.
Eg: Alexa can respond by saying – Sorry I cannot understand that. You can say Alexa Next, Repeat or Previous step or Start over. You can also ask How much to know the quantity of the ingredient. If you want to stop this recipe, say cancel.
Keep the sessions active throughout this process
This was done brilliantly by the skill – Sanjeev Kapoor Recipes. To keep the skill active so that users don’t have to say “Alexa ask Sanjeev Kapoor Recipes for the next step” every time, an audio is played indicating that Alexa is actively listening.
Intent: Search for recipes
If skill providers already have an website or mobile application, it would be best to direct users to perform the searching, meal planning on the app and invoke the step by step voice instructions on Alexa by using the recipe name.
Personalization & Integration
Alexa skill development SDK has the capability to personalize experiences and link external accounts to skills.
Based on my research I could recommend the following:
1. A short survey to classify if the user is a beginner or intermediate cook. I found that users do not mind answering these questions because they already know from prior experience that it would be in their best interest.
2. Designers should personalize the responses according to user’s persona.
Keeping the skill name short
Users found long names hard to say. For instance, saying: “Alexa, ask Sanjeev Kapoor Recipes for the next step” was a mouthful. The name could have been “Chef Sanjeev” instead.
Setting the right expectation
The biggest impediment to user adoption that I found was the interaction cost of going back and fourth in conversation with Alexa. This was especially the case when searching for a recipe. Interaction cost grows exponentially when users are not occupied and have their full attention while talking to Alexa.
Skill designers should show people multi-tasking while searching for a recipe. For example: Deciding what to cook while washing the dishes. Powerful examples of the skill coming in handy when user’s hands are occupied can persuade user to see the benefit of this mode of interaction.
Improving discoverability of skill capability
In the Alexa skill description, include list of commands for core intents and a link to a user guide. Also, provide a link to a video showing the features of the skill.
Challenges with Voice Interface Design
High expectation from users
Since voice is a natural form of interaction, users get frustrated if they have to learn to talk in a different way. It is important to make this learning curve as small as possible.
Cognitive overload with voice-only interactions
I learned that even listing three long recipe names can cause the user to forget or miss out on the first one and get confused. It is also essential to strike a balance between giving too much information to save interaction cost and too little, making the process time-consuming.
Technology is still evolving
AI like Alexa, Google Home, and Siri are still evolving while learning to handle unexpected questions and intents that break the interaction flow. Different companies have their own constraints that make certain designs hard or impossible to implement. In addition, to protect the privacy of users, active listening is not possible, and the “wake word” has to be uttered every time. Designers have to work around these constraints.
Lack of standardization and established design patterns
In a GUI all apps use common navigation patterns, for example. However, due to a lack of established design patterns for voice, different skill providers have different implementations, which makes the learning curve worse for users.
It needs more user research and testing compared to mobile or web apps
As a result of the above-mentioned challenges, I realized that the design process for a VUI needs to include the user a lot more.