Team: This was an exploratory project I did by myself
Timeframe: Jul – Sept 2020
Methods: Generative and evaluative research, Interaction Design

The problem

Busy professionals and many home cooks often prepare meals by looking at recipes online. However, constantly going back to refer to instructions on their phones with dirty/wet hands is a big problem. Voice assistants like Alexa are the perfect fix. I started off intending to design a cooking assistant skill for Alexa.
However, I discovered that there are many cooking skills already available.

The cooking assistant skills available on the Alexa store are riddled with usability issues that impede their adoption. I wanted to understand why.

Research Goal

  1. What are the needs of home cooks?
  2. Are popular Alexa skills meeting users’ needs?
  3. What are the opportunities to improve and make the skills more user-friendly?


Key recommendations

Multimodal Interaction: Use GUI for search and VUI for step by step instructions
Direct users to perform the searching and meal planning on a companion app to reduce interaction costs. Invoke the step-by-step voice instructions on Alexa by using the recipe name.

User searches for recipe on the Companion App
User then invokes cooking instructions using the recipe name he found on the companion app

Add an intent for assistance to prep ingredients
Many home cooks from my research prefer to gather ingredients on their kitchen counter before starting to cook. They they go back and fourth between the recipe to perform this task.

Account for both Beginner and Intermediate user’s needs
For intermediate users, support intents like skipping to the end of the instructions or canceling them to logically end the process when they feel they don’t need assistance anymore.
For beginners, make sure to include all possible synonyms for the utterances. E.g., to go to the next step, the user can say – “Next”, “Ok”, “Done”, “Finished”, etc.
If the user says anything other than these, direct the user to the utterances that can be processed.

Keep sessions open throughout the intent
Lengthen the session by playing a jingle while waiting for the user to move to the next step. The audio indicates to the user that Alexa is actively listening. This way, users do not have to open the skill again if they take too long to complete the task.




I divided the research into 2 phases – surveying a broad set of people followed by interviewing potential users from this set. I floated a questionnaire among my contacts and Facebook cooking groups and got 146 responses from people with varied backgrounds and ages.

It helped paint a picture of the average Indian home cook: who they are, their processes around planning and cooking, and their motivations, challenges, and needs while following a recipe. It was clear that people ages 24-30 years who have recently started cooking and who enjoy trying out new recipes are the ideal users for digital assistants. I interviewed 14 home cooks in this category and dug into their minds to understand their thought processes.


Home cooks can be divided into three categories based on their needs, cooking process, and behaviors – Beginners, Intermediate cooks, and veterans. Beginners like to be methodical, follow a recipe, and cook at a slow pace. Intermediate cooks skip steps,  improvise and are fast pace. Veterans are clearly not the target users for assistants. 

Mapping out the user’s cooking process flow helped me understand their mental models and expectations when they would interact with the voice assistants.

Key Insights

Users look for multiple data points to select the recipe they desire.

Users prefer to search on YouTube because of personalized recommendations.

Users search based on multiple query types.

When deciding what to cook, users look for multiple things - ingredients, a quick view of the process, ratings, and prep time.

Users refer to the recipe to prep ingredients and lay them out before starting.

Users go back and forth between the recipe on their phones while cooking.

Comparative analysis

The insights uncovered lead to the next question – are popular Alexa cooking assistant skills meeting the users’ needs? Doing a comparative feature analysis helped me find the answer.

As we can see in the mapping below, there is clearly a gap between the user's mental model and the features provided.

Need for deeper analysis

Based on the findings, two primary intents needed deeper evaluation.


Searching for a recipe


Following the step-by-step instructions


Testing with users

Testing plan
I  created scenario(s) that defined a task for users to perform, for each intent.
For example-
Intent: Search for a recipe
Scenario: #1
Task: You want to cook a simple delicious breakfast. Ask Alexa for recipes and narrow down to the one you like.
Expected outcome: User should be able to easily search and narrow down to a recipe in one go. User should not experience issues or get frustrated.
Observations: Did the user perform as expected? If not, why?

Recruiting users 
I reached out to friends and family who matched my primary user personas to participate in performing the test scenarios. I was able to perform some of the tests in person when they came over. However, due to the COVID restrictions, we had to many over video call. I was able to get a total of 8 users to participate.

Device used
Amazon’s Echo Input. It is a voice only device with no additional display screen.

Intent: Search for recipe, Usability log
Intent: Step-by-step instructions, Usability log



Based on the findings from research, I prioritized the following design challenges to address in my recommendations

  • How might we reduce interaction cost while searching?
  • How to support both Beginner and Intermediate home cooks?
  • How might we increase skill and feature discoverability?


Reduce interaction costs while searching
In the research phase, I found that users cannot decide based on just the Dish Name. They want an image of a dish or a summary of key ingredients and a brief about the cooking process.

How and where to include this summary?
From my usability testing phase, I already understood that users do not find the dish’s description useful. They found an overview of the ingredients as presented in the skill – Home Cooking more useful. However, presenting 3 summaries, one after the other, was too much information for the user to process. So, I started experimenting with different conversation flows.

Is multi-modal interaction better in the case of search?
I wondered if the search experience should be offered on a visual interface rather than voice only. I sketched out different ideas for a multi-modal experience. I concluded that the most user-friendly option would be to provide a companion app for the skill. Users can search and decide on the recipe on the app, and with their account linked to the Alexa skill, they can then follow instructions read out by Alexa.

Integrate missing intents & utterances
I made a list of all the intents and utterances the user did but were not understood by the skills during my testing. Additionally, I used the process model mapping visualization to add other missing intents.

1. Search for a recipe
2. Step by step walk through of the ingredients
3. Step by step walk through of the recipe
4. Stop
5. Repeat
6. Pause
7. Skip
8. Go back
9. More information – summary of ingredients in a recipe
More information – summary of process recipe follows
11. Search in favorite/saved recipes
Search from favorite recipes
13. Search from meal plan
14. Prepare meal plan 
15. Cancel/Exit
16. Stop the instructions and go back

Increase skill & feature discoverability​
I noticed three ways the skill providers can increase awareness of the features/commands that the skill has/understands.
1. Skill description section on the Alexa skill store page
2. To publish a User guide or Demo videos
3. Make advertisements that set the right expectation
I found that the skills with a detailed user guide have better reviews and fewer comments where users say they don’t understand how to use it.

Prototyping & User Testing

Variant 1: Companion app to search & choose recipe + Alexa assistant to voice out cooking instructions

Variant 2: Searching with voice only where all expected intents and utterances are supported.

Test scenario – “Search for a rice with egg recipe.” I tested both prototypes with users, observed their interaction and took feedback on which they found better.  

Users preferred searching via the companion app on the mobile phone and then using Alexa to guide them through the prepping and cooking steps. Users said that the summary helped them better understand the recipe, however, they still found the process slow and said that they would have found the recipe faster on their phone.

User searching for the recipe on the app prototype
User following Alexa's instructions to cook the chosen recipe



High expectation from users
Since voice is a natural form of interaction, users get frustrated if they have to learn to talk in a different way. It is important to make this learning curve as small as possible.

Cognitive overload with voice-only interactions
I learned that even listing three long recipe names can cause the user to forget or miss out on the first one and get confused. It is also essential to strike a balance between giving too much information to save interaction cost and too little, making the process time-consuming.

Technology is still evolving
AI like Alexa, Google Home, and Siri are still evolving while learning to handle unexpected questions and intents that break the interaction flow. Different companies have their own constraints that make certain designs hard or impossible to implement. In addition, to protect the privacy of users, active listening is not possible, and the “wake word” has to be uttered every time. Designers have to work around these constraints.

Lack of standardization and established design patterns
In a GUI all apps use common navigation patterns, for example. However, due to a lack of established design patterns for voice, different skill providers have different implementations, which makes the learning curve worse for users. 

It needs more user research and testing compared to mobile or web apps
As a result of the above-mentioned challenges, I realized that the design process for a VUI needs to include the user a lot more. 

All | Vocabulary through musicIdentifying digital risks | Voice assistants for cooking