Conversation design (also called voice design, voice user interface (VUI) design, or conversation user interface design (CUI)) is building experiences where a user interacts with an Artificial Intelligence (AI) agent to complete tasks. For example, instead of filling out a form by typing into fields on a webpage, a voice assistant can capture the required information from you through a conversation, either by typing (into a chatbot) or voice (like an Amazon Alexa).
How did you get started in conversation design?
Sid: I took a course in grad school that dealt with text and machine learning (ML) – which in a way was interesting – but there was no course offered that dealt with ML and voice together. As luck would have it, a sales lead came through at Architech for a VUI application for call centre operations, and I was given a chance to do a POC for it. Once I started working on this POC, I was blown away by the plethora of VUI technologies out there and the advancements made to enable developers to use them.
Meg: That first POC we worked on last April was my first exposure to conversation design. And then it just started snowballing from there. Like a lot of things in my career, conversation design just kind of snuck up on me in a sink-or-swim kind of way – learning by doing.
What interests you about conversation design?
Sid: For me, to have a smartphone and be able to do things that would usually require a laptop is pretty awesome – VUI lets you do that (to an extent). The most natural use case I see for conversation design is an extension of various analytics dashboards. That opens up a lot of possibilities. Imagine a C-level exec, who has to log into a web app to see data points they are interested in. With VUI, those exact data points can be made available to them via a voice interface like Google Assistant or Alexa, and they can check them by just using their smartphones. And of course, there are call centre workflows. 🙂
Meg: I’m interested in conversation design partially because it appeals to the way I think through problems. I’m not a visual designer, my background is in research and information architecture, so I really like dealing with different structures and flows. Learning new things is always fun too!
What’s the process for a conversation design project?
Meg: This practice is still so new that there aren’t a lot of established processes from the design side. On our most recent client project, there was a lot of trial and error to determine the best way to map out flows and copy decks in a way that works for all the different team members: the developers need something to build against, but it also has to be understandable for the client and QA team.
Sid: There was a lot of hit and miss during the development of one VUI we created because there is still not yet enough documentation and forums to reference when you hit a roadblock.
Meg: On one client project I used a tool called Voiceflow for mapping out the flows of the voice interactions that I wanted, then the engineer and I could work within the same tool so that the design and copy could be contained in the same place as the code. It’s really amazing for developing skills on Alexa. When we’re building projects on Google Assistant, it’s easier for me to map out conversations on Lucidchart since Dialogflow doesn’t have a nice visualization interface for the actual conversation flows. It’s also useful for a copy deck-like document, to capture variations on responses.
Sid: For one recent implementation, we used Dialogflow with Firebase as they are a natural fit in GCP (Google Cloud Platform) ecosystem. We tried using the Dialogflow UI to control the various user flows but quickly decided to shift to the backend, driving all the conversation flows from the fulfilment webhook. Dialogflow, in a sense, was used just to capture the voice-to-text inputs and various data points that were needed for fulfilment. Everything else, including where the conversation will flow to, was driven by backend logic from the webhook. One thing we realized is that, in VUI, the input from the user can be anything and in many varied combinations, and the Assistant should be robust enough to handle these varied utterances.
Meg: Speaking of utterances – I can try to think of every possible way that someone will phrase a command, but our testers will always find new ones. Testing conversation prototypes is a lot of fun, and SO NECESSARY. The complexity of how people talk is the biggest hurdle we have to jump over to make these experiences work. Of course, we can’t account for every single variety of phrase in the English language, but contextual error redirects are also an important part of the process. Our co-workers here in the office are game for being our guinea pigs during our first rounds of conversation testing, and they always manage to find bugs or training issues.
Sid: One thing’s for sure, the VUI’s greatness is determined by the VUI designer!
Meg: No pressure!
Are there any limitations to conversation design?
Meg: Getting systems to work together isn’t as easy as I wish it would be. We can dream up all these seamless processes using voice, but when those processes involve different accounts and systems, there are often speedbumps that we have to accommodate in our designs. We expect devices like Alexa or Google to work like JARVIS, but we’re definitely not there yet.
Sid: VUI platforms do a great job of translating voice-to-text, but there are instances where the translation is not perfect and that causes frustration on the user’s part. Another source of frustration from the design and development point of view is that if multiple intents share the same trigger sentence it causes ambiguity. It is entirely possible that the user will utter the same sentence in more than one scenario.
Meg: Yeah, and that’s part of the challenge as well – to determine when a conversational application makes sense as a solution. We don’t want to throw this technology at every problem just because it’s fun and new, we have to make sure that it will work well for the end-user in context.
What are you excited about going forward?
Meg: I read an article that estimated that 50% of web searches done this year will be done using voice interfaces – that’s pretty exciting. And a little daunting! There’s a lot of opportunity in this space to develop best practices for the design of these conversations, and I hope we can get it right so that our end users don’t get discouraged from using voice interfaces.
Sid: The ability to integrate directly with the Telephony Gateways in a call centre is pretty exciting. That way simple tasks can be automated with much more human-like offerings than some current standard solutions in the market.
Meg: No more “for service in English, press 1, pour le service en Français, appuyez sur le 2.”
Sid: And another thing that I’m excited about is being able to create a single VUI agent that can serve both text and voice queries.
Meg: Which appeals to me as a millennial – I’ll usually choose a text chat over talking out loud!
Meg is the Senior UX Researcher & Designer at Architech. She is amazed at the sheer number of response variations people will give to the question, “how are you?”.
Sid is a Technical Solution Architect at Architech. He is also the resident mixologist and wants to pit AI-mixed drinks against the human ones to see which turn out better!