Heuristic evaluation of conversational interfaces (& chatbots)

A set of heuristics for evaluating conversational interfaces
We developed a set of heuristics for evaluating chatbots and conversational user interfaces. This method is inspired by a commonly used user interface evaluation techniqued, called Heuristic Evaluation and has been iteratively developed through multiple rounds of users study. Our paper received a best paper award at the ACM Conference on Human Factors in Computing Systems.

CI-H1. Visibility of system status

The system should always keep users informed about what is going on, through appropriate feed-back within reasonable time, without overwhelming the user.

Example heuristic violations:

  • For a CUI designed to ask a bunch of questions, if there is no feedback on how far along the user is in the questionnaire.
  • For a voice-based CUI, when it isn’t clear to the user if they are in a particular state.

CI-H2. Match between system and the real world

The system should always keep users informed about what is going on, through appropriate feed-back within reasonable time, without overwhelming the user.

Example heuristic violations:

  • If the CUI uses words or terms that are unfamiliar to the users.
  • If the CUI speaks in an unnatural way.

CI-H3. User control and freedom

Users often choose system functions by mistake and will need an option to effortlessly leave the unwanted state without having to go through an extended dialogue. Support undo and redo.

Example heuristic violations:

  • Not providing users the ability to stop or cancel a command.
  • Not providing users the ability to redo a command.

CI-H4. Consistency and standards

Users should not have to wonder whether different words, options, or actions mean the same thing. Follow platform conventions for the design of visual and interaction elements. Users should also be able to receive consistent responses even if they communicate the same function in multiple ways (and modalities). Within the interaction, the system should have a consistent voice, style of language, and personality.

Example heuristic violations:

  • If the same command is called different things or responds inconsistently at different points of interaction.
  • If the CUI uses different tones and voices throughout (without a clear purpose/explanation).

CI-H5. Error prevention

Even better than good error messages is a careful design of the conversation and interface to reduce the likelihood of a problem from occurring in the first place. Be prepared for pauses, conversation fillers, and interruptions, as well as dialogue failures, deadends or sidetracks. Proactively prevent or eliminate potential error-prone conditions, and check and confirm with users before they commit an action.

Example heuristic violations:

  • If the users are provided with an answer set, and they do not have the freedom to choose “none of the above” or to exit out the interaction.
  • In contexts where errors are likely (e.g., a voice-based chatbot where the system is not confident about the user input), if the system does not confirm with users before proceeding.

CI-H6. Help and guidance

The system should guide the user throughout the dialogue by clarifying system capabilities. Help features should be easy to retrieve and search, focused on the user’s task, list concrete steps to be carried out, and not be too large. Make actions and options visible when appropriate.

Example heuristic violations:

  • If the CUI’s capabilities are not clear to users.
  • For CUIs with unique interface elements, if the CUI does not explain or have a feature to explain how the interface works.

CI-H7. Flexibility and efficiency of use

Support flexible interactions depending on the use context by providing users with the appropriate (or preferred) input and output modality and hardware. Additionally, provide accelerators, such as command abbreviations, that are unseen by novices but speed up the interactions for experts, to ensure that the system is efficient.

Example heuristic violations:

  • For menu/button-based CUIs, if users should be able to select from multiple responses but is restricted to select only one.
  • For voice-based interactions, CUIs should support verbal shortcuts for commands.

CI-H8. Aesthetic, minimalist and engaging design

Dialogues should not contain information which is irrelevant or rarely needed. Provide interactional elements that are necessary to engage the user and fit within the goal of the system. Interfaces should support short interactions and expand on the conversation if the user chooses.

Example heuristic violations:

  • If the CUI asks more questions/collects more information than necessary.
  • If the CUI uses too many irrelevant social utterances.

CI-H9. Help users recognize, diagnose and recover from errors

Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

Example heuristic violations:

  • If an error message does not explain the problem.
  • When the users performs an action that is incorrect or that the CUI does not recognize, the CUI does not constructively help guide users to a solution.

CI-H10. Context preservation

Maintain context preservation regarding the conversation topic intra-session, and if possible inter-session. Allow the user to reference past messages for further interactions to support implicit user expectations of conversations.

Example heuristic violations:

  • If the CUI does not retain a memory of the users previous responses/interactions within the same session (ideally across sessions).
  • If the CUI turns off or “sleeps” too quickly, where the user needs to restart the session from the beginning.

CI-H11. Trustworthiness

The system should convey trustworthiness by ensuring privacy of user data, and by being transparent and truthful with the user. The system should not falsely claim to be human.

Example heuristic violations:

  • If the CUI claims to be a human.
  • If the CUI is not explicit or does not provide a clear feature for users to explore how the data will be stored and used.