Categories: contact centreExperience DesignTechnology

There is a big difference between NLP (Natural Language Processing) and NLU (Natural Language Understanding).

NLP models, predicts, spots and generates word sequences learned over terabytes of uncurated language data scraped from the internet (hence potentially biased, fake or nonsensical).

On the other hand, NLU abstracts beyond the words and even word sequences and tries to assign some kind of meaning to the word sequences (user intent and goal, semantic category, ontology mapping).

There is a parallel in the Voice AI and Speech Recognition world: we juxtapose “word accuracy” (whether the automated speech recognition [ASR] got all the words right) and “concept accuracy” (whether the ASR got the right “words”, i.e. extracted the right user intent or slot value).

A Speech IVR, Voicebot or Chatbot may only get half the words that you speak or type right (50% word accuracy), but may still get at what you actually want to say or do (100% concept accuracy). Very often the opposite can be true too; the ASR may recognise all the words typed in a chat, as they are already in its vocabulary (100% word accuracy), but may not have a clue what you’re on about (0% concept accuracy), because this concept, goal, intent, slot, slot value has not yet been included in its domain model.


Statistics + Linguistics = Explainable NLU (xNLU)

This is why language analysis and modelling, whether it’s for content extraction, text summarisation or Conversational Experience Design, should be a combination of robust data modelling and Computational Linguistics. Neither, by itself, can get it right:

  • Conversation Design that is not based on a sufficient amount of real-world data (80k utterances is the standard in the Contact Centre world) is bound to give rise to Voicebots and Chatbots that will invariably fail in the face of actual language use and user behaviour, irrespective of how natural, friendly or even endearing the system messages are.
  • A Conversation Designer who hasn’t been trained in Linguistics is also bound to get the system messages wrong, by confusing, misleading or even putting the user off with messages that are too vague, ambiguous or chatty.
  • Data Science that does not supplement string manipulation (NLP) with semantic ontologies and Human-Machine interaction patterns (NLU), will, too, result in natural language interfaces that have limited functionality, and are, hence, unhelpful, if not nonsensical.
  • Engineers who design conversations and write system prompts always get the messaging wrong (verbose, full of jargon, too formal, confusing), which trips the user and the conversation.

Enter Google Imagen

More recently, the GOOGLE text-to-image diffusion model IMAGEN wowed everyone by generating incredibly realistic – usually beautiful – pictures. However, once again, it was trained on large, mostly uncurated, web-scraped datasets, meaning it unsurprisingly also replicates social biases and stereotypes, as the researchers themselves admit:

– “an overall bias towards generating images of people with lighter skin tones”

– “portraying different professions to align with Western gender stereotypes”

– “social and cultural biases when generating images of activities, events, and objects”.

Good job the code is not in the public domain yet, as, in the wrong hands, Imagen can generate fake, defamatory or even harmful content. The future should be trodden with caution!



In short, don’t get overexcited over GPT3 and such LMs, or even text-to-image algorithms such as IMAGEN; they may have impressive (creative, beautiful, funny, even scary) generative power, but next to no explanatory power. That is, despite the name, current large Language Models only represent a tiny fraction of what human language use is about and what human language conveys. The right terminology is “word sequence modelling”, which does not equal actual language understanding (or even … human language modelling)! We should instead strive for explainability: Explainable NLP, NLU, VUI and Conversation Design.


More about the author:

Dr Maria Aretoulaki is Principal Consultant CX Design (Voice & Conversational AI) at GlobalLogic UK&I. She has been designing Voice User Interfaces (VUIs and Speech IVRs) for the past 25+ years, mainly for Contact Centre automation and Customer Self-Service. In 2019, she coined the term “Explainable VUI Design” to promote the principle of data-based Conversation Design and Computational Linguistics-based Language Engineering.



Dr Maria Aretoulaki

Lead Conversational & Generative AI Design

View all Articles

Top Insights

Manchester City Scores Big with GlobalLogic

Manchester City Scores Big with GlobalLogic

AI and MLBig Data & AnalyticsCloudDigital TransformationExperience DesignMobilitySecurityMedia
Twitter users urged to trigger SARs against energy companies

Twitter users urged to trigger SARs against energy...

Big Data & AnalyticsDigital TransformationInnovation
Retail After COVID-19: How Innovation is Powering the New Normal

Retail After COVID-19: How Innovation is Powering the...

Digital TransformationInsightsConsumer and Retail

Top Authors

Ravikrishna Yallapragada

Ravikrishna Yallapragada

AVP, Engineering

Mark Norkin

Mark Norkin

Consultant, Engineering

Sujatha Malik

Sujatha Malik

Principal Architect

Top Insights Categories

  • URL copied!