WP - Powering voice agents using IBM Watson

W H I T E P A P E R

www.persistent.com

3. Proposed Solution(s)

a. Introduction of Solution and Advantages

Cognitive voice agents are the future of efficient user interaction, a natural evolution of IVR systems and the

preferred user interface of 21st century. Voice agents will

—

Always be available

—

Scalable to handle increase in load in a transparent manner resulting in zero or minimal wait time for

calling in customers

—

Capable of carrying out conversation on their own without requiring help from human agents

—

Learn from past interactions and get better over-time

—

Transfer call to human agents when customer demands

—

Assist human agents with real time transcriptions for better responses in real-time

Building intelligent voice agent requires

—

Always be available

—

Ability to extract intents and entities from natural language

—

Ability to execute action based on intent and entity to generate a response. The action could be

embedding interactions with internal/external APIs, services to add specific business and domain

knowledge or information

—

Convey response back to user in natural language

—

Training and customizing voice agents based on your domain

We will discuss three approaches on building / using voice agents using IBM Watson cognitive services. The

advantages and limitations for each of the solution are detailed in subsequent section

—

Fully cloud hosted solution:

Voice agents using Twilio service on IBMBluemix

—

Leverage your existing IP-PBX infrastructure:

Voice agents using IP-PBX solution and SIP stack

—

IBMWatson Voice Gateway

Above approaches revolve around making use of Watson's cognitive APIs to consume and emit audio steams

in real-time over SIP.

The voice agent uses below IBMWatson Cognitive services

—

Speech to text (STT):

Convert incoming audio streams over SIP into textual representation

(transcription)

—

Conversation:

Extract intents and entities from transcript. Determines response based on intent, entity

and context of the user conversation. The conversation service is trained in specific domains or skills to

converse with the user. Note that training the conversation service and training corpus creation is out of

scope of this white paper.

—

Text to speech (TTS):

Convert response into audio stream to be sent back to user over SIP and

eventually relay over SIP phone or PSTN line