Previous Page  4 / 15 Next Page
Information
Show Menu
Previous Page 4 / 15 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved. 4

www.persistent.com

3. Proposed Solution(s)

a. Introduction of Solution and Advantages

Cognitive voice agents are the future of efficient user interaction, a natural evolution of IVR systems and the

preferred user interface of 21st century. Voice agents will

Always be available

Scalable to handle increase in load in a transparent manner resulting in zero or minimal wait time for

calling in customers

Capable of carrying out conversation on their own without requiring help from human agents

Learn from past interactions and get better over-time

Transfer call to human agents when customer demands

Assist human agents with real time transcriptions for better responses in real-time

Building intelligent voice agent requires

Always be available

Ability to extract intents and entities from natural language

Ability to execute action based on intent and entity to generate a response. The action could be

embedding interactions with internal/external APIs, services to add specific business and domain

knowledge or information

Convey response back to user in natural language

Training and customizing voice agents based on your domain

We will discuss three approaches on building / using voice agents using IBM Watson cognitive services. The

advantages and limitations for each of the solution are detailed in subsequent section

Fully cloud hosted solution:

Voice agents using Twilio service on IBMBluemix

Leverage your existing IP-PBX infrastructure:

Voice agents using IP-PBX solution and SIP stack

IBMWatson Voice Gateway

Above approaches revolve around making use of Watson's cognitive APIs to consume and emit audio steams

in real-time over SIP.

The voice agent uses below IBMWatson Cognitive services

Speech to text (STT):

Convert incoming audio streams over SIP into textual representation

(transcription)

Conversation:

Extract intents and entities from transcript. Determines response based on intent, entity

and context of the user conversation. The conversation service is trained in specific domains or skills to

converse with the user. Note that training the conversation service and training corpus creation is out of

scope of this white paper.

Text to speech (TTS):

Convert response into audio stream to be sent back to user over SIP and

eventually relay over SIP phone or PSTN line