W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 4
www.persistent.com
3. Proposed Solution(s)
a. Introduction of Solution and Advantages
Cognitive voice agents are the future of efficient user interaction, a natural evolution of IVR systems and the
preferred user interface of 21st century. Voice agents will
—
Always be available
—
Scalable to handle increase in load in a transparent manner resulting in zero or minimal wait time for
calling in customers
—
Capable of carrying out conversation on their own without requiring help from human agents
—
Learn from past interactions and get better over-time
—
Transfer call to human agents when customer demands
—
Assist human agents with real time transcriptions for better responses in real-time
Building intelligent voice agent requires
—
Always be available
—
Ability to extract intents and entities from natural language
—
Ability to execute action based on intent and entity to generate a response. The action could be
embedding interactions with internal/external APIs, services to add specific business and domain
knowledge or information
—
Convey response back to user in natural language
—
Training and customizing voice agents based on your domain
We will discuss three approaches on building / using voice agents using IBM Watson cognitive services. The
advantages and limitations for each of the solution are detailed in subsequent section
—
Fully cloud hosted solution:
Voice agents using Twilio service on IBMBluemix
—
Leverage your existing IP-PBX infrastructure:
Voice agents using IP-PBX solution and SIP stack
—
IBMWatson Voice Gateway
Above approaches revolve around making use of Watson's cognitive APIs to consume and emit audio steams
in real-time over SIP.
The voice agent uses below IBMWatson Cognitive services
—
Speech to text (STT):
Convert incoming audio streams over SIP into textual representation
(transcription)
—
Conversation:
Extract intents and entities from transcript. Determines response based on intent, entity
and context of the user conversation. The conversation service is trained in specific domains or skills to
converse with the user. Note that training the conversation service and training corpus creation is out of
scope of this white paper.
—
Text to speech (TTS):
Convert response into audio stream to be sent back to user over SIP and
eventually relay over SIP phone or PSTN line