W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 9
www.persistent.com
Voice agent pool:
Apool of voice agents waiting to engage with users.
Voice agent:
Voice agent is a SIP endpoint built on top of existing SIP libraries such as JSIP or peers-lib and comprises of an
API manager which orchestrates API calls between multiple services offered by IBM Watson and a dialog
assistant that helps streamline conversation dialogs to organization's needs.
Orchestrator:
The Orchestrator choreographs multipleAPIs of Watson such as speech to text, text to speech
DialogAssistant:
Interface with Watson conversation API and incorporate responses with business specific intelligence and data
before passing the response back toAPI manager.
For example; Watson conversationAPI may be configured to return the following response:
Your next dental checkup is scheduled on {next_schedule_date}
Here Dialog assistant can fetch the schedule date from a database and embed the date in dialogue.
Workflow:
1. Customer places a call to a PSTN number for a service request or assistance.
2. The PSTN number is configured with a SIP trunk which forwards the audio streams from tradition phone
system. This is dictated by the origination scheme of your SIP trunk setup.
3. SIP trunk connects with an organization-wide IP-PBX which then forwards SIP traffic to endpoints (Voice
agents).
4. AVoice agent (SIP endpoint) forwards audio stream toWatson speech-to-text service over a web-socket
connection and receives text transcriptions in real-time.
5. When a pause in speech is detected, the transcriptions are sent to Watson conversation service to fetch
the next dialogue.
6. The dialogue is parsed and processed further according to business requirements; dialog assistant has
access to dialogue transcriptions, intent, entities, confidence score and alternative transcriptions. Third
partyAPIs, database calls can also be incorporated at this stage to further enrich dialogues.
9. Once the final dialogue is fabricated, the text is send toWatson text-to-speech instance and audio stream
is received and forwarded to SIP endpoint.
Advantages:
1. Built on top of existing SIP stack and libraries
2. Closely coupled with own IP-PBX such as Asterisk or 3CX, hosted on premise or in the cloud. Most
organizations have some form of organization-wide PBX already setup.
3. Scale up or scale down infrastructure and number of voice-agents at any time
4. Multi-tenancy by connecting one or more direct inward dialing, as defined in asterisk configuration files,
to Voice agents or by having a pool of Voice agents waiting to engage with customers
5. Flexible plug-in architecture: Ability to replace Watson Speech to text, Watson Text to speech with other
services as per business requirements. Voice agents are designed to be very modular. One can extend
and override voice agents to utilize other services for Speech to text or Text to speech. For example: by
extending the method void speak (String text); one can incorporate another TTS service or SDK such as
FreeTTS or espeak.