W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 6
www.persistent.com
Components
User:
Aphone on any network
Twilio platform:.
Cloud communications platform for building SMS, Voice & Messaging applications on an API built for global
scale. The solution use global PSTN number and SIP endpoint provided by Twilio
API server:
Handles incoming PSTN call and routes to voice agent for processing. The API server is hosted on a computer
with publicly visible IPaddress.
VoiceAgent:
Receives audio stream over SIP. Uses Watson speech-to-text to convert audio stream into transcript. The
transcript is passed to Watson conversation to extract intents and entities. Based on the intent and entity, a
response is created. The voice agent is then responsible for sending voice data to Twilio which forwards it to
user's phone.
The voice agent is hosted on a computer with publicly visible IP address. It can be collocated with API server.
The component is configured with settings and credentials provided by SIP provider, in this case Twilio.
However any SIP provider can be used including self-hosted SIP solutions.
Speech-to-text flow:
Workflow:
1. User dials the PSTN number provided by Twilio.
2. Twilio looks for configured webhook (API server) for that number andmakes a request to webhook.
3. API server returns a TwiML response telling Twilio to dial the SIP endpoint
4. Twilio dials the SIP endpoint and sets up a conference between user and the voice agent
5. Voice agent then gets ability to carry full duplex audio communication with user's phone
6. Audio data from user's phone is passed through following API chain: Watson speech-to-text -> Watson
Conversation ->Watson text-to-speech
7. Audio fromWatson text-to-speech is sent back to user's phone.
8. Call is disconnected when either party hangs up.
Voice Agent
Raw Audio
Buffer
Processed
Audio Buffer
Watson
Speech to Text
Websocket