Voice Conference Manager Documentation
How the Telephone Calls Work
Documenation on how to install this system is in the README files distributed in the source code. There isn't any user manual, and since VCM uses speech technology, a user manual is pretty low priority at present.
Here's an overview of how a call works.
The caller, whom we call the "clerk," dials into the telephony server, and the call triggers the main Call Control XML (CCXML) script. The CCXML script then runs a few VoiceXML scripts, which use speech recognition and text-to-speech to ask the clerk their reason for calling.
If the clerk is attempting to make a conference call, CCXML script hands the call off to another CCXML script, the "conference setup" script (and the original script exits). This CCXML script then starts a VoiceXML script to ask the clerk what phone numbers to place into the conference call. The clerk can either say a name, if the name is pre-registered in the system, or say a telephone number.
Once finished, the clerk hangs up and the CCXML script hands off the call to yet another CCXML script, the "conference manager" script (and exits). The clerk can use a web page to view the status of the call; more about that below.
The conference managers script does not directly interact with anyone — it' there to manage the overall conference, not individual calls. The script starts several CCXML "call leg" scripts, one for each person who will participate in the call. These "call leg" scripts each start several VoiceXML scripts to interact with the people who join the conference — to greet them, for example, or to tell them that the call is over.
When participants exit the call, the call leg script managing that call will inform the conference manager script and exit. If enough call legs terminate, the conference manager will inform the remaining call legs to terminate their calls and then exit.
How the Browser-based Monitoring Works
As each call progresses, the CCXML scripts send call progress reports to a server, which we'll call the "information server." These reports are sent using http POST commands, and contain information about the phone number it's calling, the name of the caller if known, and the progress of the call.
Here's a diagram of the information flow.
- The browser logs into the information server. The information server acts as an intermediary between the CCXML interpreter and the browser; it's the place where intelligence about the call will eventually reside. (Examples of "intelligence" include authorization, dynamic VoiceXML grammars, databases, and other functions.)
- As the call progresses, the CCXML interpreter sends updates to the information server.
- The server formats these updates from the CCXML interpreter and sends them to the browser.
Let's expand on that last step. Here's how the browser web page programming works.
- The Java applet connects to the information server; the port to which it connects is designated in the text of the HTML of the web page.
The current version will make calls and run a conference, but it deliberately has some built-in limitations that can be easily removed. Calls cannot last longer than a few minutes; this guards against runaway scripts. There's a long wish list of things to add to this technology; for example, a web page that lets a clerk set up and start the call entirely over the web. There's also very little if any sercurity, and at present it's a single-user system — more than one person can't view the "monitor" web page at a time.
In other words, tune in later for further updates, or contact us if you'd like to join and contribute to the project. If you're interested in this project for commercial use and need consulting, please contact Disaggregate.
Here's a summary of the various technologies that are currently used by the project:
|CCXML: Call Control XML
||CCXML is a W3C specification used to control telephony servers. CCXML lets you connect to phone calls (inbound and outbound) and take specific actions. We use it to receive calls, make outbound calls to conference attendees, and to send and receive information from the information server.
||VoiceXML is a W3C specification used to control speech recogntion, text-to-speech, and voice biometrics (speaker identifcation/speaker verification). We use it to interact with the clerk who sets up the call and the conference attendees.
||Python is a scripting language. The "information server" that connects the output of the CCXML interpreter to the browser web pages is written in Python.
In the discussion above, the "server" portion that pushes data from CCXML to the web page is in Python. "push2web," a separate package avaialble for download, also pushes data from CCXML to the web page — but via a Java servlet. It's idea for use with Voxeo's Prophey 2006 package, which includes a servlet-capable web page server. If you have your own web page server and it supports Java servlets, you won't need a separate server just to tie the CCXML and clerk's browser together. See the documenation.