Voice Conference Manager

Home | Technology | Publications | People
Voice Conference Manager Documentation

How the Telephone Calls Work

Documenation on how to install this system is in the README files distributed in the source code. There isn't any user manual, and since VCM uses speech technology, a user manual is pretty low priority at present.

Here's an overview of how a call works.

hierarchy of CCXML, VoiceXML modules

The caller, whom we call the "clerk," dials into the telephony server, and the call triggers the main Call Control XML (CCXML) script. The CCXML script then runs a few VoiceXML scripts, which use speech recognition and text-to-speech to ask the clerk their reason for calling.

If the clerk is attempting to make a conference call, CCXML script hands the call off to another CCXML script, the "conference setup" script (and the original script exits). This CCXML script then starts a VoiceXML script to ask the clerk what phone numbers to place into the conference call. The clerk can either say a name, if the name is pre-registered in the system, or say a telephone number.

Once finished, the clerk hangs up and the CCXML script hands off the call to yet another CCXML script, the "conference manager" script (and exits). The clerk can use a web page to view the status of the call; more about that below.

The conference managers script does not directly interact with anyone — it' there to manage the overall conference, not individual calls. The script starts several CCXML "call leg" scripts, one for each person who will participate in the call. These "call leg" scripts each start several VoiceXML scripts to interact with the people who join the conference — to greet them, for example, or to tell them that the call is over.

When participants exit the call, the call leg script managing that call will inform the conference manager script and exit. If enough call legs terminate, the conference manager will inform the remaining call legs to terminate their calls and then exit.

How the Browser-based Monitoring Works

As each call progresses, the CCXML scripts send call progress reports to a server, which we'll call the "information server." These reports are sent using http POST commands, and contain information about the phone number it's calling, the name of the caller if known, and the progress of the call.

Here's a diagram of the information flow.

Flow of Data between Server, CCXML Interpreter, and Browser
  1. The browser logs into the information server. The information server acts as an intermediary between the CCXML interpreter and the browser; it's the place where intelligence about the call will eventually reside. (Examples of "intelligence" include authorization, dynamic VoiceXML grammars, databases, and other functions.)
  2. As the call progresses, the CCXML interpreter sends updates to the information server.
  3. The server formats these updates from the CCXML interpreter and sends them to the browser.
  4. The browser receives the updates using Java, and sends it to the browser's JavaScript interpreter. The JavaScript updates the tables that display the status of each leg of the calls.

Let's expand on that last step. Here's how the browser web page programming works.

Browser Internal Data Flows
  1. When the clerk or other user loads the web page, the browser loads a Java applet as well as some JavaScript utilities.
  2. The Java applet connects to the information server; the port to which it connects is designated in the text of the HTML of the web page.
  3. As the Java applet receives each line of text from the information server, the Java applet calls a VCM JavaScript function and passes it that line of text.
  4. The JavaScript utilties modify the display on the browser using the standard DOM model. It adds a table, if need be, for that particular call; or it adds a row in the table to display updates about a particular caller if that caller isn' in the table yet; or it simply updates a cell in an existing row and table to show the current status of a particular caller. Note that the web page contains HTML and CSS, and the display on the web page will look like an ordinary web page — not some Flash-based window or Java applet.

Current Version

The current version will make calls and run a conference, but it deliberately has some built-in limitations that can be easily removed. Calls cannot last longer than a few minutes; this guards against runaway scripts. There's a long wish list of things to add to this technology; for example, a web page that lets a clerk set up and start the call entirely over the web. There's also very little if any sercurity, and at present it's a single-user system — more than one person can't view the "monitor" web page at a time.

In other words, tune in later for further updates, or contact us if you'd like to join and contribute to the project. If you're interested in this project for commercial use and need consulting, please contact Disaggregate.

Technology Summary

Here's a summary of the various technologies that are currently used by the project:

CCXML: Call Control XML CCXML is a W3C specification used to control telephony servers. CCXML lets you connect to phone calls (inbound and outbound) and take specific actions. We use it to receive calls, make outbound calls to conference attendees, and to send and receive information from the information server.
VoiceXML VoiceXML is a W3C specification used to control speech recogntion, text-to-speech, and voice biometrics (speaker identifcation/speaker verification). We use it to interact with the clerk who sets up the call and the conference attendees.
Python Python is a scripting language. The "information server" that connects the output of the CCXML interpreter to the browser web pages is written in Python.
Java Java is a OS-independent programming language which can also be used to write programs that run inside browsers. The "applet" that runs inside the browser and calls JavaScript functions is written in Java
JavaScript JavaScript is distinct from Java. It's a lighter version of Java, and web browsers can use JavaScript to directly manipulate the information displayed on the screen of the browser, manage buttons, check information put into forms, etc. We use it to update the display on the browser with updates of the status of the call.


In the discussion above, the "server" portion that pushes data from CCXML to the web page is in Python. "push2web," a separate package avaialble for download, also pushes data from CCXML to the web page — but via a Java servlet. It's idea for use with Voxeo's Prophey 2006 package, which includes a servlet-capable web page server. If you have your own web page server and it supports Java servlets, you won't need a separate server just to tie the CCXML and clerk's browser together. See the documenation.


The Voice Conference Manager project releases its files via Sourceforge.