Optimizing User Interfaces In Speech Synthesizers

Any application that requires users to input lots of data or alter many parameters must be done in a way that is both straightforward and efficient. While the current version of HLsyn does a decent job of inputting parameters there are several areas of improvement that are one of the main focus areas of my project. The current method of inputting parameters in HLsyn involves using table based entry system as seen below:

The table works by having the user input values into the tables rows and columns. Where the columns represent the speech parameter being affect and the rows represent the time at which the parameter is affect. Text is bold represents what is known as a pivot point. When a value is selected to be a pivot it will parameter values at the times in between that of the first pivot and the next pivot point. An example of this can be seen below:

As seen in the screenshot the parameter that is being affect is ag which represents the glottis which refers to “the part of the larynx consisting of the vocal cords and the slitlike opening between them. It affects voice modulation through expansion or contraction” (Google Dictionary). What can also be seen is that the values of the ag parameter are being affected at times 0 ms, 500 ms, 1000 ms, and 1500 ms. The values at time is 0 ms and 1500 ms are in bold and are thus selected as pivot points, where as the values at time 500 ms and 1000 ms are not bold and will be affected by the pivot points. To determine the numeric values for ag parameter at time 500 ms and 1000 ms HLsyn uses basic linear interpolation. While this may seem straightforward there are some critical pitfalls. One of these critical pitfalls is that you have to be very careful what values you select as pivot points, as sometimes HLsyn seems to randomly deselect pivot or once you select a pivot point and decide you made an error its already changed multiple values you didn’t want altered. A second issue is with how time is handled. Currently to add a new time that that parameters are affect at the user must go into the edit menu where they will be presented with a box to insert a time value:

While this works in theory, once a user enters a time value, there is no way of changing the time value without deleting the entire row of speech parameter values a user may have entered. A potential solution to these problems involves a new input method that in many ways is a hybrid table system. While users input speech parameters in a very familiar way to the table, instead of a table a series of numeric input boxes are used to enter speech parameter values and check boxes next to the parameter input boxes allow the user to easily select/deselect pivot points. A proposed design for this setup can be seen below:

A menu option would allow the user to insert rows and delete rows in the same way they are accustom to doing so in the current version of HLsyn. However with the table, users can easily change the time the parameters will be affect on the far left with out needing to delete the row they wanted to modify. Additionally this will allow the user much greater manual control of when how the speech parameters get effect. The check boxes make it easier then ever to choose which points will be a pivot point and which ones won’t. If the current selection doesn’t allow for a pivot point to be created the check box will automatically disable itself, and the re-enable when a different selection is made if needed. Overall the new hybrid table system will bring familiar controls, but with a more efficient and frustration free workflow system. In the next post on new input methods we will be focusing on arguably the biggest update, a graphical way of drawing speech graphics to represent speech parameter values.

Optimizing User Interfaces In Speech Synthesizers

New Input Methods

Cross Platform

Updated Visuals

The New Input Methods: Hybrid Table