Design Specifications

Goal/Requirement  Justification
5 independent models
  • 1 for each American English vowel
  • Used as a teaching aid
Smooth cylindrical exterior
  • Easy handling
  • Easy packaging
Translucent Material
  • Interior contours visible
Approximately 17cm long
  • Average length of male vocal tract
  • Will provide accuracy of sound
Electro-larynx input
  • Emulates vibration of air particles caused by vibration of vocal folds
Similar formant frequencies (compared to human speech sample)
  • Assurance/measure of accuracy

In order for the project as a whole to be considered successful, the models must provide a useful function to the end user. This will be achieved when the models meet or exceed certain design requirements. These criterion are divided into two categories: appearance requirements and auditory requirements.  The classifications of the criterion could also be called manufacturing requirements and testing requirements.

Appearance/Manufacturing Requirements

The goal of this Senior Capstone Project is to design and build five physical models of the human vocal tract for American English vowels. The five physical models of the human vocal tract must, most importantly, each be single solid units. They may either be 3D printed or carved from a solid material, most likely a resin, in order to produce a smooth cylindrical exterior and a shaped interior. The first function of the models will be to provide a visual representation of the shape the vocal tract takes when producing these sounds. Therefore, I have chosen a building material that is translucent and will allow the end user to see the shape of the vocal tract that is carved into the block of resin. This is essential as the vocal tract is, for the most part, concealed from view. Studying speech production takes the form of a lot of computer programs, mathematics, and spectrogram analysis and after a while these images begin to lose their context. My translucent models will allow users to see the vocal tract and allow them to identify the constrictions and openings that create the features they observe in the simulations and spectrograms.

The entire model will be no more than approximately 17 centimeters long. The final length of the model will depend on whether it is the model of a male or female speaker. Male vocal tracts are slightly longer than female: 16.9 centimeters compared to 14.1 centimeters. Given the required size and material, each model should be relatively light weight and therefore easy to carry. The cylindrical shape and weight will be crucial in the models’ role in a classroom setting. The solid cylindrical shape will allow for easy storage and relocation and the weight will allow instructors and students to hold and maneuver the models with little difficulty.

Auditory/Testing Requirements

The second function these models will serve is providing an accurate auditory representation of each vowel. That is to say, the models should produce a sound so similar to the sound produced by a human when voicing these sounds that a listener would not be able to differentiate model from human. To accomplish this the models must be designed to take an input signal generated by an electro-larynx. An electro larynx is a small device often used by tracheotomy patients. To produce any speech sounds we vibrate our vocal folds, also known as vocal chords or glottis, by forcing air up from our lungs up through the trachea. Tracheotomy patients no longer have this ability so the electro-larynx generates the vibrations for them. When turned on and placed against the throat, the electro-larynx vibrates and users can speak by moving their mouths to form words normally.  The electro-larynx generates the air flow needed to generate speech by setting the air particles from the lungs in a vibratory mode. The vibrating nature of this device is what I will use to create the air flow needed for speech production via my models. Each model will have a “glottis end” and a “lips end,” or an input and an output. When the electro larynx is placed at the glottis end and turned on, the model interior should be shaped so the user hears the sound of the vowel from the lips end. The sound generated by the model should be close to indistinguishable from a human speech sample of the same sound. This can be tested informally by a simple blind listening test. In a more formal analysis, the formant frequencies will be measured and compared to widely accepted and used measurements.