Efficient Speech Encoding/Decoding with 8-bit Microcontrollers
Software IP and 8-bit micros simplify speech recording and playback
Speech recording and playback is an increasingly common feature of applications including security systems, electronic toys, home appliances, automotive ‘infotainment’ systems, safety equipment and even elevators. Now, companies such as Toshiba are actively developing software IP that, by providing high quality ‘building blocks’ for commonly used functionality, can significantly shorten the time the designer needs to spend on software development.
Toshiba has developed dedicated speech IP ‘building blocks’ that provide the key functionality needed for the compression, storage and decompression of voice messages. Furthermore, this speech IP does not require the designer to choose high-end and expensive microcontrollers, it has been specifically designed to run on cost-effective, 8-bit microcontrollers by making use of integrated peripherals and interfaces such as ADCs, DACs and PWM (Pulse Width Modulation) capabilities. This is a significantly different approach to the conventional addition of speech recording and playback functionality to a product, which has typically been achieved through the use of a dedicated DSP speech IC. In the majority of cases, the application will also feature a microcontroller for system control – for instance to control a keypad or a display – meaning that the conventional solution comprises a minimum of two chips.
Figure 1 - Reduced System Cost
As Figure 1 illustrates, using speech software IP and a cost-effective 8-bit microcontroller rather than a dedicated speech hardware IC reduces the number of components and the cost of the overall Bill Of Materials (BOM).
Speech encode and decode
A number of different techniques are used to synthesise speech. These techniques can be grouped into two categories, namely wave encoding and synthesis by analysis. The algorithm used in waveform encoding is relatively simple, making it easy to produce the required analysis and synthesis. PCM (Pulse Code Modulation) is the most common method of encoding an analogue voice signal into a digital bit stream. PCM works by first sampling the amplitude of the voice conversation in a process known as PAM (Pulse Amplitude Modulation). This PAM sample is then coded into a binary number consisting of zeros and ones. The voice signal can then be switched, transmitted and stored digitally.
In addition to the standard PCM method outlined above, a number of companies have produced their own derivatives. Toshiba, for example, has developed its own process known as the ‘half speech’ algorithm. This original and proprietary technology is based around a process that compares the last voice data with the current voice data and then takes the difference between the two. As the name implies, a key benefit of the Toshiba algorithm is that this difference can be expressed by half the data size. This in turn means that less memory is required, the end result being that the speech CODEC software IP can easily be implemented in an 8-bit microcontroller.
The hardware - microcontroller
For speech encoding it is possible to use either a PC-based development environment or analogue values input from the ADC incorporated in a microcontroller. Then, by using either a PWM or DAC, the application can output digital-to-analogue converted speech data.
To use Toshiba’s speech CODEC software IP, a number of suitable microcontrollers can be found in the company’s TLCS-870/C family of 8-bit devices, in addition to the TLCS-900 family of 16-bit micros. In both cases, the algorithm’s memory requirements are kept small, with low CPU load for efficient application control.
For illustration, let us consider the 8-bit TMP86FS49AFG, a block diagram of which appears in Figure 2.
As the diagram shows, this 8-bit device incorporates a 10-bit ADC as well as a PWM unit, five serial communication channels and a watchdog timer. In addition, the microcontroller is housed in a small P-QFP64 package allowing designers to minimise PCB space, while memory includes a large 60kbytes of on-board Flash. The benefit of this large Flash memory is that it can provide the storage needed for some common speech synthesis phrases (‘Please leave your message after the tone’, for instance). Furthermore, the Flash is Silicon Storage Technologies’ (SST) SuperFlash™ memory*, which can be programmed much faster than many alternative technologies, allowing designers to program the microcontroller’s on-board 60kbytes in less than five seconds. If even more speech data and memory is needed, 16-bit MCUs such as the TMP91FY42 offer up to 256kbytes of embedded Flash and 16kbytes of RAM.
As well as offering enough memory, in most cases, to eliminate the need for external memory, the on-board Flash provides a high level of pre- and post-production flexibility. This flexibility comes from the availability of three distinct programming modes, namely parallel mode, serial PROM mode, and ISP (in system programming) mode. The latter allows upgrades and fixes in the field under control of the user program, removing the need to switch off the output to an LCD panel or active output controls.
To demonstrate the efficiency of the speech compression/decompression algorithm, it is worth looking at the TMP86FS49 resources that the speech CODEC software IP uses in operation. For instance, as well as representing a relatively low processing overhead, of the 60kbytes available the IP only uses 200bytes, 150bytes and 300bytes of Flash memory for, respectively, the encode, decode and encode/decode software. As a result, there is easily enough processing power and memory available for the microcontroller to execute many other application control tasks.
Developing applications – the reference board
To help the designer to evaluate the sound quality of the Toshiba proprietary speech CODEC and to simplify the development of applications using this IP, Toshiba has created a dedicated reference board. This board has a number of elements that will speed the evaluation and prototyping of applications including key input, an LCD display and connections for a microphone and an external speaker for speech recording and speech output, respectively. An RS-232C interface provides the connection to a host PC, while a NAND E²PROM is available for data storage.
* This product uses the SuperFlash® technology under licence of Silicon Storage Technology, Inc. Super Flash® is a registered trademark of Silicon Storage Technology, Inc
For further information, including the Toshiba Speech IP Product Overview and a TMP86FS49 Datasheet, pleaseclick here
Arrow Electronics, Inc is a global provider of products, services and solutions to industrial and commercial users of electronic components and enterprise computing solutions.