Speech enhancement and noise reduction techniques

Section 1

1 Introduction

Hearing Aids systems are one of the most important issues for human being. They are a small electronics instrument which makes a sound louder and makes speech easier to hear and understand. The hearing aid is designed to pick up sound waves with a tiny microphone, change weaker sounds and send them to the ear through a tiny speaker. With the microchips available today, hearing aids have gotten smaller and smaller and have significantly improved quality. Roughly 10% of the world population bears from some hearing loss. However, only a portion uses hearing aid. This is due several factors which include the stigma associated with wearing a hearing aid, customer dissatisfaction with the devices not meeting their expectations, and the cost associated with the new digital versions of hearing aids [1].Hearing loss is typically measured as the shift in auditory threshold relative to that of a normal ear for detection of a pure tone. This is why there are many types of features to address individual needs. Table 1 shows the classification of degrees of Hearing Loss [2].

A hearing aid is an electronic device that makes sounds louder and can help to offset hearing loss. The aim of the hearing aid is to amplify sound signals in such a way that become audible for the hearing impaired person.

Classification of Hearing Loss

Hearing level

Normal hearing

10dB to 26dB

Mild hearing loss

27dB to 40dB

Moderate hearing loss

40dB to 70dB

Severe hearing loss

70dB to 90dB

Profound hearing loss

Greater than 90dB

Table 1: Different degree of Hearing Loss

Basically all hearing aids were using the analogue technology for the treatment of sound. Improvements have been made by using the development of digital sound treatment for the efficiency of hearing aids. Nowadays, the digital hearing aids are small, which can be hidden inside the ear and have an almost perfect sound reproduction.

The research of Digital hearing aids have been growth and now a small programmable computer that are capable in amplifying millions of different sound signal had been constructed in the devices, thus improving hearing ability of hearing impaired people. First digital hearing aids were launched in mid-80’s , but these early models were slightly unpractical. After 10years later, the digital hearing aid really became successful, with small digital device placed either inside or discretely behind the ear.[3]

Today, Digital technology is a very much part of daily life. Most of household have a variety of digital products such as telephone, Video recorders & personal computers. Hearing aids also was benefited from the emergence of digital technology. Amongst the advantage of Digital Signal Processing (DSP) that allows hands free operation. The aid automatically adjusts the volume & pitch on it’s own. It performs 1000’s of adjustments per second which reduce background, improved listening in noisy situation, sound quality & multiple program setting [4]. The user switches between varieties of program for different listening situations.

2 Project Proposal-Aims & Objectives

  1. The project aims initially to study and document the detail of human ear, the frequency response and characteristics of a normal ear as compared to one suffering from various types of hearing loss.
  2. A number of advanced studies and designs have been carried out during the past decades by a number of researchers.
  3. The study should provide a comprehensive investigation into the field.
  4. In addition to meeting the general requirements for the final year project as set out in the module guide, the specific objectives are set out below:
  5. Once an in-depth understanding of the phenomenon of hearing loss has been achieved, the project will carry out both the analytical and simulation of some of the techniques.
  6. The final aim is to provide a suggestion or a number of suggestions for possible hearing aids.
  7. This should be backed by appropriate simulation using either Matlab or C/C++ implementation.
  8. The final report should contain various method of Noise Cancellation using fixed, programmable and adaptive filters suitable for hearing aids.

3 Detailed Background Reading.

I. Human Hearing

The human ear is an exceedingly complex organ. To make matters even more difficult, the information from two ears is combined in a perplexing neural network, the human brain. Keep in mind that the following is only a brief overview; there are many subtle effects and poorly understood phenomena related to human hearing.

Figure 2.1 illustrates the major structures and processes that comprise the human ear. The outer ear is composed of two parts, the visible flap of skin and cartilage attached to the side of the head, and the ear canal, a tube about 0.5 cm in diameter extending about 3 cm into the head. These structures direct environmental sounds to the sensitive middle and inner ear organs located safely inside of the skull bones. Stretched across the end of the ear canal is a thin sheet of tissue called the tympanic membrane or ear drum. Sound waves striking the tympanic membrane cause it to vibrate. The middle ear is a set of small bones that transfer this vibration to the cochlea (inner ear) where it is converted to neural impulses. The cochlea is a liquid filled tube roughly 2 mm in diameter and 3 cm in length. Although shown straight in Fig. 2.1, the cochlea is curled up and looks like a small snail shell. In fact, cochlea is derived from the Greek word for snail.

When a sound wave tries to pass from air into liquid, only a small fraction of the sound is transmitted through the interface, while the remainder of the energy is reflected. This is because air has a low mechanical impedance (low acoustic pressure and high particle velocity resulting from low density and high compressibility), while liquid has a high mechanical impedance. In less technical terms, it requires more effort to wave your hand in water than it does to wave it in air. This difference in mechanical impedance results in most of the sound being reflected at an air/liquid interface.

The middle ear is an impedance matching network that increases the fraction of sound energy entering the liquid of the inner ear. For example, fish do not have an ear drum or middle ear, because they have no need to hear in air. Most of the impedance conversion results from the difference in area between the eardrum (receiving sound from the air) and the oval window (transmitting sound into the liquid, see Fig 2.1). The ear drum has an area of about 60 (mm) 2, while the oval window has an area of roughly 4 (mm) 2. Since pressure is equal to force divided by area, this difference in area increases the sound wave pressure by about 15 times.

Contained within the cochlea is the basilar membrane, the supporting structure for about 12,000 sensory cells forming the cochlear nerve. The basilar membrane is stiffest near the oval window, and becomes more flexible toward the opposite end, allowing it to act as a frequency spectrum analyzer. When exposed to a high frequency signal, the basilar membrane resonates where it is stiff, resulting in the excitation of nerve cells close to the oval window. Likewise, low frequency sounds excite nerve cells at the far end of the basilar membrane. This makes specific fibres in the cochlear nerve respond to specific frequencies. This organization is called the place principle, and is preserved throughout the auditory pathway into the brain.

Another information encoding scheme is also used in human hearing, called the volley principle. Nerve cells transmit information by generating brief electrical pulses called action potentials. A nerve cell on the basilar membrane can encode audio information by producing an action potential in response to each cycle of the vibration. For example, a 200 hertz sound wave can be represented by a neuron producing 200 action potentials per second. However, this only works at frequencies below about 500 hertz, the maximum rate that neurons can produce action potentials. The human ear overcomes this problem by allowing several nerve cells to take turns performing this single task. For example, a 3000 hertz tone might be represented by ten nerve cells alternately firing at 300 times per second. This extends the range of the volley principle to about 4 kHz, above which the place principle is exclusively used.

Table 22-1 shows the relationship between sound intensity and perceived loudness. It is common to express sound intensity on a logarithmic scale, called decibel SPL (Sound Power Level). On this scale, 0 dB SPL is a sound wave power of 10-16 watts/cm2, about the weakest sound detectable by the human ear. Normal speech is at about 60 dB SPL, while painful damage to the ear occurs at about 140 dB SPL.

The difference between the loudest and faintest sounds that humans can hear is about 120 dB, a range of one-million in amplitude. Listeners can detect a change in loudness when the signal is altered by about 1 dB (a 12% change in amplitude). In other words, there are only about 120 levels of loudness that can be perceived from the faintest whisper to the loudest thunder. The sensitivity of the ear is amazing; when listening to very weak sounds, the ear drum vibrates less than the diameter of a single molecule!

The range of human hearing is generally considered to be 20 Hz to 20 kHz, but it is far more sensitive to sounds between 1 kHz and 4 kHz. For example, listeners can detect sounds as low as 0 dB SPL at 3 kHz, but require 40 dB SPL at 100 hertz (an amplitude increase of 100). Listeners can tell that two tones are different if their frequencies differ by more than about 0.3% at 3 kHz. This increases to 3% at 100 hertz. For comparison, adjacent keys on a piano differ by about 6% in frequency.

The primary advantage of having two ears is the ability to identify the direction of the sound. Human listeners can detect the difference between two sound sources that are placed as little as three degrees apart, about the width of a person at 10 meters. This directional information is obtained in two separate ways. First, frequencies above about 1 kHz are strongly shadowed by the head. In other words, the ear nearest the sound receives a stronger signal than the ear on the opposite side of the head. The second clue to directionality is that the ear on the far side of the head hears the sound slightly later than the near ear, due to its greater distance from the source. Based on a typical head size (about 22cm) and the speed of sound (about 340 meters per second), an angular discrimination of three degrees requires a timing precision of about 30microseconds. Since this timing requires the volley principle, this clue to directionality is predominately used for sounds less than about 1 kHz.

Both these sources of directional information are greatly aided by the ability to turn the head and observe the change in the signals. An interesting sensation occurs when a listener is presented with exactly the same sounds to both ears, such as listening to monaural sound through headphones. The brain concludes that the sound is coming from the centre of the listener’s head!

While human hearing can determine the direction a sound is from, it does poorly in identifying the distance to the sound source. This is because there are few clues available in a sound wave that can provide this information. Human hearing weakly perceives that high frequency sounds are nearby, while low frequency sounds are distant. This is because sound waves dissipate their higher frequencies as they propagate long distances. Echo content is another weak clue to distance, providing a perception of the room size. For example, sounds in a large auditorium will contain echoes at about 100 millisecond intervals, while 10 milliseconds is typical for a small office. Some species have solved this ranging problem by using active sonar. For example, bats and dolphins produce clicks and squeaks that reflect from nearby objects. By measuring the interval between transmission and echo, these animals can locate objects with about 1 cm resolution. Experiments have shown that some humans, particularly the blind, can also use active echo localization to a small extent.

II. Causes of Hearing Loss:-

It can result from damage or disruption to any part of the hearing system. Causes can range from wax blocking the ear canal and age-related changes to the sensory cells of the cochlea to brain damage.

Common causes of deafness in adults include presbyacusis (age-related hearing loss due to deterioration of the inner ear), side-effects of medication, acoustic neuroma (a tumour of the nerve which carries hearing signals) and Meniere’s disease.

Common causes of deafness in children include inherited conditions, infection during pregnancy, meningitis, head injury and glue ear (more correctly known as otitis media, where fluid builds up in the middle ear chamber and interferes with the passage of sound vibrations, generally as a result of viral or bacterial infection).

Common temporary causes include earwax, infection, glue ear and foreign body obstruction.

III. Noise & Hearing loss:-

Excessive exposure to noise is an important cause of a particular pattern of hearing loss, contributing to problems for up to 50 per cent of deaf people. Often people fail to realise the damage they’re doing to their ears until it’s too late.

Although loud music is often blamed (and MP3 players are said to be storing up an epidemic of deafness in years to come) research has also blamed tractors (for deafness in children of farmers), aircraft noise, sports shooting and even cordless telephones.

IV. Fundamental of Digital Signal Processing (DSP):-

‘Signal’ is a physical quality that carries information and contains frequencies up to known limiting value. There are various types of signal. They are:-

  • Continuous time and continuous amplitude.
  • Discrete time and continuous amplitude.
  • Discrete time and discrete amplitude.
  • Continuous time and continuous amplitude with uniform steps.
  • Continuous time and discrete amplitude with uniform time steps.

The term ‘Processing’ is a series or sequences of steps taken or operation performed in order to achieve particular end. Generally, ‘Signal Processing’ is used to extract the particular information about the signal and to convert the information carrying signal from one form to another. For Digital Signal Processing, the operations are performed by computers, microprocessors and logic circuits. Therefore, it is termed as ‘Digital’. Therefore, ‘DSP’ has expanded over last few years in the field of computer technology and integrated circuit(IC) fabrication.

There are two main characteristics of DSP: Signal & Systems.

‘Signal’ is defined any physical quantity which varies with one or more independent quantities such as time & space. Most of the signals are continues or analogue signal that has values continuously at every value of time. When a signal is processed by a computer, a continuous signal has to be first sampled into discrete time signal so that the value at its discrete set of time is stored in the computer memory & further processed by logic circuits, where signals are quantised into a set of the discrete values & the final result is called the ‘digital signal’. ‘Signal’ is merely a function. Analogue signals are continuous valued & digital signals are discrete valued. Analogue signals are usually signals which have integer valued independent variables [5].

Types of Signals in brief:

  1. Continuous time and continuous amplitude: It is an analogue signal which can assume any continuous range of values in both time and amplitude.
  2. Discrete time and continuous amplitude: It is a sampled signal where time steps are uniformly spaced but the signal can have any level.
  3. Discrete time and discrete amplitude: A signal is quantized in amplitude with uniform time steps. Thus, amplitude and time are quantized. Generally, A/D converter generates this type of signal called as digital signal.
  4. Continuous time and continuous amplitude with uniform time steps: Sample and Hold circuit generates this signal which can have a continuous range of amplitudes that result from sampling an analogue signal which is known as sampled analogue or data signal.
  5. Continuous time and discrete amplitude with uniform time steps: D/A converter generate this digital signal where it holds its value between time samples.

‘Systems’ is a device or algorithm that operates on an input sequence to produce output sequence.

Simple systems can be connected together where one system’s output becomes another system’s input. Systems can have different interconnections: cascade, parallel and feedback interconnection.

‘Discrete time system’ can be used where analogue signal are converted into discrete time signals and then processed with the help of software and then converted back into analogue signal without any error. [6]


Sampling is one of the important terms of signal processing. It is a process of measuring an analogue signal at distant points. It is used for digital signal processing and communication.

Advantages of Digital representation of analogue signal:

  1. Robustness towards noise, we can send more number of bits/s.
  2. Use of flexible processing elements.
  3. Reliable equipment.
  4. Use of complex algorithm.

The Sampling Theorem:

When sampling an analogue signal, the sampling frequency must be greater than twice the highest frequency components of the analogue signal so that it can be able to reconstruct the original signal from the sampled version.

Speech Recogniseation:

The automated recognition of human speech is immensely more difficult than speech generation. Speech recognition is a classic example of things that the human brain does well, but digital computers do poorly. Digital computers can store and recall vast amounts of data, perform mathematical calculations at blazing speeds, and do repetitive tasks without becoming bored or inefficient. Unfortunately, present day computers perform very poorly when faced with raw sensory data. Teaching a computer to send you a monthly electric bill is easy. Teaching the same computer to understand your voice is a major undertaking.

Digital Signal Processing generally approaches the problem of voice recognition in two steps: feature extraction followed by feature matching. Each word in the incoming audio signal is isolated and then analyzed to identify the type of excitation and resonate frequencies. These parameters are then compared with previous examples of spoken words to identify the closest match. Often, these systems are limited to only a few hundred words; can only accept speech with distinct pauses between words; and must be retrained for each individual speaker. While this is adequate for many commercial applications, these limitations are humbling when compared to the abilities of human hearing. There is a great deal of work to be done in this area, with tremendous financial rewards for those that produce successful commercial products.

A signal can be either continuous or discrete, and it can be either periodic or aperiodic.

Types of Transforms with respect to Signals:


This includes, for example, decaying exponentials and the Gaussian curve. These signals extend to both positive and negative infinity without repeating in a periodic pattern. The Fourier Transform for this type of signal is simply called the Fourier Transform.


Here the examples include: sine waves, square waves, and any waveform that repeats itself in a regular pattern from negative to positive infinity. This version of the Fourier transform is called the Fourier Series.


These signals are only defined at discrete points between positive and negative infinity, and do not repeat themselves in a periodic fashion. This type of Fourier transform is called the Discrete Time Fourier Transform.


These are discrete signals that repeat themselves in a periodic fashion from negative to positive infinity. This class of Fourier Transform is sometimes called the Discrete Fourier Series, but is most often called the Discrete Fourier Transform.

FIR Filter

* Filters are signal conditioners. It functions by accepting an input signal, blocking pre-specified frequency components, and passing the original signal minus those components to the output.

* FIR (Finite Impulse Response) is one type of signal processing filter whose response to any finite input length is of finite period because it settles down to zero in finite time. FIR filter can be discrete time or continuous time and analogue or digital. It requires more computation power as compared to IIR (Infinite Impulse Response) filter. [7]

Sampling Frequencyis the number of samples per second in a sound. Usually, sampling frequency are 44100 Hz (CD quality) and 22050 Hz (for speech, since it doesn’t contain frequencies above 11025 Hz). [12]

Signal to Noise ratiois the difference between the noise floor and the reference level. It is a technical term used to characterize the quality of signal detection of the measuring system. In case of a speech signal, measuring the performance of the algorithm by computing signal to noise ratio(SNR) in dB and also can be expressed as ratio. Signal to noise ratio in a speech signal is given by the ratio of square of signal energy to the square of noise energy.

Adaptive filters

Adaptive filters are digital filters that perform digital signal processing and adapt their performance based on the input signal. They design it based on the characteristics of the input signal to the filter and a signal that represents the desired behaviour of the filter on its input.

It uses the adaptive algorithm to reduce the error between the output signal and the desired signal. The unknown system is placed in parallel with the adaptive filter. In other words, it can be either IIR (infinite impulse response) or FIR (finite impulse response) type filters. The form of the filter remains fixed as it runs, but the output of the filter (usually error output) is fed into process which recalculates the filter coefficient in order to produce an output that is nearer to desired form. They process a signal and then decide to adjust themselves in order to alter and adjust the signal characteristics; it totally depends upon the stability of the filter.

Adaptive finite impulse response (FIR) filtering is usually employed in echo canceller applications to remove the portion of transmitted signal injected in the receiving path in full-duplex baseband data transmission systems. In order to simplify the implementation of the updating algorithms, digital techniques are often utilised to realise the FIR adaptive filter.

Section 3: Project Plan:-

Section 4: Progress Report to Date

Following is the work done till date on the project and is listed below in this whole section 4 & its Subunits.

* Block Diagram for the system:

Fig. 4.1 block Diagram of the whole system

Use of digital hearing aids:-

Approximately 10% of the world’s population suffer from some type of hearing loss, yet only a small percentage of this statistic use a hearing aid the stigma associated with wearing a hearing aid, customer dissatisfaction with hearing aid performance, and the cost associated with a high performance solution are all causes of low market penetration.

Current analogue hearing aids yield significant limitations due to their inadequate spectral shaping, narrow operating bandwidth, and only partial noise-reduction capability. This leads to sub-optimal clarity and audibility restoration, as sub-optimal speech perception in noisy environments are concerned in this project. Analogue hearing aids are hardware-driven and thus are difficult to customize to specific hearing problems.

Digital hearing aids can solve these problems. They provide full bandwidth, fine grain spectral shaping, and enhanced noise reduction. As software-driven devices, they are very flexible and easily customizable to a user’s needs.

Working of Digital Hearing Aid:-

The analogue sound signal is converted into digital domain. The digital signal processor at the heart of a digital hearing aid manipulates the signal without causing any distortion, so sounds come through more clearly and speech is easier to hear and understand. The DHP combines crisp digital sound with totally hands-free operation, making it a logical choice compared to many of the other, more traditional solutions available.

The following assumptions have been made for the system:

  • The highest frequency that most humans can hear is approximately 20 kHz. Therefore, before the signal enters the A/D converter, it will be low pass-filtered to 20 kHz, which is also my sampling frequency. This will avoid aliasing during sampling.
  • This hearing aid will be behind-the-ear so I have can avoid any effects of feedback, which may occur in a small inside-the-ear hearing aid where the microphone and speaker are very close to each other.
  • It does not operate in real-time, since I have take in the entire speech signal and then manipulate it.


Block Diagrams for the System:

Fig. 3.2 block diagram representing three stages of the project.

Stage 1: Noise Reduction

Stage 2: Frequency Shaper

Stage 3: Amplitude Shaper

* Stage 1: Noise Reduction

In everyday situations, there are always external signals that may interfere with the sounds that the hearing aid user actually wants to hear. This ability to distinguish a single sound in a noisy environment is a major concern for the hearing impaired. For people with hearing loss, background noise degrades speech intelligibility more than for people with normal hearing, because there is less redundancy that allows them to recognize the speech signal. Often the problem lies not only in being able to hear the speech, but in understanding speech signals due to the effects of masking. To adjust for this loss, I have tried to develop a code on noise cancelation using Fast Fourier Transform (FFT)

Assumptions made:-

To simplify my project, I have assume

  1. The filter will reduce noise independent of the level of hearing loss of the user, and
  2. That any external signals, or noise, can be modelled by white Gaussian noise.

White Gaussian noise:-

White Gaussian noise (WGN) has a continuous and uniform frequency spectrum over a specified frequency band and has equal power per Hertz of this band. It consists of all frequencies at equal intensity and has a normal (Gaussian) probability density function. For example, a hiss or the sound of many people talking can be modelled as WGN. Because white Gaussian noise is random, I have can generate it in MATLAB using the random number generator function, random.


Instead of adding white noise to a speech signal, I have I havere able to obtain and generate several .wav sound files of a main speech signal with White Noise background of radio.

I have experimented with implementing an FIR filter, but after researching various pre-existing MATLAB commands, I have tried using the commandwdencmp, which performs noise reduction/compression using wavelets. It returns a de-noised version of the input signal using wavelet coefficients thresh-holding. I have also utilized the MATLAB commandddencmp.

I have also tried cancelation of noise through FFT.

Both of the commands are given in design details part.

Advantages of Using Wavelets

Wavelets are nonlinear functions and do not remove noise by low-pass filtering like many traditional methods. Low-pass filtering approaches, which are linear time invariant, can blur the sharp features in a signal and sometimes it is difficult to separate noise from the signal where their Fourier spectra overlap. For wavelets the amplitude, instead of the location of the Fourier spectra, differs from that of the noise. This allows for thresh-holding of the wavelet coefficients to remove the noise. If a signal has energy concentrated in a small number of wavelet coefficients, their values will be large in comparison to the noise that has its energy spread over a large number of coefficients. These localizing properties of the wavelet transform allow the filtering of noise from a signal to be very effective. While linear methods trade-off suppression of noise for broadening of the signal features, noise reduction using wavelets allows features in the original signal to remain sharp. A problem with wavelet de-noising is the lack of shift-invariance, which means the wavelet coefficients do not move by the same amount that that the signal is shifted. This can be overcome by averaging the de-noising result over all possible shifts of the signal. Matlab function for denoising the speech signal has been generated and listed in the appendix part of the report.

Also the Noise Cancellation Matlab program was written by us & it is also listed in the appendix part.

Section 5: Work Remaining

Following tasks are pending and will be done in forthcoming weeks of Summer Break:-