Skip to main content
Over the past several years, the primary focus of investigation for speech recognition has been over the telephone or IP network. Recently more and more IP telephony has been extensively used. This paper describes the performance of a... more
Over the past several years, the primary focus of investigation for speech recognition has been over the telephone or IP network. Recently more and more IP telephony has been extensively used. This paper describes the performance of a speech recognizer on noisy speech transmitted over an H.323 IP telephony network, where the minimum mean-square error log spectra amplitude (MMSE-LSA) method [1,2] is used to reduce the mismatch between training and deployment condition in order to achieve robust speech recognition. In the H.323 network environment, the sources of distortion to the speech are packet loss and additive noise. In this work, we evaluate the impact of packet losses on speech recognition performance first, and then explore the effects of uncorrelated additive noise on the performance. To explore how additive acoustic noise affects the speech recognition performance, seven types of noise sources are selected for use in our experiments. Finally, the experimental results indica...
In this paper, we propose a new coder based on the algebraic CELP(ACELP) coding technique. Our goal is to improve the quality of the obtained synthetic speech by a modification of the excitation signal of the classical coder. Such... more
In this paper, we propose a new coder based on the algebraic CELP(ACELP) coding technique. Our goal is to improve the quality of the obtained synthetic speech by a modification of the excitation signal of the classical coder. Such modification is accomplished by the variation of both the positions of the excitation impulses, and their amplitudes. The proposed algorithm consists of dividing the 10-ms speech frames into 2-ms sub-frames and assigning one impulse per sub-frame. These impulses are characterized by their optimized positions and amplitudes. Experiments show that our algorithm results in higher-quality speech with lower bitrate compared to the classical ACELP algorithm. It is shown that an improvement of 0.73 dB in the segmental SNR is achieved by the proposed coder over the conventional ACELP coder. Informal listening tests show that the synthetic speech signal obtained using our algorithm is perceptually closer to the original speech.
Résumé/Abstract This paper develops a «contingency theory» of technological work reorganization that addresses organizational, managerial, and job characteristic contengencies in the reorganization of the work process. Sustantively, the... more
Résumé/Abstract This paper develops a «contingency theory» of technological work reorganization that addresses organizational, managerial, and job characteristic contengencies in the reorganization of the work process. Sustantively, the focus is on the rationales of top decision-makers in a sample of firms for adopting and designing telecommuting jobs. Following the theoritical model developed in the paper, the AA. find that telecommuting innovation is primarily contingent on organizational constraints (such as ...
Packet voice communications generally suffer packet losses as a re-sult of various network- or transmission-related impairments. Upon decoding, these lost packets result in missing speech segments that degrade automatic speech recognition... more
Packet voice communications generally suffer packet losses as a re-sult of various network- or transmission-related impairments. Upon decoding, these lost packets result in missing speech segments that degrade automatic speech recognition (ASR) performance. We present a novel loss recovery scheme that reproduces the missing speech waveform by interpolating its spectrum from the speech spectra on both sides of a loss. An adaptive mechanism is used to determine the FFT width of the speech waveform before and af-ter a loss to capture as much spectral detail as possible. A linearly weighted spectral interpolation ensues to obtain the spectra of miss-ing speech. The missing speech waveform is then reconstructed through IFFT, followed by smoothing at packet boundaries. Tests on Bluetooth voice packets with a high loss rate of 38 % show that our scheme improves ASR performance considerably (up to 20%) while being computationally efficient, as it is an FFT-based scheme. 1.
This paper addresses the problem of noise robustness of automatic speech recognition (ASR) systems in various noisy environments using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA). This was accomplished... more
This paper addresses the problem of noise robustness of automatic speech recognition (ASR) systems in various noisy environments using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA). This was accomplished by the integration of a Perceptual Weighting Filter (PWF) with the MMSESTSA algorithm in order to improve the preprocessing speech enhancement performance. The proposed PWF-based STSA algorithm is integrated in the front-end of an ASR system in order to evaluate its robustness in severe interfering noisy environments. Experiments were conducted using a noisy version of speech signals extracted from the TIMIT database. The Hidden Markov model Toolkit (HTK) was used throughout our experiments. Results show that the proposed approach when included in the frontend of an HTK-based ASR system, outperforms that of the conventional recognition process in interfering noisy environments for a wide range of SNRs down to -4 dB.
This work presents a novel technique to enhance speech signals in the presence of interfering noise. In this paper, the amplitude and frequency (AMFM) modulation model [7] and a multi-band analysis scheme [5] are applied to extract the... more
This work presents a novel technique to enhance speech signals in the presence of interfering noise. In this paper, the amplitude and frequency (AMFM) modulation model [7] and a multi-band analysis scheme [5] are applied to extract the speech signal parameters. The enhancement process is performed using a time-warping function (n) that is used to warp the speech signal. (n) is extracted from the speech signal using the Smoothed Energy Operator Separation Algorithm (SEOSA) [4]. This warping is capable of increasing the SNR of the high frequency harmonics of a voiced signal by forcing the the quasiperiodic nature of the voiced component to be more periodic, and consequently is useful for extracting more robust parameters of the signal in the presence of noise.
There is provided an improved method for particle size and distribution determinations using a photosedimentometer having a centrifugal disc chamber for containing a spin fluid. The spin fluid must be characterized by a density gradient.... more
There is provided an improved method for particle size and distribution determinations using a photosedimentometer having a centrifugal disc chamber for containing a spin fluid. The spin fluid must be characterized by a density gradient. The present process provides a novel way of composing the spin fluid. The less dense component water/alcohol is introduced first, and the more dense component (e.g., water) is introduced second. At no time is the power to the motor driving the disc interrupted as in prior art methods. Better and more reproducible results are secured.
This paper addresses the problem of noise robustness of automatic speech recognition (ASR) systems in noisy car environments using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA). This was accomplished by... more
This paper addresses the problem of noise robustness of automatic speech recognition (ASR) systems in noisy car environments using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA). This was accomplished by the integration of an adaptive time varying Noise Shaping Filter (NSF) with the MMSE-STSA algorithm in order to improve the speech enhancement performance by “whitening” the noisy speech signals. Experiments were conducted using a noisy version of speech signals extracted from the TIMIT database. The proposed NSF-based STSA algorithm is used as a processor of an ASR system in order to evaluate its robustness in severe interfering car noise environments. The HTK Hidden Markov Model Toolkit was used throughout our experiments. Results show that the proposed approach, when included in the front-end of an HTK-based ASR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of SNRs down to -12 dB using a noisy version of the TIMIT database.
This study presents a novel technique to reconstruct the missing frequency bands of band-limited telephone speech signals. This technique is based on the Amplitude and Frequency Modulation (AM-FM) model, which models the speech signal as... more
This study presents a novel technique to reconstruct the missing frequency bands of band-limited telephone speech signals. This technique is based on the Amplitude and Frequency Modulation (AM-FM) model, which models the speech signal as the sum of N successive AM-FM signals. Based on a least-mean-square error criterion, each AM-FM signal is modified using an iterative algorithm in order to regenerate the high-frequency AMFM signals. These modified signals are then combined in order to reconstruct the broad-band speech signal. Experiments were conducted using speech signals extracted from the NTIMIT database. Such experiments demonstrate the ability of the algorithm for speech recovery, in terms of a comparison between the original and synthesized speech and informal listening tests.
In this paper, a novel feature vector based on both Mel Fre-quency Cepstral Coefficients (MFCCs) and a Mel-based nonlinear Discrete-time Energy Operator (MDEO) is proposed to be used as the input of an HMM-based Automatic Continuous... more
In this paper, a novel feature vector based on both Mel Fre-quency Cepstral Coefficients (MFCCs) and a Mel-based nonlinear Discrete-time Energy Operator (MDEO) is proposed to be used as the input of an HMM-based Automatic Continuous Speech Recog-nition (ACSR) system. Our goal is to improve the performance of such a recognizer using the new feature vector. Experiments show that the use of the new feature vector increases the recogni-tion rate of the ACSR system. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on both the TIMIT and NTIMIT databases. For the TIMIT database, when the MDEO was included in the feature vector to test a multi-speaker ACSR system, we found that the error rate decreased by about 9:51%. On the other hand, for NTIMIT, the MDEO deteriorates the performance of the recognizer. That is, the new feature vector is useful for clean speech but not for telephone speech. 1.
ABSTRACT In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) ystems. Our goal in this paper is to improve the performance of the HMM based ASR systems by exploiting some... more
ABSTRACT In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) ystems. Our goal in this paper is to improve the performance of the HMM based ASR systems by exploiting some features that characterize speech ...
Automatic recognition of spoken digit sequences (such as credit card numbers) is now feasible even in speaker‐independent applications over the telephone. However, all recognition tasks have lower performance in noisy conditions. If... more
Automatic recognition of spoken digit sequences (such as credit card numbers) is now feasible even in speaker‐independent applications over the telephone. However, all recognition tasks have lower performance in noisy conditions. If significant limitations are also imposed on the computational resources used for recognition, then robust speech recognition is still a significant challenge, even for a simple digit vocabulary. Since recognition of continuously spoken digits over telephone links is a very practical application, such recognition was investigated here under different conditions. Traditional hidden Markov model approaches with cepstral analysis were not used, because they are computationally intensive and have not always worked well under adverse acoustic conditions. Simpler spectral analysis was used, combined with a segmental approach. The analysis focuses on locations of spectral peaks, similar to formant tracking, but without the need to estimate peaks for all time frames. The limited nature...
Packet voice communications generally suffer packet losses as a result of various network- or transmission-related impairments. Upon decoding, these lost packets result in missing speech segments that degrade automatic speech recognition... more
Packet voice communications generally suffer packet losses as a result of various network- or transmission-related impairments. Upon decoding, these lost packets result in missing speech segments that degrade automatic speech recognition (ASR) performance. We present a novel loss recovery scheme that reproduces the missing speech waveform by interpolating its spectrum from the speech spectra on both sides of a loss. An adaptive mechanism is used to determine the FFT width of the speech waveform before and after a loss to capture as much spectral detail as possible. A linearly weighted spectral interpolation ensues to obtain the spectra of missing speech. The missing speech waveform is then reconstructed through IFFT, followed by smoothing at packet boundaries. Tests on Bluetooth voice packets with a high loss rate of 38% show that our scheme improves ASR performance considerably (up to 20%) while being computationally efficient, as it is an FFT-based scheme.
The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the... more
The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of transmission channels. In this paper, we concentrate on the telephone recognition of Egyptian Arabic speech using syllables. Arabic spoken digits were described by showing their constructing phonemes, triphones, syllables and words. Speaker-independent hidden markov models (HMMs)-based speech recognition system was designed using Hidden markov model toolkit (HTK). The database used for both training and testing consists from forty-four Egyptian speakers. In clean environment, experiments show that the recognition rate using syllables outperformed the rate obtained using monophones, triphones and words by 2.68%, 1.19% and 1.79% respectively. Also in noisy telephone channel, syllables outperformed the rate obtained using monophones, tr...
There is provided an improved method for particle size and distribution determinations using a photosedimentometer having a centrifugal disc chamber for containing a spin fluid. The spin fluid must be characterized by a density gradient.... more
There is provided an improved method for particle size and distribution determinations using a photosedimentometer having a centrifugal disc chamber for containing a spin fluid. The spin fluid must be characterized by a density gradient. The present process provides a novel way of composing the spin fluid. The less dense component water/alcohol is introduced first, and the more dense component (e.g., water) is introduced second. At no time is the power to the motor driving the disc interrupted as in prior art methods. Better and more reproducible results are secured.
In this paper, the implementation of a robust front-end to be used for a large-vocabulary Continuous Speech Recognition (CSR) system based on a Voiced-Unvoiced (V-U) decision has been addressed. Our approach is based on the separation of... more
In this paper, the implementation of a robust front-end to be used for a large-vocabulary Continuous Speech Recognition (CSR) system based on a Voiced-Unvoiced (V-U) decision has been addressed. Our approach is based on the separation of the speech signal into voiced and unvoiced components. Consequently, speech enhancement can be achieved through processing of the voiced and the unvoiced components separately. Enhancement of the voiced component is performed using an adaptive comb filtering, whereas the unvoiced component is enhanced using the modified spectral subtraction approach. We proved via experiments that the proposed CSR system is robust in additive noisy environments (SNR down to 0 dB).
This paper presents an evaluation of the use of some auditory-based acoustic distinctive features and formant cues for automatic speech recognition (ASR). Comparative experiments have indicated that the use of either the formant... more
This paper presents an evaluation of the use of some auditory-based acoustic distinctive features and formant cues for automatic speech recognition (ASR). Comparative experiments have indicated that the use of either the formant magnitudes or the formant frequencies combined with some auditory-based acoustic distinctive features and the classical MFCCs within a multi-stream statistical framework leads to an improvement in the recognition performance of HMM-based ASR systems. The Hidden Markov Model Toolkit (HTK) was used throughout our experiments to test the use of the new multi-stream feature vector. A series of experiments on speaker-independent continuous-speech recognition have been carried out using a subset of the large read-speech corpus TIMIT. Using such multi-stream paradigm, N -mixture tri-phone models and a bigram language model, we found that the word error rate was decreased by about 6.46%.
This research investigates the acoustical characteristics of Al-Madinah Holy Mosque. Extensive field measurements were conducted in different locations of Al-Madinah Holy Mosque to characterize its acoustic characteristics. The acoustical... more
This research investigates the acoustical characteristics of Al-Madinah Holy Mosque. Extensive field measurements were conducted in different locations of Al-Madinah Holy Mosque to characterize its acoustic characteristics. The acoustical characteristics are usually evaluated by the use of objective parameters in unoccupied rooms due to practical considerations. However, under normal conditions, the room occupancy can vary such characteristics due to the effect of the additional sound absorption present in the room or by the change in signal-to-noise ratio. Based on the acoustic measurements carried out in Al-Madinah Holy Mosque with and without occupancy, and the analysis of such measurements, the existence of acoustical deficiencies has been confirmed. Keywords—Worship sound, Al-Madinah Holy Mosque, mosque acoustics, speech intelligibility.
This study presents a novel technique to enhance telephone speech signals. This technique is based on the Amplitude and Frequency Modulation (AM-FM) model, which represents the speech signal as the sum ofN successive AM-FM signals. Based... more
This study presents a novel technique to enhance telephone speech signals. This technique is based on the Amplitude and Frequency Modulation (AM-FM) model, which represents the speech signal as the sum ofN successive AM-FM signals. Based on a leastmean-square error criterion, each AM-FM signal is modified using an iterative algorithm in order to compensate for the deformation of the signal caused by the nonlinear telephone channel. These modified signals are then combined in order to reconstruct the enhanced speech signal. Experiments were conducted using speech signals extracted from the NTIMIT database. Such experiments demonstrate the ability of the algorithm for speech enhancement, in terms of a comparison between the original and synthesized speech and informal listening tests.
This work presents a novel technique to enhance speech signals in the presence of interfering noise. In this paper, the amplitude and frequency (AMFM) modulation model [7] and a multi-band analysis scheme [5] are applied to extract the... more
This work presents a novel technique to enhance speech signals in the presence of interfering noise. In this paper, the amplitude and frequency (AMFM) modulation model [7] and a multi-band analysis scheme [5] are applied to extract the speech signal parameters. The enhancement process is performed using a time-warping function (n) that is used to warp the speech signal. (n) is extracted from the speech signal using the Smoothed Energy Operator Separation Algorithm (SEOSA) [4]. This warping is capable of increasing the SNR of the high frequency harmonics of a voiced signal by forcing the the quasiperiodic nature of the voiced component to be more periodic, and consequently is useful for extracting more robust parameters of the signal in the presence of noise.
In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A... more
In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A Temporal-Correlation-Based Recurrent Multilayer Neural Network (TCRMNN) for noise reduction in the cepstral domain is used in order to get less-variant parameters to be useful for robust recognition in noisy environments. Experiments show that the use of the enhanced parameters using such an approach increases the recognition rate of the continuous speech recognition (CSR) process. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on a noisy version of the TIMIT database. With such a pre-processing noise reduction technique in the front-end of the HTK-based continuous speech recognition system (CSR) system, improvements in the recognition accuracy of about 17.77% and 18.58% using single mixture monophones and triphones, respectively, have be...
This paper addresses the problem of noise robustness of automatic speech recognition (ASR) systems in various noisy environments using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA). This was accomplished... more
This paper addresses the problem of noise robustness of automatic speech recognition (ASR) systems in various noisy environments using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA). This was accomplished by the integration of a Perceptual Weighting Filter (PWF) with the MMSESTSA algorithm in order to improve the preprocessing speech enhancement performance. The proposed PWF-based STSA algorithm is integrated in the front-end of an ASR system in order to evaluate its robustness in severe interfering noisy environments. Experiments were conducted using a noisy version of speech signals extracted from the TIMIT database. The Hidden Markov model Toolkit (HTK) was used throughout our experiments. Results show that the proposed approach when included in the frontend of an HTK-based ASR system, outperforms that of the conventional recognition process in interfering noisy environments for a wide range of SNRs down to -4 dB.
In this paper, a novel feature vector based on both mel frequency cepstral coefficients (MFCCs) and a mel-based nonlinear discrete-time energy operator (MDEO) is proposed to be used as the input of an HMM-based automatic continuous speech... more
In this paper, a novel feature vector based on both mel frequency cepstral coefficients (MFCCs) and a mel-based nonlinear discrete-time energy operator (MDEO) is proposed to be used as the input of an HMM-based automatic continuous speech recognition (ACSR) system. Our goal is to improve the performance of such a recognizer using the new feature vector. Experiments show that the use of the new feature vector increases the recognition rate of the ACSR system. The HTK hidden Markov model toolkit was used throughout. Experiments were done on both the TIMIT and NTIMIT databases. For the TIMIT database, when the MDEO was included in the feature vector to test a multi-speaker ACSR system, we found that the error rate decreased by about 9.51%. On the other hand, for NTIMIT, the MDEO deteriorates the performance of the recognizer. That is, the new feature vector is useful for clean speech but not for telephone speech
In this paper, syllables are proposed to be used as acoustic units to improve the performance of automatic speech recognition (ASR) systems of Arabic spoken proverbs in noisy environments. To test our proposed approach, a... more
In this paper, syllables are proposed to be used as acoustic units to improve the performance of automatic speech recognition (ASR) systems of Arabic spoken proverbs in noisy environments. To test our proposed approach, a speaker-independent HMM-based speech recognition system was designed using hidden Markov model toolkit (HTK). A series of experiments on noisy speech has been carried out using an Arabic database that consists of fifty-nine Egyptian speakers. The obtained results show that the recognition rate using syllables outperformed the rate obtained using monophones and triphones by 20.88% and 15.82%, respectively. The use of syllables did not only improve the performance of the ASR process in noisy environments, but also it limited the complexity of the computation (and consequently the running time) of the recognition process. Also, we show in this paper that the integration of a pre-processing enhancement technique in the front-end of the syllable-based ASR engine leads t...
In this paper we investigate the identification of naval targets (ships or submarine) through the identification of the underwater sound they produce. Our approach is based on the use of continuous hidden Markov models (CHMMs) to identify... more
In this paper we investigate the identification of naval targets (ships or submarine) through the identification of the underwater sound they produce. Our approach is based on the use of continuous hidden Markov models (CHMMs) to identify such naval targets. The general Gaussian density distribution HMM is developed for CHMM system. Several experiments have been conducted to study the effects
... Hesham Tolba Electrical Engineering Department Faculty of Engineering, Taibah University Al Madinah, KSA htol@link.net Ahmed Elgerzawy Egyptian Naval Forces Military of Egypt,Ras El-Teen Alexandria, Egypt vt70@hotmail.com Abstract ...
We show that the concept of voiced-unvoiced (VU) classification of speech sounds can be incorporated not only in speech analysis or speech enhancement processes, but also can be useful for recognition processes. That is, the incorporation... more
We show that the concept of voiced-unvoiced (VU) classification of speech sounds can be incorporated not only in speech analysis or speech enhancement processes, but also can be useful for recognition processes. That is, the incorporation of such a classification in a continuous speech recognition (CSR) system not only improves its performance in low SNR environments, but also limits the time and the necessary memory to carry out the process of the recognition. The proposed V-U classification of the speech sounds has two principal functions: (1) it allows the enhancement of the voiced and unvoiced parts of speech separately; (2) it limits the Viterbi (1967) search space, and consequently the process of recognition can be carried out in real time without degrading the performance of the system. We prove via experiments that such a system outperforms the baseline HTK when a V-U decision is included in both front- and far-end of the HTK-based recognizer.
The main problem that originated this paper was how to identify naval targets (ships or submarine) by hearing the underwater sound they produce. This paper reports an approach based on Continuous Hidden Markov Model (CHMM) to identify the... more
The main problem that originated this paper was how to identify naval targets (ships or submarine) by hearing the underwater sound they produce. This paper reports an approach based on Continuous Hidden Markov Model (CHMM) to identify the naval targets. The Mel frequency cepstral coefficients (MFCCs) were selected to describe the input signal. The general Gaussian density distribution HMM is developed for CHMM system. Several experiments have been conducted to study the effects of speed, distance and the direction of the naval targets on the identification rate (IR) of such targets using our proposed approach. The obtained IR was found to be 100% and kept constant while changing the direction, 91.97% while changing the distance and 58.3% while changing the speed of the target. Results showed that speed has the maximum effect on the identification process.
In this paper we investigate the identification of naval targets (ships or submarine) through the identification of the underwater sound they produce. Our approach is based on the use of Continuous Hidden Markov Models (CHMMs) to identify... more
In this paper we investigate the identification of naval targets (ships or submarine) through the identification of the underwater sound they produce. Our approach is based on the use of Continuous Hidden Markov Models (CHMMs) to identify such naval targets. The general Gaussian density distribution HMM is developed for CHMM system. Several experiments have been conducted to study the effects of speed, distance and the direction of the naval targets on the identification rate (IR) of such targets using different features Mel-Frequency Cepstrum Coefficients (MFCCs), Perceptual Linear Prediction (PLP), and Relative Spectral PLP (RASTA-PLP). The obtained IR was found to be 100% (MFCCs & PLP) and 91.67 (RASTA) while changing the direction, 91.97% (MFCCs & PLP) and 83.33% (RASTA) while changing the distance and 58.3% (MFCCs & PLP) and 25% (RASTA) while changing the speed of the target. Results showed that speed has the maximum effect on the identification process. We applied our engine t...
Le but de la technologie “interaction homme-machine” (HM) est de créer des machines artificielles intelligentes capables d'interagir avec les êtres humains par l'intermédiaire de leurs voix. Les progrès rapides dans les différents... more
Le but de la technologie “interaction homme-machine” (HM) est de créer des machines artificielles intelligentes capables d'interagir avec les êtres humains par l'intermédiaire de leurs voix. Les progrès rapides dans les différents domaines tels que les calculs numériques, le traitement des signaux et l'évolution des méthodes statistiques au cours des dix dernières années ont contribué énormément au progrès et à la croissance énorme de la recherche sur la technologie HM. Cependant, la création de telles machines demeure encore un but éloigné. Ceci est principalement dû au manque d'une compréhension fondamentale du traitement de la parole par l'être humain. Dans cet article, nous donnons un aperçu globale de la technologie HM tout en mettant l'accent sur la reconnaissance automatique de la parole. Finalement, nous concluons avec quelques perspectives au sujet des limitations fondamentales de la technologie courante, et les axes de recherche les plus prometteurs...
In this paper, the AM–FM modulation model is applied to speech analysis, synthesis and coding. The AM–FM model represents the speech signal as the sum of formant resonance signals each of which contains amplitude and frequency modulation.... more
In this paper, the AM–FM modulation model is applied to speech analysis, synthesis and coding. The AM–FM model represents the speech signal as the sum of formant resonance signals each of which contains amplitude and frequency modulation. Multiband filtering and demodulation using the energy separation algorithm are the basic tools used for speech analysis. First, multiband demodulation analysis (MDA) is

And 42 more