Skip Navigation

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2008 E91-A(6):1303-1309; doi:10.1093/ietfec/e91-a.6.1303
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by MIYOSHI, M.
Right arrow Articles by KINOSHITA, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Special Section on Acoustic Scene Analysis and Reproduction - Papers

Calculating Inverse Filters for Speech Dereverberation

Masato MIYOSHI1, Marc DELCROIX1 and Keisuke KINOSHITA1

1 The authors are with NTT Communication Science Laboratories, NTT Corporation, Kyoto-fu, 619-0237 Japan. E-mail: miyo{at}cslab.kecl.ntt.co.jp

Speech dereverberation is one of the most difficult tasks in acoustic signal processing. Of the various problems involved in this task, this paper highlights "over-whitening," which flattens the characteristics of recovered speech. This distortion sometimes happens when inverse filters are directly calculated from microphone signals. This paper reviews two studies related to this problem. The first study shows the possibility of compensating for such over-whitening to achieve precise speech-dereverberation. The second study presents a new approach for approximating the original speech by removing the effect of late reflections from observed reverberant speech.

Key Words: dereverberation, inverse filter, linear prediction, characteristic polynomial, multi-step linear prediction


Manuscript received February 4, 2007.

References

[1] B. Yegnanarayana and P.S. Murthy, "Enhancement of reverberant speech using LP residual signal," IEEE Trans. Speech Audio Process., vol.8, no.3, pp.267–281, 2000.

[2] B.W. Gillespie and L.E. Atlas, "Acoustic diversity for improved speech recognition in reverberant environments," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.557–600, 2002.

[3] S. Furui, Digital speech processing, synthesis, and recognition, Marcel Dekker, New York, Basel, 2001.

[4] M. Miyoshi and Y. Kaneda, "Inverse filtering of room acoustics," IEEE Trans. Speech Audio Process., vol.36, no.2, pp.145–152, 1988.

[5] P.A. Nelson, F. Orduña-Bustamante, and H. Hamada, "Multichannel signal processing techniques in the reproduction of sound," J. Audio Eng. Soc., vol.44, no.11, pp.973–989, 1996.

[6] K. Furuya and Y. Kaneda, "Two-channel blind deconvolution of nonminimum phase fir systems," IEICE Trans. Fundamentals, vol.E80-A, no.5, pp.804–808, May 1997.

[7] Y. Huang, J. Benesty, and J. Chen, "A blind channel identification-based two-stage approach to separation and dereve-beration of speech signals in a reverberant environment," IEEE Trans. Speech Audio Process., vol.13, no.5, pp.882–895, 2005.

[8] E. Moulines, P. Duhamel, J. Cardoso, and S. Mayrargue, "Subspace methods for the blind identification of multichannel fir filters," IEEE Trans. Signal Process., vol.43, no.2, pp.516–525, 1995.

[9] Y. Huang, J. Benesty, and J. Chen, "Optimal step size of the adaptive multichannel LMS algorithm for blind SIMO identification," IEEE Trans. Signal Process. Lett., vol.12, no.3, pp.173–176, 2005.

[10] G.B. Giannakis, Y. Hua, P. Stoica, and L. Tong, Signal processing advances in wireless and mobile communications, Prentice Hall, Upper Saddle River, NJ, 2001.

[11] S. Gannot and M. Moonen, "Subspace methods for multi- microphone speech dereverberation," EURASIP J. ASP, vol.2003, pp.1074–1090, 2003.

[12] T. Hikichi, M. Delcroix, and M. Miyoshi, "Speech dereverberation algorithm using transfer function estimates with overestimated order," Acoust. Sci. & Tech., vol.27, pp.28–35, 2006.

[13] S. Amari, S.C. Douglas, A. Cihocki, and H.H. Yang, "Multichannel blind deconvolution and equalization using the natural gradient," Proc. SPAWC, pp.101–104, 1997.

[14] S. Haykin, Unsupervised adaptive filtering: Blind source separation, A Wiley-Interscience Pub., New York, NY, 2000.

[15] S. Haykin, Adaptive filter theory, 3rd ed., Prentice-Hall, Upper Saddle River, NJ, 1996.

[16] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Provessing, Wiley, West Sussex, 2002.

[17] N.D. Gaubitch, P.A. Naylor, and D.B. Ward, "On the use of linear prediction for dereverberation of speech," Proc. Int'l. Workshop Acoust. Echo Noise Control, pp.99–102, 2003.

[18] Y. Haneda, S. Makino, and Y. Kaneda, "Multiple-point equalization of room transfer functions by using common acoustical pole," IEEE Trans. Speech Audio Process., vol.5, no.4, pp.325–33, 1997.

[19] K. Furuya and A. Kataoka, "Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction," IEEE Trans. ASLP, vol.15, no.5, pp.1579–1591, 2007.

[20] D.T.M. Slock, "Blind fractionally-spaced equalization, perfect-reconstruction filter banks and multichannel lineawr prediction," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.585–588, 1994.

[21] M. Miyoshi, "Estimating AR parameter-sets for linear-recurrent signals in convolutive mixtures," Proc. Int'l Conf. Independent Component Analysis, pp.585–589, 2003.

[22] A. Ben-Israel and T.N.E. Greville, Generalized inverses: Theory and applications, Springer-Verlag, New York, NY, 1974.

[23] S.L. Campbell and C.D.M. Meyer, Jr., Generalized inverses of linear transformations, Dover Publications, New York, NY, 1979.

[24] T. Kailath, A.H. Sayed, and B. Hassibi, Linear estimation, Prentice Hall, Upper Saddle River, NJ, 2000.

[25] X. Sun and S. Douglass, "A natural gradient convolutive blind source separation algorithm for speech mixtures," Proc. Int'l Conf. Independent Component Analysis, pp.59–64, 2001.

[26] R. Aichner, S. Araki, S. Makino, T. Nishikawa, and H. Saruwatari, "Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming," Proc. IEEE Int'l Workshop on Neural Networks for Signal Processing, pp.445–454, 2002.

[27] D.A. Harville, Matrix algebra from a statistician's perspective, Springer-Verlag, New York, NY, 1997.

[28] M. Delcroix, T. Hikichi, and M. Miyoshi, "Blind dereverberation algorithm for speech signals based on multi-channel linear prediction," Acoust. Sci. & Tech., vol.26, pp.432–439, 2005.

[29] S. Kodama and N. Suda, Matrix theories for system control, Corona-Publishing Co., Tokyo, 1978.

[30] M. Delcroix, T. Hikichi, and M. Miyoshi, "Precise dereverberation using multi-channel linear prediction," IEEE Trans. ASLP, vol.15, no.2, pp.430–440, 2007.

[31] "ATR International Speech database," http://www.red.atr.co.jp/database_page/digdb.html

[32] M. Delcroix, T. Hikichi, and M. Miyoshi, "Dereverberation and denoising using multichannel linear prediction," IEEE Trans. Speech, Audio and Language Processing, vol.15, no.6, pp.1791–1801, 2007.

[33] T. Yoshioka, T. Hikichi, and M. Miyoshi, "Dereverberation by using time-variant nature of speech production system," EURASIP J. Advances in Signal Processing, 2007. Article ID 65698, 15 pages, doi:10.1155/2007/65698.

[34] D. Gesbert and P. Duhamel, "Robust blind identification and equalization based on multi-step predictors," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.3621–3624, 1997.

[35] S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Speech Audio Process., vol.27, no.2, pp.113–120, 1979.

[36] K. Kinoshita, M. Delcroix, T. Nakatani, and M. Miyoshi, "A linear prediction-based microphone array for speech dereverberation in a realistic sound field," Proc. Audio Engineering Society 13th Regional Convention, 2007.

[37] J.L. Flanagan, "Computer-steered microphone arrays for sound transduction in large rooms," J. Acoust. Soc. Am., vol.78, no.11, pp.1508–1518, 1985.

[38] "JNAS: Japanese newspaper article sentences," http://www.mibel.cs.tsukuba.ac.jp/jnas/

[39] B.S. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Am., vol.55, no.6, pp.1304–1312, 1974.

[40] K. Kinoshita, M. Delcroix, T. Nakatani, and M. Miyoshi, "Multi-step linear prediction based speech dereverberation in noisy reverberant environment," Proc. Interspeech, pp.3–15, 2007.

[41] K. Kobayashi, K. Kiyohara, K. Furuya, and Y. Kaneda, "Improvement of sound deterioration using filter-and-sum array for near sound field," Electron. Commun. Jpn. 3, Fundam. Electron. Sci., vol.85, pp.29–37, 2002.

[42] H.L.V. Trees, Optimal array processing, Wiley-Interscience, New York, NY, 2002.

[43] S. Aoki, H. Miyata, and K. Sugiyama, "Stereo reproduction with good localization in a wide listening area," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.3621–3624, 1997.

[44] B. Kapralos, M.R.M. Jenkin, and E. Milios, "Audio-visual localization of multiple speakers in a video teleconferencing setting," Int'l J. Imaging Sys. & Tech., vol.13, pp.95–105, 2003.

[45] T. Nakatani, B. Juang, T. Hikichi, T. Yoshioka, K. Kinoshita, M. Delcroix, and M. Miyoshi, "Study on speech dereverberation with autocorrelation codebook," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.193–197, 2007.

[46] T. Hikichi, M. Delcroix, and M. Miyoshi, "Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations," EURASIP J. Advances in Signal Processing, 2007. Article ID 34013.

[47] S. Araki, H. Sawada, and S. Makino, "Blind speech separation in a meeting situation with maximum snr beamformers," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.41–44, 2007.

[48] J.E. Rubio, K. Ishizuka, H. Sawada, S. Araki, T. Nakatani, and M. Fujimoto, "Two-microphone voice activity detection based on the homogeneity of the direction arrival estimates," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp.385–388, 2007.

[49] T. Hori, C. Hori, Y. Minami, and A. Nakamura, "Efficient wfst-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition," IEEE Trans. Speech, Audio and Language Processing, vol.15, no.4, pp.1352–1365, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by MIYOSHI, M.
Right arrow Articles by KINOSHITA, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?