Noise reduction for periodic signals using high-resolution frequency analysis
- Toshio Yoshizawa^{1},
- Shigeki Hirobayashi^{1}Email author and
- Tadanobu Misawa^{1}
https://doi.org/10.1186/1687-4722-2011-426794
© Yoshizawa et al; licensee Springer. 2011
Received: 27 June 2011
Accepted: 21 September 2011
Published: 21 September 2011
Abstract
The spectrum subtraction method is one of the most common methods by which to remove noise from a spectrum. Like many noise reduction methods, the spectrum subtraction method uses discrete Fourier transform (DFT) for frequency analysis. There is generally a trade-off between frequency and time resolution in DFT. If the frequency resolution is low, then the noise spectrum can overlap with the signal source spectrum, which makes it difficult to extract the latter signal. Similarly, if the time resolution is low, rapid frequency variations cannot be detected. In order to solve this problem, as a frequency analysis method, we have applied non-harmonic analysis (NHA), which has high accuracy for detached frequency components and is only slightly affected by the frame length. Therefore, we examined the effect of the frequency resolution on noise reduction using NHA rather than DFT as the preprocessing step of the noise reduction process. The accuracy in extracting single sinusoidal waves from a noisy environment was first investigated. The accuracy of NHA was found to be higher than the theoretical upper limit of DFT. The effectiveness of NHA and DFT in extracting music from a noisy environment was then investigated. In this case, NHA was found to be superior to DFT, providing an approximately 2 dB improvement in SNR.
Keywords
1. Introduction
Noise reduction to recover a target signal from an input waveform is important in a number of fields. We usually use a frequency spectrum to remove noise from the input waveform. Although it is difficult to distinguish a signal from the noise in the time domain, this task tends to become easier in the frequency domain. However, it is difficult to filter out noise that is similar to a signal. For example, the consonant, which is the part of the sound that has a frequency spectrum that is similar to a noise. This study proposes a basic technology by which to remove a noise from musical sound including several periodic signals. We selected white noise and pink noise as the noise signals. These noises are common in cities as well as in nature and have a continuous spectrum. Based on this study, we can remove white noise, including wideband noise such as pulse and white noise, from an old music recording in order to apply digital remastering in multimedia industries. We will also be able to remove noise from a recording of a singing voice because this is a periodic signal. When listening to music in a high-noise environment, difficulty in hearing the music and the presence of ambient noise can decrease the level of enjoyment. Therefore, various noise reduction methods are being investigated, and a number of noise reduction techniques have been proposed. The spectral subtraction method (SS method) is a widely used approach [1] in which the target signal is extracted from a noisy signal by measuring the noise in advance and modeling the statistical spectral envelope characteristics [2–4]. The SS method does not require multiple microphones, and highly effective results can be obtained by using a relatively simple algorithm. For this reason, many techniques for improving the SS method have been proposed. Sorensen and Andersen [5] also used the SS method in combination with speech presence detection. Soon and Koh [6] and Ding et al. [7] treated audio signals as graphics and applied 2D and 1D Wiener filters in the frequency domain for noise reduction. The advantage of this method is the possibility of frame-to-frame correlation. In addition, the amplitude in the frequency domain can be adjusted and an unmodified initial phase can be used. Finally, Virag [8] and Udrea et al. [9] suggested an SS method based on the characteristics of the human auditory system.
However, using unmodified noisy phases limits the noise reduction effect. In general, the discrete Fourier transform (DFT) is used to obtain the spectral characteristics during preprocessing for the SS method. The frequency resolution of the DFT is restricted because it depends on the analytical frame length and the window function. If the frequency resolution is low, the noise spectrum can overlap the spectrum of the signal source, which makes it difficult to extract the original signal. Energy leaks into another band and side lobes are generated when the frequency of the analytic signal does not correspond to an integral multiple of the base frequency. In harmonic frequency analysis, there is then a high probability of overlap between the side-lobes of the source spectrum and the noise spectrum. If the side-lobes are removed, then the signal source can fully be recovered. Similarly, if the time resolution is low, then rapid frequency variations cannot be detected. In order to solve this problem, Kauppinen and Roth attempted to increase the frequency resolution by applying an extrapolation method to the signal frame in the time domain [10]. In this study, we have applied non-harmonic analysis (NHA), which has a high frequency resolution with limited influence of the frame length [11], to the problem of noise reduction. For a similar frame length, NHA is expected to achieve better frequency resolution than the length extrapolation method used in [10]. Therefore, we investigated the use of NHA as an alternative preprocessing method to DFT for noise reduction. Since the effects of frequency resolution can best be evaluated for periodic signals, sounds produced by musical instruments were used in this study, and preliminary noise reduction experiments were performed.
The remainder of this article is organized as follows. In Section 2, we provide an introduction to the NHA algorithm. In Section 3, we investigate noise reduction using single sinusoidal waves. Section 4 describes the side-lobe suppression experiments. In Section 5, noise reduction experiments are carried out using sounds produced by musical instruments, and the results are described in Section 6.
2. The NHA method
2.1 Background
When the sampling frequency is Δt and the original signal x(n) has a period of N Δt/k, X(k) can accurately reflect the spectral structure. However, if a period other than N Δt/k appears in x(n), X(k) is expressed by the combination of N Δt/k in terms of several frequency components, and X(k) is not accurately reflected in the spectral structure.
In order to increase the frequency resolution, the value of N is generally increased. If the frequency is accompanied by a temporal fluctuation, however, then the average period is extracted and the analytical accuracy deteriorates as N is increased. Some techniques use an analysis window function for x(n) in preprocessing. However, this does not improve the apparent frequency resolution.
2.2 Algorithm of NHA
Finally, we describe the motivation for the structure shown in Figure 2. For the cost function equation, given by Equation 2, although the convergence speed is slow, the steepest descent method can find the stationary point within a wide range. In contrast, the Newton method can quickly find a nearby stationary point. Therefore, we first use the steepest descent method to find the stationary point within a wide range. Then, we use the Newton method to quickly find a stationary point. Either way, we distinguish the convergence calculation of amplitude A from the other parameters, so that the local stationary point will not be calculated incorrectly.
2.3 Details of NHA
where N is the frame length and f_{s} is the sampling frequency (f_{s} = 1/Δt).
2.3.1. Steepest descent method
George and Smith [12, 13] attempted to introduce the signal parameter A and the initial phase ϕ by applying the least mean squares method to the difference signal between the analyzed signal and the modulated harmonic sinusoidal wave.
However, this method is strongly dependent on the frame length and is difficult to apply to the analysis of signals that do not have a simple frequency harmonic structure because frequencies that are dependent on the frame length are used for the group of harmonic frequencies, as in DFT. In other words, small frequency changes cannot be detected.
If the maximum amplitude A determined by DFT and the frequency f and initial phase ϕ are used as initial values (A_{0,0}, f_{0,0}, ϕ_{0,0}), then the initial values can be given inside the trough containing the minimum of cost function in Figure 3.
where μ_{ m }_{,1} is set to 1.
The next step is the convergence of the amplitude.
2.3.2. Amplitude convergence
Then, ${\widehat{A}}_{m+1,0},{\widehat{f}}_{m+1,0}$, and ${\widehat{\phi}}_{m+1,0}$ are set to ${\widehat{A}}_{m,q},{\widehat{f}}_{m,p}$, and ${\widehat{\phi}}_{m,p}$, and q and p are reset to 1.
Next, the steepest descent method and the amplitude converging algorithm are recursed until the cost function becomes partially converged. Newton's method is then applied.
2.3.3. Newton's method
and m is the number of iterations of Newton's method. In addition, μ_{ m,p } is similarly obtained from Equation 6. This series of calculations is also repeated to cause ${\widehat{f}}_{m}$ and ${\widehat{\varphi}}_{m}$ to converge accurately. After applying Equations 11 and 12, ${\widehat{A}}_{m}$ is made to converge by applying Equation 8 in the same manner as in the steepest descent method, and the series of calculations is repeated. The only difference is that the converging algorithm is repeated using Newton's method instead of the steepest descent method. Thus, the frequency parameters are estimated to a high degree of accuracy and at high speed by using a hybrid process combining the steepest descent and Newton's method.
2.3.4. Sequential reduction
If both A_{ j } and A match, then a frequency component of an estimated spectrum can completely be removed from an object signal. Therefore, the problem of acquiring an optimum solution is frequency independent and is applicable even to a signal consisting of several sinusoidal waves by sequential and individual estimation from the object signal. In other words, even when the object signal is a composite sinusoidal wave, several sinusoidal waves can be extracted by performing similar processing on sequential residual signals. If the frequencies of two spectra are adjacent to each other, the other spectrum generates another trough in the trough around the true value shown in Figure 3 and distorts the evaluation function. This may result in an error, as discussed later herein.
2.4. Accuracy of NHA
Among the techniques based on DFT, generalized harmonic analysis (GHA or Hirata's algorithm) is generally considered to have the highest accuracy [17–20].
DFT exhibited low analytical accuracy except when the signals had frequencies that were integral multiples of the fundamental frequency. At frequencies above 1 Hz, GHA exhibited accuracies that were two to five orders of magnitude greater. At the same frequencies, NHA was 10 or more orders of magnitude more accurate than DFT. At frequencies below 1 Hz, DFT and GHA were equally accurate, but NHA was able to estimate the frequency and other parameters correctly without being affected by the frame length. Thus, NHA was demonstrated to have an even greater analysis accuracy than GHA, which was developed from DFT.
Accurate estimation at frequencies below 1 Hz means that even object signals having periods longer than the frame length can accurately be analyzed. Therefore, it may be possible to accurately estimate the spectral structures of signals representing stock prices and other fluctuation factors.
The ratio of the amplitudes of the two sinusoidal waves is 1:1 in Figure 7 and 1:10 in Figure 8. The latter is the sinusoidal wave ratio at f = 0.6 Hz. In both cases, the accuracy increases in the order of NHA, GHA, and DFT. If the two sinusoidal waves have similar amplitudes, the evaluation functions shown in Figure 3 interfere with each other, increasing the distortion, which results in a greater error than that when only one sinusoidal wave is used. As mentioned above, this tendency becomes more noticeable as the frequencies become closer to each other. However, the NHA error is less than the average, as compared to the errors of DFT and GHA.
3. Extracting single sinusoidal waves
In this section, a quantitative comparison of the extraction accuracy and the calculation time of DFT and NHA is performed. A single sinusoidal wave in a noisy environment was used for the experiment. For each method, an optimum spectrum (closest to the target signal frequency) was selected and converted to a waveform for evaluation. For DFT, f is necessarily an integral multiple of the fundamental frequency. For the calculations, the frame length was set to 256, and the sampling frequency was set to 488 kHz. The sinusoidal wave was set to 488 Hz in order to investigate frequencies that DFT could not estimate.
Figure 9c, 9e are the signals detected by NHA and DFT, respectively, and (d) and (f) are the residual signals obtained by subtracting (c) and (e) from the target signal. This figure shows that NHA more accurately extracts the original signal. When noise is added to the signal, DFT produces errors if the frequency is not a multiple of the fundamental frequency. The output SNR was approximately 24 dB when NHA was used for extraction and approximately 4 dB when DFT was used. Thus, an improvement of approximately 20 dB was confirmed.
These calculations were performed using a personal computer (CPU: Intel Core i7-930@2.8 GHz, Memory: 6 GB). The time required for calculating a signal consisting of 256 samples by DFT and NHA are 2.8 and 12.0 ms, respectively. It is noted that DFT is calculated by the fastest FFT using a radix-2 number in this article.
For statistical verification at various target signal frequencies, an extraction experiment was conducted in which the frequency f and the initial phase ϕ of the target signal were varied 1,000 times in different noise environments using uniformly distributed random numbers. The range of f and ϕ was 0 <f < 4000 and -π <ϕ <π, respectively. In this case, the amplitude A was maintained constant. The input signal was generated by adding white noise to a single sinusoidal wave. Throughout the experiments, the input SNR was maintained in the range from -10 to +10 dB and was varied in 5-dB steps.
4. Suppression of side-lobes
Parameters of sinusoidal waves
Sinusoidal waves | ||
---|---|---|
Mark | Amplitude | Target frequency (Hz) |
(a) | 0.8 | 4.2 |
(b) | 1 | 10.3 |
(c) | 0.1 | 13.7 |
(d) | 0.6 | 20.3 |
Figure 13a shows the case for DFT. The side-lobes of the source spectrum overlap the noise spectrum, making it difficult to estimate the amplitude. In addition, the phase information of the target signal is lost. If the side-lobes are removed, then the signal source cannot fully be recovered. On the other hand, the possibility of any overlap between the source and noise spectrum decreases because NHA is a high-frequency resolution analysis, as shown in Figure 13b. Therefore, there is a high possibility that the information contained in the source spectrum is isolated from the noise spectrum and can be recovered.
5. Constant threshold experiment
5.1. Experimental conditions for the constant threshold experiments
Experimental conditions
Analysis method | DFT (rectangular), DFT (Hanning), Ismo (rectangular), Ismo(Hanning), NHA |
---|---|
Amplitude modification | Spectral extraction, SS |
Sampling frequency | 44.1 KHz |
Length of Music | 2 s |
Frame length | 256, 512, 1024, 2048 |
Shift length | (Frame length)/4 |
Added noise | White Gaussian noise, Pink noise |
Input SNR (dB) | -10, -5, 0, 5, 10 |
Instrument of MIDI | Flute, Grand piano, Reed organ, Overdrive guitar, Trumpet |
Music (midi) | Do-Re-Mi, For Elise |
Software synthesizer | YAMAHA XG WDM SoftSynthesizer |
5.2 Details of the methods used to obtain the amplitude-modified spectra
Substituting X_{ism}(k) obtained using the Ismo method for X(k) in Equations 18 and 19, we calculate these equations in a similar manner and obtain the output ${\u015d}_{I\mathsf{\text{SMsub}}}$ by the SS method, and the output ${\widehat{s}}_{\text{ISMex}}$by the SE method.
5.3. Results of the fixed-threshold experiment
Compared to the SE method, the NHA, indicated by blue solid lines, provided the best results, followed by the Ismo method with a Hanning window, and DTF with a rectangular window provided the worst results. Similarly, compared to the SS method, NHA provided the best results, and DTF with a rectangular window provided the worst results.
For this sound source, the output SNR calculated by each method has a different magnitude, but these magnitudes change at approximately the same time and exhibit a similar trend.
The signal used here is stable and exhibits only a few changes in its envelope for both the SE and SS methods, as shown in Figures 18, 19, and 20. The calculated results for that signal were ranked in order of NHA, the Ismo method, and DFT. For the SE method, the Ismo method and NHA provided better results than DFT by approximately 5 and 3 dB, respectively, when the envelope changed markedly. For the SS method, the Ismo method and NHA provided better results than DFT by approximately 1.5 and 0.7 dB, respectively, when the envelope changed markedly. The results obtained by NHA may have been superior because the signal source spectrum was not dispersed and the frequency resolution was high. In addition, the results of the Ismo method are comparatively good, in part because the prediction of the signal became easy.
Figure 21a-c shows the results for input SNRs of 10, 0, and -10 dB, respectively, in a white Gaussian noise environment. Based on the results, the average segmental SNR obtained by NHA is the highest for the SE method, followed by the Ismo method using a Hanning window. For the SS method, the average segmental SNR obtained by NHA is high compared to other techniques. Unlike in a previous study [11], the improvement in precision by the Ismo method for the SS method could not be confirmed in the present experiment. However, the higher values are thought to have been obtained using transient detection [21]. In this study, the threshold is chosen so that the segmental SNR I maximized each time the segmental SNR is calculated. The Ismo method is thought to be well suited to real applications (e.g., threshold decision method that considers either human hearing [8] or musical noise [23]) and provides good affinity. Figure 21d-f shows the results for input SNRs of 10, 0, and -10 dB, respectively, in a pink noise environment. In this case, the best NHA results were obtained using either the SE method or the SS method. Moreover, the combination of the Ismo method and a Hanning window provide good results compared to DFT by the SE method.
6. Summary
Previous studies have confirmed that the precision of the noise suppression is improved by increased frequency resolution for quality enhancement of sound to a previously existing recording. In this study, we demonstrate that NHA provides high frequency resolution by suppressing the influence of the window length. The limit to the precision improvement of noise suppression by NHA is examined. Since a frequency spectrum using NHA is not affected by the window length at the time of frequency conversion, the frequency resolution width is regarded as theoretically infinitesimal.
We added white Gaussian noise and pink noise to a music signal and performed experiments to examine the effects of noise suppression by the basic SS method. Segmental SNR was used to evaluate the effectiveness of noise suppression through a fixed-threshold experiment, and NHA and the conventional SS method were compared. The precision of the noise suppression obtained by NHA was confirmed to be better than that obtained by the conventional method. A similar magnitude correlation was confirmed to appear among the methods even if the window length changed. In addition, the improvement in precision of noise suppression by high frequency resolution was confirmed when the envelope was stable. Based on these results, an improvement in noise suppression precision, as compared to that provided by the conventional method, can be expected in various applications by incorporating NHA with a theoretically infinitesimal frequency resolution.
In this study, we attempt only to re-master the old music sources. Therefore, the main noise sources are usually generated by the old recording device and the deterioration of the recording media as pulsive noise and white noise. We do not assume noise encountered in a noisy environment, such as a subway or a roadside.
It may be feasible to apply the proposed technique to sound sources of daily conversations. It appears that we can recover enough even if a noise is mixed because the vowel sound is a periodic signal over a short time period. However, in the frequency analysis of the consonant, the calculation using NHA is approximately equivalent to the calculation using FFT.
In addition, we examined a pink noise as a representative colored noise. Other steady noises can be reduced in the same manner if the outline of the power spectrum is known. However, it appears that we must incorporate new methods other than the proposed method, and the new methods must be dynamically devised because the characteristic of an unsteady noise must be predicted.
At this stage, we have not incorporated the proposed method into the embedded system or the portable device because the proposed method is several times longer than the calculation time of DFT (equivalent to the fastest FFT using a radix-2 number in this article). The high-speed SS method appears to be advantageous if the application is for the research of the speech recognition in the daily conversations. Although the calculation time is increased, the proposed technique will be effective if used in an application that requires high precision. We believe that the defects of the proposed method are best left for consideration in a future study if the proposed method is applied to a portable product or the research of speech recognition.
Declarations
Acknowledgements
This work was supported by Grants-in-Aid for Challenging Exploratory Research, MEXT(No.23650110).
Authors’ Affiliations
References
- Boll SF: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech, Signal Process ASSP 1979,27(2):113-120. 10.1109/TASSP.1979.1163209View ArticleGoogle Scholar
- Lin CT: Single-channel speech enhancement in variable noise-level environment. IEEE Trans Syst Man Cybernet A 2003,33(1):137-143.Google Scholar
- Kamath SD, Loizou PC: A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. Proceedings of the ICASSP 2002, 4164-4167.Google Scholar
- Goh Z, Tan KC, Tan BTG: Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Trans Speech Audio Process 1998, 6: 287-292. 10.1109/89.668822View ArticleGoogle Scholar
- Sorensen K, Andersen S: Speech enhancement with natural sounding residual noise based on connected time-frequency speech presence regions. EURASIP J Appl Signal Process 2005, 18: 2954-2964.View ArticleGoogle Scholar
- Soon IY, Koh SN: Speech enhancement using 2-D Fourier transform. IEEE Trans Speech Audio Process 2003, 11: 717-724. 10.1109/TSA.2003.816063View ArticleGoogle Scholar
- Ding H, Soon IY, Koh SN, Yeo CK: A spectral filtering method based on hybrid wiener filters for speech enhancement. Speech Commun 2009, 51: 259-267. 10.1016/j.specom.2008.09.003View ArticleGoogle Scholar
- Virag N: Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans Speech Audio Process 1999,7(2):126-137. 10.1109/89.748118View ArticleGoogle Scholar
- Udrea R, Vizireanu N, Ciochina S: An improved spectral subtraction method for speech enhancement using a perceptual weighting filter. Digital Signal Process 2008,18(4):581-587. 10.1016/j.dsp.2007.08.002View ArticleGoogle Scholar
- Kauppinen I, Roth K: Improved noise reduction in audio signals using spectral resolution enhancement with time-domain signal extrapolation. IEEE Trans Speech Audio Process 2005, 13: 1210-1216.View ArticleGoogle Scholar
- Hirobayashi S, Ito F, Yoshizawa T, Yamabuchi T: Estimation of the frequency of non-stationary signals by the steepest descent method. Proceedings of the Fourth Asia-Pacific Conference of Industrial Engineering and Management Systems 2002, 788-791.Google Scholar
- George EB, Smith MJT: Analysis-by-synthesis/overlap add sinusoidal modeling applied to the analysis and synthesis of musical tones. J Audio Eng Soc 1992,125(40):497-516.Google Scholar
- George EB, Smith MJT: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process 1997,5(5):398-406.View ArticleGoogle Scholar
- Turkey JW, Beaton AE: The fitting of power series, meaning polynomials, illustrated on band-spectroscopic-data. Technometrics 1974, 16: 189-192. 10.2307/1267938Google Scholar
- Chambers JM: Computational Methods for Data Analysis. Wiley, New York 1977.Google Scholar
- Gill PE, Murray W: Quasi-Newton methods for unconstrained optimization. J Inst Math Appl 1972, 9: 91-108. 10.1093/imamat/9.1.91MATHMathSciNetView ArticleGoogle Scholar
- Terada T, et al.: Non-stationary waveform analysis and synthesis using generalized harmonic analysis. IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis 1994, 429-432.View ArticleGoogle Scholar
- Wiener N: The Fourier Integral and Certain of Its Applications. Dover Publications, Inc., New York; 1958:158-199.Google Scholar
- Muraoka T, Kiriu S, Kamiya Y: Fast algorithm for generalized harmonic analysis (GHA). The 47th IEEE International Midwest Symposium on Circuit and Systems 2004, 153-156.Google Scholar
- Hirata Y: Non-harmonic Fourier analysis available for detecting very low-frequency components. J Sound Vib 2005,287(3):611-613.MathSciNetView ArticleGoogle Scholar
- Kauppinen I, Roth K: An adaptive technique for modeling audio signals. In Proceedings of the 4th International Conference on Digital Audio Effects (DAFx-01). Limerick, Ireland; 2001:1-4.Google Scholar
- Kauppinen I, Roth K: Audio signal extrapolation--theory and applications. In Proceedings of the 5th International Conference on Digital Audio Effects (DAFx-02). Hamburg, Germany; 2002:105-110.Google Scholar
- Berouti M, Schwartz R, Makhoul J: Enhancement of speech corrupted by acoustic noise. Proc IEEE ICASSP'79 1979, 208-211.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.