연세대 / 김병준, 한민아, 심현정*, 백종덕*
합성곱 신경망을 학습하는 데에 사용하는 손실함수로 대표적으로 평균제곱오차(mean squared error ;MSE)와 평균절대오차(mean absolute error; MAE), 지각손실(perceptual loss), 그리고 적대손실(adversarial loss) 등이 있습니다. 본 연구에서는 이들 각각 또는 이들의 조합을 손실함수로 사용하여 저선량 CT영상의 노이즈를 제거하기 위한 합성곱 신경망을 학습시켰습니다. 데이터는 XCAT 시뮬레이션 및 Mayo Clinic에서 제공받은 실측 데이터를 사용하였습니다. 기본 선량의 25%, 50%, 75%영상으로 촬영된 저선량 CT영상에 대하여 노이즈 제거를 진행하였고, 잡음력 스펙트럼과 변조 전달 함수, 그리고 수학적 관찰자를 통한 병변 검출능을 측정하였습니다. 그 결과, 선량의 세기나 병변의 종류에 따라 다르긴 하지만 많은 경우 지각손실함수와 적대손실 함수를 함께 사용하는 경우 기존의 반복적 재구성 방법에 비해 노이즈를 효과적으로 제거하면서도 해상도를 잘 유지한다는 것을 보였습니다.
Convolutional neural network (CNN)-based image denoising techniques have shown promising results in low-dose CT denoising. However, CNN often introduces blurring in denoised images when trained with a widely used pixel-level loss function. Perceptual loss and adversarial loss have been proposed recently to further improve the image denoising performance. In this paper, we investigate the effect of different loss functions on image denoising performance using task-based image quality assessment methods for various signals and dose levels.
We used a modified version of U-net that was effective at reducing the correlated noise in CT images. The loss functions used for comparison were two pixel-level losses (i.e., the mean-squared error and the mean absolute error), Visual Geometry Group network-based perceptual loss (VGG loss), adversarial loss used to train the Wasserstein generative adversarial network with gradient penalty (WGAN-GP), and their weighted summation. Each image denoising method was applied to reconstructed images and sinogram images independently and validated using the extended cardiac-torso (XCAT) simulation and Mayo Clinic datasets. In the XCAT simulation, we generated fan-beam CT datasets with four different dose levels (25%, 50%, 75%, and 100% of a normal-dose level) using 10 XCAT phantoms and inserted signals in a test set. The signals had two different shapes (spherical and spiculated), sizes (4 and 12 mm), and contrast levels (60 and 160 HU). To evaluate signal detectability, we used a detection task SNR (tSNR) calculated from a non-prewhitening model observer with an eye filter. We also measured the noise power spectrum (NPS) and modulation transfer function (MTF) to compare the noise and signal transfer properties.
Compared to CNNs without VGG loss, VGG-loss-based CNNs achieved a more similar tSNR to that of the normal-dose CT for all signals at different dose levels except for a small signal at the 25% dose level. For a low-contrast signal at 25% or 50% dose, adding other losses to the VGG loss showed more improved performance than only using VGG loss. The NPS shapes from VGG-loss-based CNN closely matched that of normal-dose CT images while CNN without VGG loss overly reduced the mid-high-frequency noise power at all dose levels. MTF also showed VGG-loss-based CNN with better-preserved high resolution for all dose and contrast levels. It is also observed that additional WGAN-GP loss helps improve the noise and signal transfer properties of VGG-loss-based CNN.
The evaluation results using tSNR, NPS, and MTF indicate that VGG-loss-based CNNs are more effective than those without VGG loss for natural denoising of low-dose images and WGAN-GP loss improves the denoising performance of VGG-loss-based CNNs, which corresponds with the qualitative evaluation.
Kim B1, Han M1, Shim H1, Baek J1.
School of Integrated Technology and Yonsei Institute of Convergence Technology, Yonsei University, Incheon, 21983, South Korea.