Defense of Pablo David Pino Garretón

Report generation from chest X-rays: Analysis of NLP metrics and clinically correct template-based model.

Advisor: Denis Parra Santander


Every year radiologists face an increasing demand of image-based diagnosis from pa- tients, and computer-aided diagnosis (CAD) systems seem like a promising way to alleviate their workload. In recent years, many authors have proposed deep learning models to gen- erate reports from medical images, but they mainly focus on improving Natural Language Processing (NLP) metrics, such as BLEU and CIDEr, which may not be suitable to mea- sure clinical correctness in the reports, as indicated by multiple authors. Additionally, most approaches are end-to-end black box models that are difficult or impossible to understand by a human, which would make it very hard to implement in a clinical scenario. In this thesis, we contest the state-of-the-art models and evaluations in the report gen- eration from chest X-rays task. We provide further evidence showing that traditional NLP metrics are not enough to evaluate this task, by showing their lack of robustness in mul- tiple cases. For example, we show NLP metrics are not able to discriminate sentences with opposite clinical meaning, and we show that slightly altering report wording from a model can increase its NLP performance while maintaining high clinical performance. We also propose a template-based report generation model that detects a set of abnormalities and verbalizes them via fixed sentences into a structured report. We benchmark our model in the IU X-ray and MIMIC-CXR datasets against naive baselines, deep learning-based models, and literature models, by employing the CheXpert labeler and NLP metrics. The proposed model is much simpler and inherently interpretable than other state-of-the-art methods, and achieves better results in medical correctness metrics, though worse in NLP. We conclude there is a need to improve the assessment methods in this research area, by analyzing the available data in detail, performing more extensive evaluations and involving expert physicians.

