A Deep Learning Model For Chest Radiograph Findings From Pneumonia Etiology Research for Child Health (PERCH)

Primary endpoint pneumonia (PEP) along with non-endpoint/other infiltrates (OI), and non-PEP/OI are defined by the World Health Organization (WHO) as endpoints of evaluation in vaccine effectiveness studies among pediatric population. This is a tool to detect these endpoints on pediatric chest radiograph. The project is written in TensorFlow 2.2. The original study is published on PLOS ONE.

The model was built using DenseNet121 and trained using chest x-ray images from the Pneumonia Etiology Research for Child Health (PERCH) study. The project also used CheXpert dataset for pretraining, and images from WHO-original and WHO-CRES (Chest Radiography in Epidemiological Studies) for testing.

Disclaimer: The definition is designed to be used in epidmiological studies which typically favors specificty over sensitivity, and thus is not meant to be used for clinical diagnosis where a higher sensitivity is preferred to reduce under-diagnosis. Some researchers have recommended dropping OI category from the WHO definition due to its low inter-rater agreement.

Visualization

Primary Endpoint Pneumonia

alt text Frontal radiographs of the chest in a child with WHO-defined primary endpoint pneumonia; the child is rotated to the right with dense opacity in the right upper lobe; the model localizes consolidation with a predicted probability p = 0.980; the discriminative visualization shows fine-grained features important to the predicted class.

Non-Endpoint/Other Infiltrate

alt text Frontal radiograph of the chest presents patchy opacity consistent with non-endpoint infiltrate. The model correctly classifies the image as infiltrate with a probability of p = 0.917 and localizes the areas of opacity. The class discriminative visualization highlights important class features.

User Tutorial

File Structure:

The folder WHO_images contains a toy sample of 9 randomly selected images (3 PEP, 3 OI, and 3 non-PEP/OI) from the WHO-CRES dataset. The folder saved_model contains a pre-trained weight file from the PERCH-CXR study.

Step by Step Instructions:

Note: The instructions work for the toy sample right out of the box. For training on your own data, modify the parameters in config.ini, and structure the data csv file as is in WHO_CRES.csv, with first column containing path to each image, and second onward column containing image labels.

Run pip install -r requirements.txt to install required packages.
Run python generate_tfreocrd.py to transform data into TFRecords file, an optional format for TensorFlow, recommended for working large dataset.
Run python train.py to train and evaluate the model.
Run python test.py to test the model.
Run python grad-cam.py to visualize model's prediction using Grad-CAM.

Config.ini File Explanation

The file contains 4 sections (DATA,TRAIN,TEST,GRAD-CAM), each corresponding to one of the 4 steps.
The [DATA] section contains a sharding parameter. Sharding can be used to acheive a more thoroughly shuffled dataset in order to destroy any large-scale correlations in your data (see "How to shuffle in TensorFlow").
When n_fold>1, the training will run in n-fold cross-validation mode, and the results will be saved to n folders.
The [TRAIN] section contains a class_names parameter. Its order corresponds to the order of outcome columns in the data csv file and determines the order of model outputs. If you are evaluating your pre-trained weight in a new dataset, remember to sort the outcome columns in your new data according to class_names.

Author

Star Chen (starchen1440@gmail.com)

Acknowledgment

The study is sponsored by Merck & Co., Inc.

License

This project is licensed under the terms of the MIT license.