Hospital San Juan de Alicante – University of Alicante
PadChest
A large chest x-ray image dataset with multi-label annotated reportsPadChest: A large chest x-ray image dataset with multi-label annotated reports
We present a labeled large-scale, high resolution chest x-ray dataset for automated ex-ploration of medical images along with their associated reports. This dataset includes more than 160,000 images from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography.
The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS) terminology. A 27% of the reports were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms.Generated labels were validated, achieving a 0.93 Micro-F1 score using an independent test set.
To the best of our knowledge, this is the first public database of chest x-rays annotated with the largest number of different labels suitable for training supervised on radiographs, and the first one in Spanish containing radiographic reports.
Data availability
Use of the PadChest is free to all researchers. Researchers seeking to use the full Clinical Database must formally request access. By requesting access the user agrees that (1) he/she will not share the data, (2) he/she will make no attempt to reidentify individuals.
The PadChest, although de-identified, still contains information regarding the clinical care of patients, and must be treated with appropriate respect. Researchers seeking to use the full Clinical Database must formally request access.
B2DROP is supported as part of the EUDAT Collaborative Data Infrastructure services (www.eudat.eu). The B2DROP instance used for this work is provided by BSC-CNS.
Dataset Research Use Agreement
Please, read PADCHEST Dataset Research Use Agreement before download.
Dataset Statistics

PadChest global statistics
Dataset description

Table 5: Dataset fields: All additional processed fields different from original DICOM fields. Additional information on UMLS Metathesaurus CUIs can be found at https://uts.nlm.nih.gov/home.html
Example 1
Example 2
Researchers
PADCHEST (Pathology Detection in Chest Radiology)
Aurelia Bustos (a) , Antonio Pertusa (a), Jose María Salinas (b), María de la Iglesia Vayá (c)
(a) Department of Software and Computing Systems, University Institute for Computing Research, University of Alicante, Spain
(b) Department of Health Informatics, Hospital San Juan de Alicante, Spain
(c) Centre of Excellence in Biomedical Image, Regional Ministry of Health, Valencia, Spain
Contact
If you want to know more about the project, or contact the research team, write to us.