Hospital San Juan de Alicante – University of Alicante


A large chest x-ray image dataset with multi-label annotated reports

PadChest: A large chest x-ray image dataset with multi-label annotated reports

We present a labeled large-scale, high resolution chest x-ray dataset for automated ex-ploration of medical images along with their associated reports.  This dataset includes more than 160,000 images from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography.

The reports were labeled with 174 different radiographic findings,  19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS) terminology. A 27% of the reports were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms.Generated labels were validated, achieving a 0.93 Micro-F1 score using an independent test set.

To the best of our knowledge, this is the first public database of chest x-rays annotated with the largest number of different labels  suitable for training supervised on radiographs, and the first one in Spanish containing radiographic reports.

Data availability
Use of the PadChest is free to all researchers. Researchers seeking to use the full Clinical Database must formally request access. By requesting access the user agrees that (1) he/she will not share the data, (2) he/she will make no attempt to reidentify individuals.
The PadChest, although de-identified, still contains information regarding the clinical care of patients, and must be treated with appropriate respect. Researchers seeking to use the full Clinical Database must formally request access.
B2DROP is supported as part of the EUDAT Collaborative Data Infrastructure services ( The B2DROP instance used for this work is provided by BSC-CNS.
Dataset Research Use Agreement
Please, read PADCHEST Dataset Research Use Agreement before download.

Dataset Statistics

PadChest global statistics

Most reported radiographic findings. Labels are shown for both physician (dark color) and automatically labeled dataset (light color). See Appendix A.1.1 for counts of labels on each hierarchical tree (paper).

Most common locations of radiographic findings and differential diagnoses. See Appendix A.1.3 for counts on the locations tree (paper).

Dataset description

The generated dataset provides for each chest-x ray image two types of fields:


1) Fields that contains the values of the original field in the DICOM standard: StudyDate, PatientSex, ViewPosition, Modality, Manufacturer, PhotometricInterpretation, PixelRepresentation, Data representation of the pixel samples, PixelAspectRatio, SpatialResolution, BitsStored, WindowCenter, WindowWidth, Rows, Columns, XRayTubeCurrent, X-ray Tube Current, ExposureTime, Duration of x-ray exposure, Exposure, ExposureInuAs, RelativeXRayExposure


2) The remaining fields enrich the PadChest dataset with additional processed information as described in table 5.


Table 5: Dataset fields: All additional processed fields different from original DICOM fields. Additional information on UMLS Metathesaurus CUIs can be found at

Example 1



Labels [‘pulmonary mass’, ‘pacemaker’, ‘cardiomegaly’, ‘vascular hilar enlargement’, ‘sternotomy’, ‘dual chamber device’, ‘suture material’]
Localizations [‘loc lung field’, ‘loc right’, ‘loc hemithorax’, ‘loc hilar’, ‘loc cardiac’, ‘loc middle lung field’]
LabelsLocalizationsBySentence [[‘pulmonary mass’, ‘loc right’, ‘loc lung field’, ‘loc middle lung field’, ‘loc hemithorax’], [‘pacemaker’, ‘dual chamber device’], [‘cardiomegaly’, ‘loc cardiac’], [‘cardiomegaly’, ‘loc cardiac’], [‘vascular hilar enlargement’, ‘loc hilar’], [‘sternotomy’, ‘suture material’]]
labelCUIS [‘C0149726’ ‘C0030163’ ‘C0018800’ ‘C0185792’ ‘C2732817’ ‘C4305366’]
LocalizationsCUIS [‘C0225759’ ‘C0444532’ ‘C0934569’ ‘C0205150’ ‘C1522601’ ‘C0929434’]

Example 2



Labels [‘pneumothorax’, ‘pulmonary mass’]
Localizations [‘loc apical’, ‘loc right’]
LabelsLocalizationsBySentence [‘pneumothorax’, ‘loc apical’, ‘loc right’, ‘pulmonary mass’, ‘loc right’]
labelCUIS [‘C2073565’ ‘C0149726’]
LocalizationsCUIS [‘C0734296’ ‘C0444532’]


PADCHEST (Pathology Detection in Chest Radiology)

Aurelia Bustos (a) , Antonio Pertusa (a), Jose María Salinas (b), María de la Iglesia Vayá (c)

(a) Department of Software and Computing Systems, University Institute for Computing Research, University of Alicante, Spain
(b) Department of Health Informatics, Hospital San Juan de Alicante, Spain
(c) Centre of Excellence in Biomedical Image, Regional Ministry of Health, Valencia, Spain


If you want to know more about the project, or contact the research team, write to us.