Welcome! The following statistics provide some visusal insights into ORCHESTRA Public Data Set. The Public Data Set constitutes patient data from the ORCHESTRA after a data cleaning process and includes data from patients documented until March 28, 2022.
The ORCHESTRA Public Data Set is originating from the central ORCHESTRA data base. The data anonymisation pipeline is described by Jakob et al. in "Design and evaluation of a data anonymisation pipeline to promote Open Science on COVID-19". The public data is anonymised using our "data protection concept". The anonymisation process was carried out with the "ARX software"
Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the ORCHESTRA study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.
If you have any comments on the notebook, please drop us a message at support@orchestra-cohort.eu.
Here we provide information on the basic structure of the ORCHESTRA Public Data Set.
The data set consists of 292 patients before anonymisation, 182 patients after anonymisation, and 11 variables. A row represents anonymised data of a single patient.
The columns are described by the variables:
*The Clinical Phases are defined according to the WHO clinical progression scale:
To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete ORCHESTRA data set. Anonymisation processes may lead to variables having less values than in the complete ORCHESTRA data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.
age: 18 - 39 years, 40 - 59 years, 60 - 79 years, >= 80 years, nan gender: Female, Male quarter_of_diagnosis: 3-2020, 4-2020, 1-2021, 2-2021, 3-2021 most_severe_stage_acute: Mild, Moderate, Severe, Unknown any_symptom_acute: No, Unknown, Yes type_of_discharge_acute: Alive, Ambulant, Referral to another insitution, Unknown hospitalisation: No, Unknown, Yes intensive_care_treatment: No, Unknown, Yes highest_level_oxygen_therapy: High flow, Invasive ventilation, Mask or nasal prongs, No oxygen, Non-invasive ventilation, Unknown availability_6month_followup: Yes any_symptom_6MFU: No, Yes
The following descriptive statistics are computed in this section:
The number of patients before anonymisation is 292.
The number of patients after anonymisation 182.
The following descriptive statistics on the health status at the end of medical consultation are computed in this section:
Note that we will use a filtered data set for computing the rates, which we describe below.
Before Anonymisation | After Anonymisation | |
---|---|---|
Ambulant | 187 | 128 |
Alive | 92 | 54 |
Referral to another insitution | 9 | 0 |
Unknown | 4 | 0 |
From here on we will indicate the three clinical phases as
In the following we will plot the: