Explorative Data Analysis of the ORCHESTRA Public Data Set¶

Welcome! The following statistics provide some visusal insights into ORCHESTRA Public Data Set. The Public Data Set constitutes patient data from the ORCHESTRA after a data cleaning process and includes data from patients documented until March 28, 2022.

The ORCHESTRA Public Data Set is originating from the central ORCHESTRA data base. The data anonymisation pipeline is described by Jakob et al. in "Design and evaluation of a data anonymisation pipeline to promote Open Science on COVID-19". The public data is anonymised using our "data protection concept". The anonymisation process was carried out with the "ARX software"

Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the ORCHESTRA study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.

If you have any comments on the notebook, please drop us a message at support@orchestra-cohort.eu.

Data Set Structure¶

Here we provide information on the basic structure of the ORCHESTRA Public Data Set.

The data set consists of 292 patients before anonymisation, 182 patients after anonymisation, and 11 variables. A row represents anonymised data of a single patient.

The columns are described by the variables:

  • age (categorial): age group
  • gender (categorial): gender
  • quarter_of_diagnosis (categorical): quartile of first confirmed diagnosis of SARS-CoV-2
  • most_severe_stage_acute (categorical): most severe stage reached based on WHO Clinical progression scale
  • any_symptom_acute (categorial): any symptom during acute phase
  • type_of_discharge_acute (categorial): status of the patient at the end of acute phase of SARS-CoV-2
  • hospitalisation (categorial): patient hospitalsed during acute phase of SARS-CoV-2
  • intensive_care_treatment (categorial): patient with intensive care treatment during acute phase of SARS-CoV-2
  • highest_level_oxygen_therapy (categorial): highest oxygen therapy level reached by patient
  • availability_6month_followup (categorial): completion of the 6-month follow-up
  • any_symptom_6MFU (categorial): patient has any symptoms at the 6-month follow-up

*The Clinical Phases are defined according to the WHO clinical progression scale:

41385_2021_464_Fig1_HTML.png

To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete ORCHESTRA data set. Anonymisation processes may lead to variables having less values than in the complete ORCHESTRA data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.

age:
18 - 39 years, 40 - 59 years, 60 - 79 years, >= 80 years, nan

gender:
Female, Male

quarter_of_diagnosis:
3-2020, 4-2020, 1-2021, 2-2021, 3-2021

most_severe_stage_acute:
Mild, Moderate, Severe, Unknown

any_symptom_acute:
No, Unknown, Yes

type_of_discharge_acute:
Alive, Ambulant, Referral to another insitution, Unknown

hospitalisation:
No, Unknown, Yes

intensive_care_treatment:
No, Unknown, Yes

highest_level_oxygen_therapy:
High flow, Invasive ventilation, Mask or nasal prongs, No oxygen, Non-invasive ventilation, Unknown

availability_6month_followup:
Yes

any_symptom_6MFU:
No, Yes

1. Descriptive Analysis¶

The following descriptive statistics are computed in this section:

  • Quarter of diagnosis Distribution
  • Gender Distribution
  • Age Distribution
  • Age - Gender Distribution

The number of patients before anonymisation is 292.
The number of patients after anonymisation 182.

2. Patient status at the end of acute phase¶

The following descriptive statistics on the health status at the end of medical consultation are computed in this section:

  • Frequency of Health Status at the End of Medical Consultation
  • Hospitalisation in the acute phase
  • Intensive care treatment in the acute phase
  • Highest oxygen level reached in the acute phase

Note that we will use a filtered data set for computing the rates, which we describe below.

Frequency of Health Status at the End of Medical Consultation¶

Before Anonymisation After Anonymisation
Ambulant 187 128
Alive 92 54
Referral to another insitution 9 0
Unknown 4 0

Hospitalisation in the acute phase¶

Intensive care treatment in the acute phase¶

Invasive ventilation in the acute phase¶

3. Clinical Phases¶

From here on we will indicate the three clinical phases as

  • Mild Phase
  • Moderate Phase
  • Severe Phase

In the following we will plot the:

  • Maximum phase reached by patients

4. 6 Months Follow Up¶

In the following we will plot the:

  • Any Symptom - 6MFU