Difference between revisions of "Introduction"

From SICDB Doc
Jump to navigation Jump to search
Line 2: Line 2:




=== Reference Table ===
The dataset, version 1.0.4, includes data from more than 27,386 admissions to the Department of Anesthesiology and Intensive Care Medicine at the General Hospital Salzburg and Paracelsus Medical University.
Nominal data is encoded, the reference table d_references provides information associated with a reference. Referenced fields of all data tables relate to the primary key of the reference table, d_references.ReferenceGlobalID. The value field d_references.ReferenceValue gives the variables value, the unit field d_references.ReferenceUnit holds the unit or measurement if applicable. Refer to chapter [[d_references]] for further information.


=== Data Description ===
== Data Tables ==
The SICdb dataset contains billions of data entries, provided in 6 data tables. The base table is ''cases'', which contains one entry per intensive care admission. The table contains, amongst others, information about the patient, like age, weight and sex, and case information like diagnosis, scores and ICD10 codes. All other data table relate to this table, where x.CaseID is associated with cases.CaseID. Most data has timing information, where the field x.offset is the amount of seconds from admission to event time.
The SICdb dataset consists of billions of data entries across 7 data tables. The main table, "cases," contains a single entry for each intensive care admission and includes information about the patient (such as age, weight, and sex) and case details (such as diagnosis, scores, and ICD10 codes). The "TimeOfStay" field indicates the time from the first admission to a Metavision-enabled ward to the final closing of the case, including any preceding surgery. The "OffsetOfDeath" field indicates survival in seconds from admission. This data represents in-hospital mortality and may extend beyond the length of the intensive care stay or the current hospital stay. Personal data such as age, weight, and height have been grouped into bins of 5, with ages over 90 placed in the final bin.


Table laboratory contains laboratory values, table medication provides data about applied drugs. There are several generic data tables, which contain data sorted by their type. In table data_ref additional nominal/categorical data, one per admission, is found. In data_range items with a begin and end time are documented, this includes for example data about central lines or drainages. Table data_float_h contains float data, aggregated once per hour. To decrease table size minute data is serialized as a stream of IEEE 754 floats in data_float_h.rawdata
All other data tables are related to the "cases" table through the "CaseID" field. Most data has timing information, with the "offset" field indicating the number of seconds from admission to the time of the event. The "laboratory" table contains laboratory values and the "medication" table provides data on administered drugs. There are several generic data tables that contain data sorted by type. The "data_ref" table contains additional nominal/categorical data, one entry per admission, and the "data_range" table documents items with a start and end time, such as data on central lines or drainages. The "data_float_h" table contains float data, aggregated once per hour, and includes most signal data. To reduce table size, minute data is serialized as a stream of IEEE 754 floats in the "data_float_h.rawdata" field. For further instructions on using this data, see the documentation found in Documentation.pdf or online [7], and refer to the unpacking script example provided on our GitHub repository [8].


The database contains pseudoanonymized data of intensive care patients and has to be handled with appropriate care and respect.
== Reference Table ==
Nominal data is encoded, and the reference table "d_references" provides additional information about the associated field. The referenced fields in all data tables correspond to the primary key of the "d_references" table, "ReferenceGlobalID." The "ReferenceValue" field in "d_references" gives the variable's value, and the "ReferenceUnit" field holds the unit or measurement, if applicable.
 
== Data Format ==
GZip-compressed RFC 4180 comma-separated files are provided. The most current documentation, including table schemas, can be found online [7], and an offline copy is included in the files under the name Documentation.pdf. A GitHub repository has been created to share code, report issues, and discuss the dataset [8].

Revision as of 09:54, 6 January 2023

The SICdb dataset provides insight to over 27 thousand intensive care admissions, including therapy and data of their preceding surgery. Data was collected between 2013 and 2021 from 4 of the intensive care units at the University Hospital Salzburg, having more than 3 thousand intensive care admissions per year on 37 beds. The dataset is deidentified and contains, amongst others, case information, laboratory, medication, monitor and respirator signal data.SICdb provides aggregated once-per-hour and highly granular once-per-minute data.


The dataset, version 1.0.4, includes data from more than 27,386 admissions to the Department of Anesthesiology and Intensive Care Medicine at the General Hospital Salzburg and Paracelsus Medical University.

Data Tables

The SICdb dataset consists of billions of data entries across 7 data tables. The main table, "cases," contains a single entry for each intensive care admission and includes information about the patient (such as age, weight, and sex) and case details (such as diagnosis, scores, and ICD10 codes). The "TimeOfStay" field indicates the time from the first admission to a Metavision-enabled ward to the final closing of the case, including any preceding surgery. The "OffsetOfDeath" field indicates survival in seconds from admission. This data represents in-hospital mortality and may extend beyond the length of the intensive care stay or the current hospital stay. Personal data such as age, weight, and height have been grouped into bins of 5, with ages over 90 placed in the final bin.

All other data tables are related to the "cases" table through the "CaseID" field. Most data has timing information, with the "offset" field indicating the number of seconds from admission to the time of the event. The "laboratory" table contains laboratory values and the "medication" table provides data on administered drugs. There are several generic data tables that contain data sorted by type. The "data_ref" table contains additional nominal/categorical data, one entry per admission, and the "data_range" table documents items with a start and end time, such as data on central lines or drainages. The "data_float_h" table contains float data, aggregated once per hour, and includes most signal data. To reduce table size, minute data is serialized as a stream of IEEE 754 floats in the "data_float_h.rawdata" field. For further instructions on using this data, see the documentation found in Documentation.pdf or online [7], and refer to the unpacking script example provided on our GitHub repository [8].

Reference Table

Nominal data is encoded, and the reference table "d_references" provides additional information about the associated field. The referenced fields in all data tables correspond to the primary key of the "d_references" table, "ReferenceGlobalID." The "ReferenceValue" field in "d_references" gives the variable's value, and the "ReferenceUnit" field holds the unit or measurement, if applicable.

Data Format

GZip-compressed RFC 4180 comma-separated files are provided. The most current documentation, including table schemas, can be found online [7], and an offline copy is included in the files under the name Documentation.pdf. A GitHub repository has been created to share code, report issues, and discuss the dataset [8].