SICdb Documentation
Introduction
he SICdb dataset provides insight to over 27 thousand intensive care admissions, including therapy and data of their preceding surgery. Data was collected between 2013 and 2021 from 4 of the intensive care units at the University Hospital Salzburg, having more than 3 thousand intensive care admissions per year on 37 beds. The dataset is deidentified and contains, amongst others, case information, laboratory, medication, monitor and respirator signal data.SICdb provides aggregated once-per-hour and highly granular once-per-minute data.
File Description
The SICdb dataset consists of 8 files. All data files in the SICdb dataset are provided as comma seperated files (.csv) using the RFC 4180 standard. Additionally all files are compressed with gzip, an commonly used openly available compression method.
List of files
Data Description cases
"The "cases" table is the base table of the SICdb dataset and the "CaseID" field serves as the identifier that relates all data. Each admission to the intensive care unit generates a unique "CaseID." To identify readmissions, a "PatientID" is provided. The "OffsetAfterFirstAdmission" field stores the time between the first admission and the current one.
"TimeOfStay" is the time in seconds from the admission recorded by the MetaVision system to the discharge. This time may include any preceding surgery and ends when the case is closed.
There are three fields indicating survival: "DischargeState" indicates the patient's state upon discharge from the ICU and is taken from the MetaVision discharge form. "HospitalDischargeType" provides information on how the case was closed in the clinical information system. "OffsetOfDeath" is the time in seconds from admission to death and includes additional hospital stays and out of hospital mortality. As it refers to a 1-year mortality it is set to null if death occurred more than one year after admission.
Name | Type | Description | Comment |
---|---|---|---|
CaseID | Integer | A randomly assigned identifier | CaseID is unique to each admission. |
PatientID | Integer | A randomly assigned identifier | PatientID is unique to each Patient, useful to identify readmission |
AdmissionYear | Integer | Year of admission | |
TimeOfStay | Integer | Time of stay (seconds) | Time from primary metavision admission, to last discharge. In cases, where the patient is admitted to ward after surgery, this will include surgery time. |
ICUOffset | Integer | Time of actual ICU admission | SICdb includes preceding surgery if applicable, this field indicates the first transfer to an intensive or intermediate care ward |
saps3 | Float | Simplified Acute Physiology Score III | |
HospitalDischargeType | Reference | Type of hospital discharge. | This field indicates survival. |
HospitalDischargeDay | Integer | The day of hospital discharge after admission | Note that this data is only available in days |
HospitalStayDays | Integer | Days stayed in hospital for this case, including pre icu. | |
DischargeState | Reference | Type of icu discharge | This field indicates survival on icu |
DischargeUnit | Reference | Unit the patient was discharged to, as selected in Metavision discharge form | |
OffsetOfDeath | Integer | 1-year mortality in seconds from primary admission to death | 1 year mortality, including out-of-hospital as far as known. See The "OffsetOfDeath" field denotes the elapsed time, in seconds, from admission to death, taking into account any subsequent hospital stays and out-of-hospital mortality data. The data was gathered from various sources, including government data. It is set to null if the death occurs more than one year after admission, since it is defined to 1-year mortality. For technical reasons observation time is only 6 month for some patients, field "EstimatedSurvivalObservationTime" holds information about that. However, it is worth noting that in some cases, if a patient dies in foreign countries, the information may be scarce. Postal address validity checks were performed, but no statistically significant differences were found, indicating that this may not be an issue of concern. for further information. |
EstimatedSurvivalObservationTime | Reference | Estimation of oversation time, either 1-year or 6-month | |
Sex | Reference | ||
WeightOnAdmission | Float | Rounded to +-5kg | |
HeightOnAdmission | Float | Rounded to +-5cm | |
AgeOnAdmission | Integer | Rounded to +-5y, over 90 set to 90 | |
HospitalUnit | Reference | Last unit using this case | |
ReferringUnit | Reference | Referring unit as selected in admission form | Note: Unfortunatly in some cases "Notaufnahme" is selected here, so the referring unit is not specified in these cases. |
ICD10Main | Text | ICD10 main code | |
ICD10MainText | Text | ICD10 main text | |
SurgicalSite | Reference | ||
InterventionsText | Text | List of interventions | |
HoursOfCRRT | Integer | Hours of continuous renal replacement threapy this admission ** | |
AdmissionFormHasSepsis | Reference | A mandatory field in the admission form ** | |
HeartSurgeryAdditionalData | For heart surgery patients there is additional data collected ** | Yes if applicable | |
HeartSurgeryCPBTime | Integer | Bypass time ** | |
HeartSurgeryBeginOffset | Integer | Offset in seconds from ICU admission to cut ** | |
HeartSurgeryEndOffset | Integer | Offset in seconds from ICU admission to end of surgery** | |
OffsetAfterFirstAdmission | Integer | If a patient has more than one admission, this is the offset in seconds from the first |
* These fields are not available on PhysioNet at the moment. Contact us for further information. ** These fields will be moved to data tables in version 1.1.0
Data Description d_references
The d_references table contains information on all encoded data fields of the SICdb dataset. Each field, that has "Reference" as field type, is associated with the ReferenceGlobalID in the d_references table. Additionally ReferenceUnit describes the unit of measurement used for this field. Refer to chapter SQL Examples to learn how to easily use this table in relational databases.
Name | Type | Description | Comment |
---|---|---|---|
ReferenceGlobalID | Integer | The unique ID for the reference | Use this identifier as dictionary for alle encoded fields |
ReferenceValue | Text | Reference value | i.e. "Creatinine" |
ReferenceName | Text | The name of the reference | i.e. "Laboratory" |
ReferenceUnit | Text | The unit of this item if applicable | i.e. "mg/dl" |
LOINC_code | Text | LOINC Code | |
LOINC_short | Text | LOINC SHORTNAME | |
LOINC_long | Text | LOINC LONG_COMMON_NAME |
Data Description data_float_h
Contains hourly float data associated with a case. If more (minute) data is available, data was aggregated and the corresponding values are stored in rawdata field. Refer to Deserialize Raw Data for more information. Metavision does, in database, not differentiate between qualitative (i.e. blood pressure) and quantitative (i.e. drainage volume) data, this was done programatically saving the sum instead of average. The field cnt indicates the amount of values aggregated.
Name | Type | Description | Comment |
---|---|---|---|
CaseID | Integer | Case identifier | |
DataID | Reference | ||
Offset | Integer | Time in seconds after admission | |
Val | Float | Value | The unit (if applicable) is found in d_references where data_float_h.DataID is associated to d_references.ReferenceGlobalID |
cnt | Integer | Amount of values aggregated | |
rawdata | Blob | List of 60 floats containing raw data | Due to excessive storage needs minute values have been aggregated. |
Refer to chapter "Decode raw_data" for more information
Data Description data_ref
data_ref.csv.gz contains referenced (nominal) data and is unique per icu admission.
Name | Type | Description | Comment |
---|---|---|---|
id | Integer | Primary Key | |
CaseID | Integer | Case identifier | |
FieldID | Reference | Refers to the name of the field | i.e. PreconditionDiabetes |
RefID | Reference |
Data Description laboratory.csv.gz
The laboratory table contains all available lab data, in many cases including the pre-admission labs.
Like all "Reference" fields additional data like name and unit of measurement is found in d_references. See SQL Examples to learn how to connect them using a relational database system.
To simplify lookup for labs of previous stays PatientID was added to this table.
Name | Type | Description | Comment |
---|---|---|---|
id | Integer | Primary Key | |
CaseID | Integer | Case identifier | |
PatientID | Integer | Patient identifier | |
DrugID | Reference | ||
Offset | Integer | Time in seconds after admission | |
LaboratoryValue | Float | Value | Unity of measurement can be found in d_references |
LaboratoryType | Reference | Used for special laboratory types, i.e. arterial blood gas. |
Data Description medication.csv.gz
Contains all medication given during stay. Additional data like name and unit of measurement is found in d_references, where medication.DrugID is associated to d_references.ReferenceGlobalID. Metavision does not differ between continuous or single doses in rangesignals, so a single dose is set to 60 second admission time. Each change of a continuous application will end the current entry and create a new one.
Name | Type | Description | Comment |
---|---|---|---|
id | Integer | Primary Key | |
CaseID | Integer | Case identifier | |
DrugID | Reference | ||
Offset | Integer | Time in seconds after admission | |
OffsetDrugEnd | Integer | If drug was given continuously refers to end of application. Field is set to 60 seconds on bolus application. | |
IsSingleDose | Integer | Set to 1 when given as bolus. | |
Amount | Float | Full amount given regardless of time | Unity of measurement found in d_references |
AmountPerMinute | Float | Dosage per minute | Unity of measurement found in d_references |
Data Description treatment.csv.gz
The treatment table contains time range data, i.e. ECMO or CRRT.
Name | Type | Description | Comment | |
---|---|---|---|---|
id | Integer | PrimaryKey | ||
CaseID | Integer | |||
TreatmentID | Reference | |||
Offset | Integer | Seconds from admission | ||
OffsetEnd | Integer | End of item, Seconds from admission | ||
TreatmentValue | Float | Value, if applicable | ||
TreatmentData | Text | JSon object containing additional data |
Preprocessed fields
Several fields have been preprocessed and added to the database. This includes peprocessed-for-convenience and preprocessed-for-anonymization elements. The data is found in data_ref.csv.gz, the text values are referenced in d_references, where d_references.ReferenceGlobalID relates to data_ref. FieldID and data_ref.RefID.
Name | Type | Description |
---|---|---|
PremedicationBetablocker | Referenced | Yes/No |
PremedicationACEInhibitor | Referenced | Yes/No |
PremedicationStatin | Referenced | Yes/No |
PremedicationDiuretic | Referenced | Yes/No |
PreconditionDiabetes | Referenced | Any type of diabetes (IDDM or NIDDM) |
PreconditionLungDisease | Referenced | Any kind of lung severe disease (COPD, restrictive lung diseases, ...). Asthma is included when medication is taken daily |
PreconditionArtHypertension | Referenced | Any kind of treated or diagnosed arterial hypertension |
PreconditionRenalDysfunction | Referenced | If stated in admission protocol (omitted due to anonymity issues) |
Case Selection
The SICdb dataset contains all admissions to metavision, that were treated on an intensive care unit. This includes patients that are admitted post-surgery as well as internal medicine cases. Note that non-surgical reasons for admission are highly underrepresentated in this dataset because most internal medicine patients are treated on a unit not using an electric PDMS.
All admissions to participating intensiv care units are included, as long as following conditions do not apply:
- Less the 10 minutes of Stay
- Missing mandatory admission data
- No monitor signals
- No medication data
- hospital number mismatch
Intensive care or intermediate care?
To provide as much data as possible all patients admitted to one of our intensive or intermediate care units are included. While the associated beds have a designated care level, the patients treatment level is not defined and are treated according to their needs, they may be transferred multiply between the strongly cooperating units. In general, patients that are uniquely associated (see unitlog) to INIC or INID did not receive high level intensive care treatment. Please note that the field cases.HospitalUnit gives no information about the level of intensive care a patient received. We may provide a preprocessed field in future versions of the SICdb dataset concerning the level of treatment.
Unit Description
The participating metavision enabled wards are CWIN, INBD, INIC, and INIC. INCV and CWCV are used for cases associated with SARS-COV-2, this includes SARS-COV-2 ARDS and patients with coincidental positive test and other cause of intensive care. CWIN and INBD, INCV and CWCVC are high level intensive care units. INIC and INID are lower level of care and do not provide invasive mechanical ventilation (NIV possible on INID, HFNC on all wards), CRRT or ECMO.
Validation
All cases are cross-matched with a second database. In case of unsuccessful matching, cases were excluded, as these refer to entries for testing and demo.
The dataset is in constant development. Feel free to contact us when you find a significant amount of invalid data.
SQL Examples
While the SICdb dataset is provided as csv raw data, it is designed to be used in a a relational database management system. Refer to Get Started for further information building up the database and corresponding software. Our software includes hundreds of configurable one-click SQL snippets like creatinine on day one/two/..., max/min/average lactate and much more.
Using Referenced Fields
One of the most important tasks is decoding the referenced fields. While our Software does this automatically the database design makes it also quite simple in raw SQL code.
Simple Join Query
SELECT CaseID, d_references.ReferenceValue as Survived FROM `cases` LEFT JOIN d_references ON d_references.ReferenceGlobalID=cases.HospitalDischargeType;
Join Query including Unit-Of-Measurement
SELECT caseID,Offset,LaboratoryValue,d_references.ReferenceValue as LaboratoryName, d_references.ReferenceUnit as LabUnit FROM `laboratory` LEFT JOIN d_references ON d_references.ReferenceGlobalID=laboratory.LaboratoryID;
Subquery Examples
A quite flexible query solution are subqueries. These allow better configuration for the field. Note this is a MySQL 8.x query and may need to be adapted in other database systems. These SQL code snippets have been generated by our software.
Select Maximum Creatinine Example
SELECT `cases`.`CaseID` AS CaseID ,( SELECT Max(laboratory.LaboratoryValue) FROM laboratory WHERE laboratory.CaseID = cases.CaseID AND laboratory.LaboratoryID = 367 AND Offset >= 0 AND Offset < 7 * 86400 ) AS MaximumCreatinineInFirstWeek FROM cases
Select First Creatinine Example
SELECT `cases`.`CaseID` AS CaseID ,( SELECT laboratory.LaboratoryValue AS val FROM laboratory WHERE laboratory.CaseID = cases.CaseID AND laboratory.LaboratoryID = 367 AND Offset >= 0 ORDER BY laboratory.Offset ASC LIMIT 0 ,1 ) AS FirstCreatinineAfterAdmission FROM cases
Select Second Creatinine Example
SELECT `cases`.`CaseID` AS CaseID ,( SELECT laboratory.LaboratoryValue AS val FROM laboratory WHERE laboratory.CaseID = cases.CaseID AND laboratory.LaboratoryID = 367 AND Offset >= 0 ORDER BY laboratory.Offset ASC LIMIT 1 ,1 ) AS SecondCreatinineAfterAdmission FROM cases
Select Second Creatine With Time Example
A little more advanced example including the time-of-measurement.
SELECT `cases`.`CaseID` AS CaseID ,( SELECT laboratory.LaboratoryValue AS val FROM laboratory WHERE laboratory.CaseID = cases.CaseID AND laboratory.LaboratoryID = 367 AND Offset >= 0 ORDER BY laboratory.Offset ASC LIMIT 1 ,1 ) AS SecondCreatinineAfterAdmission ,( SELECT laboratory.Offset AS val FROM laboratory WHERE laboratory.CaseID = cases.CaseID AND laboratory.LaboratoryID = 367 AND Offset >= 0 ORDER BY laboratory.Offset ASC LIMIT 1 ,1 ) / 3600 AS HoursAfterAdmission FROM cases
Version Information
The SICdb dataset uses a versioning system. It contains 3 numbers major, minor and patch. A suffix may be added for changes, the do not necessarily needs to be applied.
A major version change generally occurs when the database was incrementally updated and more cases are included.
A minor version change occurs when data was altered that will likely change most study data. (i.e. removal of a significant amount of invalid cases) A minor update also applies when new data is added (i.e. a new table) but the existing data is not altered.
A patch version change occurs when only specific data was altered and will not affect all exports. Please read changelog to check if your data is affected.
1.0.8 (07/2024)
- Added LOINC codes to laboratory references
1.0.7 (04/2024)
- Added field `cases`.`HospitalDischargeDay` `HospitalStayDays` representing the day of release from hospital after admission and the full lenght of hospital stay, respectively.
- Added field `cases`.`AdmissionUrgency`, depicting the urgency of admission
- Added High Flow (HFNC) therapy data
- Added Richmond Agitation-Sedation Scale (RASS) score
- Added Numeric Rating Scale (NRS-11)
- Added SOFA Score
- Removed 36 invalid cases
- Recalculated field `cases`.`OffsetAfterFirstAdmission`, fixing an issue that occasionally led to inadequate values
1.0.6 (05/2023)
- [Important] Update a major error at the heart rate signal (invalid mapping of ecg signal)
- [Important] Renamed heart rate signals
- [Important] Updated some (~150) invalid weight/height signals
- [Fix] Due to a change in table structure the Premedication_ fields missed reference id, fixed
- Added KDIGO_AKI_168 and published algorithm (note: in general be careful with using urine output KDIGO in automated datasets)
- Added field ICUOffset for better comparability with other ICU datasets
- Added more signal data
- Corrected a small issue in the norepinephrine per kg algorithm (change is not significant)
1.0.5
- Merged additional mortality data and changed OffsetOfDeath from in-hospital-mortality to general 1-year-mortality
- Removed or fixed 190 cases of inplausible height or weight data
- Added some more CRRT data
- Recalculated DrainageSum
1.0.4
- Added unity of measurements for signal data
- Structural changes