Difference between revisions of "Deserialize Raw Data"
| (10 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| ==  | |||
| <noinclude>== Why is this data serialized? ==</noinclude> | |||
| <includeonly>===Why is this data serialized? ===</includeonly> | |||
| The intermediate data, where minute values are not aggregated, has roughly about 240gb, and it is to be expected that is increases by about 15% in 2023 update. By aggregating and serializing these values the database can be compresssed to under 10gb. Nevertheless it complicates the deserialization of this data. While our software can export the minute values one-click, using it in raw data requires a bit of coding. | |||
| Refer to chapter Full Unpack to learn how to completely unpack minute data. | |||
| <noinclude>== Encoding ==</noinclude> | |||
| <includeonly>=== Encoding ===</includeonly> | |||
| The raw data field is a stream of 60 little endian IEEE 754 floats, so it has exactly 240 bytes. The first 4 bytes represent the first minute of the hour and so on. Note that 0x000000 is defined to be NULL (no value). | The raw data field is a stream of 60 little endian IEEE 754 floats, so it has exactly 240 bytes. The first 4 bytes represent the first minute of the hour and so on. Note that 0x000000 is defined to be NULL (no value). | ||
| <noinclude>== Python example ==</noinclude> | |||
| <includeonly>=== Python example ===</includeonly> | |||
|   def GetRawValues(data): | |||
|     ret=[] | |||
|     for i in range(int(len(data)/4)): | |||
|         if (data[i*4]==0 and data[i*4+1]==0 and data[i*4+2]==0 and data[i*4+3]==0): continue # remove null values | |||
|         ret.append(struct.unpack('<f',data[i*4:i*4+4])[0]) | |||
|     return ret | |||
| This simple example does not include a correct offset for the values. You may check our [https://github.com/nrodemund/sicdb/tree/main/Scripts/Unpack%20raw%20data unpack script] to find an example on how to compute the offset.  | |||
| == C# example == | <noinclude>== C# example ==</noinclude> | ||
| <includeonly>=== C# example ===</includeonly> | |||
|          public static float?[] GetRawValues(byte[] data) |          public static float?[] GetRawValues(byte[] data) | ||
|          { |          { | ||
|              byte[] buf = new byte[4]; |              byte[] buf = new byte[4]; | ||
|              float?[] ret = new float?[ |              float?[] ret = new float?[data.Length / 4]; | ||
|              for(int i = 0; i < data.Length; i += 4) |              for(int i = 0; i < data.Length; i += 4) | ||
|              { |              { | ||
| Line 17: | Line 37: | ||
|                  buf[2] = data[i+2]; |                  buf[2] = data[i+2]; | ||
|                  buf[3] = data[i+3]; |                  buf[3] = data[i+3]; | ||
|                  if (buf[0] == 0 && buf[1] == 0 && buf[2] == 0 && buf[3] == 0) continue; //  |                  if (buf[0] == 0 && buf[1] == 0 && buf[2] == 0 && buf[3] == 0) continue; // ignore null values | ||
|                  ret[i / 4] = BitConverter.ToSingle(buf); |                  ret[i / 4] = BitConverter.ToSingle(buf); // note: if you are on a BigEndian machine you need to flip buf | ||
|              } |              } | ||
|              return ret; |              return ret; | ||
|          } |          } | ||
| <noinclude>== Full Unpack ==</noinclude> | |||
| <includeonly>=== Full Unpack ===</includeonly> | |||
| We provide a simple unpack python script on our [https://github.com/nrodemund/sicdb/tree/main/Scripts/Unpack%20raw%20data github code repository]. | |||
Latest revision as of 12:59, 21 June 2023
Why is this data serialized?
The intermediate data, where minute values are not aggregated, has roughly about 240gb, and it is to be expected that is increases by about 15% in 2023 update. By aggregating and serializing these values the database can be compresssed to under 10gb. Nevertheless it complicates the deserialization of this data. While our software can export the minute values one-click, using it in raw data requires a bit of coding.
Refer to chapter Full Unpack to learn how to completely unpack minute data.
Encoding
The raw data field is a stream of 60 little endian IEEE 754 floats, so it has exactly 240 bytes. The first 4 bytes represent the first minute of the hour and so on. Note that 0x000000 is defined to be NULL (no value).
Python example
 def GetRawValues(data):
   ret=[]
   for i in range(int(len(data)/4)):
       if (data[i*4]==0 and data[i*4+1]==0 and data[i*4+2]==0 and data[i*4+3]==0): continue # remove null values
       ret.append(struct.unpack('<f',data[i*4:i*4+4])[0])
   return ret
This simple example does not include a correct offset for the values. You may check our unpack script to find an example on how to compute the offset.
C# example
       public static float?[] GetRawValues(byte[] data)
       {
           byte[] buf = new byte[4];
           float?[] ret = new float?[data.Length / 4];
           for(int i = 0; i < data.Length; i += 4)
           {
               buf[0] = data[i];
               buf[1] = data[i+1];
               buf[2] = data[i+2];
               buf[3] = data[i+3];
               if (buf[0] == 0 && buf[1] == 0 && buf[2] == 0 && buf[3] == 0) continue; // ignore null values
               ret[i / 4] = BitConverter.ToSingle(buf); // note: if you are on a BigEndian machine you need to flip buf
           }
           return ret;
       }
Full Unpack
We provide a simple unpack python script on our github code repository.