Hour Identifiers in Cobalt 1.1

go/fx-cobalt-hour-id

Document	Info
Last Update	2020-06-04
First Proposed	2020-06-01
Author	zmbush
Status	Final

Goals

This document seeks to define a notion of an hour_id that can be used in Cobalt 1.1 for the sake of hourly aggregation. This document will focus on the implementation issues that are present in LOCAL, since UTC does not need to support time zone or daylight savings time. Any solution for LOCAL should then work for UTC without issue.

Requirements

An hour index needs to

Be a monotonically increasing value for every 60 minute interval in the chosen timezone (Note: this is normally a trivial requirement, but daylight savings time means that a given hour past midnight could be repeated in a day)
Align with day_index (i.e. a new index should start when a day index starts, and end when a day index ends) (Note: this seems like a trivial requirement, however non-standard time zones like India (UTC+5:30) make this not as simple as just using UTC everywhere)
Be easily convertible into a day_index

Why We Need This

The primary use case of this hour_id will be in storing and aggregating on device before sending it to the server. The hour_id needs to meet requirement 1 to make backfill possible, since we need to easily be able to determine if one hour_id is before another. The hour_id needs to meet requirement 2 so that we can guarantee that events are never assigned to the wrong day, even if the local timezone uses non-standard time zones. Finally, we need to meet requirement 3 so that we can later assign observations generated for a given hour_id to the appropriate day_index.

Definitions

Time Struct

To calculate hour id, we make use of the following fields of the ctime struct tm:

tm_hour

Represents the number of hours since midnight in the current timezone. Has a range of (0-23)

tm_isdst

Is non-zero when DST is enabled, and zero otherwise. For time zones that don't practice DST (e.g. UTC), this will always be zero.

Method for calculating hour id

DST-Aware Hour Identifier

DST-Aware hour id forgoes accurately representing the number of hours since the unix epoch, to instead allow duplicate tm_hour values. This method is calculated by: (day_index * 48) + (tm_hour * 2) + (tm_isdst ? 0 : 1). With this method, DST hours are always even, and non-DST hours are always odd, thus avoiding any collision. A DST ‘fall back’ could then have the sequence of hours: 24, 26, 28, 29, 31, 33 and a DST ‘spring forward’ could have the sequence of hours: 25, 27, 29, 32, 34, 36. In both cases, the index increases every hour, usually by 2, but sometimes by less.

This hour index can be converted back to a day index by dividing by 48 and truncating.

Solution

This solution easily solves both the DST and fractional timezone problems, without increasing the complexity of the protobuf. This identifier does not have the downsides of the other two hour identification methods, at the cost of its value not increasing by 1 every hour. This is fine, because we don't actually care about the value of the identifier as long as it still meets the requirements, which this does.