Skip to content

Data Types

Introduction

Project Haystack defines a fixed set of data types called kinds, which are mapped to Python objects in Phable.

Map for singleton data types

Project Haystack Phable
Marker phable.Marker
NA phable.NA
Remove phable.Remove

Map for scalar atomic data types

Project Haystack Phable
Bool bool
Number phable.Number
Str str
Uri phable.Uri
Ref phable.Ref
Symbol phable.Symbol
Date datetime.date
Time datetime.time
DateTime datetime.datetime
Coord phable.Coord
XStr phable.XStr

Note: Phable's datetime.datetime must be timezone aware to represent Project Haystack's DateTime.

Map for collection data types

Project Haystack Phable
List typing.Sequence
Dict typing.Mapping
Grid phable.Grid

Note: Project Haystack data types are immutable, but Python's lists and dicts are mutable. Type checkers can use typing.Sequence and typing.Mapping to detect mutations while giving programmers flexibility to use either mutable (list/dict) or immutable (tuple/frozendict) types at runtime. Native frozendict support may be added in Python 3.15.

Data Types in Phable Only

As a convenience, Phable defines these data types, which are not defined in Project Haystack:

  • phable.DateRange
  • phable.DateTimeRange

Marker

Marker data type defined by Project Haystack here. Marker is a singleton used to create "label" tags.

Example:

from phable.kinds import Marker

meter_equip = {"meter": Marker(), "equip": Marker()}
Source code in src/phable/kinds.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Marker:
    """`Marker` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#marker). `Marker` is a
    singleton used to create "label" tags.

    **Example:**
    ```python
    from phable.kinds import Marker

    meter_equip = {"meter": Marker(), "equip": Marker()}
    ```
    """

    __instance = None

    def __new__(cls):
        if Marker.__instance is None:
            Marker.__instance = object.__new__(cls)
        return Marker.__instance

    def __str__(self):
        return "\u2713"

NA

NA data type defined by Project Haystack here. NA is a singleton to indicate a data value that is not available. In Project Haystack it is most often used in historized data to indicate a timestamp sample is in error.

Source code in src/phable/kinds.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class NA:
    """`NA` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#na). `NA` is a
    singleton to indicate a data value that is not available. In Project Haystack it is
    most often used in historized data to indicate a timestamp sample is in error.
    """

    __instance = None

    def __new__(cls):
        if NA.__instance is None:
            NA.__instance = object.__new__(cls)
        return NA.__instance

    def __str__(self):
        return "NA"

Remove

Remove data type defined by Project Haystack here. Remove is a singleton used in a dict to indicate removal of a tag.

Source code in src/phable/kinds.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class Remove:
    """`Remove` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#remove). `Remove` is a
    singleton used in a `dict` to indicate removal of a tag.
    """

    __instance = None

    def __new__(cls):
        if Remove.__instance is None:
            Remove.__instance = object.__new__(cls)
        return Remove.__instance

    def __str__(self):
        return "remove"

Number dataclass

Number data type defined by Project Haystack here.

Parameters:

Name Type Description Default
val float

Floating point value.

required
unit str | None

Optional unit of measurement defined in Project Haystack's standard unit database here.

Note: Phable does not validate a defined unit at this time.

None
Source code in src/phable/kinds.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@dataclass(frozen=True, slots=True)
class Number:
    """`Number` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#number).

    Parameters:
        val: Floating point value.
        unit:
            Optional unit of measurement defined in Project Haystack's standard unit
            database [here](https://project-haystack.org/doc/docHaystack/Units).

            **Note**: Phable does not validate a defined unit at this time.
    """

    val: float
    unit: str | None = None

    def __str__(self):
        if self.unit is not None:
            return f"{self.val}{self.unit}"
        else:
            return f"{self.val}"

Uri dataclass

Uri data type defined by Project Haystack here.

Example:

from phable.kinds import Uri

uri = Uri("http://project-haystack.org/")

Parameters:

Name Type Description Default
val str

Universal Resource Identifier according to RFC 3986.

required
Source code in src/phable/kinds.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
@dataclass(frozen=True, slots=True)
class Uri:
    """`Uri` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#uri).

    **Example:**
    ```python
    from phable.kinds import Uri

    uri = Uri("http://project-haystack.org/")
    ```

    Parameters:
        val:
            Universal Resource Identifier according to
            [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986).
    """

    val: str

    def __str__(self):
        return self.val

Ref dataclass

Ref data type defined by Project Haystack here.

Parameters:

Name Type Description Default
val str

Unique identifier for an entity.

required
dis str | None

Optional human display name.

None
Source code in src/phable/kinds.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
@dataclass(frozen=True, slots=True)
class Ref:
    """`Ref` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#ref).

    Parameters:
        val: Unique identifier for an entity.
        dis: Optional human display name.
    """

    val: str
    dis: str | None = None

    def __str__(self) -> str:
        return self.val

Symbol dataclass

Symbol data type defined by Project Haystack here.

Parameters:

Name Type Description Default
val str

def identifier. Consists of only ASCII letters, digits, underbar, colon, dash, period, or tilde.

required
Source code in src/phable/kinds.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
@dataclass(frozen=True, slots=True)
class Symbol:
    """`Symbol` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#symbol).

    Parameters:
        val:
            [def](https://project-haystack.org/doc/docHaystack/Defs) identifier.
            Consists of only ASCII letters, digits, underbar, colon, dash, period, or
            tilde.
    """

    val: str

    def __str__(self):
        return f"^{self.val}"

Coord dataclass

Coord data type defined by Project Haystack here.

Parameters:

Name Type Description Default
lat decimal.Decimal

Latitude represented in decimal degrees.

required
lng decimal.Decimal

Longitude represented in decimal degrees.

required
Source code in src/phable/kinds.py
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
@dataclass(frozen=True, slots=True)
class Coord:
    """`Coord` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#coord).

    Parameters:
        lat:
            Latitude represented in
            [decimal degrees](https://en.wikipedia.org/wiki/Decimal_degrees).
        lng:
            Longitude represented in
            [decimal degrees](https://en.wikipedia.org/wiki/Decimal_degrees).
    """

    lat: Decimal
    lng: Decimal

    def __str__(self):
        getcontext().prec = 6
        return f"C({self.lat}, {self.lng})"

XStr dataclass

XStr data type defined by Project Haystack here.

Parameters:

Name Type Description Default
type str

Type name that follows Project Haystack's tag naming rules, except it must start with an ASCII uppercase letter (A-Z).

required
val str

String encoded value.

required
Source code in src/phable/kinds.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
@dataclass(frozen=True, slots=True)
class XStr:
    """`XStr` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#xstr).

    Parameters:
        type:
            Type name that follows Project Haystack's
            [tag naming](https://project-haystack.org/doc/docHaystack/Kinds#names)
            rules, except it must start with an ASCII uppercase letter (A-Z).
        val: String encoded value.
    """

    type: str
    val: str

    def __str__(self):
        return f"({self.type}, {self.val})"

Grid dataclass

Grid data type defined by Project Haystack here.

Parameters:

Name Type Description Default
meta typing.Mapping[str, typing.Any]

Metadata for the entire Grid.

required
cols typing.Sequence[phable.kinds.GridCol]

Column definitions for the Grid.

required
rows typing.Sequence[typing.Mapping[str, typing.Any]]

Row data for Grid.

required
Source code in src/phable/kinds.py
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
@dataclass(frozen=True, slots=True)
class Grid:
    """`Grid` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#grid).

    Parameters:
        meta: Metadata for the entire `Grid`.
        cols: Column definitions for the `Grid`.
        rows: Row data for `Grid`.
    """

    meta: Mapping[str, Any]
    cols: Sequence[GridCol]
    rows: Sequence[Mapping[str, Any]]

    def __str__(self):
        return "Haystack Grid"

    @staticmethod
    def to_grid(
        rows: Mapping[str, Any] | Sequence[Mapping[str, Any]],
        meta: Mapping[str, Any] | None = None,
    ) -> Grid:
        """Creates a `Grid` using row data and optional metadata.

        If parameters include history data, assumes the history rows are in
        chronological order to establish `hisStart` and `hisEnd` in `meta`.

        Parameters:
            rows: Row data for `Grid`.
            meta: Optional metadata for the entire `Grid`.
        """
        normalized_rows: Sequence[Mapping[str, Any]]
        if isinstance(rows, Mapping):
            normalized_rows = [cast(Mapping[str, Any], rows)]
        else:
            normalized_rows = rows

        # might be able to find a nicer way to do this
        col_names: list[str] = []
        for row in normalized_rows:
            for col_name in row.keys():
                if col_name not in col_names:
                    col_names.append(col_name)

        cols = [GridCol(name) for name in col_names]

        grid_meta: dict[str, Any] = {"ver": "3.0"}

        if meta is not None:
            grid_meta = grid_meta | dict(meta)

        his_start = normalized_rows[0].get("ts", None)
        his_end = normalized_rows[-1].get("ts", None)

        if his_start is not None and his_end is not None:
            grid_meta["hisStart"] = his_start
            grid_meta["hisEnd"] = his_end + timedelta(minutes=1)

        return Grid(meta=grid_meta, cols=cols, rows=normalized_rows)

    def to_pandas(self):
        """Converts time-series `Grid` to a long-format Pandas DataFrame.

        **Note:** This method is experimental and subject to change.

        **Requirements:**
        - Phable's optional Pandas dependency must be installed.
        - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
        - Grid column metadata must have an `id` of type `Ref`.
        - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
        - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

        When converting to a long-format DataFrame, history data for one or more points are combined into columns.
        Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
        performance, since different points may have different value types. All value columns are always present for
        schema consistency to enable predictable programmatic access.

        For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
        value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
        type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

        | Column     | Pandas Type                  | Nullable | Description                                    |
        |------------|------------------------------|----------|------------------------------------------------|
        | `id`       | `Categorical`                | No       | Point identifier from Ref (without `@` prefix) |
        | `ts`       | `timestamp[us, tz][pyarrow]` | No       | Timestamp of the reading                       |
        | `val_bool` | `bool[pyarrow]`              | Yes      | Boolean value (when `kind` tag is `Bool`)      |
        | `val_str`  | `string[pyarrow]`            | Yes      | String value (when `kind` tag is `Str`)        |
        | `val_num`  | `double[pyarrow]`            | Yes      | Numeric value (when `kind` tag is `Number`)    |
        | `na`       | `bool[pyarrow]`              | No       | `True` when value is Project Haystack's `NA`   |

        The resultant DataFrame is sorted by `id` and `ts`.

        Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
        pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
        interpolation should not occur.

        **Example:**

        ```python
        # convert to long-format pandas dataframe
        df_long = his_grid.to_pandas()

        # pivot to wide format (one column per point, indexed by timestamp)
        # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
        df_wide = df_long.pivot_table(index="ts", columns="id", values="val_num")
        ```

        Raises:
            ValueError:
                If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
                row timestamps have a different timezone than `hisStart`, columns are missing required `id`
                metadata of type Ref, or values are unsupported types.
        """
        import pandas as pd
        import pyarrow as pa

        tz, data = _structure_long_format_for_df(self)

        schema = pa.schema(
            [
                ("id", pa.dictionary(pa.int32(), pa.string())),
                ("ts", pa.timestamp("us", tz=tz.key)),
                ("val_bool", pa.bool_()),
                ("val_str", pa.string()),
                ("val_num", pa.float64()),
                ("na", pa.bool_()),
            ]
        )

        table = pa.Table.from_pylist(data, schema=schema)
        df = table.to_pandas(types_mapper=pd.ArrowDtype)

        unique_ids = sorted(df["id"].unique())
        df["id"] = df["id"].astype(
            pd.CategoricalDtype(categories=unique_ids, ordered=False)
        )

        return df.sort_values(["id", "ts"]).reset_index(drop=True)

    def to_polars(self):
        """Converts time-series `Grid` to a long-format Polars DataFrame.

        **Note:** This method is experimental and subject to change.

        **Requirements:**
        - Phable's optional Polars dependency must be installed.
        - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
        - Grid column metadata must have an `id` of type `Ref`.
        - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
        - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

        When converting to a long-format DataFrame, history data for one or more points are combined into columns.
        Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
        performance, since different points may have different value types. All value columns are always present for
        schema consistency to enable predictable programmatic access.

        For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
        value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
        type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

        | Column     | Polars Type        | Nullable | Description                                    |
        |------------|--------------------|----------|------------------------------------------------|
        | `id`       | `Categorical`      | No       | Point identifier from Ref (without `@` prefix) |
        | `ts`       | `Datetime[us, tz]` | No       | Timestamp of the reading                       |
        | `val_bool` | `Boolean`          | Yes      | Boolean value (when `kind` tag is `Bool`)      |
        | `val_str`  | `String`           | Yes      | String value (when `kind` tag is `Str`)        |
        | `val_num`  | `Float64`          | Yes      | Numeric value (when `kind` tag is `Number`)    |
        | `na`       | `Boolean`          | No       | `True` when value is Project Haystack's `NA`   |

        The resultant DataFrame is sorted by `id` and `ts`.

        Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
        pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
        interpolation should not occur.

        **Example:**

        ```python
        # convert to long-format polars dataframe
        df_long = his_grid.to_polars()

        # pivot to wide format (one column per point, indexed by timestamp)
        # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
        df_wide = df_long.pivot(on="id", index="ts", values="val_num")
        ```

        Raises:
            ValueError:
                If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
                row timestamps have a different timezone than `hisStart`, columns are missing required `id`
                metadata of type Ref, or values are unsupported types.
        """
        import polars as pl  # ty: ignore[unresolved-import]

        tz, data = _structure_long_format_for_df(self)

        schema = {
            "id": pl.Categorical,
            "ts": pl.Datetime(time_unit="us", time_zone=tz.key),
            "val_bool": pl.Boolean,
            "val_str": pl.String,
            "val_num": pl.Float64,
            "na": pl.Boolean,
        }

        return pl.DataFrame(data=data, schema=schema).sort("id", "ts")

to_grid staticmethod

to_grid(rows, meta=None)

Creates a Grid using row data and optional metadata.

If parameters include history data, assumes the history rows are in chronological order to establish hisStart and hisEnd in meta.

Parameters:

Name Type Description Default
rows typing.Mapping[str, typing.Any] | typing.Sequence[typing.Mapping[str, typing.Any]]

Row data for Grid.

required
meta typing.Mapping[str, typing.Any] | None

Optional metadata for the entire Grid.

None
Source code in src/phable/kinds.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
@staticmethod
def to_grid(
    rows: Mapping[str, Any] | Sequence[Mapping[str, Any]],
    meta: Mapping[str, Any] | None = None,
) -> Grid:
    """Creates a `Grid` using row data and optional metadata.

    If parameters include history data, assumes the history rows are in
    chronological order to establish `hisStart` and `hisEnd` in `meta`.

    Parameters:
        rows: Row data for `Grid`.
        meta: Optional metadata for the entire `Grid`.
    """
    normalized_rows: Sequence[Mapping[str, Any]]
    if isinstance(rows, Mapping):
        normalized_rows = [cast(Mapping[str, Any], rows)]
    else:
        normalized_rows = rows

    # might be able to find a nicer way to do this
    col_names: list[str] = []
    for row in normalized_rows:
        for col_name in row.keys():
            if col_name not in col_names:
                col_names.append(col_name)

    cols = [GridCol(name) for name in col_names]

    grid_meta: dict[str, Any] = {"ver": "3.0"}

    if meta is not None:
        grid_meta = grid_meta | dict(meta)

    his_start = normalized_rows[0].get("ts", None)
    his_end = normalized_rows[-1].get("ts", None)

    if his_start is not None and his_end is not None:
        grid_meta["hisStart"] = his_start
        grid_meta["hisEnd"] = his_end + timedelta(minutes=1)

    return Grid(meta=grid_meta, cols=cols, rows=normalized_rows)

to_pandas

to_pandas()

Converts time-series Grid to a long-format Pandas DataFrame.

Note: This method is experimental and subject to change.

Requirements: - Phable's optional Pandas dependency must be installed. - Grid must have history data (hisStart in Grid metadata that is timezone-aware). - Grid column metadata must have an id of type Ref. - Grid row value types must be Number, bool, str, or NA. - Row timestamps must use the same timezone as hisStart in Grid metadata.

When converting to a long-format DataFrame, history data for one or more points are combined into columns. Values are split into typed columns (val_bool, val_str, val_num) to use native DataFrame types for performance, since different points may have different value types. All value columns are always present for schema consistency to enable predictable programmatic access.

For each DataFrame row: if the Grid value is Project Haystack's NA, the na column is True and all typed value columns are None. Otherwise, na is False and exactly one typed value column is populated based on type: val_bool for bool, val_str for str, or val_num for Number.

Column Pandas Type Nullable Description
id Categorical No Point identifier from Ref (without @ prefix)
ts timestamp[us, tz][pyarrow] No Timestamp of the reading
val_bool bool[pyarrow] Yes Boolean value (when kind tag is Bool)
val_str string[pyarrow] Yes String value (when kind tag is Str)
val_num double[pyarrow] Yes Numeric value (when kind tag is Number)
na bool[pyarrow] No True when value is Project Haystack's NA

The resultant DataFrame is sorted by id and ts.

Phable users are encouraged to interpolate data while in the long format dataframe using na before pivoting to a wide format dataframe, since pivoting loses NA semantics which define where interpolation should not occur.

Example:

# convert to long-format pandas dataframe
df_long = his_grid.to_pandas()

# pivot to wide format (one column per point, indexed by timestamp)
# use the appropriate value column (val_bool, val_str, or val_num) based on point kind
df_wide = df_long.pivot_table(index="ts", columns="id", values="val_num")

Raises:

Type Description
ValueError

If Grid does not have hisStart in metadata, hisStart is not timezone-aware, row timestamps have a different timezone than hisStart, columns are missing required id metadata of type Ref, or values are unsupported types.

Source code in src/phable/kinds.py
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
def to_pandas(self):
    """Converts time-series `Grid` to a long-format Pandas DataFrame.

    **Note:** This method is experimental and subject to change.

    **Requirements:**
    - Phable's optional Pandas dependency must be installed.
    - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
    - Grid column metadata must have an `id` of type `Ref`.
    - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
    - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

    When converting to a long-format DataFrame, history data for one or more points are combined into columns.
    Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
    performance, since different points may have different value types. All value columns are always present for
    schema consistency to enable predictable programmatic access.

    For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
    value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
    type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

    | Column     | Pandas Type                  | Nullable | Description                                    |
    |------------|------------------------------|----------|------------------------------------------------|
    | `id`       | `Categorical`                | No       | Point identifier from Ref (without `@` prefix) |
    | `ts`       | `timestamp[us, tz][pyarrow]` | No       | Timestamp of the reading                       |
    | `val_bool` | `bool[pyarrow]`              | Yes      | Boolean value (when `kind` tag is `Bool`)      |
    | `val_str`  | `string[pyarrow]`            | Yes      | String value (when `kind` tag is `Str`)        |
    | `val_num`  | `double[pyarrow]`            | Yes      | Numeric value (when `kind` tag is `Number`)    |
    | `na`       | `bool[pyarrow]`              | No       | `True` when value is Project Haystack's `NA`   |

    The resultant DataFrame is sorted by `id` and `ts`.

    Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
    pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
    interpolation should not occur.

    **Example:**

    ```python
    # convert to long-format pandas dataframe
    df_long = his_grid.to_pandas()

    # pivot to wide format (one column per point, indexed by timestamp)
    # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
    df_wide = df_long.pivot_table(index="ts", columns="id", values="val_num")
    ```

    Raises:
        ValueError:
            If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
            row timestamps have a different timezone than `hisStart`, columns are missing required `id`
            metadata of type Ref, or values are unsupported types.
    """
    import pandas as pd
    import pyarrow as pa

    tz, data = _structure_long_format_for_df(self)

    schema = pa.schema(
        [
            ("id", pa.dictionary(pa.int32(), pa.string())),
            ("ts", pa.timestamp("us", tz=tz.key)),
            ("val_bool", pa.bool_()),
            ("val_str", pa.string()),
            ("val_num", pa.float64()),
            ("na", pa.bool_()),
        ]
    )

    table = pa.Table.from_pylist(data, schema=schema)
    df = table.to_pandas(types_mapper=pd.ArrowDtype)

    unique_ids = sorted(df["id"].unique())
    df["id"] = df["id"].astype(
        pd.CategoricalDtype(categories=unique_ids, ordered=False)
    )

    return df.sort_values(["id", "ts"]).reset_index(drop=True)

to_polars

to_polars()

Converts time-series Grid to a long-format Polars DataFrame.

Note: This method is experimental and subject to change.

Requirements: - Phable's optional Polars dependency must be installed. - Grid must have history data (hisStart in Grid metadata that is timezone-aware). - Grid column metadata must have an id of type Ref. - Grid row value types must be Number, bool, str, or NA. - Row timestamps must use the same timezone as hisStart in Grid metadata.

When converting to a long-format DataFrame, history data for one or more points are combined into columns. Values are split into typed columns (val_bool, val_str, val_num) to use native DataFrame types for performance, since different points may have different value types. All value columns are always present for schema consistency to enable predictable programmatic access.

For each DataFrame row: if the Grid value is Project Haystack's NA, the na column is True and all typed value columns are None. Otherwise, na is False and exactly one typed value column is populated based on type: val_bool for bool, val_str for str, or val_num for Number.

Column Polars Type Nullable Description
id Categorical No Point identifier from Ref (without @ prefix)
ts Datetime[us, tz] No Timestamp of the reading
val_bool Boolean Yes Boolean value (when kind tag is Bool)
val_str String Yes String value (when kind tag is Str)
val_num Float64 Yes Numeric value (when kind tag is Number)
na Boolean No True when value is Project Haystack's NA

The resultant DataFrame is sorted by id and ts.

Phable users are encouraged to interpolate data while in the long format dataframe using na before pivoting to a wide format dataframe, since pivoting loses NA semantics which define where interpolation should not occur.

Example:

# convert to long-format polars dataframe
df_long = his_grid.to_polars()

# pivot to wide format (one column per point, indexed by timestamp)
# use the appropriate value column (val_bool, val_str, or val_num) based on point kind
df_wide = df_long.pivot(on="id", index="ts", values="val_num")

Raises:

Type Description
ValueError

If Grid does not have hisStart in metadata, hisStart is not timezone-aware, row timestamps have a different timezone than hisStart, columns are missing required id metadata of type Ref, or values are unsupported types.

Source code in src/phable/kinds.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
def to_polars(self):
    """Converts time-series `Grid` to a long-format Polars DataFrame.

    **Note:** This method is experimental and subject to change.

    **Requirements:**
    - Phable's optional Polars dependency must be installed.
    - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
    - Grid column metadata must have an `id` of type `Ref`.
    - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
    - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

    When converting to a long-format DataFrame, history data for one or more points are combined into columns.
    Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
    performance, since different points may have different value types. All value columns are always present for
    schema consistency to enable predictable programmatic access.

    For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
    value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
    type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

    | Column     | Polars Type        | Nullable | Description                                    |
    |------------|--------------------|----------|------------------------------------------------|
    | `id`       | `Categorical`      | No       | Point identifier from Ref (without `@` prefix) |
    | `ts`       | `Datetime[us, tz]` | No       | Timestamp of the reading                       |
    | `val_bool` | `Boolean`          | Yes      | Boolean value (when `kind` tag is `Bool`)      |
    | `val_str`  | `String`           | Yes      | String value (when `kind` tag is `Str`)        |
    | `val_num`  | `Float64`          | Yes      | Numeric value (when `kind` tag is `Number`)    |
    | `na`       | `Boolean`          | No       | `True` when value is Project Haystack's `NA`   |

    The resultant DataFrame is sorted by `id` and `ts`.

    Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
    pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
    interpolation should not occur.

    **Example:**

    ```python
    # convert to long-format polars dataframe
    df_long = his_grid.to_polars()

    # pivot to wide format (one column per point, indexed by timestamp)
    # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
    df_wide = df_long.pivot(on="id", index="ts", values="val_num")
    ```

    Raises:
        ValueError:
            If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
            row timestamps have a different timezone than `hisStart`, columns are missing required `id`
            metadata of type Ref, or values are unsupported types.
    """
    import polars as pl  # ty: ignore[unresolved-import]

    tz, data = _structure_long_format_for_df(self)

    schema = {
        "id": pl.Categorical,
        "ts": pl.Datetime(time_unit="us", time_zone=tz.key),
        "val_bool": pl.Boolean,
        "val_str": pl.String,
        "val_num": pl.Float64,
        "na": pl.Boolean,
    }

    return pl.DataFrame(data=data, schema=schema).sort("id", "ts")

GridCol dataclass

GridCol defines a column in a Grid.

Example:

from phable.kinds import GridCol

# Column with metadata
temp_col = GridCol("temp", {"unit": "°F", "dis": "Temperature"})

# Simple column without metadata
id_col = GridCol("id")

Parameters:

Name Type Description Default
name str

Column name following Haystack tag naming rules (lowercase start).

required
meta dict[str, typing.Any] | None

Optional metadata dictionary for the column (e.g., unit, display name).

None
Source code in src/phable/kinds.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
@dataclass(frozen=True, slots=True)
class GridCol:
    """`GridCol` defines a column in a `Grid`.

    **Example:**
    ```python
    from phable.kinds import GridCol

    # Column with metadata
    temp_col = GridCol("temp", {"unit": "°F", "dis": "Temperature"})

    # Simple column without metadata
    id_col = GridCol("id")
    ```

    Parameters:
        name: Column name following Haystack tag naming rules (lowercase start).
        meta: Optional metadata dictionary for the column (e.g., unit, display name).
    """

    name: str
    meta: dict[str, Any] | None = None

GridBuilder

Builder for constructing Project Haystack Grid objects.

Provides a builder pattern with method chaining for adding columns, metadata, and rows before creating a Grid.

Source code in src/phable/grid_builder.py
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class GridBuilder:
    """Builder for constructing Project Haystack `Grid` objects.

    Provides a builder pattern with method chaining for adding columns, metadata,
    and rows before creating a `Grid`.
    """

    def __init__(self):
        self._meta = {"ver": "3.0"}
        self._cols = []
        self._rows = []

    _meta: dict[str, Any]
    _cols: list[GridCol]
    _rows: list[dict[str, Any]]

    @property
    def col_names(self) -> list[str]:
        """Column names.

        Returns:
            List of column names in the order they were added.
        """
        return [col.name for col in self._cols]

    def set_meta(self, meta: Mapping[str, Any]) -> Self:
        """Set or update grid-level metadata.

        Parameters:
            meta: Metadata dictionary to merge with existing grid metadata.

        Returns:
            Self for method chaining.
        """
        self._meta = self._meta | dict(meta)
        return self

    def add_col(self, name: str, meta: Mapping[str, Any] | None = None) -> Self:
        """Adds a column to the grid.

        Parameters:
            name: Column name following Haystack tag naming rules (lowercase start).
            meta: Optional metadata for the column (e.g., unit, display name).

        Returns:
            Self for method chaining.

        Raises:
            ValueError: If column name is invalid or already exists.
        """
        if not _is_tag_name(name):
            raise ValueError(f"Invalid column name: {name}")

        # verify the column does not already exist
        for c in self._cols:
            if c.name == name:
                raise ValueError(f"Duplicate column name: {name}")

        col = GridCol(name, dict(meta) if meta is not None else None)

        self._cols.append(col)
        return self

    def set_col_meta(self, col_name: str, meta: Mapping[str, Any]) -> Self:
        """Set or update metadata for an existing column.

        Parameters:
            col_name: Name of the column to update.
            meta: Metadata to merge with existing column metadata.

        Returns:
            Self for method chaining.

        Raises:
            ValueError: If column does not exist.
        """
        col_found = False
        for i, c in enumerate(self._cols):
            if c.name == col_name:
                col_found = True
                existing_meta = c.meta or {}
                new_meta = existing_meta | dict(meta)
                self._cols[i] = GridCol(c.name, new_meta)
                break

        if not col_found:
            raise ValueError(f"Column not found: {col_name}")

        return self

    def add_row(self, row: Mapping[str, Any]) -> Self:
        """Adds a row of data to the grid.

        Parameters:
            row: Dictionary mapping column names to values.

        Returns:
            Self for method chaining.

        Raises:
            ValueError: If any row key does not match an existing column name.
        """
        col_names = self.col_names
        for key in row.keys():
            if key not in col_names:
                raise ValueError(f"Row key '{key}' does not match any column name")

        self._rows.append(dict(row))
        return self

    def build(self) -> Grid:
        """Builds a Grid from the accumulated columns, rows, and metadata.

        Returns:
            A constructed `Grid` instance.
        """
        return Grid(self._meta, self._cols, self._rows)

col_names property

col_names

Column names.

Returns:

Type Description
list[str]

List of column names in the order they were added.

add_col

add_col(name, meta=None)

Adds a column to the grid.

Parameters:

Name Type Description Default
name str

Column name following Haystack tag naming rules (lowercase start).

required
meta typing.Mapping[str, typing.Any] | None

Optional metadata for the column (e.g., unit, display name).

None

Returns:

Type Description
typing.Self

Self for method chaining.

Raises:

Type Description
ValueError

If column name is invalid or already exists.

Source code in src/phable/grid_builder.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def add_col(self, name: str, meta: Mapping[str, Any] | None = None) -> Self:
    """Adds a column to the grid.

    Parameters:
        name: Column name following Haystack tag naming rules (lowercase start).
        meta: Optional metadata for the column (e.g., unit, display name).

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If column name is invalid or already exists.
    """
    if not _is_tag_name(name):
        raise ValueError(f"Invalid column name: {name}")

    # verify the column does not already exist
    for c in self._cols:
        if c.name == name:
            raise ValueError(f"Duplicate column name: {name}")

    col = GridCol(name, dict(meta) if meta is not None else None)

    self._cols.append(col)
    return self

add_row

add_row(row)

Adds a row of data to the grid.

Parameters:

Name Type Description Default
row typing.Mapping[str, typing.Any]

Dictionary mapping column names to values.

required

Returns:

Type Description
typing.Self

Self for method chaining.

Raises:

Type Description
ValueError

If any row key does not match an existing column name.

Source code in src/phable/grid_builder.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def add_row(self, row: Mapping[str, Any]) -> Self:
    """Adds a row of data to the grid.

    Parameters:
        row: Dictionary mapping column names to values.

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If any row key does not match an existing column name.
    """
    col_names = self.col_names
    for key in row.keys():
        if key not in col_names:
            raise ValueError(f"Row key '{key}' does not match any column name")

    self._rows.append(dict(row))
    return self

build

build()

Builds a Grid from the accumulated columns, rows, and metadata.

Returns:

Type Description
phable.kinds.Grid

A constructed Grid instance.

Source code in src/phable/grid_builder.py
118
119
120
121
122
123
124
def build(self) -> Grid:
    """Builds a Grid from the accumulated columns, rows, and metadata.

    Returns:
        A constructed `Grid` instance.
    """
    return Grid(self._meta, self._cols, self._rows)

set_col_meta

set_col_meta(col_name, meta)

Set or update metadata for an existing column.

Parameters:

Name Type Description Default
col_name str

Name of the column to update.

required
meta typing.Mapping[str, typing.Any]

Metadata to merge with existing column metadata.

required

Returns:

Type Description
typing.Self

Self for method chaining.

Raises:

Type Description
ValueError

If column does not exist.

Source code in src/phable/grid_builder.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def set_col_meta(self, col_name: str, meta: Mapping[str, Any]) -> Self:
    """Set or update metadata for an existing column.

    Parameters:
        col_name: Name of the column to update.
        meta: Metadata to merge with existing column metadata.

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If column does not exist.
    """
    col_found = False
    for i, c in enumerate(self._cols):
        if c.name == col_name:
            col_found = True
            existing_meta = c.meta or {}
            new_meta = existing_meta | dict(meta)
            self._cols[i] = GridCol(c.name, new_meta)
            break

    if not col_found:
        raise ValueError(f"Column not found: {col_name}")

    return self

set_meta

set_meta(meta)

Set or update grid-level metadata.

Parameters:

Name Type Description Default
meta typing.Mapping[str, typing.Any]

Metadata dictionary to merge with existing grid metadata.

required

Returns:

Type Description
typing.Self

Self for method chaining.

Source code in src/phable/grid_builder.py
33
34
35
36
37
38
39
40
41
42
43
def set_meta(self, meta: Mapping[str, Any]) -> Self:
    """Set or update grid-level metadata.

    Parameters:
        meta: Metadata dictionary to merge with existing grid metadata.

    Returns:
        Self for method chaining.
    """
    self._meta = self._meta | dict(meta)
    return self

DateRange dataclass

DateRange data type, defined by Phable, describes a time range using dates.

Note: Project Haystack does not define a kind for DateRange.

Parameters:

Name Type Description Default
start datetime.date

Midnight of the start date (inclusive) for the range.

required
end datetime.date

Midnight of the end date (exclusive) for the range.

required
Source code in src/phable/kinds.py
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
@dataclass(frozen=True, slots=True)
class DateRange:
    """`DateRange` data type, defined by `Phable`, describes a time range using dates.

    **Note:** Project Haystack does not define a kind for `DateRange`.

    Parameters:
        start: Midnight of the start date (inclusive) for the range.
        end: Midnight of the end date (exclusive) for the range.
    """

    start: date
    end: date

    def __str__(self):
        return self.start.isoformat() + "," + self.end.isoformat()

DateTimeRange dataclass

DateTimeRange data type, defined by Phable, describes a time range using date, time, and timezone information.

datetime objects used for start and end must be timezone aware using ZoneInfo as a concrete implementation of the datetime.tzinfo abstract base class.

Example:

from datetime import datetime
from zoneinfo import ZoneInfo

from phable.kinds import DateTimeRange

tzinfo = ZoneInfo("America/New_York")
start = datetime(2024, 11, 22, 8, 19, 0, tzinfo=tzinfo)
end = datetime(2024, 11, 22, 9, 19, 0, tzinfo=tzinfo)

range_with_end = DateTimeRange(start, end)
range_without_end = DateTimeRange(start)

Note: Project Haystack does not define a kind for DateTimeRange.

Parameters:

Name Type Description Default
start datetime.datetime

Start timestamp (inclusive) which is timezone aware using ZoneInfo.

required
end datetime.datetime | None

Optional end timestamp (exclusive) which is timezone aware using ZoneInfo. If end is undefined, then assume end to be when the last data value was recorded.

None
Source code in src/phable/kinds.py
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
@dataclass(frozen=True, slots=True)
class DateTimeRange:
    """`DateTimeRange` data type, defined by `Phable`, describes a time range using
    date, time, and timezone information.

    `datetime` objects used for `start` and `end` must be timezone aware using
    `ZoneInfo` as a concrete implementation of the `datetime.tzinfo` abstract base
    class.

    **Example:**

    ```python
    from datetime import datetime
    from zoneinfo import ZoneInfo

    from phable.kinds import DateTimeRange

    tzinfo = ZoneInfo("America/New_York")
    start = datetime(2024, 11, 22, 8, 19, 0, tzinfo=tzinfo)
    end = datetime(2024, 11, 22, 9, 19, 0, tzinfo=tzinfo)

    range_with_end = DateTimeRange(start, end)
    range_without_end = DateTimeRange(start)
    ```

    **Note:** Project Haystack does not define a kind for `DateTimeRange`.

    Parameters:
        start: Start timestamp (inclusive) which is timezone aware using `ZoneInfo`.
        end:
            Optional end timestamp (exclusive) which is timezone aware using
            `ZoneInfo`. If end is undefined, then assume end to be when the last data
            value was recorded.
    """

    start: datetime
    end: datetime | None = None

    def __str__(self):
        if self.end is None:
            return _to_haystack_datetime(self.start)
        else:
            return (
                _to_haystack_datetime(self.start)
                + ","
                + _to_haystack_datetime(self.end)
            )

    def __post_init__(self):
        start_ok = isinstance(self.start.tzinfo, ZoneInfo)
        end_ok = self.end is None

        if isinstance(self.end, datetime):
            end_ok = isinstance(self.end.tzinfo, ZoneInfo)

        if start_ok is False or end_ok is False:
            raise ValueError