Data Types

Introduction

Project Haystack defines a fixed set of data types called kinds, which are mapped to Python objects in Phable.

Map for singleton data types

Project Haystack	Phable
`Marker`	`phable.Marker`
`NA`	`phable.NA`
`Remove`	`phable.Remove`

Map for scalar atomic data types

Project Haystack	Phable
`Bool`	`bool`
`Number`	`phable.Number`
`Str`	`str`
`Uri`	`phable.Uri`
`Ref`	`phable.Ref`
`Symbol`	`phable.Symbol`
`Date`	`datetime.date`
`Time`	`datetime.time`
`DateTime`	`datetime.datetime`
`Coord`	`phable.Coord`
`XStr`	`phable.XStr`

Note: Phable's datetime.datetime must be timezone aware to represent Project Haystack's DateTime.

Map for collection data types

Project Haystack	Phable
`List`	`typing.Sequence`
`Dict`	`typing.Mapping`
`Grid`	`phable.Grid`

Note: Project Haystack data types are immutable, but Python's lists and dicts are mutable. Type checkers can use typing.Sequence and typing.Mapping to detect mutations while giving programmers flexibility to use either mutable (list/dict) or immutable (tuple/frozendict) types at runtime. Native frozendict support may be added in Python 3.15.

Data Types in Phable Only

As a convenience, Phable defines these data types, which are not defined in Project Haystack:

phable.DateRange
phable.DateTimeRange

Marker

Marker data type defined by Project Haystack here. Marker is a singleton used to create "label" tags.

Example:

from phable.kinds import Marker

meter_equip = {"meter": Marker(), "equip": Marker()}

Source code in src/phable/kinds.py

class Marker:
    """`Marker` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#marker). `Marker` is a
    singleton used to create "label" tags.

    **Example:**
    ```python
    from phable.kinds import Marker

    meter_equip = {"meter": Marker(), "equip": Marker()}
    ```
    """

    __instance = None

    def __new__(cls):
        if Marker.__instance is None:
            Marker.__instance = object.__new__(cls)
        return Marker.__instance

    def __str__(self):
        return "\u2713"

NA

NA data type defined by Project Haystack here. NA is a singleton to indicate a data value that is not available. In Project Haystack it is most often used in historized data to indicate a timestamp sample is in error.

Source code in src/phable/kinds.py

class NA:
    """`NA` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#na). `NA` is a
    singleton to indicate a data value that is not available. In Project Haystack it is
    most often used in historized data to indicate a timestamp sample is in error.
    """

    __instance = None

    def __new__(cls):
        if NA.__instance is None:
            NA.__instance = object.__new__(cls)
        return NA.__instance

    def __str__(self):
        return "NA"

Remove

Remove data type defined by Project Haystack here. Remove is a singleton used in a dict to indicate removal of a tag.

Source code in src/phable/kinds.py

class Remove:
    """`Remove` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#remove). `Remove` is a
    singleton used in a `dict` to indicate removal of a tag.
    """

    __instance = None

    def __new__(cls):
        if Remove.__instance is None:
            Remove.__instance = object.__new__(cls)
        return Remove.__instance

    def __str__(self):
        return "remove"

Number `dataclass`

Number data type defined by Project Haystack here.

Parameters:

Name	Type	Description	Default
`val`	`float`	Floating point value.	required
`unit`	`str \| None`	Optional unit of measurement defined in Project Haystack's standard unit database here. Note: Phable does not validate a defined unit at this time.	`None`

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class Number:
    """`Number` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#number).

    Parameters:
        val: Floating point value.
        unit:
            Optional unit of measurement defined in Project Haystack's standard unit
            database [here](https://project-haystack.org/doc/docHaystack/Units).

            **Note**: Phable does not validate a defined unit at this time.
    """

    val: float
    unit: str | None = None

    def __str__(self):
        if self.unit is not None:
            return f"{self.val}{self.unit}"
        else:
            return f"{self.val}"

Uri `dataclass`

Uri data type defined by Project Haystack here.

Example:

from phable.kinds import Uri

uri = Uri("http://project-haystack.org/")

Parameters:

Name	Type	Description	Default
`val`	`str`	Universal Resource Identifier according to RFC 3986.	required

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class Uri:
    """`Uri` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#uri).

    **Example:**
    ```python
    from phable.kinds import Uri

    uri = Uri("http://project-haystack.org/")
    ```

    Parameters:
        val:
            Universal Resource Identifier according to
            [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986).
    """

    val: str

    def __str__(self):
        return self.val

Ref `dataclass`

Ref data type defined by Project Haystack here.

Parameters:

Name	Type	Description	Default
`val`	`str`	Unique identifier for an entity.	required
`dis`	`str \| None`	Optional human display name.	`None`

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class Ref:
    """`Ref` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#ref).

    Parameters:
        val: Unique identifier for an entity.
        dis: Optional human display name.
    """

    val: str
    dis: str | None = None

    def __str__(self) -> str:
        return self.val

Symbol `dataclass`

Symbol data type defined by Project Haystack here.

Parameters:

Name	Type	Description	Default
`val`	`str`	def identifier. Consists of only ASCII letters, digits, underbar, colon, dash, period, or tilde.	required

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class Symbol:
    """`Symbol` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#symbol).

    Parameters:
        val:
            [def](https://project-haystack.org/doc/docHaystack/Defs) identifier.
            Consists of only ASCII letters, digits, underbar, colon, dash, period, or
            tilde.
    """

    val: str

    def __str__(self):
        return f"^{self.val}"

Coord `dataclass`

Coord data type defined by Project Haystack here.

Parameters:

Name	Type	Description	Default
`lat`	`decimal.Decimal`	Latitude represented in decimal degrees.	required
`lng`	`decimal.Decimal`	Longitude represented in decimal degrees.	required

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class Coord:
    """`Coord` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#coord).

    Parameters:
        lat:
            Latitude represented in
            [decimal degrees](https://en.wikipedia.org/wiki/Decimal_degrees).
        lng:
            Longitude represented in
            [decimal degrees](https://en.wikipedia.org/wiki/Decimal_degrees).
    """

    lat: Decimal
    lng: Decimal

    def __str__(self):
        getcontext().prec = 6
        return f"C({self.lat}, {self.lng})"

XStr `dataclass`

XStr data type defined by Project Haystack here.

Parameters:

Name	Type	Description	Default
`type`	`str`	Type name that follows Project Haystack's tag naming rules, except it must start with an ASCII uppercase letter (A-Z).	required
`val`	`str`	String encoded value.	required

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class XStr:
    """`XStr` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#xstr).

    Parameters:
        type:
            Type name that follows Project Haystack's
            [tag naming](https://project-haystack.org/doc/docHaystack/Kinds#names)
            rules, except it must start with an ASCII uppercase letter (A-Z).
        val: String encoded value.
    """

    type: str
    val: str

    def __str__(self):
        return f"({self.type}, {self.val})"

Grid `dataclass`

Grid data type defined by Project Haystack here.

Parameters:

Name	Type	Description	Default
`meta`	`typing.Mapping[str, typing.Any]`	Metadata for the entire `Grid`.	required
`cols`	`typing.Sequence[phable.kinds.GridCol]`	Column definitions for the `Grid`.	required
`rows`	`typing.Sequence[typing.Mapping[str, typing.Any]]`	Row data for `Grid`.	required

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class Grid:
    """`Grid` data type defined by Project Haystack
    [here](https://project-haystack.org/doc/docHaystack/Kinds#grid).

    Parameters:
        meta: Metadata for the entire `Grid`.
        cols: Column definitions for the `Grid`.
        rows: Row data for `Grid`.
    """

    meta: Mapping[str, Any]
    cols: Sequence[GridCol]
    rows: Sequence[Mapping[str, Any]]

    def __str__(self):
        return "Haystack Grid"

    @staticmethod
    def to_grid(
        rows: Mapping[str, Any] | Sequence[Mapping[str, Any]],
        meta: Mapping[str, Any] | None = None,
    ) -> Grid:
        """Creates a `Grid` using row data and optional metadata.

        If parameters include history data, assumes the history rows are in
        chronological order to establish `hisStart` and `hisEnd` in `meta`.

        Parameters:
            rows: Row data for `Grid`.
            meta: Optional metadata for the entire `Grid`.
        """
        normalized_rows: Sequence[Mapping[str, Any]]
        if isinstance(rows, Mapping):
            normalized_rows = [cast(Mapping[str, Any], rows)]
        else:
            normalized_rows = rows

        # might be able to find a nicer way to do this
        col_names: list[str] = []
        for row in normalized_rows:
            for col_name in row.keys():
                if col_name not in col_names:
                    col_names.append(col_name)

        cols = [GridCol(name) for name in col_names]

        grid_meta: dict[str, Any] = {"ver": "3.0"}

        if meta is not None:
            grid_meta = grid_meta | dict(meta)

        his_start = normalized_rows[0].get("ts", None)
        his_end = normalized_rows[-1].get("ts", None)

        if his_start is not None and his_end is not None:
            grid_meta["hisStart"] = his_start
            grid_meta["hisEnd"] = his_end + timedelta(minutes=1)

        return Grid(meta=grid_meta, cols=cols, rows=normalized_rows)

    def to_pandas(self):
        """Converts time-series `Grid` to a long-format Pandas DataFrame.

        **Note:** This method is experimental and subject to change.

        **Requirements:**
        - Phable's optional Pandas dependency must be installed.
        - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
        - Grid column metadata must have an `id` of type `Ref`.
        - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
        - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

        When converting to a long-format DataFrame, history data for one or more points are combined into columns.
        Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
        performance, since different points may have different value types. All value columns are always present for
        schema consistency to enable predictable programmatic access.

        For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
        value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
        type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

        | Column     | Pandas Type                  | Nullable | Description                                    |
        |------------|------------------------------|----------|------------------------------------------------|
        | `id`       | `Categorical`                | No       | Point identifier from Ref (without `@` prefix) |
        | `ts`       | `timestamp[us, tz][pyarrow]` | No       | Timestamp of the reading                       |
        | `val_bool` | `bool[pyarrow]`              | Yes      | Boolean value (when `kind` tag is `Bool`)      |
        | `val_str`  | `string[pyarrow]`            | Yes      | String value (when `kind` tag is `Str`)        |
        | `val_num`  | `double[pyarrow]`            | Yes      | Numeric value (when `kind` tag is `Number`)    |
        | `na`       | `bool[pyarrow]`              | No       | `True` when value is Project Haystack's `NA`   |

        The resultant DataFrame is sorted by `id` and `ts`.

        Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
        pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
        interpolation should not occur.

        **Example:**

        ```python
        # convert to long-format pandas dataframe
        df_long = his_grid.to_pandas()

        # pivot to wide format (one column per point, indexed by timestamp)
        # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
        df_wide = df_long.pivot_table(index="ts", columns="id", values="val_num")
        ```

        Raises:
            ValueError:
                If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
                row timestamps have a different timezone than `hisStart`, columns are missing required `id`
                metadata of type Ref, or values are unsupported types.
        """
        import pandas as pd
        import pyarrow as pa

        tz, data = _structure_long_format_for_df(self)

        schema = pa.schema(
            [
                ("id", pa.dictionary(pa.int32(), pa.string())),
                ("ts", pa.timestamp("us", tz=tz.key)),
                ("val_bool", pa.bool_()),
                ("val_str", pa.string()),
                ("val_num", pa.float64()),
                ("na", pa.bool_()),
            ]
        )

        table = pa.Table.from_pylist(data, schema=schema)
        df = table.to_pandas(types_mapper=pd.ArrowDtype)

        unique_ids = sorted(df["id"].unique())
        df["id"] = df["id"].astype(
            pd.CategoricalDtype(categories=unique_ids, ordered=False)
        )

        return df.sort_values(["id", "ts"]).reset_index(drop=True)

    def to_polars(self):
        """Converts time-series `Grid` to a long-format Polars DataFrame.

        **Note:** This method is experimental and subject to change.

        **Requirements:**
        - Phable's optional Polars dependency must be installed.
        - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
        - Grid column metadata must have an `id` of type `Ref`.
        - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
        - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

        When converting to a long-format DataFrame, history data for one or more points are combined into columns.
        Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
        performance, since different points may have different value types. All value columns are always present for
        schema consistency to enable predictable programmatic access.

        For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
        value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
        type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

        | Column     | Polars Type        | Nullable | Description                                    |
        |------------|--------------------|----------|------------------------------------------------|
        | `id`       | `Categorical`      | No       | Point identifier from Ref (without `@` prefix) |
        | `ts`       | `Datetime[us, tz]` | No       | Timestamp of the reading                       |
        | `val_bool` | `Boolean`          | Yes      | Boolean value (when `kind` tag is `Bool`)      |
        | `val_str`  | `String`           | Yes      | String value (when `kind` tag is `Str`)        |
        | `val_num`  | `Float64`          | Yes      | Numeric value (when `kind` tag is `Number`)    |
        | `na`       | `Boolean`          | No       | `True` when value is Project Haystack's `NA`   |

        The resultant DataFrame is sorted by `id` and `ts`.

        Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
        pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
        interpolation should not occur.

        **Example:**

        ```python
        # convert to long-format polars dataframe
        df_long = his_grid.to_polars()

        # pivot to wide format (one column per point, indexed by timestamp)
        # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
        df_wide = df_long.pivot(on="id", index="ts", values="val_num")
        ```

        Raises:
            ValueError:
                If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
                row timestamps have a different timezone than `hisStart`, columns are missing required `id`
                metadata of type Ref, or values are unsupported types.
        """
        import polars as pl  # ty: ignore[unresolved-import]

        tz, data = _structure_long_format_for_df(self)

        schema = {
            "id": pl.Categorical,
            "ts": pl.Datetime(time_unit="us", time_zone=tz.key),
            "val_bool": pl.Boolean,
            "val_str": pl.String,
            "val_num": pl.Float64,
            "na": pl.Boolean,
        }

        return pl.DataFrame(data=data, schema=schema).sort("id", "ts")

to_grid `staticmethod`

to_grid(rows, meta=None)

Creates a Grid using row data and optional metadata.

If parameters include history data, assumes the history rows are in chronological order to establish hisStart and hisEnd in meta.

Parameters:

Name	Type	Description	Default
`rows`	`typing.Mapping[str, typing.Any] \| typing.Sequence[typing.Mapping[str, typing.Any]]`	Row data for `Grid`.	required
`meta`	`typing.Mapping[str, typing.Any] \| None`	Optional metadata for the entire `Grid`.	`None`

Source code in src/phable/kinds.py

@staticmethod
def to_grid(
    rows: Mapping[str, Any] | Sequence[Mapping[str, Any]],
    meta: Mapping[str, Any] | None = None,
) -> Grid:
    """Creates a `Grid` using row data and optional metadata.

    If parameters include history data, assumes the history rows are in
    chronological order to establish `hisStart` and `hisEnd` in `meta`.

    Parameters:
        rows: Row data for `Grid`.
        meta: Optional metadata for the entire `Grid`.
    """
    normalized_rows: Sequence[Mapping[str, Any]]
    if isinstance(rows, Mapping):
        normalized_rows = [cast(Mapping[str, Any], rows)]
    else:
        normalized_rows = rows

    # might be able to find a nicer way to do this
    col_names: list[str] = []
    for row in normalized_rows:
        for col_name in row.keys():
            if col_name not in col_names:
                col_names.append(col_name)

    cols = [GridCol(name) for name in col_names]

    grid_meta: dict[str, Any] = {"ver": "3.0"}

    if meta is not None:
        grid_meta = grid_meta | dict(meta)

    his_start = normalized_rows[0].get("ts", None)
    his_end = normalized_rows[-1].get("ts", None)

    if his_start is not None and his_end is not None:
        grid_meta["hisStart"] = his_start
        grid_meta["hisEnd"] = his_end + timedelta(minutes=1)

    return Grid(meta=grid_meta, cols=cols, rows=normalized_rows)

to_pandas

to_pandas()

Converts time-series Grid to a long-format Pandas DataFrame.

Note: This method is experimental and subject to change.

Requirements: - Phable's optional Pandas dependency must be installed. - Grid must have history data (hisStart in Grid metadata that is timezone-aware). - Grid column metadata must have an id of type Ref. - Grid row value types must be Number, bool, str, or NA. - Row timestamps must use the same timezone as hisStart in Grid metadata.

When converting to a long-format DataFrame, history data for one or more points are combined into columns. Values are split into typed columns (val_bool, val_str, val_num) to use native DataFrame types for performance, since different points may have different value types. All value columns are always present for schema consistency to enable predictable programmatic access.

For each DataFrame row: if the Grid value is Project Haystack's NA, the na column is True and all typed value columns are None. Otherwise, na is False and exactly one typed value column is populated based on type: val_bool for bool, val_str for str, or val_num for Number.

Column	Pandas Type	Nullable	Description
`id`	`Categorical`	No	Point identifier from Ref (without `@` prefix)
`ts`	`timestamp[us, tz][pyarrow]`	No	Timestamp of the reading
`val_bool`	`bool[pyarrow]`	Yes	Boolean value (when `kind` tag is `Bool`)
`val_str`	`string[pyarrow]`	Yes	String value (when `kind` tag is `Str`)
`val_num`	`double[pyarrow]`	Yes	Numeric value (when `kind` tag is `Number`)
`na`	`bool[pyarrow]`	No	`True` when value is Project Haystack's `NA`

The resultant DataFrame is sorted by id and ts.

Phable users are encouraged to interpolate data while in the long format dataframe using na before pivoting to a wide format dataframe, since pivoting loses NA semantics which define where interpolation should not occur.

Example:

# convert to long-format pandas dataframe
df_long = his_grid.to_pandas()

# pivot to wide format (one column per point, indexed by timestamp)
# use the appropriate value column (val_bool, val_str, or val_num) based on point kind
df_wide = df_long.pivot_table(index="ts", columns="id", values="val_num")

Raises:

Type	Description
`ValueError`	If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware, row timestamps have a different timezone than `hisStart`, columns are missing required `id` metadata of type Ref, or values are unsupported types.

Source code in src/phable/kinds.py

def to_pandas(self):
    """Converts time-series `Grid` to a long-format Pandas DataFrame.

    **Note:** This method is experimental and subject to change.

    **Requirements:**
    - Phable's optional Pandas dependency must be installed.
    - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
    - Grid column metadata must have an `id` of type `Ref`.
    - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
    - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

    When converting to a long-format DataFrame, history data for one or more points are combined into columns.
    Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
    performance, since different points may have different value types. All value columns are always present for
    schema consistency to enable predictable programmatic access.

    For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
    value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
    type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

    | Column     | Pandas Type                  | Nullable | Description                                    |
    |------------|------------------------------|----------|------------------------------------------------|
    | `id`       | `Categorical`                | No       | Point identifier from Ref (without `@` prefix) |
    | `ts`       | `timestamp[us, tz][pyarrow]` | No       | Timestamp of the reading                       |
    | `val_bool` | `bool[pyarrow]`              | Yes      | Boolean value (when `kind` tag is `Bool`)      |
    | `val_str`  | `string[pyarrow]`            | Yes      | String value (when `kind` tag is `Str`)        |
    | `val_num`  | `double[pyarrow]`            | Yes      | Numeric value (when `kind` tag is `Number`)    |
    | `na`       | `bool[pyarrow]`              | No       | `True` when value is Project Haystack's `NA`   |

    The resultant DataFrame is sorted by `id` and `ts`.

    Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
    pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
    interpolation should not occur.

    **Example:**

    ```python
    # convert to long-format pandas dataframe
    df_long = his_grid.to_pandas()

    # pivot to wide format (one column per point, indexed by timestamp)
    # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
    df_wide = df_long.pivot_table(index="ts", columns="id", values="val_num")
    ```

    Raises:
        ValueError:
            If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
            row timestamps have a different timezone than `hisStart`, columns are missing required `id`
            metadata of type Ref, or values are unsupported types.
    """
    import pandas as pd
    import pyarrow as pa

    tz, data = _structure_long_format_for_df(self)

    schema = pa.schema(
        [
            ("id", pa.dictionary(pa.int32(), pa.string())),
            ("ts", pa.timestamp("us", tz=tz.key)),
            ("val_bool", pa.bool_()),
            ("val_str", pa.string()),
            ("val_num", pa.float64()),
            ("na", pa.bool_()),
        ]
    )

    table = pa.Table.from_pylist(data, schema=schema)
    df = table.to_pandas(types_mapper=pd.ArrowDtype)

    unique_ids = sorted(df["id"].unique())
    df["id"] = df["id"].astype(
        pd.CategoricalDtype(categories=unique_ids, ordered=False)
    )

    return df.sort_values(["id", "ts"]).reset_index(drop=True)

to_polars

to_polars()

Converts time-series Grid to a long-format Polars DataFrame.

Note: This method is experimental and subject to change.

Requirements: - Phable's optional Polars dependency must be installed. - Grid must have history data (hisStart in Grid metadata that is timezone-aware). - Grid column metadata must have an id of type Ref. - Grid row value types must be Number, bool, str, or NA. - Row timestamps must use the same timezone as hisStart in Grid metadata.

When converting to a long-format DataFrame, history data for one or more points are combined into columns. Values are split into typed columns (val_bool, val_str, val_num) to use native DataFrame types for performance, since different points may have different value types. All value columns are always present for schema consistency to enable predictable programmatic access.

For each DataFrame row: if the Grid value is Project Haystack's NA, the na column is True and all typed value columns are None. Otherwise, na is False and exactly one typed value column is populated based on type: val_bool for bool, val_str for str, or val_num for Number.

Column	Polars Type	Nullable	Description
`id`	`Categorical`	No	Point identifier from Ref (without `@` prefix)
`ts`	`Datetime[us, tz]`	No	Timestamp of the reading
`val_bool`	`Boolean`	Yes	Boolean value (when `kind` tag is `Bool`)
`val_str`	`String`	Yes	String value (when `kind` tag is `Str`)
`val_num`	`Float64`	Yes	Numeric value (when `kind` tag is `Number`)
`na`	`Boolean`	No	`True` when value is Project Haystack's `NA`

The resultant DataFrame is sorted by id and ts.

Phable users are encouraged to interpolate data while in the long format dataframe using na before pivoting to a wide format dataframe, since pivoting loses NA semantics which define where interpolation should not occur.

Example:

# convert to long-format polars dataframe
df_long = his_grid.to_polars()

# pivot to wide format (one column per point, indexed by timestamp)
# use the appropriate value column (val_bool, val_str, or val_num) based on point kind
df_wide = df_long.pivot(on="id", index="ts", values="val_num")

Raises:

Type	Description
`ValueError`	If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware, row timestamps have a different timezone than `hisStart`, columns are missing required `id` metadata of type Ref, or values are unsupported types.

Source code in src/phable/kinds.py

def to_polars(self):
    """Converts time-series `Grid` to a long-format Polars DataFrame.

    **Note:** This method is experimental and subject to change.

    **Requirements:**
    - Phable's optional Polars dependency must be installed.
    - `Grid` must have history data (`hisStart` in Grid metadata that is timezone-aware).
    - Grid column metadata must have an `id` of type `Ref`.
    - Grid row value types must be `Number`, `bool`, `str`, or `NA`.
    - Row timestamps must use the same timezone as `hisStart` in Grid metadata.

    When converting to a long-format DataFrame, history data for one or more points are combined into columns.
    Values are split into typed columns (`val_bool`, `val_str`, `val_num`) to use native DataFrame types for
    performance, since different points may have different value types. All value columns are always present for
    schema consistency to enable predictable programmatic access.

    For each DataFrame row: if the Grid value is Project Haystack's `NA`, the `na` column is `True` and all typed
    value columns are `None`. Otherwise, `na` is `False` and exactly one typed value column is populated based on
    type: `val_bool` for `bool`, `val_str` for `str`, or `val_num` for `Number`.

    | Column     | Polars Type        | Nullable | Description                                    |
    |------------|--------------------|----------|------------------------------------------------|
    | `id`       | `Categorical`      | No       | Point identifier from Ref (without `@` prefix) |
    | `ts`       | `Datetime[us, tz]` | No       | Timestamp of the reading                       |
    | `val_bool` | `Boolean`          | Yes      | Boolean value (when `kind` tag is `Bool`)      |
    | `val_str`  | `String`           | Yes      | String value (when `kind` tag is `Str`)        |
    | `val_num`  | `Float64`          | Yes      | Numeric value (when `kind` tag is `Number`)    |
    | `na`       | `Boolean`          | No       | `True` when value is Project Haystack's `NA`   |

    The resultant DataFrame is sorted by `id` and `ts`.

    Phable users are encouraged to interpolate data while in the long format dataframe using `na` before
    pivoting to a wide format dataframe, since pivoting loses `NA` semantics which define where
    interpolation should not occur.

    **Example:**

    ```python
    # convert to long-format polars dataframe
    df_long = his_grid.to_polars()

    # pivot to wide format (one column per point, indexed by timestamp)
    # use the appropriate value column (val_bool, val_str, or val_num) based on point kind
    df_wide = df_long.pivot(on="id", index="ts", values="val_num")
    ```

    Raises:
        ValueError:
            If `Grid` does not have `hisStart` in metadata, `hisStart` is not timezone-aware,
            row timestamps have a different timezone than `hisStart`, columns are missing required `id`
            metadata of type Ref, or values are unsupported types.
    """
    import polars as pl  # ty: ignore[unresolved-import]

    tz, data = _structure_long_format_for_df(self)

    schema = {
        "id": pl.Categorical,
        "ts": pl.Datetime(time_unit="us", time_zone=tz.key),
        "val_bool": pl.Boolean,
        "val_str": pl.String,
        "val_num": pl.Float64,
        "na": pl.Boolean,
    }

    return pl.DataFrame(data=data, schema=schema).sort("id", "ts")

GridCol `dataclass`

GridCol defines a column in a Grid.

Example:

from phable.kinds import GridCol

# Column with metadata
temp_col = GridCol("temp", {"unit": "°F", "dis": "Temperature"})

# Simple column without metadata
id_col = GridCol("id")

Parameters:

Name	Type	Description	Default
`name`	`str`	Column name following Haystack tag naming rules (lowercase start).	required
`meta`	`dict[str, typing.Any] \| None`	Optional metadata dictionary for the column (e.g., unit, display name).	`None`

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class GridCol:
    """`GridCol` defines a column in a `Grid`.

    **Example:**
    ```python
    from phable.kinds import GridCol

    # Column with metadata
    temp_col = GridCol("temp", {"unit": "°F", "dis": "Temperature"})

    # Simple column without metadata
    id_col = GridCol("id")
    ```

    Parameters:
        name: Column name following Haystack tag naming rules (lowercase start).
        meta: Optional metadata dictionary for the column (e.g., unit, display name).
    """

    name: str
    meta: dict[str, Any] | None = None

GridBuilder

Builder for constructing Project Haystack Grid objects.

Provides a builder pattern with method chaining for adding columns, metadata, and rows before creating a Grid.

Source code in src/phable/grid_builder.py

class GridBuilder:
    """Builder for constructing Project Haystack `Grid` objects.

    Provides a builder pattern with method chaining for adding columns, metadata,
    and rows before creating a `Grid`.
    """

    def __init__(self):
        self._meta = {"ver": "3.0"}
        self._cols = []
        self._rows = []

    _meta: dict[str, Any]
    _cols: list[GridCol]
    _rows: list[dict[str, Any]]

    @property
    def col_names(self) -> list[str]:
        """Column names.

        Returns:
            List of column names in the order they were added.
        """
        return [col.name for col in self._cols]

    def set_meta(self, meta: Mapping[str, Any]) -> Self:
        """Set or update grid-level metadata.

        Parameters:
            meta: Metadata dictionary to merge with existing grid metadata.

        Returns:
            Self for method chaining.
        """
        self._meta = self._meta | dict(meta)
        return self

    def add_col(self, name: str, meta: Mapping[str, Any] | None = None) -> Self:
        """Adds a column to the grid.

        Parameters:
            name: Column name following Haystack tag naming rules (lowercase start).
            meta: Optional metadata for the column (e.g., unit, display name).

        Returns:
            Self for method chaining.

        Raises:
            ValueError: If column name is invalid or already exists.
        """
        if not _is_tag_name(name):
            raise ValueError(f"Invalid column name: {name}")

        # verify the column does not already exist
        for c in self._cols:
            if c.name == name:
                raise ValueError(f"Duplicate column name: {name}")

        col = GridCol(name, dict(meta) if meta is not None else None)

        self._cols.append(col)
        return self

    def set_col_meta(self, col_name: str, meta: Mapping[str, Any]) -> Self:
        """Set or update metadata for an existing column.

        Parameters:
            col_name: Name of the column to update.
            meta: Metadata to merge with existing column metadata.

        Returns:
            Self for method chaining.

        Raises:
            ValueError: If column does not exist.
        """
        col_found = False
        for i, c in enumerate(self._cols):
            if c.name == col_name:
                col_found = True
                existing_meta = c.meta or {}
                new_meta = existing_meta | dict(meta)
                self._cols[i] = GridCol(c.name, new_meta)
                break

        if not col_found:
            raise ValueError(f"Column not found: {col_name}")

        return self

    def add_row(self, row: Mapping[str, Any]) -> Self:
        """Adds a row of data to the grid.

        Parameters:
            row: Dictionary mapping column names to values.

        Returns:
            Self for method chaining.

        Raises:
            ValueError: If any row key does not match an existing column name.
        """
        col_names = self.col_names
        for key in row.keys():
            if key not in col_names:
                raise ValueError(f"Row key '{key}' does not match any column name")

        self._rows.append(dict(row))
        return self

    def build(self) -> Grid:
        """Builds a Grid from the accumulated columns, rows, and metadata.

        Returns:
            A constructed `Grid` instance.
        """
        return Grid(self._meta, self._cols, self._rows)

col_names `property`

col_names

Column names.

Returns:

Type	Description
`list[str]`	List of column names in the order they were added.

add_col

add_col(name, meta=None)

Adds a column to the grid.

Parameters:

Name	Type	Description	Default
`name`	`str`	Column name following Haystack tag naming rules (lowercase start).	required
`meta`	`typing.Mapping[str, typing.Any] \| None`	Optional metadata for the column (e.g., unit, display name).	`None`

Returns:

Type	Description
`typing.Self`	Self for method chaining.

Raises:

Type	Description
`ValueError`	If column name is invalid or already exists.

Source code in src/phable/grid_builder.py

def add_col(self, name: str, meta: Mapping[str, Any] | None = None) -> Self:
    """Adds a column to the grid.

    Parameters:
        name: Column name following Haystack tag naming rules (lowercase start).
        meta: Optional metadata for the column (e.g., unit, display name).

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If column name is invalid or already exists.
    """
    if not _is_tag_name(name):
        raise ValueError(f"Invalid column name: {name}")

    # verify the column does not already exist
    for c in self._cols:
        if c.name == name:
            raise ValueError(f"Duplicate column name: {name}")

    col = GridCol(name, dict(meta) if meta is not None else None)

    self._cols.append(col)
    return self

add_row

add_row(row)

Adds a row of data to the grid.

Parameters:

Name	Type	Description	Default
`row`	`typing.Mapping[str, typing.Any]`	Dictionary mapping column names to values.	required

Returns:

Type	Description
`typing.Self`	Self for method chaining.

Raises:

Type	Description
`ValueError`	If any row key does not match an existing column name.

Source code in src/phable/grid_builder.py

def add_row(self, row: Mapping[str, Any]) -> Self:
    """Adds a row of data to the grid.

    Parameters:
        row: Dictionary mapping column names to values.

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If any row key does not match an existing column name.
    """
    col_names = self.col_names
    for key in row.keys():
        if key not in col_names:
            raise ValueError(f"Row key '{key}' does not match any column name")

    self._rows.append(dict(row))
    return self

build

build()

Builds a Grid from the accumulated columns, rows, and metadata.

Returns:

Type	Description
`phable.kinds.Grid`	A constructed `Grid` instance.

Source code in src/phable/grid_builder.py

def build(self) -> Grid:
    """Builds a Grid from the accumulated columns, rows, and metadata.

    Returns:
        A constructed `Grid` instance.
    """
    return Grid(self._meta, self._cols, self._rows)

set_col_meta

set_col_meta(col_name, meta)

Set or update metadata for an existing column.

Parameters:

Name	Type	Description	Default
`col_name`	`str`	Name of the column to update.	required
`meta`	`typing.Mapping[str, typing.Any]`	Metadata to merge with existing column metadata.	required

Returns:

Type	Description
`typing.Self`	Self for method chaining.

Raises:

Type	Description
`ValueError`	If column does not exist.

Source code in src/phable/grid_builder.py

def set_col_meta(self, col_name: str, meta: Mapping[str, Any]) -> Self:
    """Set or update metadata for an existing column.

    Parameters:
        col_name: Name of the column to update.
        meta: Metadata to merge with existing column metadata.

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If column does not exist.
    """
    col_found = False
    for i, c in enumerate(self._cols):
        if c.name == col_name:
            col_found = True
            existing_meta = c.meta or {}
            new_meta = existing_meta | dict(meta)
            self._cols[i] = GridCol(c.name, new_meta)
            break

    if not col_found:
        raise ValueError(f"Column not found: {col_name}")

    return self

set_meta

set_meta(meta)

Set or update grid-level metadata.

Parameters:

Name	Type	Description	Default
`meta`	`typing.Mapping[str, typing.Any]`	Metadata dictionary to merge with existing grid metadata.	required

Returns:

Type	Description
`typing.Self`	Self for method chaining.

Source code in src/phable/grid_builder.py

def set_meta(self, meta: Mapping[str, Any]) -> Self:
    """Set or update grid-level metadata.

    Parameters:
        meta: Metadata dictionary to merge with existing grid metadata.

    Returns:
        Self for method chaining.
    """
    self._meta = self._meta | dict(meta)
    return self

DateRange `dataclass`

DateRange data type, defined by Phable, describes a time range using dates.

Note: Project Haystack does not define a kind for DateRange.

Parameters:

Name	Type	Description	Default
`start`	`datetime.date`	Midnight of the start date (inclusive) for the range.	required
`end`	`datetime.date`	Midnight of the end date (exclusive) for the range.	required

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class DateRange:
    """`DateRange` data type, defined by `Phable`, describes a time range using dates.

    **Note:** Project Haystack does not define a kind for `DateRange`.

    Parameters:
        start: Midnight of the start date (inclusive) for the range.
        end: Midnight of the end date (exclusive) for the range.
    """

    start: date
    end: date

    def __str__(self):
        return self.start.isoformat() + "," + self.end.isoformat()

DateTimeRange `dataclass`

DateTimeRange data type, defined by Phable, describes a time range using date, time, and timezone information.

datetime objects used for start and end must be timezone aware using ZoneInfo as a concrete implementation of the datetime.tzinfo abstract base class.

Example:

from datetime import datetime
from zoneinfo import ZoneInfo

from phable.kinds import DateTimeRange

tzinfo = ZoneInfo("America/New_York")
start = datetime(2024, 11, 22, 8, 19, 0, tzinfo=tzinfo)
end = datetime(2024, 11, 22, 9, 19, 0, tzinfo=tzinfo)

range_with_end = DateTimeRange(start, end)
range_without_end = DateTimeRange(start)

Note: Project Haystack does not define a kind for DateTimeRange.

Parameters:

Name	Type	Description	Default
`start`	`datetime.datetime`	Start timestamp (inclusive) which is timezone aware using `ZoneInfo`.	required
`end`	`datetime.datetime \| None`	Optional end timestamp (exclusive) which is timezone aware using `ZoneInfo`. If end is undefined, then assume end to be when the last data value was recorded.	`None`

Source code in src/phable/kinds.py

@dataclass(frozen=True, slots=True)
class DateTimeRange:
    """`DateTimeRange` data type, defined by `Phable`, describes a time range using
    date, time, and timezone information.

    `datetime` objects used for `start` and `end` must be timezone aware using
    `ZoneInfo` as a concrete implementation of the `datetime.tzinfo` abstract base
    class.

    **Example:**

    ```python
    from datetime import datetime
    from zoneinfo import ZoneInfo

    from phable.kinds import DateTimeRange

    tzinfo = ZoneInfo("America/New_York")
    start = datetime(2024, 11, 22, 8, 19, 0, tzinfo=tzinfo)
    end = datetime(2024, 11, 22, 9, 19, 0, tzinfo=tzinfo)

    range_with_end = DateTimeRange(start, end)
    range_without_end = DateTimeRange(start)
    ```

    **Note:** Project Haystack does not define a kind for `DateTimeRange`.

    Parameters:
        start: Start timestamp (inclusive) which is timezone aware using `ZoneInfo`.
        end:
            Optional end timestamp (exclusive) which is timezone aware using
            `ZoneInfo`. If end is undefined, then assume end to be when the last data
            value was recorded.
    """

    start: datetime
    end: datetime | None = None

    def __str__(self):
        if self.end is None:
            return _to_haystack_datetime(self.start)
        else:
            return (
                _to_haystack_datetime(self.start)
                + ","
                + _to_haystack_datetime(self.end)
            )

    def __post_init__(self):
        start_ok = isinstance(self.start.tzinfo, ZoneInfo)
        end_ok = self.end is None

        if isinstance(self.end, datetime):
            end_ok = isinstance(self.end.tzinfo, ZoneInfo)

        if start_ok is False or end_ok is False:
            raise ValueError

Data Types

Introduction

Marker

NA

Remove

Number dataclass

Uri dataclass

Ref dataclass

Symbol dataclass

Coord dataclass

XStr dataclass

Grid dataclass

to_grid staticmethod

to_pandas

to_polars

GridCol dataclass

GridBuilder

col_names property

add_col

add_row

build

set_col_meta

set_meta

DateRange dataclass

DateTimeRange dataclass

Number `dataclass`

Uri `dataclass`

Ref `dataclass`

Symbol `dataclass`

Coord `dataclass`

XStr `dataclass`

Grid `dataclass`

to_grid `staticmethod`

GridCol `dataclass`

col_names `property`

DateRange `dataclass`

DateTimeRange `dataclass`