Gait Sequence Iterator#

As part of most pipelines, we need to iterate over the gait sequences to apply all further algorithms to them individually. This can be a bit cumbersome, as we need to iterate over the data and aggregate the results at the same time. Hence, we provide some helpers for that.

We provide two ways of iterating. The first one, only handles the iteration and does not aggregate the results. The second approach attempts to also support you in aggregating the results.

Getting Some Example Data#

import numpy as np
import pandas as pd
from mobgap.data import LabExampleDataset

lab_example_data = LabExampleDataset(reference_system="INDIP")
long_trial = lab_example_data.get_subset(
    cohort="MS", participant_id="001", test="Test11", trial="Trial1"
)
long_trial_gs = long_trial.reference_parameters_.wb_list

long_trial_gs
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(
start end n_strides duration_s length_m avg_walking_speed_mps avg_cadence_spm avg_stride_length_m termination_reason
wb_id
0 1019 1768 9 7.48 4.468932 0.847668 107.795850 0.942678 Pause
1 4534 5549 11 10.14 2.900453 0.365176 93.396106 0.483923 Pause
2 9665 10569 9 9.03 2.140232 0.294058 75.981133 0.506458 Pause
3 12337 14633 28 22.95 11.201110 0.634425 92.337768 0.803933 Pause
4 20151 20982 11 8.30 2.390709 0.371746 87.915774 0.507484 Pause
5 21378 22129 9 7.50 2.517558 0.492965 95.365740 0.599360 Pause


Simple Functional Interface#

We provide the iter_gs function to iterate over the gait sequences. It simply takes the data and the gait sequence list and cuts the data accordingly to iterate over it. The function yields the gait sequence information as tuple (i.e. the “row” of the gs dataframe as namedtuple) and the data for each iteration. Note that the index of the data is not changed. Hence, we recommend using iloc to access the data (iloc[0] will return the first sample of the gait sequence).

Using our example data and gs, we can iterate over the data as follows:

from mobgap.pipeline import iter_gs

for gs, data in iter_gs(long_trial.data_ss, long_trial_gs):
    # Note that the key to access the id is called "wb_id" here, as we loaded the WB from the reference system.
    # If this is an "actual" gait sequences, as calculated by one of the GSD algorithms, the key would be "gs_id".
    print("Gait Sequence: ", gs)
    print("Expected N-samples in gs: ", gs.end - gs.start)
    print("N-samples in gs: ", len(data))
    print("First sample of gs:\n", data.iloc[0], end="\n\n")
Gait Sequence:  Region(id=0, start=1019, end=1768, id_origin='wb_id')
Expected N-samples in gs:  749
N-samples in gs:  749
First sample of gs:
 acc_x    10.746760
acc_y     0.390074
acc_z    -2.088171
gyr_x    13.226900
gyr_y    -4.914900
gyr_z    19.874400
Name: 2020-10-30 12:53:33.211999893+00:00, dtype: float64

Gait Sequence:  Region(id=1, start=4534, end=5549, id_origin='wb_id')
Expected N-samples in gs:  1015
N-samples in gs:  1015
First sample of gs:
 acc_x     9.993886
acc_y    -0.864273
acc_z    -0.297190
gyr_x     3.193700
gyr_y    12.861200
gyr_z    -5.371300
Name: 2020-10-30 12:54:08.361999989+00:00, dtype: float64

Gait Sequence:  Region(id=2, start=9665, end=10569, id_origin='wb_id')
Expected N-samples in gs:  904
N-samples in gs:  904
First sample of gs:
 acc_x    8.889232
acc_y   -0.136809
acc_z   -3.005737
gyr_x   -7.543100
gyr_y   -5.688700
gyr_z    0.470900
Name: 2020-10-30 12:54:59.671999931+00:00, dtype: float64

Gait Sequence:  Region(id=3, start=12337, end=14633, id_origin='wb_id')
Expected N-samples in gs:  2296
N-samples in gs:  2296
First sample of gs:
 acc_x    10.232635
acc_y    -1.686375
acc_z    -0.285600
gyr_x    84.994600
gyr_y   -17.645800
gyr_z   -24.538000
Name: 2020-10-30 12:55:26.391999960+00:00, dtype: float64

Gait Sequence:  Region(id=4, start=20151, end=20982, id_origin='wb_id')
Expected N-samples in gs:  831
N-samples in gs:  831
First sample of gs:
 acc_x      9.132286
acc_y     -1.796825
acc_z      0.479311
gyr_x   -137.510300
gyr_y     16.480800
gyr_z    -21.008900
Name: 2020-10-30 12:56:44.532000064+00:00, dtype: float64

Gait Sequence:  Region(id=5, start=21378, end=22129, id_origin='wb_id')
Expected N-samples in gs:  751
N-samples in gs:  751
First sample of gs:
 acc_x     9.891845
acc_y    -3.079999
acc_z    -1.069887
gyr_x    56.595700
gyr_y    -5.242100
gyr_z   -16.025800
Name: 2020-10-30 12:56:56.802000046+00:00, dtype: float64

Note

The gs named-tuples returned by the iterator is of type Region. It contains the fields id, start, and end in this order. When using the named access the id field corresponds to either the gs_id or wb_id of the input dataframe, depending on what type of list was provided.

You can see that this way it is pretty easy to iterate over the data. However, if you are planning to run calculations on the data, you need to aggregate the results yourself. If you are planning to collect multiple pieces of results, this can become cumbersome. See the is tpcp example for more information about this. Therefore, we also provide an Iterator Class based on TypedIterator.

Class based Interface#

Note

Learn more about the general approach of using TypedIterator classes in this tpcp example.

Compared to the functional interface, the class interface attempts to also solve the problem of collecting the and aggregating results that you produce per GS. In a typical pipeline you might want to calculate the initial contacts, cadence, stride length, and gait speed for each gait sequence. With the class based interface, you can easily collect all of these results and then aggregate them into one predefined data structure.

The class based interface can be used in two ways. First in the “default” configuration, which is set up to work with the typical calculations and results that you would expect from a typical processing pipeline. And second, in a custom way, where you need to define expected “results” per iteration yourself.

The simple case#

The simple case basically requires no more setup as the functional interface. However, it assumes that your results are a subset of initial contacts, cadence, stride length, and gait speed, and that all of them are stored in the expected mobgap datatypes (aka pandas dataframes). The iterator will then automatically aggregate the results the dataframes per iteration into one combined dataframe, handling the sample offsets of the gait sequences for you.

Below we will show how this works, by “simulating” the calculation of some initial contacts and cadence.

We start by setting up an iterator object. We can leave everything at the default values, as we do not need any custom aggregation functions.

from mobgap.pipeline import GsIterator

iterator = GsIterator()
dt = iterator.data_type

The default result datatype per iteration is defined as follows:

import inspect

from IPython.core.display_functions import display

display(inspect.getsource(dt))
@dataclass
class FullPipelinePerGsResult:
    """Default expected result type for the gait-sequence iterator.

    When using the :class:`~mobgap.pipeline.GsIterator` with the default configuration, an instance of this dataclass
    will be created for each gait-sequence.

    Each value is expected to be a dataframe.

    Attributes
    ----------
    ic_list
        The initial contacts for each gait-sequence.
        This is a dataframe with a column called ``ic``.
        The values of this ic-column are expected to be samples relative to the start of the gait-sequence.
    turn_list
        The turn list for each gait-sequence.
        The dataframe has at least columns called ``start`` and ``end``.
        The values of these columns are expected to be samples relative to the start of the gait-sequence.
    cadence_per_sec
        The cadence values within each gait-sequence.
        This dataframe has no further requirements relevant for the iterator.
    stride_length_per_sec
        The stride length values within each gait-sequence.
        This dataframe has no further requirements relevant for the iterator.
    walking_speed_per_sec
        The gait speed values within each gait-sequence.
        This dataframe has no further requirements relevant for the iterator.

    """

    ic_list: pd.DataFrame
    turn_list: pd.DataFrame
    cadence_per_sec: pd.DataFrame
    stride_length_per_sec: pd.DataFrame
    walking_speed_per_sec: pd.DataFrame

This means you are only allowed to use the available attributes. But, you don’t need to specify all of them. Below we will only “calculate” the initial contacts and cadence.

In each iteration the iterator will give us a tuple of the gait sequence information, the data for the iteration, and a new empty result object.

from mobgap.utils.conversions import as_samples

for (gs, data), result in iterator.iterate(long_trial.data_ss, long_trial_gs):
    # Now we can just "calculate" the initial contacts and set it on the result object.
    result.ic_list = pd.DataFrame(
        np.arange(0, len(data), 100, dtype="int64"), columns=["ic"]
    ).rename_axis(index="step_id")
    # For cadence, we just set a dummy value to the wb_id for each 1 second bout of the data.
    n_seconds = int(len(data) // long_trial.sampling_rate_hz)
    result.cadence_per_sec = pd.DataFrame(
        [gs.id] * n_seconds,
        columns=["cadence_spm"],
        index=as_samples(
            np.arange(0, n_seconds) + 0.5, long_trial.sampling_rate_hz
        ),
    ).rename_axis(index="sec_center_samples")

After the iteration, we can access the aggregated results using the results_ property of the iterator

ic
wb_id step_id
0 0 1019
1 1119
2 1219
3 1319
4 1419
... ... ...
5 3 21678
4 21778
5 21878
6 21978
7 22078

69 rows × 1 columns



We can see that we only get a single dataframe with all the results. And all ICs are offset, so that they are relative to the start of the recording and not the start of the gait sequence anymore.

For the cadence value, the index represents the sample of the center of the second the cadence value belongs to. This value was originally relative to the start of the GS. We can see that in the aggregated results this is transformed back to be relative to the start of the recording.

cadence_spm
wb_id sec_center_samples
0 1069 0
1169 0
1269 0
1369 0
1469 0
... ... ...
5 21628 5
21728 5
21828 5
21928 5
22028 5

63 rows × 1 columns



But what to do, if you don’t want to use the default result datatype?

Custom Results#

This requires a little bit more setup. First we need to decide what results we expect. This is done by defining a dataclass that represents the results.

Here we create a new dataclass that only expect two dummy results, but you can add as many as you want. You could also subclass the default dataclass and just add the additional results.

The first result here is n_samples which is just a dummy results indicating the number of samples the data has. The second result is filtered_data (we will just add some dummy data here). This is expected to be a pd.DataFrame to demonstrate that you can also return more complex results.

from dataclasses import dataclass


@dataclass
class ResultType:
    n_samples: int
    filtered_data: pd.DataFrame

For each iteration (i.e. for each gait sequence), we will create one instance of this dataclass. The list of these instances will be available as the raw_results_ attribute of the iterator.

We can also decide to aggregate the results. We provide some default aggregations functions (see GsIterator.DEFAULT_AGGREGATORS), that you could use. However, here we will create our own aggregation function.

It might be nice to turn the n_samples into a pandas series with the gs identifier as index. For this we define an aggregation function that expects the list of TypedIteratorResultTuple. These are named tuples of the following shape:

class TypedIteratorResultTuple(NamedTuple, Generic[InputTypeT, ResultT]):
    iteration_name: str
    input: InputTypeT
    result: ResultT
    iteration_context: dict[str, Any]

The type of the input and the result depend on the dataclass you defined and the iterator you use. For the gait sequence iterator the input-type will be tuple[Region, pd.DataFrame] and the result-type will the dataclass you defined. The other arguments provide additional context, that might be needed in advanced cases (see lower down in this example).

To simplify typing of functions that use these types, we provide GsIterator.IteratorResult which already has the input type bound and is generic with respect to the output type. We can see in the function below how to use it.

As mentioned, an aggregation function will get a list of these named tuples. Note, that the values get passed the entire result object and that parts of the result objects might be NOT_SET. To filter out the NOT_SET values and replace the result attribute with just one specific value, we provide the GsIterator.filter_iterator_results function (see below).

With that, out aggregate function, takes the gs-id from the inputs and the n_samples from the results and creates a pandas series with the gs-id as index and the n_samples as values.

def aggregate_n_samples(values: list[GsIterator.IteratorResult[ResultType]]):
    non_null_results: list[GsIterator.IteratorResult[int]] = (
        GsIterator.filter_iterator_results(values, "n_samples")
    )
    results = {r.input[0].id: r.result for r in non_null_results}
    return pd.Series(results, name="N-Samples")


aggregations = [("n_samples", aggregate_n_samples)]

Now we can create an instance of the iterator. Note, that if we want to correctly infer the result type, we need to use the somewhat weird square bracket-typing syntax, when creating the iterator. This will allow to autocomplete the attributes of the result type.

Iterating over the iterator now provides us the row from the gait sequence list (which we ignore here), the data for each iteration, and the empty result object, we can fill up each iteration.

for (_, data), custom_result in custom_iterator.iterate(
    long_trial.data_ss, long_trial_gs
):
    # We just calculate the length, but you can image any other calculation here.
    # Then we just set the result.
    custom_result.n_samples = len(data)
    # For the "filtered" data we just subtract 1 form the input
    custom_result.filtered_data = data - 1

Then we can easily inspect the aggregated results. Note, while the typing system can correctly infer the available attributes of the result object, the typing of the attributes might be wrong as Python can not infer the types based on the aggregations. We have to explicitly cast the value if we care about the type-correctness,

0     749
1    1015
2     904
3    2296
4     831
5     751
Name: N-Samples, dtype: int64

For the filtered data, we did not apply any aggregation and hence just get a list of all results.

[                                        acc_x     acc_y  ...    gyr_y    gyr_z
time                                                     ...
2020-10-30 12:53:33.211999893+00:00  9.746760 -0.609926  ...  -5.9149  18.8744
2020-10-30 12:53:33.221999884+00:00  9.212541 -1.843216  ... -16.0051  24.6507
2020-10-30 12:53:33.232000113+00:00  8.315173 -2.915390  ... -21.3116  26.9805
2020-10-30 12:53:33.242000103+00:00  7.357514 -3.853508  ... -22.8495  25.4291
2020-10-30 12:53:33.252000092+00:00  6.386339 -4.408384  ... -21.4698  21.1185
...                                       ...       ...  ...      ...      ...
2020-10-30 12:53:40.651999950+00:00  7.841718 -3.162747  ...   3.7649 -10.4524
2020-10-30 12:53:40.661999941+00:00  7.816116 -3.056435  ...   5.7176 -11.1201
2020-10-30 12:53:40.671999931+00:00  7.897385 -2.964248  ...   6.3879 -11.5961
2020-10-30 12:53:40.681999922+00:00  7.913504 -2.904070  ...   5.8304 -12.1774
2020-10-30 12:53:40.691999912+00:00  8.062056 -2.800103  ...   6.3839 -12.1929

[749 rows x 6 columns],                                         acc_x     acc_y  ...    gyr_y    gyr_z
time                                                     ...
2020-10-30 12:54:08.361999989+00:00  8.993886 -1.864273  ...  11.8612  -6.3713
2020-10-30 12:54:08.371999979+00:00  9.006853 -1.821367  ...  11.6298  -4.6801
2020-10-30 12:54:08.381999969+00:00  9.066191 -1.717037  ...  11.0973  -3.1574
2020-10-30 12:54:08.391999960+00:00  9.389386 -1.495249  ...   9.9199  -0.4397
2020-10-30 12:54:08.401999950+00:00  9.781203 -1.396804  ...   7.7379   3.6206
...                                       ...       ...  ...      ...      ...
2020-10-30 12:54:18.461999893+00:00  7.044788  0.514322  ... -27.7005  35.8898
2020-10-30 12:54:18.471999884+00:00  6.966694  0.439490  ... -26.9190  37.7245
2020-10-30 12:54:18.482000113+00:00  6.716884  0.505251  ... -26.5421  38.8936
2020-10-30 12:54:18.492000103+00:00  6.448211  0.742959  ... -26.5874  39.8177
2020-10-30 12:54:18.502000093+00:00  6.191566  0.939965  ... -26.6665  40.8314

[1015 rows x 6 columns],                                         acc_x     acc_y  ...    gyr_y   gyr_z
time                                                     ...
2020-10-30 12:54:59.671999931+00:00  7.889232 -1.136809  ...  -6.6887 -0.5291
2020-10-30 12:54:59.681999922+00:00  7.987170 -1.144480  ...  -7.2629 -0.7600
2020-10-30 12:54:59.691999912+00:00  8.074680 -1.182571  ...  -7.6096 -1.1997
2020-10-30 12:54:59.701999903+00:00  8.150059 -1.142325  ...  -7.9950 -1.5970
2020-10-30 12:54:59.711999893+00:00  8.128522 -1.144701  ...  -8.3241 -1.6867
...                                       ...       ...  ...      ...     ...
2020-10-30 12:55:08.661999941+00:00  8.133023 -2.345900  ...  16.1862 -7.2686
2020-10-30 12:55:08.671999931+00:00  8.149110 -2.267799  ...  17.1322 -8.0022
2020-10-30 12:55:08.681999922+00:00  8.179423 -2.251566  ...  17.5279 -8.3387
2020-10-30 12:55:08.691999912+00:00  8.302335 -2.481964  ...  18.2932 -8.8593
2020-10-30 12:55:08.701999903+00:00  8.430995 -2.731003  ...  18.5796 -7.7286

[904 rows x 6 columns],                                         acc_x     acc_y  ...    gyr_y    gyr_z
time                                                     ...
2020-10-30 12:55:26.391999960+00:00  9.232635 -2.686375  ... -18.6458 -25.5380
2020-10-30 12:55:26.401999950+00:00  9.363166 -2.469547  ... -20.7625 -24.8423
2020-10-30 12:55:26.411999941+00:00  9.262668 -2.263549  ... -18.0376 -24.7323
2020-10-30 12:55:26.421999931+00:00  8.993393 -2.105027  ... -10.3876 -24.2681
2020-10-30 12:55:26.431999922+00:00  8.917060 -1.906179  ...  -1.9655 -23.3928
...                                       ...       ...  ...      ...      ...
2020-10-30 12:55:49.302000046+00:00  7.714016  0.326042  ...  15.9933   7.4403
2020-10-30 12:55:49.312000036+00:00  7.725419  0.574019  ...  15.3581   5.5792
2020-10-30 12:55:49.322000027+00:00  7.813756  0.665189  ...  15.8199   4.5902
2020-10-30 12:55:49.332000017+00:00  8.071623  0.308971  ...  17.9715   3.6861
2020-10-30 12:55:49.342000008+00:00  8.267141 -0.338464  ...  20.9984   3.2169

[2296 rows x 6 columns],                                         acc_x     acc_y  ...    gyr_y    gyr_z
time                                                     ...
2020-10-30 12:56:44.532000064+00:00  8.132286 -2.796825  ...  15.4808 -22.0089
2020-10-30 12:56:44.542000055+00:00  8.472326 -2.708309  ...  16.3716 -23.4909
2020-10-30 12:56:44.552000046+00:00  8.850238 -2.523199  ...  15.7210 -24.0634
2020-10-30 12:56:44.562000036+00:00  9.348547 -2.173151  ...  14.0635 -24.1058
2020-10-30 12:56:44.572000027+00:00  9.730036 -1.798863  ...  12.3673 -23.7012
...                                       ...       ...  ...      ...      ...
2020-10-30 12:56:52.792000055+00:00  8.033643 -2.490667  ...  13.5385   1.7647
2020-10-30 12:56:52.802000046+00:00  8.542713 -2.340519  ...  11.4386   6.4245
2020-10-30 12:56:52.812000036+00:00  8.818854 -2.348388  ...  12.2948  10.9618
2020-10-30 12:56:52.822000027+00:00  9.123138 -2.335998  ...  15.8081  14.4861
2020-10-30 12:56:52.832000017+00:00  9.461653 -2.308538  ...  17.4874  17.1938

[831 rows x 6 columns],                                         acc_x     acc_y  ...    gyr_y    gyr_z
time                                                     ...
2020-10-30 12:56:56.802000046+00:00  8.891845 -4.079999  ...  -6.2421 -17.0258
2020-10-30 12:56:56.812000036+00:00  8.841221 -4.293471  ... -12.8417 -14.2214
2020-10-30 12:56:56.822000027+00:00  8.438049 -4.353188  ... -10.4114 -12.7980
2020-10-30 12:56:56.832000017+00:00  7.941118 -4.371653  ...  -0.0473 -10.9684
2020-10-30 12:56:56.842000008+00:00  7.879858 -4.434756  ...  11.1010  -7.4715
...                                       ...       ...  ...      ...      ...
2020-10-30 12:57:04.262000084+00:00  7.598598  1.484613  ...  -2.9975  -8.6763
2020-10-30 12:57:04.272000074+00:00  8.016810  1.889539  ...  -1.3660  -9.6691
2020-10-30 12:57:04.282000065+00:00  8.434351  2.205847  ...  -0.3714  -8.8523
2020-10-30 12:57:04.292000055+00:00  8.746053  2.303438  ...  -2.6950  -7.8553
2020-10-30 12:57:04.302000046+00:00  9.100351  2.272933  ... -10.0859  -6.3404

[751 rows x 6 columns]]

Sub-Iterations (Advanced)#

Using the iterator to iterate GSs or other types of regions of interest works well, if all of them are defined at the start of the processing. However, sometimes you might want to iterate over sub-regions of the gait sequences where the regions are only calculated during the iteration. In this case, you would need to start creating multiple instances of the iterator. However, this is cumbersome and redundant, as both iterator share a lot of information. Hence, we support this special case with the iterate_subregions method. It takes a gait-sequence list as input, that is defined relative to the gait sequence that is currently processed. It then iterates over the sub-regions, provides new result objects for each sub-region, and then magically aggregates everything after the main iteration ends.

Note

There is one usecase, we don’t support at the moment, and that is accessing the results of the sub-iterations in the outer loop. The results are only available after the main iteration ends. However, for this you can create a new instance of your iterator within the outer loop instead of using iterate_subregions.

Below we show an “artificial” example, where we split each outer gs dynamically into 3 subparts. We then calculate the length of each subpart and detect some “fake” events.

As before, we start by defining a type for the results.

@dataclass
class CustomNestedResults:
    n_samples: int
    outer_regions: pd.DataFrame
    events: pd.DataFrame

And 3 aggregators:

  1. A df-aggregator that adjusts the index of the events to be relative to the start of the original data.

  2. A df-aggregator that adjusts the start/end of the outer_regions to be relative to the start of the original data.

  3. An aggregator that turns the n_samples into a pandas series with the gs identifier as index.

For the first two aggregator, we can just use the default aggregator for dataframes and tell is that we want to modify the ev column based on the start of the respective GS.

events_agg = GsIterator.DefaultAggregators.create_aggregate_df("events", ["ev"])
outer_regions_agg = GsIterator.DefaultAggregators.create_aggregate_df(
    "outer_regions", ["start", "end"]
)

For the second, we will use a modified version of the aggregator we used before. The only difference is that we will make use of the iteration_context. In case of a nested iteration, the context will contain the parent-GS.

def aggregate_n_samples(values: list[GsIterator.IteratorResult[ResultType]]):
    non_null_results: list[GsIterator.IteratorResult[int]] = (
        GsIterator.filter_iterator_results(values, "n_samples")
    )
    results = [r.result for r in non_null_results]
    ids = [
        (r.iteration_context["parent_region"].id, r.input[0].id)
        for r in non_null_results
    ]
    index_col_names = [
        non_null_results[0].iteration_context["parent_region"].id_origin,
        non_null_results[0].input.region.id_origin,
    ]
    index = pd.MultiIndex.from_tuples(ids, names=index_col_names)
    return pd.Series(results, index=index, name="N-Samples")

Now we can define the iterator.

nested_iterator = GsIterator[CustomNestedResults](
    CustomNestedResults,
    aggregations=[
        ("n_samples", aggregate_n_samples),
        ("events", events_agg),
        ("outer_regions", outer_regions_agg),
    ],
)

When we loop the iterator, we will reuse the outer iteration as before, but then “simulate” an algorithm that identifies sub-regions within the gait sequence. Note, that we can write some results in the outer scope and some results in the inner scope.

for (_, data), r in nested_iterator.iterate(long_trial.data_ss, long_trial_gs):
    print(
        f"Length of outer data: {len(data)} samples. Divided by 3: {len(data) // 3} samples."
    )
    r.outer_regions = pd.DataFrame(
        {
            "start": [0, len(data) // 3, 2 * len(data) // 3],
            "end": [len(data) // 3, 2 * len(data) // 3, len(data)],
        }
    ).rename_axis("sub_roi_id")

    # Then we iterate over the sub-regions and calculate the length of each sub-region and identify fake events
    for (_, nested_data), nr in nested_iterator.iterate_subregions(
        r.outer_regions
    ):
        nr.n_samples = len(nested_data)
        nr.events = pd.DataFrame(
            {"ev": np.linspace(0, len(nested_data), 3, dtype="int64")}
        ).rename_axis("step_id")
Length of outer data: 749 samples. Divided by 3: 249 samples.
Length of outer data: 1015 samples. Divided by 3: 338 samples.
Length of outer data: 904 samples. Divided by 3: 301 samples.
Length of outer data: 2296 samples. Divided by 3: 765 samples.
Length of outer data: 831 samples. Divided by 3: 277 samples.
Length of outer data: 751 samples. Divided by 3: 250 samples.

After the iteration, we can access the aggregated results. Let’s start with the unspectacular outer_regions.

As we wrote them in the outer scope, iteration and aggreagtion worked just like before. We can see that the start and end values are now relative to the start of the recording and match the orignal gait sequences (see below).

start end
wb_id sub_roi_id
0 0 1019 1268
1 1268 1518
2 1518 1768
1 0 4534 4872
1 4872 5210
2 5210 5549
2 0 9665 9966
1 9966 10267
2 10267 10569
3 0 12337 13102
1 13102 13867
2 13867 14633
4 0 20151 20428
1 20428 20705
2 20705 20982
5 0 21378 21628
1 21628 21878
2 21878 22129


For reference the outer GSs:

start end n_strides duration_s length_m avg_walking_speed_mps avg_cadence_spm avg_stride_length_m termination_reason
wb_id
0 1019 1768 9 7.48 4.468932 0.847668 107.795850 0.942678 Pause
1 4534 5549 11 10.14 2.900453 0.365176 93.396106 0.483923 Pause
2 9665 10569 9 9.03 2.140232 0.294058 75.981133 0.506458 Pause
3 12337 14633 28 22.95 11.201110 0.634425 92.337768 0.803933 Pause
4 20151 20982 11 8.30 2.390709 0.371746 87.915774 0.507484 Pause
5 21378 22129 9 7.50 2.517558 0.492965 95.365740 0.599360 Pause


We can see that our n_samples are now a multi-index series with both gs-levels as index. The length roughly matches the length of the outer scope that we printed during iteration (see above).

wb_id  sub_roi_id
0      0             249
       1             250
       2             250
1      0             338
       1             338
       2             339
2      0             301
       1             301
       2             302
3      0             765
       1             765
       2             766
4      0             277
       1             277
       2             277
5      0             250
       1             250
       2             251
Name: N-Samples, dtype: int64

The events are also a multi-index dataframe containin both gs-levels. All ev values are modified to be relative to the start of the recording.

ev
wb_id sub_roi_id step_id
0 0 0 1019
1 1143
2 1268
1 0 1268
1 1393
2 1518
2 0 1518
1 1643
2 1768
1 0 0 4534
1 4703
2 4872
1 0 4872
1 5041
2 5210
2 0 5210
1 5379
2 5549
2 0 0 9665
1 9815
2 9966
1 0 9966
1 10116
2 10267
2 0 10267
1 10418
2 10569
3 0 0 12337
1 12719
2 13102
1 0 13102
1 13484
2 13867
2 0 13867
1 14250
2 14633
4 0 0 20151
1 20289
2 20428
1 0 20428
1 20566
2 20705
2 0 20705
1 20843
2 20982
5 0 0 21378
1 21503
2 21628
1 0 21628
1 21753
2 21878
2 0 21878
1 22003
2 22129


Single nested regions/aka refined GS (advanced)#

In some cases, you might want to iterate over a single sub-region of the gait sequence. While you could use the iterate_subregions method, this is a bit cumbersome and makes the code harder to read. For this we provide the with_subregion and the subregion method, where the latter is syntactic sugar for the former. Both methods simply return the same output that you would get per iteration, but simply once. Below a short example on how this works. We start with the subregion version, as this is actually the recommended way to use it, as we think it is easier to read, even though it might be a bit surprising that Python allows this.

We are going to reuse most of the setup from the previous example.

flat_nested_iterator = GsIterator[CustomNestedResults](
    CustomNestedResults,
    aggregations=[
        ("n_samples", aggregate_n_samples),
        ("events", events_agg),
        ("outer_regions", outer_regions_agg),
    ],
)

But then we will use the subregion to run some computations in the context of the refined GS. The return value of subregion acts as contextmanager, that allows to visually encapsulate the code that is run in the context of the refined GS.

for (_, data), r in flat_nested_iterator.iterate(
    long_trial.data_ss, long_trial_gs
):
    r.outer_regions = pd.DataFrame(
        {
            "start": [5],
            "end": [len(data) - 5],
        }
    ).rename_axis("refined_gs_id")

    with flat_nested_iterator.subregion(r.outer_regions.iloc[[0]]) as (
        (_, refined_data),
        refined_result,
    ):
        refined_result.n_samples = len(refined_data)
        refined_result.events = pd.DataFrame(
            {"ev": np.linspace(0, len(refined_data), 3, dtype="int64")}
        ).rename_axis("step_id")

This is equivalent to the following code, using with_subregion:

for (_, data), r in flat_nested_iterator.iterate(
    long_trial.data_ss, long_trial_gs
):
    r.outer_regions = pd.DataFrame(
        {
            "start": [5],
            "end": [len(data) - 5],
        }
    ).rename_axis("refined_gs_id")

    (_, refined_data), refined_result = flat_nested_iterator.with_subregion(
        r.outer_regions.iloc[[0]]
    )
    refined_result.n_samples = len(refined_data)
    refined_result.events = pd.DataFrame(
        {"ev": np.linspace(0, len(refined_data), 3, dtype="int64")}
    ).rename_axis("step_id")

And in both cases everything is aggregated as expected.

start end
wb_id refined_gs_id
0 0 1024 1763
1 0 4539 5544
2 0 9670 10564
3 0 12342 14628
4 0 20156 20977
5 0 21383 22124


wb_id  refined_gs_id
0      0                 739
1      0                1005
2      0                 894
3      0                2286
4      0                 821
5      0                 741
Name: N-Samples, dtype: int64
ev
wb_id refined_gs_id step_id
0 0 0 1024
1 1393
2 1763
1 0 0 4539
1 5041
2 5544
2 0 0 9670
1 10117
2 10564
3 0 0 12342
1 13485
2 14628
4 0 0 20156
1 20566
2 20977
5 0 0 21383
1 21753
2 22124


Nested Iterations - under the Hood#

These nested iterators a re a little bit black magic… If you are working with them, it might be nice to have some understanding of what is going on.

When a new item is yielded during iteration (in the outer or the inner), the iterator will create a new instance of result object and will internally store this object together with some metadata. This metadata includes an indicator, if we are in the parent or sub-iteration scope and in case of the subscope it contains the parent GS we are iterating.

We can see the stored information by inspecting raw_results_. We will do that for the nested iterator we used before. We will format them a little to make things easier to read.

from pprint import pprint

pprint(
    [
        v._replace(result="...", input=(v.input[0], "..."))
        for v in nested_iterator.raw_results_
    ]
)
[TypedIteratorResultTuple(iteration_name='__main__', input=(Region(id=0, start=1019, end=1768, id_origin='wb_id'), '...'), result='...', iteration_context={}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=249, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=0, start=1019, end=1768, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=249, end=499, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=0, start=1019, end=1768, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=499, end=749, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=0, start=1019, end=1768, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__main__', input=(Region(id=1, start=4534, end=5549, id_origin='wb_id'), '...'), result='...', iteration_context={}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=338, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=1, start=4534, end=5549, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=338, end=676, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=1, start=4534, end=5549, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=676, end=1015, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=1, start=4534, end=5549, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__main__', input=(Region(id=2, start=9665, end=10569, id_origin='wb_id'), '...'), result='...', iteration_context={}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=301, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=2, start=9665, end=10569, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=301, end=602, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=2, start=9665, end=10569, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=602, end=904, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=2, start=9665, end=10569, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__main__', input=(Region(id=3, start=12337, end=14633, id_origin='wb_id'), '...'), result='...', iteration_context={}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=765, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=3, start=12337, end=14633, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=765, end=1530, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=3, start=12337, end=14633, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=1530, end=2296, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=3, start=12337, end=14633, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__main__', input=(Region(id=4, start=20151, end=20982, id_origin='wb_id'), '...'), result='...', iteration_context={}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=277, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=4, start=20151, end=20982, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=277, end=554, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=4, start=20151, end=20982, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=554, end=831, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=4, start=20151, end=20982, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__main__', input=(Region(id=5, start=21378, end=22129, id_origin='wb_id'), '...'), result='...', iteration_context={}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=250, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=5, start=21378, end=22129, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=250, end=500, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=5, start=21378, end=22129, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=500, end=751, id_origin='sub_roi_id'), '...'), result='...', iteration_context={'parent_region': Region(id=5, start=21378, end=22129, id_origin='wb_id')})]

All iterations that are marked as __sub_iter__ are the sub-iterations and we can see that they have the parent GS in the context.

If we look at the result values, we can see that the n_samples are only on result objects that come from the inner scope. For the result objects from the outer scope, the n_samples are set to NOT_SET.

pprint(
    [
        (v.iteration_name, v.result.n_samples)
        for v in nested_iterator.raw_results_
    ]
)
[('__main__', NOT_SET),
 ('__sub_iter__', 249),
 ('__sub_iter__', 250),
 ('__sub_iter__', 250),
 ('__main__', NOT_SET),
 ('__sub_iter__', 338),
 ('__sub_iter__', 338),
 ('__sub_iter__', 339),
 ('__main__', NOT_SET),
 ('__sub_iter__', 301),
 ('__sub_iter__', 301),
 ('__sub_iter__', 302),
 ('__main__', NOT_SET),
 ('__sub_iter__', 765),
 ('__sub_iter__', 765),
 ('__sub_iter__', 766),
 ('__main__', NOT_SET),
 ('__sub_iter__', 277),
 ('__sub_iter__', 277),
 ('__sub_iter__', 277),
 ('__main__', NOT_SET),
 ('__sub_iter__', 250),
 ('__sub_iter__', 250),
 ('__sub_iter__', 251)]

The second piece of “magic” happens in the aggregation functions. There we use the filter_iterator_results function to filter out the NOT_SET values, so that we can operate on the actual values and use their context to make adjustments/aggregate them.

pprint(
    [
        v._replace(input=(v.input[0], "..."))
        for v in GsIterator.filter_iterator_results(
            nested_iterator.raw_results_, "n_samples"
        )
    ]
)
[TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=249, id_origin='sub_roi_id'), '...'), result=249, iteration_context={'parent_region': Region(id=0, start=1019, end=1768, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=249, end=499, id_origin='sub_roi_id'), '...'), result=250, iteration_context={'parent_region': Region(id=0, start=1019, end=1768, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=499, end=749, id_origin='sub_roi_id'), '...'), result=250, iteration_context={'parent_region': Region(id=0, start=1019, end=1768, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=338, id_origin='sub_roi_id'), '...'), result=338, iteration_context={'parent_region': Region(id=1, start=4534, end=5549, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=338, end=676, id_origin='sub_roi_id'), '...'), result=338, iteration_context={'parent_region': Region(id=1, start=4534, end=5549, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=676, end=1015, id_origin='sub_roi_id'), '...'), result=339, iteration_context={'parent_region': Region(id=1, start=4534, end=5549, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=301, id_origin='sub_roi_id'), '...'), result=301, iteration_context={'parent_region': Region(id=2, start=9665, end=10569, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=301, end=602, id_origin='sub_roi_id'), '...'), result=301, iteration_context={'parent_region': Region(id=2, start=9665, end=10569, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=602, end=904, id_origin='sub_roi_id'), '...'), result=302, iteration_context={'parent_region': Region(id=2, start=9665, end=10569, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=765, id_origin='sub_roi_id'), '...'), result=765, iteration_context={'parent_region': Region(id=3, start=12337, end=14633, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=765, end=1530, id_origin='sub_roi_id'), '...'), result=765, iteration_context={'parent_region': Region(id=3, start=12337, end=14633, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=1530, end=2296, id_origin='sub_roi_id'), '...'), result=766, iteration_context={'parent_region': Region(id=3, start=12337, end=14633, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=277, id_origin='sub_roi_id'), '...'), result=277, iteration_context={'parent_region': Region(id=4, start=20151, end=20982, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=277, end=554, id_origin='sub_roi_id'), '...'), result=277, iteration_context={'parent_region': Region(id=4, start=20151, end=20982, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=554, end=831, id_origin='sub_roi_id'), '...'), result=277, iteration_context={'parent_region': Region(id=4, start=20151, end=20982, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=0, start=0, end=250, id_origin='sub_roi_id'), '...'), result=250, iteration_context={'parent_region': Region(id=5, start=21378, end=22129, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=1, start=250, end=500, id_origin='sub_roi_id'), '...'), result=250, iteration_context={'parent_region': Region(id=5, start=21378, end=22129, id_origin='wb_id')}),
 TypedIteratorResultTuple(iteration_name='__sub_iter__', input=(Region(id=2, start=500, end=751, id_origin='sub_roi_id'), '...'), result=251, iteration_context={'parent_region': Region(id=5, start=21378, end=22129, id_origin='wb_id')})]

After the filtering, we only have cases where the value was provided (only inner-iterations in this case). Based on this we can do further processing.

Total running time of the script: (0 minutes 4.667 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery