.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/pipeline/_01_gs_iterator.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_pipeline__01_gs_iterator.py: .. _gs_iterator_example: Gait Sequence Iterator ====================== As part of most pipelines, we need to iterate over the gait sequences to apply all further algorithms to them individually. This can be a bit cumbersome, as we need to iterate over the data and aggregate the results at the same time. Hence, we provide some helpers for that. We provide two ways of iterating. The first one, only handles the iteration and does not aggregate the results. The second approach attempts to also support you in aggregating the results. Getting Some Example Data ------------------------- .. GENERATED FROM PYTHON SOURCE LINES 20-32 .. code-block:: default import numpy as np import pandas as pd from mobgap.data import LabExampleDataset lab_example_data = LabExampleDataset(reference_system="INDIP") long_trial = lab_example_data.get_subset(cohort="MS", participant_id="001", test="Test11", trial="Trial1") long_trial_gs = long_trial.reference_parameters_.wb_list long_trial_gs .. raw:: html
start end n_strides duration_s length_m avg_speed_mps avg_cadence_spm avg_stride_length_m termination_reason
wb_id
1 1019 1768 9 7.48 4.468932 0.847668 107.795850 0.942678 Pause
2 4534 5549 11 10.14 2.900453 0.365176 93.396106 0.483923 Pause
3 9665 10569 9 9.03 2.140232 0.294058 75.981133 0.506458 Pause
4 12337 14633 28 22.95 11.201110 0.634425 92.337768 0.803933 Pause
5 20151 20982 11 8.30 2.390709 0.371746 87.915774 0.507484 Pause
6 21378 22129 9 7.50 2.517558 0.492965 95.365740 0.599360 Pause


.. GENERATED FROM PYTHON SOURCE LINES 33-43 Simple Functional Interface --------------------------- We provide the :func:`~mobgap.pipeline.iter_gs` function to iterate over the gait sequences. It simply takes the data and the gait sequence list and cuts the data accordingly to iterate over it. The function yields the gait sequence information as tuple (i.e. the "row" of the gs dataframe as namedtuple) and the data for each iteration. Note that the index of the data is not changed. Hence we recommend using `iloc` to access the data (`iloc[0]` will return the first sample of the gait sequence). Using our example data and gs, we can iterate over the data as follows: .. GENERATED FROM PYTHON SOURCE LINES 43-53 .. code-block:: default from mobgap.pipeline import iter_gs for gs, data in iter_gs(long_trial.data["LowerBack"], long_trial_gs): # Note that the key to access the id is called "wb_id" here, as we loaded the WB from the reference system. # If this is an "actual" gait sequences, as calculated by one of the GSD algorithms, the key would be "gs_id". print("Gait Sequence: ", gs) print("Expected N-samples in gs: ", gs.end - gs.start) print("N-samples in gs: ", len(data)) print("First sample of gs:\n", data.iloc[0], end="\n\n") .. rst-class:: sphx-glr-script-out .. code-block:: none Gait Sequence: WalkingBout(wb_id=1, start=1019, end=1768) Expected N-samples in gs: 749 N-samples in gs: 749 First sample of gs: acc_x 10.750431 acc_y 0.390207 acc_z -2.088885 gyr_x 13.226900 gyr_y -4.914900 gyr_z 19.874400 Name: 2020-10-30 12:53:33.211999893+00:00, dtype: float64 Gait Sequence: WalkingBout(wb_id=2, start=4534, end=5549) Expected N-samples in gs: 1015 N-samples in gs: 1015 First sample of gs: acc_x 9.997300 acc_y -0.864568 acc_z -0.297291 gyr_x 3.193700 gyr_y 12.861200 gyr_z -5.371300 Name: 2020-10-30 12:54:08.361999989+00:00, dtype: float64 Gait Sequence: WalkingBout(wb_id=3, start=9665, end=10569) Expected N-samples in gs: 904 N-samples in gs: 904 First sample of gs: acc_x 8.892269 acc_y -0.136856 acc_z -3.006764 gyr_x -7.543100 gyr_y -5.688700 gyr_z 0.470900 Name: 2020-10-30 12:54:59.671999931+00:00, dtype: float64 Gait Sequence: WalkingBout(wb_id=4, start=12337, end=14633) Expected N-samples in gs: 2296 N-samples in gs: 2296 First sample of gs: acc_x 10.236130 acc_y -1.686952 acc_z -0.285697 gyr_x 84.994600 gyr_y -17.645800 gyr_z -24.538000 Name: 2020-10-30 12:55:26.391999960+00:00, dtype: float64 Gait Sequence: WalkingBout(wb_id=5, start=20151, end=20982) Expected N-samples in gs: 831 N-samples in gs: 831 First sample of gs: acc_x 9.135405 acc_y -1.797439 acc_z 0.479475 gyr_x -137.510300 gyr_y 16.480800 gyr_z -21.008900 Name: 2020-10-30 12:56:44.532000064+00:00, dtype: float64 Gait Sequence: WalkingBout(wb_id=6, start=21378, end=22129) Expected N-samples in gs: 751 N-samples in gs: 751 First sample of gs: acc_x 9.895224 acc_y -3.081051 acc_z -1.070252 gyr_x 56.595700 gyr_y -5.242100 gyr_z -16.025800 Name: 2020-10-30 12:56:56.802000046+00:00, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 54-95 .. note:: The ``gs`` named-tuples returned by the iterator can either be of type ``GaitSequence`` or ``WalkingBout``. In both cases they contain the fields ``id``, ``start``, and ``end`` in this order. When using the named access the ``id`` field can also be accessed via the ``wb_id``/``gs_id`` field ( depending on the type of the gait sequence). You can see that this way it is pretty easy to iterate over the data. However, if you are planning to run calculations on the data, you need to aggregate the results yourself. If you are planning to collect multiple pieces of results, this can become cumbersome. See the is `tpcp example `__ for more information about this. Therefore, we also provide an Iterator Class based on :class:`~tpcp.misc.TypedIterator`. Class based Interface --------------------- .. note:: Learn more about the general approach of using :class:`~tpcp.misc.TypedIterator` classes in this `tpcp example `__. Compared to the functional interface, the class interface attempts to also solve the problem of collecting the and aggregating results that you produce per GS. In a typical pipeline you might want to calculate the initial contacts, cadence, stride length, and gait speed for each gait sequence. With the class based interface, you can easily collect all of these results and then aggregate them into one predefined data structure. The class based interface can be used in two ways. First in the "default" configuration, which is set up to work with the typical calculations and results that you would expect from a typical processing pipeline. And second, in a custom way, where you need to define expected "results" per iteration yourself. The simple case --------------- The simple case basically no more setup as the functional interface. However, it assumes that your results are a subset of initial contacts, cadence, stride length, and gait speed, and that all of them are stored in the expected mobgap datatypes (aka pandas dataframes). The iterator will then automatically aggregate the results the dataframes per iteration into one combined dataframe, handling the sample offsets of the gait sequences for you. Below we will show how this works, by "simulating" the calculation of some initial contacts and cadence. We start by setting up an iterator object. We can leave everything at the default values, as we do not need any custom aggregation functions. .. GENERATED FROM PYTHON SOURCE LINES 95-100 .. code-block:: default from mobgap.pipeline import GsIterator iterator = GsIterator() dt = iterator.data_type .. GENERATED FROM PYTHON SOURCE LINES 101-102 The default result datatype per iteration is defined as follows: .. GENERATED FROM PYTHON SOURCE LINES 102-108 .. code-block:: default import inspect from IPython.core.display_functions import display display(inspect.getsource(iterator.data_type)) .. rst-class:: sphx-glr-script-out .. code-block:: none @dataclass class FullPipelinePerGsResult: """Default expected result type for the gait-sequence iterator. When using the :class:`~mobgap.pipeline.GsIterator` with the default configuration, an instance of this dataclass will be created for each gait-sequence. Each value is expected to be a dataframe. Attributes ---------- ic_list The initial contacts for each gait-sequence. This is a dataframe with a column called ``ic``. The values of this ic-column are expected to be samples relative to the start of the gait-sequence. cad_per_sec The cadence values within each gait-sequence. This dataframe has no further requirements relevant for the iterator. stride_length The stride length values within each gait-sequence. This dataframe has no further requirements relevant for the iterator. gait_speed The gait speed values within each gait-sequence. This dataframe has no further requirements relevant for the iterator. """ ic_list: pd.DataFrame cad_per_sec: pd.DataFrame stride_length: pd.DataFrame gait_speed: pd.DataFrame .. GENERATED FROM PYTHON SOURCE LINES 109-115 This means you are only allowed to use the available attributes. But, you don't need to specify all of them. Below we will only "calculate" the initial contacts and cadence. In each iteration the iterator will give us a tuple of the gait sequence information, the data for the iteration, and a new empty result object. .. GENERATED FROM PYTHON SOURCE LINES 115-128 .. code-block:: default from mobgap.utils.conversions import as_samples for (gs, data), result in iterator.iterate(long_trial.data["LowerBack"], long_trial_gs): # Now we can just "calculate" the initial contacts and set it on the result object. result.ic_list = pd.DataFrame(np.arange(0, len(data), 100), columns=["ic"]).rename_axis(index="step_id") # For cadence, we just set a dummy value to the wb_id for each 1 second bout of the data. n_seconds = int(len(data) // long_trial.sampling_rate_hz) result.cad_per_sec = pd.DataFrame( [gs.id] * n_seconds, columns=["cad_spm"], index=as_samples(np.arange(0, n_seconds) + 0.5, long_trial.sampling_rate_hz), ).rename_axis(index="sec_center_samples") .. GENERATED FROM PYTHON SOURCE LINES 129-130 After the iteration, we can access the aggregated results either using the `results_` property of the iterator .. GENERATED FROM PYTHON SOURCE LINES 130-132 .. code-block:: default iterator.results_.ic_list .. raw:: html
ic
wb_id step_id
1 0 1019
1 1119
2 1219
3 1319
4 1419
... ... ...
6 3 21678
4 21778
5 21878
6 21978
7 22078

69 rows × 1 columns



.. GENERATED FROM PYTHON SOURCE LINES 133-135 Or via direct dynamic property access, where we add a trailing underscore to the name of the result (`result.ic_list` -> `iterator.ic_list_`). .. GENERATED FROM PYTHON SOURCE LINES 135-137 .. code-block:: default iterator.ic_list_ .. raw:: html
ic
wb_id step_id
1 0 1019
1 1119
2 1219
3 1319
4 1419
... ... ...
6 3 21678
4 21778
5 21878
6 21978
7 22078

69 rows × 1 columns



.. GENERATED FROM PYTHON SOURCE LINES 138-145 We can see that we only get a single dataframe with all the results. And all ICs are offset, so that they are relative to the start of the recording and not the start of the gait sequence anymore. For the cadence value, the index represents the sample of the center of the second the cadence value belongs to. This value was originally relative to the start of the GS. We can see that in the aggregated results this is transformed back to be relative to the start of the recording. .. GENERATED FROM PYTHON SOURCE LINES 145-148 .. code-block:: default iterator.results_.cad_per_sec .. raw:: html
cad_spm
wb_id sec_center_samples
1 1069 1
1169 1
1269 1
1369 1
1469 1
... ... ...
6 21628 6
21728 6
21828 6
21928 6
22028 6

63 rows × 1 columns



.. GENERATED FROM PYTHON SOURCE LINES 149-164 But what to do, if you don't want to use the default result datatype? Custom Results -------------- This requires a little bit more setup. First we need to decide what results we expect. This is done by defining a dataclass that represents the results. Here we create a new dataclass that only expect two dummy results, but you can add as many as you want. You could also subclass the default dataclass and just add the additional results. The first result here is ``n_samples`` which is just a dummy results indicating the number of samples the data has. The second result is ``filtered_data`` (we will just add some dummy data here). This is expected to be a pd.DataFrame to demonstrate that you can also return more complex results. .. GENERATED FROM PYTHON SOURCE LINES 164-173 .. code-block:: default from dataclasses import dataclass @dataclass class ResultType: n_samples: int filtered_data: pd.DataFrame .. GENERATED FROM PYTHON SOURCE LINES 174-183 For each iteration (i.e. for each gait sequence), we will create one instance of this dataclass. The list of these instances will be available as the `raw_results_` attribute of the iterator. We can also decide to aggregate the results. We provide some default aggregations functions (see ``GsIterator.DEFAULT_AGGREGATORS``), that you could use. However, here we will create our own aggregation function. It might be nice to turn the ``n_samples`` into a pandas series with the gs identifier as index. For this we define an aggregation function that expects the list of inputs and the list of results as inputs. .. GENERATED FROM PYTHON SOURCE LINES 183-192 .. code-block:: default def aggregate_n_samples(inputs, results): gait_sequences, _ = zip(*inputs) return pd.Series(results, index=[gs.id for gs in gait_sequences], name="N-Samples") aggregations = [("n_samples", aggregate_n_samples)] .. GENERATED FROM PYTHON SOURCE LINES 193-197 Now we can create an instance of the iterator. Note, that if we want to correctly infer the result type, we need to use the somewhat weird square bracket-typing syntax, when creating the iterator. This will allow to autocomplete the attributes of the result type. .. GENERATED FROM PYTHON SOURCE LINES 197-201 .. code-block:: default from mobgap.pipeline import GsIterator custom_iterator = GsIterator[ResultType](ResultType, aggregations=aggregations) .. GENERATED FROM PYTHON SOURCE LINES 202-204 Iterating over the iterator now provides us the row from the gait sequence list (which we ignore here), the data for each iteration, and the empty result object, we can fill up each iteration. .. GENERATED FROM PYTHON SOURCE LINES 204-212 .. code-block:: default for (_, data), custom_result in custom_iterator.iterate(long_trial.data["LowerBack"], long_trial_gs): # We just calculate the length, but you can image any other calculation here. # Then we just set the result. custom_result.n_samples = len(data) # For the "filtered" data we just subtract 1 form the input custom_result.filtered_data = data - 1 .. GENERATED FROM PYTHON SOURCE LINES 213-214 Then we can easily inspect the aggregated results. .. GENERATED FROM PYTHON SOURCE LINES 214-216 .. code-block:: default custom_iterator.results_.n_samples .. rst-class:: sphx-glr-script-out .. code-block:: none 1 749 2 1015 3 904 4 2296 5 831 6 751 Name: N-Samples, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 217-218 For the filtered data, we did not apply any aggregation and hence just get a list of all results. .. GENERATED FROM PYTHON SOURCE LINES 218-219 .. code-block:: default custom_iterator.results_.filtered_data .. rst-class:: sphx-glr-script-out .. code-block:: none [ acc_x acc_y ... gyr_y gyr_z time ... 2020-10-30 12:53:33.211999893+00:00 9.750431 -0.609793 ... -5.9149 18.8744 2020-10-30 12:53:33.221999884+00:00 9.216030 -1.843504 ... -16.0051 24.6507 2020-10-30 12:53:33.232000113+00:00 8.318355 -2.916044 ... -21.3116 26.9805 2020-10-30 12:53:33.242000103+00:00 7.360369 -3.854483 ... -22.8495 25.4291 2020-10-30 12:53:33.252000092+00:00 6.388862 -4.409548 ... -21.4698 21.1185 ... ... ... ... ... ... 2020-10-30 12:53:40.651999950+00:00 7.844738 -3.163485 ... 3.7649 -10.4524 2020-10-30 12:53:40.661999941+00:00 7.819128 -3.057138 ... 5.7176 -11.1201 2020-10-30 12:53:40.671999931+00:00 7.900424 -2.964919 ... 6.3879 -11.5961 2020-10-30 12:53:40.681999922+00:00 7.916549 -2.904721 ... 5.8304 -12.1774 2020-10-30 12:53:40.691999912+00:00 8.065151 -2.800718 ... 6.3839 -12.1929 [749 rows x 6 columns], acc_x acc_y ... gyr_y gyr_z time ... 2020-10-30 12:54:08.361999989+00:00 8.997300 -1.864568 ... 11.8612 -6.3713 2020-10-30 12:54:08.371999979+00:00 9.010271 -1.821648 ... 11.6298 -4.6801 2020-10-30 12:54:08.381999969+00:00 9.069629 -1.717282 ... 11.0973 -3.1574 2020-10-30 12:54:08.391999960+00:00 9.392935 -1.495418 ... 9.9199 -0.4397 2020-10-30 12:54:08.401999950+00:00 9.784886 -1.396940 ... 7.7379 3.6206 ... ... ... ... ... ... 2020-10-30 12:54:18.461999893+00:00 7.047536 0.514839 ... -27.7005 35.8898 2020-10-30 12:54:18.471999884+00:00 6.969416 0.439982 ... -26.9190 37.7245 2020-10-30 12:54:18.482000113+00:00 6.719520 0.505765 ... -26.5421 38.8936 2020-10-30 12:54:18.492000103+00:00 6.450755 0.743554 ... -26.5874 39.8177 2020-10-30 12:54:18.502000093+00:00 6.194022 0.940628 ... -26.6665 40.8314 [1015 rows x 6 columns], acc_x acc_y ... gyr_y gyr_z time ... 2020-10-30 12:54:59.671999931+00:00 7.892269 -1.136856 ... -6.6887 -0.5291 2020-10-30 12:54:59.681999922+00:00 7.990240 -1.144530 ... -7.2629 -0.7600 2020-10-30 12:54:59.691999912+00:00 8.077780 -1.182633 ... -7.6096 -1.1997 2020-10-30 12:54:59.701999903+00:00 8.153184 -1.142374 ... -7.9950 -1.5970 2020-10-30 12:54:59.711999893+00:00 8.131641 -1.144750 ... -8.3241 -1.6867 ... ... ... ... ... ... 2020-10-30 12:55:08.661999941+00:00 8.136143 -2.346360 ... 16.1862 -7.2686 2020-10-30 12:55:08.671999931+00:00 8.152235 -2.268232 ... 17.1322 -8.0022 2020-10-30 12:55:08.681999922+00:00 8.182559 -2.251994 ... 17.5279 -8.3387 2020-10-30 12:55:08.691999912+00:00 8.305513 -2.482470 ... 18.2932 -8.8593 2020-10-30 12:55:08.701999903+00:00 8.434217 -2.731595 ... 18.5796 -7.7286 [904 rows x 6 columns], acc_x acc_y ... gyr_y gyr_z time ... 2020-10-30 12:55:26.391999960+00:00 9.236130 -2.686952 ... -18.6458 -25.5380 2020-10-30 12:55:26.401999950+00:00 9.366706 -2.470049 ... -20.7625 -24.8423 2020-10-30 12:55:26.411999941+00:00 9.266174 -2.263981 ... -18.0376 -24.7323 2020-10-30 12:55:26.421999931+00:00 8.996807 -2.105405 ... -10.3876 -24.2681 2020-10-30 12:55:26.431999922+00:00 8.920447 -1.906488 ... -1.9655 -23.3928 ... ... ... ... ... ... 2020-10-30 12:55:49.302000046+00:00 7.716993 0.326495 ... 15.9933 7.4403 2020-10-30 12:55:49.312000036+00:00 7.728400 0.574557 ... 15.3581 5.5792 2020-10-30 12:55:49.322000027+00:00 7.816767 0.665758 ... 15.8199 4.5902 2020-10-30 12:55:49.332000017+00:00 8.074722 0.309418 ... 17.9715 3.6861 2020-10-30 12:55:49.342000008+00:00 8.270306 -0.338238 ... 20.9984 3.2169 [2296 rows x 6 columns], acc_x acc_y ... gyr_y gyr_z time ... 2020-10-30 12:56:44.532000064+00:00 8.135405 -2.797439 ... 15.4808 -22.0089 2020-10-30 12:56:44.542000055+00:00 8.475562 -2.708892 ... 16.3716 -23.4909 2020-10-30 12:56:44.552000046+00:00 8.853603 -2.523719 ... 15.7210 -24.0634 2020-10-30 12:56:44.562000036+00:00 9.352082 -2.173552 ... 14.0635 -24.1058 2020-10-30 12:56:44.572000027+00:00 9.733701 -1.799136 ... 12.3673 -23.7012 ... ... ... ... ... ... 2020-10-30 12:56:52.792000055+00:00 8.036729 -2.491177 ... 13.5385 1.7647 2020-10-30 12:56:52.802000046+00:00 8.545972 -2.340976 ... 11.4386 6.4245 2020-10-30 12:56:52.812000036+00:00 8.822208 -2.348849 ... 12.2948 10.9618 2020-10-30 12:56:52.822000027+00:00 9.126596 -2.336454 ... 15.8081 14.4861 2020-10-30 12:56:52.832000017+00:00 9.465227 -2.308985 ... 17.4874 17.1938 [831 rows x 6 columns], acc_x acc_y ... gyr_y gyr_z time ... 2020-10-30 12:56:56.802000046+00:00 8.895224 -4.081051 ... -6.2421 -17.0258 2020-10-30 12:56:56.812000036+00:00 8.844582 -4.294596 ... -12.8417 -14.2214 2020-10-30 12:56:56.822000027+00:00 8.441273 -4.354333 ... -10.4114 -12.7980 2020-10-30 12:56:56.832000017+00:00 7.944172 -4.372805 ... -0.0473 -10.9684 2020-10-30 12:56:56.842000008+00:00 7.882891 -4.435929 ... 11.1010 -7.4715 ... ... ... ... ... ... 2020-10-30 12:57:04.262000084+00:00 7.601535 1.485462 ... -2.9975 -8.6763 2020-10-30 12:57:04.272000074+00:00 8.019890 1.890526 ... -1.3660 -9.6691 2020-10-30 12:57:04.282000065+00:00 8.437574 2.206942 ... -0.3714 -8.8523 2020-10-30 12:57:04.292000055+00:00 8.749383 2.304567 ... -2.6950 -7.8553 2020-10-30 12:57:04.302000046+00:00 9.103801 2.274051 ... -10.0859 -6.3404 [751 rows x 6 columns]] .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.930 seconds) **Estimated memory usage:** 9 MB .. _sphx_glr_download_auto_examples_pipeline__01_gs_iterator.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _01_gs_iterator.py <_01_gs_iterator.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _01_gs_iterator.ipynb <_01_gs_iterator.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_