Working with reference data#

Often you want to test an algorithmic step in isolation or validate the output of an algorithm. In both cases, you need reference data - either as input or as a comparison.

As explained in the data example, reference data that is stored in .mat files can be easily loaded using the existing tooling.

In this example, we will go into more detail about common patterns of using this reference data.

Ref Data as input on a GS level#

Most algorithms (after the GS detection) expect the data of only a single GS. If you want to test such an algorithm, you need to use the GS/WB information of the reference data to cut the data accordingly. Further, you might also want to get the reference information belonging to the GS/WB.

This can be achieved using the GsIterator (or the iter_gs function).

But first, we need to load some example data.

from mobgap.data import LabExampleDataset

dataset = LabExampleDataset(reference_system="INDIP")
datapoint = dataset.get_subset(
    cohort="HA", participant_id="001", test="Test11", trial="Trial1"
)
data = datapoint.data_ss
data
acc_x acc_y acc_z gyr_x gyr_y gyr_z
time
2020-12-10 13:09:03.815000057+00:00 9.690482 -0.498529 -0.135459 -2.326209 2.967921 -0.916732
2020-12-10 13:09:03.825000048+00:00 9.656917 -0.489721 -0.132339 -1.993893 2.647065 -0.813600
2020-12-10 13:09:03.835000038+00:00 9.611984 -0.476001 -0.097731 -1.535527 2.188699 -0.561499
2020-12-10 13:09:03.845000029+00:00 9.624743 -0.467484 -0.144403 -0.882355 1.879302 -0.355234
2020-12-10 13:09:03.855000019+00:00 9.636681 -0.488557 -0.197406 -0.320856 1.478231 -0.595876
... ... ... ... ... ... ...
2020-12-10 13:11:21.355000019+00:00 9.605895 -0.272558 -0.233749 -2.245995 1.294885 -2.417882
2020-12-10 13:11:21.365000010+00:00 9.606560 -0.333937 -0.245102 -2.727279 0.641713 -2.910626
2020-12-10 13:11:21.375000+00:00 9.577996 -0.333360 -0.271078 -4.549285 -0.160428 -3.701307
2020-12-10 13:11:21.384999990+00:00 9.592582 -0.389587 -0.202253 -5.890006 -1.226130 -4.365938
2020-12-10 13:11:21.394999981+00:00 9.640033 -0.392071 -0.276513 -7.013003 -2.234535 -4.595122

13759 rows × 6 columns



Then we load the reference data. There are two versions available: One version where all values are provided relative to the start of the recording and one version where the values are provided relative to the start of the respective GS/WB. Below we can see the first version, here both the walking bouts and the initial contacts (and other parameters) are provided relative to the start of the recording.

ref_data = datapoint.reference_parameters_
ref_data.wb_list
start end n_strides duration_s length_m avg_walking_speed_mps avg_cadence_spm avg_stride_length_m termination_reason
wb_id
0 632 988 5 3.55 3.428989 0.975373 104.069084 1.124391 Pause
1 2864 3325 4 4.60 1.452572 0.411857 81.296475 0.581029 Pause
2 3853 5085 16 12.31 7.044042 0.617801 89.246331 0.838960 Pause
3 7641 8621 12 9.79 4.396574 0.510108 94.370318 0.645176 Pause
4 9451 9932 6 4.80 3.545277 0.755728 88.778698 1.021695 Pause
5 11989 12517 6 5.27 3.514735 0.880632 95.832693 1.021576 Pause


ic lr_label
wb_id step_id
0 0 632 left
1 709 right
2 763 left
3 824 right
4 876 left
... ... ... ...
5 3 12162 left
4 12220 right
5 12277 left
6 12335 right
7 12516 left

63 rows × 2 columns



However, as we want to use the reference data as input to an algorithm on a GS level, we use the version that provides values relative to the start of the GS/WB.

The start and end values of reference WB are of course still relative to the start of the recording.

start end n_strides duration_s length_m avg_walking_speed_mps avg_cadence_spm avg_stride_length_m termination_reason
wb_id
0 632 988 5 3.55 3.428989 0.975373 104.069084 1.124391 Pause
1 2864 3325 4 4.60 1.452572 0.411857 81.296475 0.581029 Pause
2 3853 5085 16 12.31 7.044042 0.617801 89.246331 0.838960 Pause
3 7641 8621 12 9.79 4.396574 0.510108 94.370318 0.645176 Pause
4 9451 9932 6 4.80 3.545277 0.755728 88.778698 1.021695 Pause
5 11989 12517 6 5.27 3.514735 0.880632 95.832693 1.021576 Pause


But the ICs time-samples are now relative to the start of the respective GS/WB.

ic lr_label
step_id
0 0 left
1 77 right
2 131 left
3 192 right
4 244 left
5 301 right
6 355 left


ref_ics_rel.loc[1]  # Second WB
ic lr_label
step_id
0 0 right
1 71 left
2 132 right
3 194 left
4 263 right
5 460 left


Now we can use the GsIterator to iterate over the data. Check out the gs_iterator example for more information.

from mobgap.pipeline import GsIterator

gs_iterator = GsIterator()

# For most use-cases, the default configuration of the :class:`~mobgap.pipeline.GsIterator` should be sufficient.
# This allows you to specify the following results:
gs_iterator.data_type

If you want to change the default behaviour, you can create a custom dataclass (check the example linked above)

The iterator provides us the cut data and an object representing all information of the respective GS/WB. The latter can be used to index other aspects of the reference data.

for (wb, data_per_wb), result in gs_iterator.iterate(data, ref_walking_bouts):
    print("GS/WB id: ", wb.id)
    print(
        "Expected N-samples in wb: ",
        ref_walking_bouts.loc[wb.id].end - ref_walking_bouts.loc[wb.id].start,
    )
    print("N-samples in wb: ", len(data_per_wb))

    # We can use the wb.id to get the reference initial contacts that belong to this GS/WB
    ics_per_wb = ref_ics_rel.loc[wb.id]
    # These could be used in some algorithm.
    # Here we will just store them in the results.
    result.ic_list = ics_per_wb
GS/WB id:  0
Expected N-samples in wb:  356
N-samples in wb:  356
GS/WB id:  1
Expected N-samples in wb:  461
N-samples in wb:  461
GS/WB id:  2
Expected N-samples in wb:  1232
N-samples in wb:  1232
GS/WB id:  3
Expected N-samples in wb:  980
N-samples in wb:  980
GS/WB id:  4
Expected N-samples in wb:  481
N-samples in wb:  481
GS/WB id:  5
Expected N-samples in wb:  528
N-samples in wb:  528

The iterator will also conveniently aggregate the results for us. You can see that the initial contacts are now stored in a single dataframe and the values are transformed back to be relative to the start of the recording and not the GS anymore.

ic lr_label
wb_id step_id
0 0 632 left
1 709 right
2 763 left
3 824 right
4 876 left
... ... ... ...
5 3 12162 left
4 12220 right
5 12277 left
6 12335 right
7 12516 left

63 rows × 2 columns



Total running time of the script: (0 minutes 1.711 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery