Working with reference data#

Often you want to test an algorithmic step in isolation or validate the output of an algorithm. In both cases, you need reference data - either as input or as a comparison.

As explained in the data example, reference data that is stored in .mat files can be easily loaded using the existing tooling.

In this example, we will go into more detail about common patterns of using this reference data.

Ref Data as input on a GS level#

Most algorithms (after the GS detection) expect the data of only a single GS. If you want to test such an algorithm, you need to use the GS/WB information of the reference data to cut the data accordingly. Further, you might also want to get the reference information belonging to the GS/WB.

This can be achieved using the GsIterator (or the iter_gs function).

But first, we need to load some example data.

from mobgap.data import LabExampleDataset

dataset = LabExampleDataset(reference_system="INDIP")
datapoint = dataset.get_subset(
    cohort="HA", participant_id="001", test="Test11", trial="Trial1"
)
data = datapoint.data_ss
data

	acc_x	acc_y	acc_z	gyr_x	gyr_y	gyr_z
time
2020-12-10 13:09:03.815000057+00:00	9.690482	-0.498529	-0.135459	-2.326209	2.967921	-0.916732
2020-12-10 13:09:03.825000048+00:00	9.656917	-0.489721	-0.132339	-1.993893	2.647065	-0.813600
2020-12-10 13:09:03.835000038+00:00	9.611984	-0.476001	-0.097731	-1.535527	2.188699	-0.561499
2020-12-10 13:09:03.845000029+00:00	9.624743	-0.467484	-0.144403	-0.882355	1.879302	-0.355234
2020-12-10 13:09:03.855000019+00:00	9.636681	-0.488557	-0.197406	-0.320856	1.478231	-0.595876
...	...	...	...	...	...	...
2020-12-10 13:11:21.355000019+00:00	9.605895	-0.272558	-0.233749	-2.245995	1.294885	-2.417882
2020-12-10 13:11:21.365000010+00:00	9.606560	-0.333937	-0.245102	-2.727279	0.641713	-2.910626
2020-12-10 13:11:21.375000+00:00	9.577996	-0.333360	-0.271078	-4.549285	-0.160428	-3.701307
2020-12-10 13:11:21.384999990+00:00	9.592582	-0.389587	-0.202253	-5.890006	-1.226130	-4.365938
2020-12-10 13:11:21.394999981+00:00	9.640033	-0.392071	-0.276513	-7.013003	-2.234535	-4.595122

13759 rows × 6 columns

Then we load the reference data. There are two versions available: One version where all values are provided relative to the start of the recording and one version where the values are provided relative to the start of the respective GS/WB. Below we can see the first version, here both the walking bouts and the initial contacts (and other parameters) are provided relative to the start of the recording.

ref_data = datapoint.reference_parameters_
ref_data.wb_list

	start	end	n_strides	duration_s	length_m	avg_walking_speed_mps	avg_cadence_spm	avg_stride_length_m	termination_reason
wb_id
0	632	988	5	3.55	3.428989	0.975373	104.069084	1.124391	Pause
1	2864	3325	4	4.60	1.452572	0.411857	81.296475	0.581029	Pause
2	3853	5085	16	12.31	7.044042	0.617801	89.246331	0.838960	Pause
3	7641	8621	12	9.79	4.396574	0.510108	94.370318	0.645176	Pause
4	9451	9932	6	4.80	3.545277	0.755728	88.778698	1.021695	Pause
5	11989	12517	6	5.27	3.514735	0.880632	95.832693	1.021576	Pause

ref_data.ic_list

		ic	lr_label
wb_id	step_id
0	0	632	left
	1	709	right
	2	763	left
	3	824	right
	4	876	left
...	...	...	...
5	3	12162	left
	4	12220	right
	5	12277	left
	6	12335	right
	7	12516	left

63 rows × 2 columns

However, as we want to use the reference data as input to an algorithm on a GS level, we use the version that provides values relative to the start of the GS/WB.

The start and end values of reference WB are of course still relative to the start of the recording.

ref_data_rel = datapoint.reference_parameters_relative_to_wb_
ref_walking_bouts = ref_data_rel.wb_list
ref_walking_bouts

	start	end	n_strides	duration_s	length_m	avg_walking_speed_mps	avg_cadence_spm	avg_stride_length_m	termination_reason
wb_id
0	632	988	5	3.55	3.428989	0.975373	104.069084	1.124391	Pause
1	2864	3325	4	4.60	1.452572	0.411857	81.296475	0.581029	Pause
2	3853	5085	16	12.31	7.044042	0.617801	89.246331	0.838960	Pause
3	7641	8621	12	9.79	4.396574	0.510108	94.370318	0.645176	Pause
4	9451	9932	6	4.80	3.545277	0.755728	88.778698	1.021695	Pause
5	11989	12517	6	5.27	3.514735	0.880632	95.832693	1.021576	Pause

But the ICs time-samples are now relative to the start of the respective GS/WB.

ref_ics_rel = ref_data_rel.ic_list
ref_ics_rel.loc[0]  # First WB

	ic	lr_label
step_id
0	0	left
1	77	right
2	131	left
3	192	right
4	244	left
5	301	right
6	355	left

ref_ics_rel.loc[1]  # Second WB

	ic	lr_label
step_id
0	0	right
1	71	left
2	132	right
3	194	left
4	263	right
5	460	left

Now we can use the GsIterator to iterate over the data. Check out the gs_iterator example for more information.

from mobgap.pipeline import GsIterator

gs_iterator = GsIterator()

# For most use-cases, the default configuration of the :class:`~mobgap.pipeline.GsIterator` should be sufficient.
# This allows you to specify the following results:
gs_iterator.data_type

If you want to change the default behaviour, you can create a custom dataclass (check the example linked above)

The iterator provides us the cut data and an object representing all information of the respective GS/WB. The latter can be used to index other aspects of the reference data.

for (wb, data_per_wb), result in gs_iterator.iterate(data, ref_walking_bouts):
    print("GS/WB id: ", wb.id)
    print(
        "Expected N-samples in wb: ",
        ref_walking_bouts.loc[wb.id].end - ref_walking_bouts.loc[wb.id].start,
    )
    print("N-samples in wb: ", len(data_per_wb))

    # We can use the wb.id to get the reference initial contacts that belong to this GS/WB
    ics_per_wb = ref_ics_rel.loc[wb.id]
    # These could be used in some algorithm.
    # Here we will just store them in the results.
    result.ic_list = ics_per_wb

GS/WB id:  0
Expected N-samples in wb:  356
N-samples in wb:  356
GS/WB id:  1
Expected N-samples in wb:  461
N-samples in wb:  461
GS/WB id:  2
Expected N-samples in wb:  1232
N-samples in wb:  1232
GS/WB id:  3
Expected N-samples in wb:  980
N-samples in wb:  980
GS/WB id:  4
Expected N-samples in wb:  481
N-samples in wb:  481
GS/WB id:  5
Expected N-samples in wb:  528
N-samples in wb:  528

The iterator will also conveniently aggregate the results for us. You can see that the initial contacts are now stored in a single dataframe and the values are transformed back to be relative to the start of the recording and not the GS anymore.

gs_iterator.results_.ic_list

		ic	lr_label
wb_id	step_id
0	0	632	left
	1	709	right
	2	763	left
	3	824	right
	4	876	left
...	...	...	...
5	3	12162	left
	4	12220	right
	5	12277	left
	6	12335	right
	7	12516	left

63 rows × 2 columns

Total running time of the script: (0 minutes 1.711 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery