Loading Digital Mobility Outcome (DMO) data#

Besides raw data, mobgap also has utilities to load pre-calculated DMO data in the format published by Mobilise-D. Specifically, this is the data of the TVS (=technical validation study) and the individual CVS (=clinical validation study) visits. For each CVS visit data is available as a single CSV file, which contains the DMO data for all walking bouts of all participants.

In addition, you might have access to the weartime reports, published by McRoberts. They can be loaded optionally together with the DMO data (see below).

To get started, the data should be organised as follows:

The main dmo file with a name like cvs-T1-wb-dmo-27-11-2023.csv. The important part is that the second element (separated by -) indicates the visit-id (T1, T2, …).
A file that contains the mapping from the p-id to the measurement site. This file should have at least the columns
Local.Participant and Participant.Site.
If you are planning to load the weartime reports, you need to have a folder with the individual weartime reports and a “compliance report” that contains the total weartime per day. The file should follow the naming schema CVS-wear-complicance-*.xlsx and should be placed in the same folder as the weartime reports.

If the data is organised as described above, you can load the data using the MobilisedCvsDmoDataset class. Below, we will use some example data that is included in the mobgap package containing the data from two participants.

Loading data using these classes handles a lot of common edgecases, in particular the correct handling of timezones and is hence, the recommended way to load the data.

We will only show loading the data without the weartime reports, as no example weartime reports are included in the package at the moment.

from mobgap.data import MobilisedCvsDmoDataset, get_example_cvs_dmo_data_path

example_data_base_path = get_example_cvs_dmo_data_path()
dmo_data_path = example_data_base_path / "cvs-T1-test_data.csv"
mapping_path = example_data_base_path / "cvs-T1-test_data_mapping.csv"

dataset = MobilisedCvsDmoDataset(
    dmo_path=dmo_data_path, site_pid_map_path=mapping_path
)
dataset

MobilisedCvsDmoDataset [14 groups/rows]

	visit_type	participant_id	measurement_date
0	T1	10004	2021-04-13
1	T1	10004	2021-04-14
2	T1	10004	2021-04-15
3	T1	10004	2021-04-16
4	T1	10004	2021-04-17
5	T1	10004	2021-04-18
6	T1	10004	2021-04-19
7	T1	10005	2021-04-14
8	T1	10005	2021-04-15
9	T1	10005	2021-04-16
10	T1	10005	2021-04-17
11	T1	10005	2021-04-18
12	T1	10005	2021-04-19
13	T1	10005	2021-04-20

We can access all dmo data (i.e. all individual dmos per walking bout) of the entire dataset using the following line. This might take a second, as the data is loaded from the CSV file (in particular when using the full dataset instead of the example data).

dataset.data

				wbday	duration_s	n_steps	n_turns	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s	visit_date_utc	site	timezone
visit_type	participant_id	measurement_date	wb_id
T1	10004	2021-04-13	wb_2021_4_13_0001_1	1	4.823538	8	0	90.888440	0.530627	0.774706	1.383975	2021-04-12 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_13_0002_1	1	8.039229	11	0	80.188857	0.529198	0.876706	1.634663	2021-04-12 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_13_0003_1	1	13.817425	19	1	91.352944	0.708731	1.027429	1.481877	2021-04-12 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_13_0004_1	1	13.264728	21	0	99.749524	0.668984	0.894819	1.420747	2021-04-12 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_13_0005_1	1	32.207162	37	1	79.089075	0.609849	1.029953	1.495227	2021-04-12 22:00:00+00:00	CAU	Europe/Berlin
	...	...	...	...	...	...	...	...	...	...	...	...	...	...
	10005	2021-04-20	wb_2021_4_20_0366_7	7	4.748170	6	0	87.497580	0.459190	0.694687	0.994593	2021-04-19 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_20_0367_7	7	7.360919	10	1	78.690165	0.546124	0.908011	1.445640	2021-04-19 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_20_0368_7	7	7.788003	12	0	85.342554	0.734981	1.124061	1.320190	2021-04-19 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_20_0369_7	7	9.194868	10	1	81.725056	0.752255	1.202214	1.651615	2021-04-19 22:00:00+00:00	CAU	Europe/Berlin
			wb_2021_4_20_0370_7	7	11.254921	10	7	76.918853	1.290913	2.199377	2.174182	2021-04-19 22:00:00+00:00	CAU	Europe/Berlin

4753 rows × 11 columns

We can also access a data_mask that represents potential data quality issues in the data. If the value is False the specific value of the DMO is outside expert defined thresholds. Depending on the analysis, you might want to exclude these values from the analysis. Further methods like the MobilisedAggregator allow to pass this data mask to exclude these values correctly from further analysis.

dataset.data_mask

				duration_s	n_steps	n_turns	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s
visit_type	participant_id	measurement_date	wb_id
T1	10004	2021-04-13	wb_2021_4_13_0001_1	True	True	True	True	True	True	True
			wb_2021_4_13_0002_1	True	True	True	True	True	True	True
			wb_2021_4_13_0003_1	True	True	True	True	True	True	True
			wb_2021_4_13_0004_1	True	True	True	True	True	True	True
			wb_2021_4_13_0005_1	True	True	True	True	True	True	True
	...	...	...	...	...	...	...	...	...	...
	10005	2021-04-20	wb_2021_4_20_0366_7	True	True	True	True	True	True	True
			wb_2021_4_20_0367_7	True	True	True	True	True	True	True
			wb_2021_4_20_0368_7	True	True	True	True	True	True	True
			wb_2021_4_20_0369_7	True	True	True	True	True	True	True
			wb_2021_4_20_0370_7	True	True	True	True	True	True	True

4753 rows × 7 columns

We can see in the index that each day of the recording is listed as a separate entry in the dataset index and hence can be easily accessed individually.

dataset

MobilisedCvsDmoDataset [14 groups/rows]

	visit_type	participant_id	measurement_date
0	T1	10004	2021-04-13
1	T1	10004	2021-04-14
2	T1	10004	2021-04-15
3	T1	10004	2021-04-16
4	T1	10004	2021-04-17
5	T1	10004	2021-04-18
6	T1	10004	2021-04-19
7	T1	10005	2021-04-14
8	T1	10005	2021-04-15
9	T1	10005	2021-04-16
10	T1	10005	2021-04-17
11	T1	10005	2021-04-18
12	T1	10005	2021-04-19
13	T1	10005	2021-04-20

single_participant = dataset.get_subset(participant_id="10004")
single_participant

MobilisedCvsDmoDataset [7 groups/rows]

	visit_type	participant_id	measurement_date
0	T1	10004	2021-04-13
1	T1	10004	2021-04-14
2	T1	10004	2021-04-15
3	T1	10004	2021-04-16
4	T1	10004	2021-04-17
5	T1	10004	2021-04-18
6	T1	10004	2021-04-19

This allows to access the measurement site and timezone of the participant. Note, that this is usually not that important, as the class handles timezone conversions internally and provides all time values (e.g. the start of a walking bout) in the local time of the measurement site.

single_participant.measurement_site

'CAU'

single_participant.timezone

'Europe/Berlin'

Total running time of the script: (0 minutes 1.309 seconds)

Estimated memory usage: 10 MB

Gallery generated by Sphinx-Gallery