The Mobilise-D pipeline: Step-by-Step Breakdown#

This example shows how to build a full gait analysis pipeline using the mobgap package. Note, that we provide pre-built pipelines for common use-cases. Checkout the examples for those, if you want to understand how to use them.

This example is meant to provide a better understanding of the individual steps and should serve as a blueprint to build completely custom pipelines. We are following the pipeline explained in [1] (Fig. 7) step by step.

For more information about the individual steps, please refer to the respective examples for the algorithms.

Load example data#

We load example data from the lab dataset, and we will use a single long-trail from an “MS” participant for this example.

Note, that we directly convert it into the body frame, as bascially all algorithms require the body frame data (and all others support it). We can do that, because we know that the data is already well aligned with the sensor frame conventions. If this would not be the case, the data would need to be rotated/algigned before running through the pipeline.

import pandas as pd
from mobgap.data import LabExampleDataset
from mobgap.utils.conversions import to_body_frame
from mobgap.utils.interpolation import naive_sec_paras_to_regions

lab_example_data = LabExampleDataset(reference_system="INDIP")
long_trial = lab_example_data.get_subset(
    cohort="MS", participant_id="001", test="Test11", trial="Trial1"
)
imu_data = to_body_frame(long_trial.data_ss)
sampling_rate_hz = long_trial.sampling_rate_hz
participant_metadata = long_trial.participant_metadata

Step 1: Gait Sequence Detection#

start end
gs_id
0 750 1651
1 4650 6151
2 12900 14851
3 20100 21151
4 21300 22501


Starting from here, all the processing will happen per gait sequence. We will go through the steps just for a single gait sequence first and later put everything in a loop.

Step 2: Initial Contact Detection#

ic
step_id
0 45
1 129
2 235
3 308
4 366
5 421
6 476
7 531
8 590
9 646
10 710
11 771
12 847


Step 2.5: Laterality Detection#

For each IC we want to detect the laterality.

Gait Sequence Refinement#

After detecting the ICs within the gait sequence, we can refine the gait sequence using the ICs. Basically, we restrict the area of the gait sequence to the area between the first and the last IC. This should ensure that the subsequent steps are only getting data that contains detectable gait.

Step 3: Cadence Calculation#

cadence_spm
sec_center_samples
50 63.157895
150 82.191781
250 103.448276
350 109.090909
450 105.263158
550 107.142857
650 96.000000
750 78.947368
850 78.947368


Step 4: Stride Length Calculation#

stride_length_m
sec_center_samples
50 0.739409
150 0.942794
250 1.156495
350 1.180773
450 1.098688
550 1.031297
650 0.931708
750 0.713847
850 0.713847


Step 5: Walking Speed Calculation#

Finally, we can calculate the walking speed. This could be done by another sophisticated algorithm that uses the raw data to estimate the walking speed. However, in the Mobilise-D pipeline we opted to base the walking speed calculation on the cadence and stride length to avoid adding a walking speed that would not be coherent with the other results.

To allow to modify this in the future and to allow for different walking speed calculation algorithms, we still encapsulate the walking speed calculation in a separate class that takes all the previous results as input.

walking_speed_mps
sec_center_samples
50 0.389163
150 0.645749
250 0.996979
350 1.073430
450 0.963761
550 0.920801
650 0.745366
750 0.469636
850 0.469636


Step 6: Turn Detection#

Independent of all other parameters, we detect turns in the gait sequence. This does not influence the calculation of the other parameters, and is only used to gather the n_turns parameter. Note, that the turning detection is usually done on the non-refined gs. But this shouldn’t really matter.

Warning

The turning algorithm is not evaluated. If you are planning to use detailed turning information, you should perform your own evaluation for your specific use-case.

start end duration_s angle_deg direction
turn_id
0 552 901 3.49 -70.0964 right


After going through the steps for a single gait sequence, we would then put all the data together to calculate the final results per WB. But let’s first put all the processing into an easy-to-read loop.

Actual Pipeline#

We first define all the algorithms we want to use.

from mobgap.cadence import CadFromIc
from mobgap.gait_sequences import GsdIluz
from mobgap.initial_contacts import IcdShinImproved, refine_gs
from mobgap.laterality import LrcUllrich
from mobgap.stride_length import SlZijlstra
from mobgap.turning import TdElGohary
from mobgap.walking_speed import WsNaive

gsd = GsdIluz()
icd = IcdShinImproved()
lrc = LrcUllrich()
cad = CadFromIc()
sl = SlZijlstra()
speed = WsNaive()
turn = TdElGohary()

Then we calculate the gait sequences as before.

Note that some of the algorithms might need the participant metadata. Hence, we pass it as keyword argument to all the algorithms.

Then we use a nested iterator to go through all the gait sequences and process them. To learn more about this iterator, check out the example about the Gait Sequence Iterator. Note, that we use the special r object to store the results of each step and the subregion method to elegantly handle the refined gait sequence.

Now we can access all accumulated and offset-corrected results from the iterator.

ic lr_label
gs_id step_id
0 0 795 left
1 879 left
2 985 left
3 1058 left
4 1116 right
... ... ... ...
4 10 22094 left
11 22219 right
12 22313 right
13 22348 left
14 22436 left

94 rows × 2 columns



We combine all per-sec results into one.

Note, that we remove the r_gs_id index, as we don’t need it anymore and each normal gs is mapped to a single refined gs anyway. In case we would have multiple refined gs per normal gs, we might need to keep the r_gs_id index around.

cadence_spm stride_length_m walking_speed_mps
gs_id sec_center_samples
0 845 63.157895 0.739409 0.389163
945 82.191781 0.942794 0.645749
1045 103.448276 1.156495 0.996979
1145 109.090909 1.180773 1.073430
1245 105.263158 1.098688 0.963761
... ... ... ... ...
4 22016 63.829787 0.974711 0.518463
22116 63.829787 1.045141 0.555926
22216 63.829787 1.076485 0.572598
22316 68.181818 0.611750 0.347585
22416 68.181818 0.783434 0.445133

63 rows × 3 columns



Using the combined results, we want to define walking bouts. As walking bouts in the context of Mobilise-D are defined based on strides, we need to turn the ICs into strides and the per-second values into per-stride values by using interpolation. We also calculate the stride duration here.

from mobgap.laterality import strides_list_from_ic_lr_list

stride_list = (
    results.ic_list.groupby("gs_id", group_keys=False)
    .apply(strides_list_from_ic_lr_list)
    .assign(
        stride_duration_s=lambda df_: (df_.end - df_.start) / sampling_rate_hz
    )
)
stride_list
start end lr_label stride_duration_s
gs_id s_id
0 0 795 879 left 0.84
1 879 985 left 1.06
2 985 1058 left 0.73
3 1058 1171 left 1.13
4 1116 1226 right 1.10
... ... ... ... ... ...
4 8 21920 22219 right 2.99
9 22017 22094 left 0.77
10 22094 22348 left 2.54
11 22219 22313 right 0.94
13 22348 22436 left 0.88

84 rows × 4 columns



This initial stride list is completely unfiltered, and might contain very long strides, in areas where initial contacts were not detected, or the participant was not walking for a short moment. The stride list will be filtered later as part of the WB assembly.

For now, we are using linear interpolation to map the per-second cadence values to per-stride values and derive approximated stride parameters.

from mobgap.utils.df_operations import create_multi_groupby

stride_list_with_approx_paras = create_multi_groupby(
    stride_list,
    combined_results,
    "gs_id",
    group_keys=False,
).apply(naive_sec_paras_to_regions, sampling_rate_hz=sampling_rate_hz)

stride_list_with_approx_paras
start end lr_label stride_duration_s cadence_spm stride_length_m walking_speed_mps
gs_id s_id
0 0 795 879 left 0.84 63.157895 0.739409 0.389163
1 879 985 left 1.06 79.318741 0.912094 0.607019
2 985 1058 left 0.73 100.536427 1.127221 0.948865
3 1058 1171 left 1.13 107.243321 1.172824 1.048397
4 1116 1226 right 1.10 108.012179 1.157640 1.042524
... ... ... ... ... ... ... ... ...
4 8 21920 22219 right 2.99 63.526077 1.015147 0.537524
9 22017 22094 left 0.77 63.829787 1.000322 0.532086
10 22094 22348 left 2.54 65.234774 0.917567 0.495230
11 22219 22313 right 0.94 66.005803 0.844118 0.460092
13 22348 22436 left 0.88 68.181818 0.748316 0.425180

84 rows × 7 columns



Now the final strides are regrouped into walking bouts. For this we ignore which gait sequence the strides belong to, hence we remove the gs_id from the index, but keep it around as column for debugging.

from mobgap.wba import StrideSelection, WbAssembly

flat_index = pd.Index(
    [
        "_".join(str(e) for e in s_id)
        for s_id in stride_list_with_approx_paras.index
    ],
    name="s_id",
)
stride_list_with_approx_paras = (
    stride_list_with_approx_paras.reset_index("gs_id")
    .rename(columns={"gs_id": "original_gs_id"})
    .set_index(flat_index)
)

Then we apply the stride selection (note that we have additional rules in case the stride length is available) and then group the remaining strides into walking bouts.

original_gs_id start end lr_label stride_duration_s cadence_spm stride_length_m walking_speed_mps
wb_id s_id
0 0_0 0 795 879 left 0.84 63.157895 0.739409 0.389163
0_1 0 879 985 left 1.06 79.318741 0.912094 0.607019
0_2 0 985 1058 left 0.73 100.536427 1.127221 0.948865
0_3 0 1058 1171 left 1.13 107.243321 1.172824 1.048397
0_4 0 1116 1226 right 1.10 108.012179 1.157640 1.042524
... ... ... ... ... ... ... ... ... ...
3 4_8 4 21920 22219 right 2.99 63.526077 1.015147 0.537524
4_9 4 22017 22094 left 0.77 63.829787 1.000322 0.532086
4_10 4 22094 22348 left 2.54 65.234774 0.917567 0.495230
4_11 4 22219 22313 right 0.94 66.005803 0.844118 0.460092
4_13 4 22348 22436 left 0.88 68.181818 0.748316 0.425180

83 rows × 8 columns



We also have meta information about the WBs available.

per_wb_params = wba.wb_meta_parameters_
per_wb_params.drop(columns="rule_obj").T
wb_id 0 1 2 3
start 795 4668 12944 20161
end 1597 6049 14819 22436
n_strides 11 17 30 25
rule_name max_break max_break max_break max_break
duration_s 8.02 13.81 18.75 22.75


We extend them further with the per-stride parameters.

params_to_aggregate = [
    "stride_duration_s",
    "cadence_spm",
    "stride_length_m",
    "walking_speed_mps",
]
per_wb_params = pd.concat(
    [
        per_wb_params,
        final_strides.reindex(columns=params_to_aggregate)
        .groupby(["wb_id"])
        .mean(),
    ],
    axis=1,
)

per_wb_params.drop(columns="rule_obj").T
wb_id 0 1 2 3
start 795 4668 12944 20161
end 1597 6049 14819 22436
n_strides 11 17 30 25
rule_name max_break max_break max_break max_break
duration_s 8.02 13.81 18.75 22.75
stride_duration_s 0.983636 1.510588 1.213 1.3508
cadence_spm 95.088665 81.602047 100.088146 86.501758
stride_length_m 0.987869 0.816786 0.853348 0.872176
walking_speed_mps 0.801656 0.537168 0.714887 0.622995


For each WB we can then apply thresholds to check if the calculated parameters are within the expected range.

from mobgap.aggregation import apply_thresholds, get_mobilised_dmo_thresholds

thresholds = get_mobilised_dmo_thresholds()

per_wb_params_mask = apply_thresholds(
    per_wb_params,
    thresholds,
    cohort=long_trial.participant_metadata["cohort"],
    height_m=long_trial.participant_metadata["height_m"],
    measurement_condition=long_trial.recording_metadata[
        "measurement_condition"
    ],
)
per_wb_params_mask.T
wb_id 0 1 2 3
start NaN NaN NaN NaN
end NaN NaN NaN NaN
n_strides NaN NaN NaN NaN
rule_name NaN NaN NaN NaN
rule_obj NaN NaN NaN NaN
duration_s NaN NaN NaN NaN
stride_duration_s True True True True
cadence_spm True True True True
stride_length_m True True True True
walking_speed_mps True True True True


We can see that we either get NaN (for parameters that are not checked) or True/False values for each parameter.

This output together with the per-WB parameters would then normally be used in some aggregation step to calculate single values per participant, day, or other grouping criteria. Depending on the use-case, this aggregation can be performed withing the “per-recording” pipeline or as a separate step after processing all recordings.

Here, we perform it per recording and calculate a single values from all the WBs.

from mobgap.aggregation import MobilisedAggregator

agg = MobilisedAggregator(
    **MobilisedAggregator.PredefinedParameters.single_recording
)
agg_results = agg.aggregate(
    per_wb_params, wb_dmos_mask=per_wb_params_mask
).aggregated_data_
agg_results.T
all_wbs
wb_all__count 4
total_walking_duration_h 0.017592
wb_all__duration_s__avg 16.28
wb_all__duration_s__max 21.55
wb_all__duration_s__var 0.401938
wb_all__cadence_spm__avg 90.820154
wb_all__stride_duration_s__avg 1.264506
wb_all__cadence_spm__var 0.091625
wb_all__stride_duration_s__var 0.176564
wb_10_30__count 3
wb_10_30__walking_speed_mps__avg 0.625017
wb_10_30__stride_length_m__avg 0.847437
wb_10__count 3
wb_10__walking_speed_mps__max 0.696509
wb_30__count 0
wb_30__walking_speed_mps__avg NaN
wb_30__stride_length_m__avg NaN
wb_30__cadence_spm__avg NaN
wb_30__stride_duration_s__avg NaN
wb_30__walking_speed_mps__max NaN
wb_30__cadence_spm__max NaN
wb_30__walking_speed_mps__var NaN
wb_30__stride_length_m__var NaN
wb_60__count 0


Running as a single pipeline#

The steps that are shown above, are exactly the steps that are performed in the Mobilise-D pipeline. Hence, we can also use the pre-built pipeline to perform the same steps. This is shown below.

from mobgap.pipeline import GenericMobilisedPipeline

pipeline = GenericMobilisedPipeline(
    gait_sequence_detection=gsd,
    initial_contact_detection=icd,
    laterality_classification=lrc,
    cadence_calculation=cad,
    stride_length_calculation=sl,
    turn_detection=turn,
    walking_speed_calculation=ws,
    stride_selection=ss,
    wba=wba,
    dmo_thresholds=thresholds,
    dmo_aggregation=agg,
)

pipeline.safe_run(long_trial)
GenericMobilisedPipeline(cadence_calculation=CadFromIc(max_interpolation_gap_s=3, step_time_smoothing=HampelFilter(half_window_size=2, n_sigmas=3.0)), dmo_aggregation=MobilisedAggregator(groupby=None, unique_wb_id_column='wb_id', use_original_names=False), dmo_thresholds=condition                free_living              ...     global
threshold_type                   min         max  ...        min      max
dmo               cohort                          ...
cadence_spm       CHF      40.594770  167.942930  ...  40.000000  172.900
                  COPD     38.880484  151.558718  ...  40.000000  172.900
                  HA       35.751898  156.946955  ...  40.000000  172.900
                  MS       38.443211  155.394496  ...  40.000000  172.900
                  PD       40.957969  142.121947  ...  40.000000  172.900
                  PFF      42.191719  157.613665  ...  40.000000  172.900
walking_speed_mps CHF       0.112478    1.757103  ...   0.081515    2.220
                  COPD      0.090731    1.641773  ...   0.081515    2.220
                  HA        0.097413    1.965728  ...   0.081515    2.220
                  MS        0.085918    1.920262  ...   0.081515    2.220
                  PD        0.081515    1.735348  ...   0.081515    2.220
                  PFF       0.104547    1.553186  ...   0.081515    2.220
stride_length_m   CHF       0.185308    2.166646  ...   0.150523    2.190
                  COPD      0.176403    1.711645  ...   0.150523    2.190
                  HA        0.155126    2.024694  ...   0.150523    2.190
                  MS        0.191099    1.940537  ...   0.150523    2.190
                  PD        0.150523    1.982926  ...   0.150523    2.190
                  PFF       0.206787    1.697251  ...   0.150523    2.190
stride_duration_s CHF       0.702000    3.030857  ...   0.460000    3.000
                  COPD      0.770000    3.000000  ...   0.460000    3.000
                  HA        0.735435    3.254400  ...   0.460000    3.000
                  MS        0.775597    3.057750  ...   0.460000    3.000
                  PD        0.836253    2.928000  ...   0.460000    3.000
                  PFF       0.744000    2.769000  ...   0.460000    3.000
step_duration_s   CHF       0.376000    1.504500  ...   0.140000    2.124
                  COPD      0.390400    1.788000  ...   0.140000    2.124
                  HA        0.367972    1.730400  ...   0.140000    2.124
                  MS        0.388084    1.905000  ...   0.140000    2.124
                  PD        0.417216    1.605600  ...   0.140000    2.124
                  PFF       0.371429    2.124000  ...   0.140000    2.124

[30 rows x 8 columns], gait_sequence_detection=GsdIluz(acc_v_standing_threshold=4.903325, allowed_acc_v_change_per_window=0.15, allowed_steps_per_s=(0.5, 3), mean_activity_threshold=-0.980665, min_gsd_duration_s=5, pre_filter=FirFilter(cutoff_freq_hz=(0.5, 3), filter_type='bandpass', order=200, window='hamming', zero_phase=True), sin_template_freq_hz=2, std_activity_threshold=0.0980665, step_detection_thresholds=(3.92266, 14.709975), window_length_s=3, window_overlap=0.5), initial_contact_detection=IcdShinImproved(axis='norm'), laterality_classification=LrcUllrich(clf_pipe=Pipeline(steps=[('scaler_old', MinMaxScaler()),
                ('clf_old', SVC(C=0.1, kernel='linear'))]), smoothing_filter=ButterworthFilter(cutoff_freq_hz=(0.5, 2), filter_type='bandpass', order=4, zero_phase=True)), recommended_cohorts=None, stride_length_calculation=SlZijlstra(acc_smoothing=ButterworthFilter(cutoff_freq_hz=0.1, filter_type='highpass', order=4, zero_phase=True), max_interpolation_gap_s=3, orientation_method=None, speed_smoothing=ButterworthFilter(cutoff_freq_hz=1, filter_type='highpass', order=4, zero_phase=True), step_length_scaling_factor=1.14675, step_length_smoothing=HampelFilter(half_window_size=2, n_sigmas=3.0)), stride_selection=StrideSelection(incompatible_rules='warn', rules=[('stride_duration_thres', IntervalDurationCriteria(inclusive=(False, True), max_duration_s=3.0, min_duration_s=0.2)), ('stride_length_thres', IntervalParameterCriteria(inclusive=(False, True), lower_threshold=0.15, parameter='stride_length_m', upper_threshold=None))]), turn_detection=TdElGohary(allowed_turn_angle_deg=(45, inf), allowed_turn_duration_s=(0.5, 10), lower_threshold_velocity_dps=5, min_gap_between_turns_s=0.05, min_peak_angle_velocity_dps=15, orientation_estimation=None, smoothing_filter=ButterworthFilter(cutoff_freq_hz=0.5, filter_type='lowpass', order=4, zero_phase=True)), walking_speed_calculation=WsNaive(), wba=WbAssembly(rules=[('min_strides', NStridesCriteria(min_strides=4, min_strides_left=3, min_strides_right=3)), ('max_break', MaxBreakCriteria(consider_end_as_break=True, max_break_s=3, remove_last_ic=False))]))

The results are stored in the pipeline object. And basically all the individual results that are shown above are also available in the pipeline object.

For example the per stride parameters:

start end lr_label stride_duration_s cadence_spm stride_length_m walking_speed_mps
gs_id s_id
0 0 795 879 left 0.84 63.157895 0.739409 0.389163
1 879 985 left 1.06 79.318741 0.912094 0.607019
2 985 1058 left 0.73 100.536427 1.127221 0.948865
3 1058 1171 left 1.13 107.243321 1.172824 1.048397
4 1116 1226 right 1.10 108.012179 1.157640 1.042524
... ... ... ... ... ... ... ... ...
4 8 21920 22219 right 2.99 63.526077 1.015147 0.537524
9 22017 22094 left 0.77 63.829787 1.000322 0.532086
10 22094 22348 left 2.54 65.234774 0.917567 0.495230
11 22219 22313 right 0.94 66.005803 0.844118 0.460092
13 22348 22436 left 0.88 68.181818 0.748316 0.425180

84 rows × 7 columns



The per-wb parameters:

start end n_strides rule_name rule_obj duration_s stride_duration_s cadence_spm stride_length_m walking_speed_mps
wb_id
0 795 1597 11 max_break MaxBreakCriteria(consider_end_as_break=True, m... 8.02 0.983636 95.088665 0.987869 0.801656
1 4668 6049 17 max_break MaxBreakCriteria(consider_end_as_break=True, m... 13.81 1.510588 81.602047 0.816786 0.537168
2 12944 14819 30 max_break MaxBreakCriteria(consider_end_as_break=True, m... 18.75 1.213000 100.088146 0.853348 0.714887
3 20161 22436 25 max_break MaxBreakCriteria(consider_end_as_break=True, m... 22.75 1.350800 86.501758 0.872176 0.622995


And the aggregated parameters:

wb_all__count total_walking_duration_h wb_all__duration_s__avg wb_all__duration_s__max wb_all__duration_s__var wb_all__cadence_spm__avg wb_all__stride_duration_s__avg wb_all__cadence_spm__var wb_all__stride_duration_s__var wb_10_30__count wb_10_30__walking_speed_mps__avg wb_10_30__stride_length_m__avg wb_10__count wb_10__walking_speed_mps__max wb_30__count wb_30__walking_speed_mps__avg wb_30__stride_length_m__avg wb_30__cadence_spm__avg wb_30__stride_duration_s__avg wb_30__walking_speed_mps__max wb_30__cadence_spm__max wb_30__walking_speed_mps__var wb_30__stride_length_m__var wb_60__count
all_wbs 4 0.017592 16.28 21.55 0.401938 90.820154 1.264506 0.091625 0.176564 3 0.625017 0.847437 3 0.696509 0 NaN NaN NaN NaN NaN NaN NaN NaN 0


Total running time of the script: (0 minutes 6.502 seconds)

Estimated memory usage: 10 MB

Gallery generated by Sphinx-Gallery