The Mobilise-D pipeline: Step-by-Step Breakdown#

This example shows how to build a full gait analysis pipeline using the mobgap package. Note, that we provide pre-built pipelines for common use-cases. Checkout the examples for those, if you want to understand how to use them.

This example is meant to provide a better understanding of the individual steps and should serve as a blueprint to build completely custom pipelines. We are following the pipeline explained in [1] (Fig. 7) step by step.

For more information about the individual steps, please refer to the respective examples for the algorithms.

Load example data#

We load example data from the lab dataset, and we will use a single long-trail from an “MS” participant for this example.

Note, that we directly convert it into the body frame, as bascially all algorithms require the body frame data (and all others support it). We can do that, because we know that the data is already well aligned with the sensor frame conventions. If this would not be the case, the data would need to be rotated/algigned before running through the pipeline.

import pandas as pd
from mobgap.data import LabExampleDataset
from mobgap.utils.conversions import to_body_frame
from mobgap.utils.interpolation import naive_sec_paras_to_regions

lab_example_data = LabExampleDataset(reference_system="INDIP")
long_trial = lab_example_data.get_subset(
    cohort="MS", participant_id="001", test="Test11", trial="Trial1"
)
imu_data = to_body_frame(long_trial.data_ss)
sampling_rate_hz = long_trial.sampling_rate_hz
participant_metadata = long_trial.participant_metadata

Step 1: Gait Sequence Detection#



Starting from here, all the processing will happen per gait sequence. We will go through the steps just for a single gait sequence first and later put everything in a loop.

Step 2: Initial Contact Detection#



Step 2.5: Laterality Detection#

For each IC we want to detect the laterality.

Gait Sequence Refinement#

After detecting the ICs within the gait sequence, we can refine the gait sequence using the ICs. Basically, we restrict the area of the gait sequence to the area between the first and the last IC. This should ensure that the subsequent steps are only getting data that contains detectable gait.

Step 3: Cadence Calculation#



Step 4: Stride Length Calculation#



Step 5: Walking Speed Calculation#

Finally, we can calculate the walking speed. This could be done by another sophisticated algorithm that uses the raw data to estimate the walking speed. However, in the Mobilise-D pipeline we opted to base the walking speed calculation on the cadence and stride length to avoid adding a walking speed that would not be coherent with the other results.

To allow to modify this in the future and to allow for different walking speed calculation algorithms, we still encapsulate the walking speed calculation in a separate class that takes all the previous results as input.



Step 6: Turn Detection#

Independent of all other parameters, we detect turns in the gait sequence. This does not influence the calculation of the other parameters, and is only used to gather the n_turns parameter. Note, that the turning detection is usually done on the non-refined gs. But this shouldn’t really matter.

Warning

The turning algorithm is not evaluated. If you are planning to use detailed turning information, you should perform your own evaluation for your specific use-case.



After going through the steps for a single gait sequence, we would then put all the data together to calculate the final results per WB. But let’s first put all the processing into an easy-to-read loop.

Actual Pipeline#

We first define all the algorithms we want to use.

from mobgap.cadence import CadFromIc
from mobgap.gait_sequences import GsdIluz
from mobgap.initial_contacts import IcdShinImproved, refine_gs
from mobgap.laterality import LrcUllrich
from mobgap.stride_length import SlZijlstra
from mobgap.turning import TdElGohary
from mobgap.walking_speed import WsNaive

gsd = GsdIluz()
icd = IcdShinImproved()
lrc = LrcUllrich()
cad = CadFromIc()
sl = SlZijlstra()
speed = WsNaive()
turn = TdElGohary()

Then we calculate the gait sequences as before.

Note that some of the algorithms might need the participant metadata. Hence, we pass it as keyword argument to all the algorithms.

Then we use a nested iterator to go through all the gait sequences and process them. To learn more about this iterator, check out the example about the Gait Sequence Iterator. Note, that we use the special r object to store the results of each step and the subregion method to elegantly handle the refined gait sequence.

Now we can access all accumulated and offset-corrected results from the iterator.



We combine all per-sec results into one.

Note, that we remove the r_gs_id index, as we don’t need it anymore and each normal gs is mapped to a single refined gs anyway. In case we would have multiple refined gs per normal gs, we might need to keep the r_gs_id index around.



Using the combined results, we want to define walking bouts. As walking bouts in the context of Mobilise-D are defined based on strides, we need to turn the ICs into strides and the per-second values into per-stride values by using interpolation. We also calculate the stride duration here.

from mobgap.laterality import strides_list_from_ic_lr_list

stride_list = (
    results.ic_list.groupby("gs_id", group_keys=False)
    .apply(strides_list_from_ic_lr_list)
    .assign(
        stride_duration_s=lambda df_: (df_.end - df_.start) / sampling_rate_hz
    )
)
stride_list


This initial stride list is completely unfiltered, and might contain very long strides, in areas where initial contacts were not detected, or the participant was not walking for a short moment. The stride list will be filtered later as part of the WB assembly.

For now, we are using linear interpolation to map the per-second cadence values to per-stride values and derive approximated stride parameters.

from mobgap.utils.df_operations import create_multi_groupby

stride_list_with_approx_paras = create_multi_groupby(
    stride_list,
    combined_results,
    "gs_id",
    group_keys=False,
).apply(naive_sec_paras_to_regions, sampling_rate_hz=sampling_rate_hz)

stride_list_with_approx_paras


Now the final strides are regrouped into walking bouts. For this we ignore which gait sequence the strides belong to, hence we remove the gs_id from the index, but keep it around as column for debugging.

from mobgap.wba import StrideSelection, WbAssembly

flat_index = pd.Index(
    [
        "_".join(str(e) for e in s_id)
        for s_id in stride_list_with_approx_paras.index
    ],
    name="s_id",
)
stride_list_with_approx_paras = (
    stride_list_with_approx_paras.reset_index("gs_id")
    .rename(columns={"gs_id": "original_gs_id"})
    .set_index(flat_index)
)

Then we apply the stride selection (note that we have additional rules in case the stride length is available) and then group the remaining strides into walking bouts.



We also have meta information about the WBs available.

per_wb_params = wba.wb_meta_parameters_
per_wb_params.drop(columns="rule_obj").T


We extend them further with the per-stride parameters.

params_to_aggregate = [
    "stride_duration_s",
    "cadence_spm",
    "stride_length_m",
    "walking_speed_mps",
]
per_wb_params = pd.concat(
    [
        per_wb_params,
        final_strides.reindex(columns=params_to_aggregate)
        .groupby(["wb_id"])
        .mean(),
    ],
    axis=1,
)

per_wb_params.drop(columns="rule_obj").T


For each WB we can then apply thresholds to check if the calculated parameters are within the expected range.

from mobgap.aggregation import apply_thresholds, get_mobilised_dmo_thresholds

thresholds = get_mobilised_dmo_thresholds()

per_wb_params_mask = apply_thresholds(
    per_wb_params,
    thresholds,
    cohort=long_trial.participant_metadata["cohort"],
    height_m=long_trial.participant_metadata["height_m"],
    measurement_condition=long_trial.recording_metadata[
        "measurement_condition"
    ],
)
per_wb_params_mask.T


We can see that we either get NaN (for parameters that are not checked) or True/False values for each parameter.

This output together with the per-WB parameters would then normally be used in some aggregation step to calculate single values per participant, day, or other grouping criteria. Depending on the use-case, this aggregation can be performed withing the “per-recording” pipeline or as a separate step after processing all recordings.

Here, we perform it per recording and calculate a single values from all the WBs.

from mobgap.aggregation import MobilisedAggregator

agg = MobilisedAggregator(
    **MobilisedAggregator.PredefinedParameters.single_recording
)
agg_results = agg.aggregate(
    per_wb_params, wb_dmos_mask=per_wb_params_mask
).aggregated_data_
agg_results.T


Running as a single pipeline#

The steps that are shown above, are exactly the steps that are performed in the Mobilise-D pipeline. Hence, we can also use the pre-built pipeline to perform the same steps. This is shown below.

from mobgap.pipeline import GenericMobilisedPipeline

pipeline = GenericMobilisedPipeline(
    gait_sequence_detection=gsd,
    initial_contact_detection=icd,
    laterality_classification=lrc,
    cadence_calculation=cad,
    stride_length_calculation=sl,
    turn_detection=turn,
    walking_speed_calculation=ws,
    stride_selection=ss,
    wba=wba,
    dmo_thresholds=thresholds,
    dmo_aggregation=agg,
)

pipeline.safe_run(long_trial)
GenericMobilisedPipeline(cadence_calculation=CadFromIc(max_interpolation_gap_s=3, step_time_smoothing=HampelFilter(half_window_size=2, n_sigmas=3.0)), dmo_aggregation=MobilisedAggregator(groupby=None, unique_wb_id_column='wb_id', use_original_names=False), dmo_thresholds=condition                free_living              ...     global
threshold_type                   min         max  ...        min      max
dmo               cohort                          ...
cadence_spm       CHF      40.594770  167.942930  ...  40.000000  172.900
                  COPD     38.880484  151.558718  ...  40.000000  172.900
                  HA       35.751898  156.946955  ...  40.000000  172.900
                  MS       38.443211  155.394496  ...  40.000000  172.900
                  PD       40.957969  142.121947  ...  40.000000  172.900
                  PFF      42.191719  157.613665  ...  40.000000  172.900
walking_speed_mps CHF       0.112478    1.757103  ...   0.081515    2.220
                  COPD      0.090731    1.641773  ...   0.081515    2.220
                  HA        0.097413    1.965728  ...   0.081515    2.220
                  MS        0.085918    1.920262  ...   0.081515    2.220
                  PD        0.081515    1.735348  ...   0.081515    2.220
                  PFF       0.104547    1.553186  ...   0.081515    2.220
stride_length_m   CHF       0.185308    2.166646  ...   0.150523    2.190
                  COPD      0.176403    1.711645  ...   0.150523    2.190
                  HA        0.155126    2.024694  ...   0.150523    2.190
                  MS        0.191099    1.940537  ...   0.150523    2.190
                  PD        0.150523    1.982926  ...   0.150523    2.190
                  PFF       0.206787    1.697251  ...   0.150523    2.190
stride_duration_s CHF       0.702000    3.030857  ...   0.460000    3.000
                  COPD      0.770000    3.000000  ...   0.460000    3.000
                  HA        0.735435    3.254400  ...   0.460000    3.000
                  MS        0.775597    3.057750  ...   0.460000    3.000
                  PD        0.836253    2.928000  ...   0.460000    3.000
                  PFF       0.744000    2.769000  ...   0.460000    3.000
step_duration_s   CHF       0.376000    1.504500  ...   0.140000    2.124
                  COPD      0.390400    1.788000  ...   0.140000    2.124
                  HA        0.367972    1.730400  ...   0.140000    2.124
                  MS        0.388084    1.905000  ...   0.140000    2.124
                  PD        0.417216    1.605600  ...   0.140000    2.124
                  PFF       0.371429    2.124000  ...   0.140000    2.124

[30 rows x 8 columns], gait_sequence_detection=GsdIluz(acc_v_standing_threshold=4.903325, allowed_acc_v_change_per_window=0.15, allowed_steps_per_s=(0.6666666666666666, 3), mean_activity_threshold=-0.980665, min_gsd_duration_s=5, pre_filter=FirFilter(cutoff_freq_hz=(0.5, 3), filter_type='bandpass', order=100, window='hamming', zero_phase=True), sin_template_freq_hz=2, std_activity_threshold=0.0980665, step_detection_thresholds=(0.0392266, 0.04903325), use_original_peak_detection=False, window_length_s=3, window_overlap=0.5), initial_contact_detection=IcdShinImproved(axis='norm'), laterality_classification=LrcUllrich(clf_pipe=Pipeline(steps=[('scaler', MinMaxScaler()), ('clf', SVC(C=1, kernel='linear'))]), smoothing_filter=ButterworthFilter(cutoff_freq_hz=(0.5, 2), filter_type='bandpass', order=4, zero_phase=True)), recommended_cohorts=None, stride_length_calculation=SlZijlstra(acc_smoothing=ButterworthFilter(cutoff_freq_hz=0.1, filter_type='highpass', order=4, zero_phase=True), max_interpolation_gap_s=3, orientation_method=None, speed_smoothing=ButterworthFilter(cutoff_freq_hz=1, filter_type='highpass', order=4, zero_phase=True), step_length_scaling_factor=1.14675, step_length_smoothing=HampelFilter(half_window_size=2, n_sigmas=3.0)), stride_selection=StrideSelection(incompatible_rules='warn', rules=[('stride_duration_thres', IntervalDurationCriteria(inclusive=(False, True), max_duration_s=3.0, min_duration_s=0.2)), ('stride_length_thres', IntervalParameterCriteria(inclusive=(False, True), lower_threshold=0.15, parameter='stride_length_m', upper_threshold=None))]), turn_detection=TdElGohary(allowed_turn_angle_deg=(45, inf), allowed_turn_duration_s=(0.5, 10), lower_threshold_velocity_dps=5, min_gap_between_turns_s=0.05, min_peak_angle_velocity_dps=15, orientation_estimation=None, smoothing_filter=ButterworthFilter(cutoff_freq_hz=0.5, filter_type='lowpass', order=4, zero_phase=True)), walking_speed_calculation=WsNaive(), wba=WbAssembly(rules=[('min_strides', NStridesCriteria(min_strides=4, min_strides_left=3, min_strides_right=3)), ('max_break', MaxBreakCriteria(consider_end_as_break=True, max_break_s=3, remove_last_ic=False))]))

The results are stored in the pipeline object. And basically all the individual results that are shown above are also available in the pipeline object.

For example the per stride parameters:



The per-wb parameters:



And the aggregated parameters:



Total running time of the script: (0 minutes 6.052 seconds)

Estimated memory usage: 81 MB

Gallery generated by Sphinx-Gallery