LabExampleDataset#

class mobgap.data.LabExampleDataset( *, raw_data_sensor: Literal['SU', 'INDIP', 'INDIP2'] = 'SU', reference_system: Literal['INDIP', 'Stereophoto'] | None = None, reference_para_level: Literal['wb', 'lwb'] = 'wb', sensor_positions: Sequence[str] = ('LowerBack',), single_sensor_position: str = 'LowerBack', sensor_types: Sequence[Literal['acc', 'gyr', 'mag', 'bar']] = ('acc', 'gyr'), missing_sensor_error_type: Literal['raise', 'warn', 'ignore', 'skip'] = 'raise', missing_reference_error_type: Literal['raise', 'warn', 'ignore', 'skip'] = 'ignore', memory: Memory = Memory(location=None), groupby_cols: list[str] | str | None = None, subset_index: DataFrame | None = None, )[source]#

A dataset containing all lab example data provided with mobgap.

Parameters:

raw_data_sensor: Which sensor to load the raw data for. One of “SU”, “INDIP”, “INDIP2”. SU is usually the “normal” lower back sensor. INDIP and INDIP2 are only available under special circumstances for the Mobilise-D TVS data. Note, that we don’t support loading multiple sensors at once.
reference_system: When specified, reference gait parameters are loaded using the specified reference system.
sensor_positions: Which sensor positions to load the raw data for. For “SU”, only “LowerBack” is available, but for other sensors, more positions might be available. If a sensor position is not available, an error is raised.
single_sensor_position: The sensor position that is considered the “single sensor”. This is the sensor that you expect to be the input to all pipelines and algorithms. For most Mobilise-d datasets, this should be kept at “LowerBack”. But, we support using other sensors as well.
sensor_types: Which sensor types to load the raw data for. This can be used to reduce the amount of data loaded, if only e.g. acc and gyr data is required. Some sensors might only have a subset of the available sensor types. If a sensor type is not available, it is ignored.
missing_sensor_error_type: Whether to throw an error (“raise”), a warning (“warn”) or ignore (“ignore”) when a sensor is missing. In all three cases, the trial is still included in the index, but the imu-data is not available. If you want to skip the trial entirely, set this to “skip”. Then the trial will not appear in the index at all. Note, that “skip” will skip the trial, if ANY sensor position specified is missing. Specifying, “skip” will only effect the initial data loading. Changing the value after you already created a subset of the data has no effect.
missing_reference_error_type: Whether to throw an error (“raise”), a warning (“warn”) or ignore (“ignore”) when reference data is missing. If you want to skip the trial entirely when the reference data is not available, set this to “skip”. Specifying, “skip” will only effect the initial data loading. Changing the value after you already created a subset of the data has no effect.
memory: A joblib memory object to cache the results of the data loading. This is highly recommended, if you have many large data files. Otherwise, the initial index creation can take a long time.
reference_para_level: Whether to provide “wb” (walking bout) or “lwb” (level-walking bout) reference when loading reference_parameters_. raw_reference_parameters_ will always contain both in an unformatted way.
groupby_cols: Columns to group the data by. See Dataset for details.
subset_index: The selected subset of the index. See Dataset for details.

Attributes:

data: The raw IMU data of all available sensors. This is a dictionary with the sensor name as key and the data as value.
data_ss: The IMU data of the “single sensor”. Compared to data, this is only just the data of a single sensor. Which sensor is considered the “single sensor” might be different for each dataset. Most datasets use a configuration of single_sensor_name=... to allow the user to select the sensor.
sampling_rate_hz: The sampling rate of the IMU data in Hz.
participant_metadata: General participant metadata. Contains at least the keys listed in ParticipantMetadata.
recording_metadata: General recording metadata. Contains at least the keys listed in RecordingMetadata.
reference_parameters_: Parsed reference parameters. This contains the reference parameters in a format that can be used as input and output to many of the mobgap algorithms. See ReferenceData for details. Note that these reference parameters are expected to be relative to the start of the recording and all timing parameters (like the start and end of a walking bout) are expected to be in samples.
reference_parameters_relative_to_wb_: Same as reference_parameters_, but all timing parameters are relative to the start of the walking bout. This is useful for algorithms that only act in the context of a walking bout.
reference_sampling_rate_hz_: The sampling rate of the reference data in Hz.
participant_metadata_as_df: The participant metadata as a DataFrame. This contains the same information as participant_metadata, but the property can be accessed even when the dataset still contains multiple participants. It contains one row per participant and are all columns of the index, that are required to uniquely identify a single measurement.
recording_metadata_as_df: The recording metadata as a DataFrame. This contains the same information as recording_metadata, but the property can be accessed even when the dataset still contains multiple participants. It contains one row for each row in the dataset index.
raw_reference_parameters_: The raw reference parameters (if available). Check other attributes with a trailing underscore for the reference parameters converted into a more standardized format.
metadata: The metadata of the selected test.
UNITS: Representation of units IMU units in gait datasets.

See also

Dataset: For details about the groupby_cols and subset_index parameters.
load_mobilised_matlab_format

Methods

`UNITS`()	Representation of units IMU units in gait datasets.
`as_attrs`()	Return a version of the Dataset class that can be subclassed using `attrs` defined classes.
`as_dataclass`()	Return a version of the Dataset class that can be subclassed using dataclasses.
`assert_is_single`(groupby_cols, property_name)	Raise error if index does contain more than one group/row with the given groupby settings.
`assert_is_single_group`(property_name)	Raise error if index does contain more than one group/row.
`clone`()	Create a new instance of the class with all parameters copied over.
`create_index`()	Create the dataset index.
`create_precomputed_test_list`([n_jobs])	Compute and store a json test list for a data.mat file.
`create_string_group_labels`(label_cols)	Generate a list of string labels for each group/row in the dataset.
`get_params`([deep])	Get parameters for this algorithm.
`get_subset`(*[, group_labels, index, bool_map])	Get a subset of the dataset.
`groupby`(groupby_cols)	Return a copy of the dataset grouped by the specified columns.
`index_as_tuples`()	Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).
`is_single`(groupby_cols)	Return True if index contains only one row/group with the given groupby settings.
`is_single_group`()	Return True if index contains only one group.
`iter_level`(level)	Return generator object containing a subset for every category from the selected level.
`set_params`(**params)	Set the parameters of this Algorithm.

create_group_labels

__init__( *, raw_data_sensor: Literal['SU', 'INDIP', 'INDIP2'] = 'SU', reference_system: Literal['INDIP', 'Stereophoto'] | None = None, reference_para_level: Literal['wb', 'lwb'] = 'wb', sensor_positions: Sequence[str] = ('LowerBack',), single_sensor_position: str = 'LowerBack', sensor_types: Sequence[Literal['acc', 'gyr', 'mag', 'bar']] = ('acc', 'gyr'), missing_sensor_error_type: Literal['raise', 'warn', 'ignore', 'skip'] = 'raise', missing_reference_error_type: Literal['raise', 'warn', 'ignore', 'skip'] = 'ignore', memory: Memory = Memory(location=None), groupby_cols: list[str] | str | None = None, subset_index: DataFrame | None = None, ) → None[source]#

class UNITS[source]#

Representation of units IMU units in gait datasets.

Parameters:

acc: acceleration unit, default = ms^-2
gyr: gyroscope unit, default = deg/s
mag: magnetometer unit, default = uT

classmethod as_attrs()[source]#

Return a version of the Dataset class that can be subclassed using attrs defined classes.

Note, this requires attrs to be installed!

classmethod as_dataclass()[source]#: Return a version of the Dataset class that can be subclassed using dataclasses.

assert_is_single( groupby_cols: list[str] | str | None, property_name, ) → None[source]#

Raise error if index does contain more than one group/row with the given groupby settings.

This should be used when implementing access to data values, which can only be accessed when only a single trail/participant/etc. exist in the dataset.

Parameters:

groupby_cols: None (no grouping) or a valid subset of the columns available in the dataset index.
property_name: Name of the property this check is used in. Used to format the error message.

assert_is_single_group(property_name) → None[source]#

Raise error if index does contain more than one group/row.

Note that this is different from assert_is_single as it is aware of the current grouping. Instead of checking that a certain combination of columns is left in the dataset, it checks that only a single group exists with the already selected grouping as defined by self.groupby_cols.

Parameters:

property_name: Name of the property this check is used in. Used to format the error message.

clone() → Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

create_index() → DataFrame[source]#

Create the dataset index.

The index columns will consist of the metadata extracted from the columns and the test names.

create_precomputed_test_list(n_jobs: int | None = None) → None[source]#

Compute and store a json test list for a data.mat file.

This function should be used by Dataset developers to precompute the test list for a data.mat file. This can massively reduce initial loading time, as the dataset index can be generated without loading the data.

When this is used to generate the test-list, the _relpath_to_precomputed_test_list method must be implemented.

Warning

Don’t create test lists for datasets that are likely to be changed. Otherwise, you might end up with outdated files and hard to debug errors. If you want to recreate the test list (either because of a mobgap or a dataset update), delete all test list files and recreate them using this method.

create_string_group_labels(label_cols: str | list[str]) → list[str][source]#

Generate a list of string labels for each group/row in the dataset.

Note

This has a different use case than the dataset-wide groupby. Using groupby reduces the effective size of the dataset to the number of groups. This method produces a group label for each group/row that is already in the dataset, without changing the dataset.

The output of this method can be used in combination with GroupKFold as the group label.

Parameters:

label_cols: The columns that should be included in the label. If the dataset is already grouped, this must be a subset of self.groupby_cols.

get_params(deep: bool = True) → dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:

deep: Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:

params: Parameter names mapped to their values.

get_subset(

*,

group_labels: list[tuple[str, ...]] | None = None,

index: DataFrame | None = None,

bool_map: Sequence[bool] | None = None,

**kwargs: list[str] | str,

) → Self[source]#

Get a subset of the dataset.

Note

All arguments are mutable exclusive!

Parameters:

group_labels: A valid row locator or slice that can be passed to self.grouped_index.loc[locator, :]. This basically needs to be a subset of self.group_labels. Note that this is the only indexer that works on the grouped index. All other indexers work on the pure index.
index: pd.DataFrame that is a valid subset of the current dataset index.
bool_map: bool-map that is used to index the current index-dataframe. The list must be of same length as the number of rows in the index.
**kwargs: The key must be the name of an index column. The value is a list containing strings that correspond to the categories that should be kept. For examples see above.

Returns:

subset: New dataset object filtered by specified parameters.

property group: GroupLabelT#: Get the current group label. Deprecated, use group_label instead.

property group_label: GroupLabelT#

Get the current group label.

The group is defined by the current groupby settings.

Note, this attribute can only be used, if there is just a single group. This will return a named tuple. The tuple will contain only one entry if there is only a single groupby column or column in the index. The elements of the named tuple will have the same names as the groupby columns and will be in the same order.

property group_labels: list[GroupLabelT]#

Get all group labels of the dataset based on the set groupby level.

This will return a list of named tuples. The tuples will contain only one entry if there is only one groupby level or index column.

The elements of the named tuples will have the same names as the groupby columns and will be in the same order.

Note, that if one of the groupby levels/index columns is not a valid Python attribute name (e.g. in contains spaces or starts with a number), the named tuple will not contain the correct column name! For more information see the documentation of the rename parameter of collections.namedtuple.

For some examples and additional explanation see this example.

groupby( groupby_cols: list[str] | str | None, ) → Self[source]#

Return a copy of the dataset grouped by the specified columns.

This does not change the order of the rows of the dataset index.

Each unique group represents a single data point in the resulting dataset.

Parameters:

groupby_cols: None (no grouping) or a valid subset of the columns available in the dataset index.

property grouped_index: DataFrame#: Return the index with the groupby columns set as multiindex.

property groups: list[GroupLabelT]#: Get the current group labels. Deprecated, use group_labels instead.

property index: DataFrame#: Get index.

index_as_tuples() → list[GroupLabelT][source]#: Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).

property index_is_unchanged: bool#

Returns True if the index is the same as the one created by create_index.

This can be used to check, if the index represents a subset or the actual full index. Note, that this is independent of the groupby_cols setting.

Note

Under the hood this uses the attrs functionality of pandas to store a hash of the original index on the dataframe. If the index is modified or a new index is created, this property does either not exist anymore or the content is modified.

is_single(groupby_cols: list[str] | str | None) → bool[source]#

Return True if index contains only one row/group with the given groupby settings.

If groupby_cols=None this checks if there is only a single row left. If you want to check if there is only a single group within the current grouping, use is_single_group instead.

Parameters:

groupby_cols: None (no grouping) or a valid subset of the columns available in the dataset index.

is_single_group() → bool[source]#: Return True if index contains only one group.

iter_level( level: str, ) → Iterator[Self][source]#

Return generator object containing a subset for every category from the selected level.

Parameters:

level: Optional str that sets the level which shall be used for iterating. This must be one of the columns names of the index.

Returns:

subset: New dataset object containing only one category in the specified level.

set_params(**params: Any) → Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

property shape: tuple[int]#

Get the shape of the dataset.

This only reports a single dimension. This is equal to the number of rows in the index, if self.groupby_cols=None. Otherwise, it is equal to the number of unique groups.

Examples using `mobgap.data.LabExampleDataset`#

Loading example data

Working with reference data

Mobilise-D TVS Dataset

Custom Data and Datasets

Gait Sequence Iterator

The Mobilise-D pipeline: Step-by-Step Breakdown

Preconfigured Mobilised Pipelines

GSD Iluz

This example shows how to use the GSD Iluz algorithm and some examples on how the results compare to the original

GSD Paraschiv-Ionescu

GSD Evaluation

GSD Evaluation Challenges

McCamley L/R Classifier

Ullrich L/R Classifier

LRC Evaluation

Cadence from Initial Contacts

Cadence Evaluation

SL Zijlstra

Stride Length Evaluation

ElGohary Turning Algo

General Filter Introduction

Resampling data

Continuous Wavelet Transform (CWT) - Filter

Gaussian Smoothing

Savitzky-Golay Filter Example

LabExampleDataset#

Examples using mobgap.data.LabExampleDataset#

Examples using `mobgap.data.LabExampleDataset`#