MobilisedCvsDmoDataset#
- class mobgap.data.MobilisedCvsDmoDataset(
- dmo_path: str | Path,
- site_pid_map_path: str | Path,
- *,
- weartime_reports_base_path: str | Path | None = None,
- pre_compute_daily_weartime: bool = True,
- memory: Memory = Memory(location=None),
- groupby_cols: list[str] | str | None = None,
- subset_index: DataFrame | None = None,
A dataset representing calculated DMO data of the clinical validation study.
Warning
This dataset will not provide the raw data of the clinical validation study, but rather the official export of the calculated DMO data as provided by WP3.
Warning
We assume that the dmo data file has the structure
WP6-{visit_type}-...- Parameters:
- dmo_path
The path to the calculated DMO data export. This should be the path to the approx. 1 Gb csv file with all the DMOs. Note, that we only support DMO files including the data of a single visit (e.g. T1, T2, …) at a time.
- site_pid_map_path
The path to the file that contains the mapping between all the participants and their site. This is required to calculate the correct timezone for each participant.
- weartime_reports_base_path
The base path to the folder that contains the wear-time compliance reports. These exports are provided by McRoberts and contain the wear-time per minute data.
- pre_compute_daily_weartime
If True, the daily weartime will be pre-computed and a new file will be stored within the
weartime_reports_base_path. This computation will be performed for all participants when you first attempt to load any weartime related features. This might take multiple minutes, but will speed up the loading of the data in the future. If you are only planning to access a small subset of the data, you might want to set this to False.Warning
Remember to delete the pre-computed weartime file, if you obtain a new version of the weartime reports.
- Attributes:
dataThe DMO data per WB.
data_maskThe DMO data mask per WB.
groupGet the current group label.
group_labelGet the current group label.
group_labelsGet all group labels of the dataset based on the set groupby level.
grouped_indexReturn the index with the
groupbycolumns set as multiindex.groupsGet the current group labels.
indexGet index.
index_is_unchangedReturns True if the index is the same as the one created by
create_index.measurement_siteThe measurement site of the dataset.
shapeGet the shape of the dataset.
timezoneThe timezone of the measurement site.
visit_typeThe visit type (T1 - TN) of the dataset.
weartime_dailyThe daily weartime per participant.
Methods
as_attrs()Return a version of the Dataset class that can be subclassed using
attrsdefined classes.Return a version of the Dataset class that can be subclassed using dataclasses.
assert_is_single(groupby_cols, property_name)Raise error if index does contain more than one group/row with the given groupby settings.
assert_is_single_group(property_name)Raise error if index does contain more than one group/row.
clone()Create a new instance of the class with all parameters copied over.
Create the full index for the dataset.
create_string_group_labels(label_cols)Generate a list of string labels for each group/row in the dataset.
get_params([deep])Get parameters for this algorithm.
get_subset(*[, group_labels, index, bool_map])Get a subset of the dataset.
groupby(groupby_cols)Return a copy of the dataset grouped by the specified columns.
Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).
is_single(groupby_cols)Return True if index contains only one row/group with the given groupby settings.
Return True if index contains only one group.
iter_level(level)Return generator object containing a subset for every category from the selected level.
set_params(**params)Set the parameters of this Algorithm.
create_group_labels
- __init__(
- dmo_path: str | Path,
- site_pid_map_path: str | Path,
- *,
- weartime_reports_base_path: str | Path | None = None,
- pre_compute_daily_weartime: bool = True,
- memory: Memory = Memory(location=None),
- groupby_cols: list[str] | str | None = None,
- subset_index: DataFrame | None = None,
- classmethod as_attrs()[source]#
Return a version of the Dataset class that can be subclassed using
attrsdefined classes.Note, this requires
attrsto be installed!
- classmethod as_dataclass()[source]#
Return a version of the Dataset class that can be subclassed using dataclasses.
- assert_is_single( ) None[source]#
Raise error if index does contain more than one group/row with the given groupby settings.
This should be used when implementing access to data values, which can only be accessed when only a single trail/participant/etc. exist in the dataset.
- Parameters:
- groupby_cols
None (no grouping) or a valid subset of the columns available in the dataset index.
- property_name
Name of the property this check is used in. Used to format the error message.
- assert_is_single_group(property_name) None[source]#
Raise error if index does contain more than one group/row.
Note that this is different from
assert_is_singleas it is aware of the current grouping. Instead of checking that a certain combination of columns is left in the dataset, it checks that only a single group exists with the already selected grouping as defined byself.groupby_cols.- Parameters:
- property_name
Name of the property this check is used in. Used to format the error message.
- clone() Self[source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- create_index() DataFrame[source]#
Create the full index for the dataset.
This needs to be implemented by the subclass.
Warning
Make absolutely sure that the dataframe you return is deterministic and does not change between runs! This can lead to some nasty bugs! We try to catch them internally, but it is not always possible. As tips, avoid reliance on random numbers and make sure that the order is not depend on things like file system order, when creating an index by scanning a directory. Particularly nasty are cases when using non-sorted container like
set, that sometimes maintain their order, but sometimes don’t. At the very least, we recommend to sort the final dataframe you return increate_index.
- create_string_group_labels( ) list[str][source]#
Generate a list of string labels for each group/row in the dataset.
Note
This has a different use case than the dataset-wide groupby. Using
groupbyreduces the effective size of the dataset to the number of groups. This method produces a group label for each group/row that is already in the dataset, without changing the dataset.The output of this method can be used in combination with
GroupKFoldas the group label.- Parameters:
- label_cols
The columns that should be included in the label. If the dataset is already grouped, this must be a subset of
self.groupby_cols.
- property data: DataFrame#
The DMO data per WB.
This will provide a df with all DMOs, where each row corresponds to a single WB. The df will include the data of all participants and days currently selected in the index of the dataset.
- property data_mask: DataFrame#
The DMO data mask per WB.
A “true”/”false” flag for each individual DMO. A “false” indicates that the specific DMO value might potentially be incorrect. These flags are determined using some expert defined thresholds for likely valid ranges of DMOs.
The shaoe of the flags-df is identical to the shape of the DMO data-df, so that they can be directly overlayed.
- get_params(deep: bool = True) dict[str, Any][source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- get_subset(
- *,
- group_labels: list[tuple[str, ...]] | None = None,
- index: DataFrame | None = None,
- bool_map: Sequence[bool] | None = None,
- **kwargs: list[str] | str,
Get a subset of the dataset.
Note
All arguments are mutable exclusive!
- Parameters:
- group_labels
A valid row locator or slice that can be passed to
self.grouped_index.loc[locator, :]. This basically needs to be a subset ofself.group_labels. Note that this is the only indexer that works on the grouped index. All other indexers work on the pure index.- index
pd.DataFramethat is a valid subset of the current dataset index.- bool_map
bool-map that is used to index the current index-dataframe. The list must be of same length as the number of rows in the index.
- **kwargs
The key must be the name of an index column. The value is a list containing strings that correspond to the categories that should be kept. For examples see above.
- Returns:
- subset
New dataset object filtered by specified parameters.
- property group: GroupLabelT#
Get the current group label. Deprecated, use
group_labelinstead.
- property group_label: GroupLabelT#
Get the current group label.
The group is defined by the current groupby settings.
Note, this attribute can only be used, if there is just a single group. This will return a named tuple. The tuple will contain only one entry if there is only a single groupby column or column in the index. The elements of the named tuple will have the same names as the groupby columns and will be in the same order.
- property group_labels: list[GroupLabelT]#
Get all group labels of the dataset based on the set groupby level.
This will return a list of named tuples. The tuples will contain only one entry if there is only one groupby level or index column.
The elements of the named tuples will have the same names as the groupby columns and will be in the same order.
Note, that if one of the groupby levels/index columns is not a valid Python attribute name (e.g. in contains spaces or starts with a number), the named tuple will not contain the correct column name! For more information see the documentation of the
renameparameter ofcollections.namedtuple.For some examples and additional explanation see this example.
- groupby( ) Self[source]#
Return a copy of the dataset grouped by the specified columns.
This does not change the order of the rows of the dataset index.
Each unique group represents a single data point in the resulting dataset.
- Parameters:
- groupby_cols
None (no grouping) or a valid subset of the columns available in the dataset index.
- property grouped_index: DataFrame#
Return the index with the
groupbycolumns set as multiindex.
- property groups: list[GroupLabelT]#
Get the current group labels. Deprecated, use
group_labelsinstead.
- property index: DataFrame#
Get index.
- index_as_tuples() list[GroupLabelT][source]#
Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).
- property index_is_unchanged: bool#
Returns True if the index is the same as the one created by
create_index.This can be used to check, if the index represents a subset or the actual full index. Note, that this is independent of the
groupby_colssetting.Note
Under the hood this uses the attrs functionality of pandas to store a hash of the original index on the dataframe. If the index is modified or a new index is created, this property does either not exist anymore or the content is modified.
- is_single(groupby_cols: list[str] | str | None) bool[source]#
Return True if index contains only one row/group with the given groupby settings.
If
groupby_cols=Nonethis checks if there is only a single row left. If you want to check if there is only a single group within the current grouping, useis_single_groupinstead.- Parameters:
- groupby_cols
None (no grouping) or a valid subset of the columns available in the dataset index.
- iter_level(
- level: str,
Return generator object containing a subset for every category from the selected level.
- Parameters:
- level
Optional
strthat sets the level which shall be used for iterating. This must be one of the columns names of the index.
- Returns:
- subset
New dataset object containing only one category in the specified
level.
- property measurement_site: Literal['CAU', 'CHUM', 'ICL', 'KUL1', 'KUL2', 'NTNU', 'PFLG', 'RBMF', 'TASMC', 'TFG', 'UKER', 'UNEW', 'UNN', 'USFD', 'USR', 'UZH1', 'UZH2', 'UZH3', 'ISG1', 'ISG2', 'ISG3', 'ISG4']#
The measurement site of the dataset.
This can only be accessed if the dataset only contains data from a single participant.
- set_params(**params: Any) Self[source]#
Set the parameters of this Algorithm.
To set parameters of nested objects use
nested_object_name__para_name=.
- property shape: tuple[int]#
Get the shape of the dataset.
This only reports a single dimension. This is equal to the number of rows in the index, if
self.groupby_cols=None. Otherwise, it is equal to the number of unique groups.
- property timezone: str#
The timezone of the measurement site.
This can only be accessed if the dataset only contains data from a single participant.
- property visit_type: str#
The visit type (T1 - TN) of the dataset.
Each dataset instance can only load data from a single visit type. This is determined by the visit type in the filename of the dmo export file.
- property weartime_daily: DataFrame#
The daily weartime per participant.
This is calculated from the minute to minute weartime reports provided by McRoberts. This is optional, and you might not have access to the weartime reports.