.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/aggregation/_99_cvs_agg_pipeline_no_exc.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_aggregation__99_cvs_agg_pipeline_no_exc.py: CVS Official Aggregation Script =============================== This example shows how the raw CVS per walking bout results are combined with the weartime reports and aggregated to a daily and weekly level. For more details on the individual steps see the other aggregation examples. This example should primarily serve as an easy to use script for people that want to run/replicate the CVS aggregation. Before you start you need 3 things: - DMO data:A csv file obtained from the Mobilise-D datawarehouse that contains the data of one CVS measurement time points (e.g. T1). Example file name: `cvs-T3-wb-dmo-14-05-2024.csv` - PID-Map: A mapping between patient ids and measurement sites (short pid-map). Example file name: `study-instances-Cohort Site-2023-08-08h22m09s48.csv` - Weartime: The minute-by-minute weartime reports obtained by McRoberts. This is a folder with a large amount of csv files. In addition to the csv files, we also expect a file with the pattern `CVS-wear-compliance-*.xlsx` in the same folder. This is used to map the measurement-ids to the correct weartime reports. Use the full path to these files/folders in the `path_config` dictionary below. The "outpath" key in the `path_config` dictionary specifies the folder where the aggregated data should be saved. .. GENERATED FROM PYTHON SOURCE LINES 24-34 .. code-block:: Python from pathlib import Path path_config = { "dmo": ..., "pid_map": ..., "weartime_reports": ..., "outpath": ..., } .. GENERATED FROM PYTHON SOURCE LINES 35-38 Load the data ------------- We create a new dataset instance that contains all the data and allows use to load and query it efficiently. .. GENERATED FROM PYTHON SOURCE LINES 38-49 .. code-block:: Python from joblib import Memory from mobgap.data import MobilisedCvsDmoDataset cache = Memory(".cache") ds = MobilisedCvsDmoDataset( path_config["dmo"], path_config["pid_map"], memory=cache, weartime_reports_base_path=path_config["weartime_reports"], ) .. GENERATED FROM PYTHON SOURCE LINES 50-56 QA: Check for duplicated DMOs ----------------------------- We check if there are any duplicated DMOs in the data. If yes, we create a warning and save a file for further investigation. .. note:: The initial loading of the data might take some time .. GENERATED FROM PYTHON SOURCE LINES 56-62 .. code-block:: Python data = ds.data duplicated_dmos = data.index.to_frame()[data.index.to_frame().duplicated()] if not duplicated_dmos.empty: print("Warning: Duplicated DMOs found. Saving to 'duplicated_dmos.csv'") duplicated_dmos.to_csv(Path(path_config["outpath"]) / "duplicated_dmos.csv") .. GENERATED FROM PYTHON SOURCE LINES 63-67 Daily agg --------- We aggregate the data per day for each participant. We then merge the results with the weartime reports to later filter out days that have not enough weartime. .. GENERATED FROM PYTHON SOURCE LINES 67-74 .. code-block:: Python from mobgap.aggregation import MobilisedAggregator daily_agg = MobilisedAggregator( **MobilisedAggregator.PredefinedParameters.cvs_dmo_data ) daily_agg.aggregate(data, data_mask=ds.data_mask) .. GENERATED FROM PYTHON SOURCE LINES 75-77 To exactly match the output format of the original Mobilise-D R-Script, we round the output to 3 decimal places and convert the stride length values to cm. .. GENERATED FROM PYTHON SOURCE LINES 77-81 .. code-block:: Python agg_values = daily_agg.aggregated_data_ agg_values[["strlen_1030_avg", "strlen_30_avg"]] *= 100 agg_values = agg_values.round(3) .. GENERATED FROM PYTHON SOURCE LINES 82-83 Further, we express the variance parameters in "%". .. GENERATED FROM PYTHON SOURCE LINES 83-94 .. code-block:: Python agg_values[ [ "wbdur_all_var", "cadence_all_var", "strdur_all_var", "ws_30_var", "strlen_30_var", ] ] *= 100 .. GENERATED FROM PYTHON SOURCE LINES 95-101 Merge with weartime. .. note:: The initial loading of the weartime data will take some time. After it is loaded once, a new file ``daily_weartime_pre_computed.csv`` will be created in the weartime folder and used as cache for subsequent loads. .. GENERATED FROM PYTHON SOURCE LINES 101-106 .. code-block:: Python daily_weartime = ds.weartime_daily daily_aggregated = agg_values.merge( daily_weartime, left_index=True, right_index=True ) .. GENERATED FROM PYTHON SOURCE LINES 107-108 We drop the "wb_1030_sum" column, because it was not officially verified as a DMO .. GENERATED FROM PYTHON SOURCE LINES 108-110 .. code-block:: Python daily_aggregated = daily_aggregated.drop(columns=["wb_1030_sum"]) .. GENERATED FROM PYTHON SOURCE LINES 111-114 Weartime Filtering ------------------ We filter out days that do not have enough weartime (or not weartime info at all). .. GENERATED FROM PYTHON SOURCE LINES 114-119 .. code-block:: Python daily_aggregated_filtered = daily_aggregated.query( "total_worn_during_waking_h > 12" ) .. GENERATED FROM PYTHON SOURCE LINES 120-123 QA: Number of recording days per participant -------------------------------------------- We check the number of recording days per participant. .. GENERATED FROM PYTHON SOURCE LINES 123-152 .. code-block:: Python import pandas as pd date_range = ( daily_aggregated_filtered.index.to_frame() .reset_index(drop=True) .groupby("participant_id")["measurement_date"] .agg(["min", "max", "count"]) .rename(columns={"count": "n_days", "min": "first", "max": "last"}) .assign( day_diff=lambda df_: ( df_[["first", "last"]].map(pd.to_datetime).eval("last - first") ) ) ) date_range.to_csv(Path(path_config["outpath"]) / "measurement_ranges.csv") if not (over_7 := date_range.query("n_days > 7")).empty: print( f"Warning: {len(over_7)} Participants with more than 7 valid recording days found." ) if not (over_6_diff := date_range.query("day_diff > '6 days'")).empty: print( f"Warning: {len(over_6_diff)} Participants with more than 6 days between first and last valid recording day " "found." ) .. GENERATED FROM PYTHON SOURCE LINES 153-156 Weekly Aggregation ------------------ We aggregate the daily data to a weekly level and remove weeks with insufficient data. .. GENERATED FROM PYTHON SOURCE LINES 156-169 .. code-block:: Python weekly_aggregated = ( daily_aggregated_filtered.drop( columns=["total_worn_h", "total_worn_during_waking_h"] ) .groupby(["visit_type", "participant_id"]) .mean(numeric_only=True) .assign( n_days=daily_aggregated_filtered["walkdur_all_sum"] .groupby(["visit_type", "participant_id"]) .count() ) ) .. GENERATED FROM PYTHON SOURCE LINES 170-171 Formatting: .. GENERATED FROM PYTHON SOURCE LINES 171-187 .. code-block:: Python round_to_int = [ "walkdur_all_sum", "turns_all_sum", "wb_all_sum", "wb_10_sum", "wb_30_sum", "wb_60_sum", ] round_to_three_decimals = weekly_aggregated.columns[ ~weekly_aggregated.columns.isin(round_to_int) ] weekly_aggregated[round_to_int] = weekly_aggregated[round_to_int].round() weekly_aggregated[round_to_three_decimals] = weekly_aggregated[ round_to_three_decimals ].round(3) .. GENERATED FROM PYTHON SOURCE LINES 188-189 Filtering: .. GENERATED FROM PYTHON SOURCE LINES 189-192 .. code-block:: Python weekly_aggregated_filtered = weekly_aggregated[weekly_aggregated["n_days"] >= 3] .. GENERATED FROM PYTHON SOURCE LINES 193-195 Export ------ .. GENERATED FROM PYTHON SOURCE LINES 195-201 .. code-block:: Python from datetime import datetime current_date = datetime.now().strftime("%Y_%m_%d") outdir = Path(path_config["outpath"]) / f"export_{current_date}" outdir.mkdir(exist_ok=True, parents=True) .. GENERATED FROM PYTHON SOURCE LINES 202-216 .. code-block:: Python daily_aggregated.add_suffix("_d").to_csv(outdir / "daily_agg_all.csv") daily_aggregated_filtered.add_suffix("_d").to_csv( outdir / "daily_agg_filtered.csv" ) weekly_aggregated.add_suffix("_w").to_csv(outdir / "weekly_agg_all.csv") weekly_aggregated_filtered.add_suffix("_w").to_csv( outdir / "weekly_agg_filtered.csv" ) daily_weartime.to_csv(outdir / "daily_weartime.csv") daily_weartime[ daily_weartime["total_worn_h"].isna() ].index.to_frame().reset_index(drop=True).to_csv( outdir / "missing_weartime.csv" ) **Estimated memory usage:** 0 MB .. _sphx_glr_download_auto_examples_aggregation__99_cvs_agg_pipeline_no_exc.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _99_cvs_agg_pipeline_no_exc.ipynb <_99_cvs_agg_pipeline_no_exc.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _99_cvs_agg_pipeline_no_exc.py <_99_cvs_agg_pipeline_no_exc.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: _99_cvs_agg_pipeline_no_exc.zip <_99_cvs_agg_pipeline_no_exc.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_