.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/data/_04_tvs_data_no_exc.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_data__04_tvs_data_no_exc.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_data__04_tvs_data_no_exc.py:


Mobilise-D TVS Dataset
======================

As part of the Mobilise-D technical validation study an extensive dataset containing 115 participants
(5 different indications + healthy adults) wearing a lower back sensor in the lab and during a 2.5 hour free-living
period was collected.
In the lab all trials were recorded with a synchronized motion capture system and the multi-modal wearable INDIP system
as reference.
During the 2.5 hour free-living period only the INDIP system was used as reference [1]_.

With that this dataset is one of the only datasets that contains data from patients with multiple indications AND
high-granular reference information from both lab and free-living settings.
This makes it a valuable resource for the development and validation of algorithms for lower back sensor data.
The dataset was already used extensivly to benchmark the Mobilise-D algorithms individually [2]_ and in
combination as a pipeline [3]_.

The dataset is published on Zenodo and can be accessed here: https://zenodo.org/records/13899386

The recommended way to work with the dataset is to use this library and the provided classes to load the data.
This will ensure that the data is loaded in a consistent way and that the data is correctly preprocessed.

This example demonstrate how to do this.

.. warning:: This example only shows the code, but not the output of the code, as this requires the dataset to be
   available, when this website is created.
   We highly recommend to run this code on your local machine to see the output.
   For this, replace the "Environmental Variable" loaded below with the actual path to the dataset on your local machine.

.. [1] Salis F, Bertuletti S, Bonci T, Caruso M, Scott K, Alcock L, Buckley E, Gazit E, Hansen , Schwickert L,
       Aminian K, Becker C, Brown P, Carsin A, Caulfield B, Chiari L, D’Ascanio I, Del Din S, Eskofier B,
       Garcia-Aymerich J, Hausdorff J, Hume E, Kirk C, Kluge F, Koch S, Kuederle A, Maetzler W, Micó-Amigo E, Mueller A,
       Neatrour I, Paraschiv-Ionescu A, Palmerini L, Yarnall A, Rochester L, Sharrack B, Singleton D, Vereijken B,
       Vogiatzis I, Della Croce U, Mazzà C, Cereatti A. A multi-sensor wearable system for the assessment of diseased
       gait in real-world conditions. Frontiers in Bioengineering and Biotechnology. 2023;11.
       doi: 10.3389/fbioe.2023.1143248.
.. [2] Micó-Amigo ME, Bonci T, Paraschiv-Ionescu A, Ullrich M, Kirk C, Soltani A, Küderle A, Gazit E, Salis F, Alcock L,
       Aminian K, Becker C, Bertuletti S, Brown P, Buckley E, Cantu A, Carsin AE, Caruso M, Caulfield B, Cereatti A,
       Chiari L, D'Ascanio I, Eskofier B, Fernstad S, Froehlich M, Garcia-Aymerich J, Hansen C, Hausdorff JM, Hiden H,
       Hume E, Keogh A, Kluge F, Koch S, Maetzler W, Megaritis D, Mueller A, Niessen M, Palmerini L, Schwickert L,
       Scott K, Sharrack B, Sillén H, Singleton D, Vereijken B, Vogiatzis I, Yarnall AJ, Rochester L, Mazzà C,
       Del Din S; Mobilise-D consortium. Assessing real-world gait with digital technology? Validation, insights and
       recommendations from the Mobilise-D consortium. J Neuroeng Rehabil. 2023 Jun 14;20(1):78.
       doi: 10.1186/s12984-023-01198-5. Erratum in: J Neuroeng Rehabil. 2024 May 3;21(1):71.
       doi: 10.1186/s12984-024-01361-6. PMID: 37316858; PMCID: PMC10265910.
.. [3] Kirk C, Küderle A, Micó-Amigo ME, Bonci T, Paraschiv-Ionescu A, Ullrich M, Soltani A, Gazit E, Salis F, Alcock L,
       Aminian K, Becker C, Bertuletti S, Brown P, Buckley E, Cantu A, Carsin A, Caruso M, Caulfield B, Cereatti A,
       Chiari L, D’Ascanio I, Garcia-Aymerich J, Hansen C, M. Hausdorff J, Hiden H, Hume E, Keogh A, Kluge F, Koch S,
       Maetzler W, Megaritis D, Mueller A, Niessen M, Palmerini L, Schwickert L, Scott K, Sharrack B, Sillén H,
       Singleton D, Vereijken B, Vogiatzis I, J. Yarnall A, Rochester L, Mazzà C, M. Eskofier B, Del Din S, Mobilise-D
       consortium. Mobilise-D insights to estimate real-world walking speed in multiple conditions with a wearable
       device. Sci Rep 14. 2024 Jan 19;1754. doi: https://doi.org/10.1038/s41598-024-51766-5

.. GENERATED FROM PYTHON SOURCE LINES 56-61

Dataset Path
------------
For this example we get the path from an environmental variable, so that we don't have to hardcode the path here.
When you run this on your local machine, you can set the environmental variable to the path of the dataset or just
replace the path in the code below.

.. GENERATED FROM PYTHON SOURCE LINES 61-71

.. code-block:: default

    import os
    from pathlib import Path

    if "MOBGAP_TVS_DATASET_PATH" not in os.environ:
        raise ValueError(
            "Please set the environmental variable MOBGAP_TVS_DATASET_PATH to the path of the TVS dataset."
        )

    dataset_path = Path(os.getenv("MOBGAP_TVS_DATASET_PATH"))


.. GENERATED FROM PYTHON SOURCE LINES 72-85

Load the Dataset
----------------
We use the `TVSLabDataset` class to load the lab data and show most of the possible interactions.
The `TVSFreeLivingDataset` can be used in the same way to load the free-living data.

.. note:: Creating the dataset and selecting values from the index, does not actually load the data into RAM.
          The data is only loaded once the actual (meta)data attributes are accessed.
          Even then, data is only loaded per recording and not all at once.

.. note:: Loading the data from the data.mat files can still be slow. If you're working with the TVS dataset a lot
          (and have enough disk space available), it is highly recommended to use a diskcache by passing a joblib
          Memory instance to the ``memory`` parameter of the dataset class.


.. GENERATED FROM PYTHON SOURCE LINES 85-90

.. code-block:: default

    from mobgap.data import TVSLabDataset

    labdata = TVSLabDataset(dataset_path, reference_system="Stereophoto")
    labdata


.. GENERATED FROM PYTHON SOURCE LINES 91-100

Selecting data
--------------
The index of the dataset shows all available trials.
This means each individual recording of a participant is represented by one row.
By default, the dataset will also show trials that might not have valid reference data.
If you want to skip them you can use the `missing_reference_error_type` parameter.

If you compare the number of rows you can see that this removes a couple 100 trials.
If you are planning to use the reference data, you should always set this parameter to "skip".

.. GENERATED FROM PYTHON SOURCE LINES 100-107

.. code-block:: default

    labdata = TVSLabDataset(
        dataset_path,
        reference_system="Stereophoto",
        missing_reference_error_type="skip",
    )
    labdata


.. GENERATED FROM PYTHON SOURCE LINES 108-110

On the remaining data, we can easily filter by all columns that are in the index.
For example, we could filter for only Test 11 (simulated activities of daily living) and Test 5 (straight walking).

.. GENERATED FROM PYTHON SOURCE LINES 110-113

.. code-block:: default

    test_subset = labdata.get_subset(test=["Test5", "Test11"])
    test_subset


.. GENERATED FROM PYTHON SOURCE LINES 114-117

We could then further filter by cohort.
For example only getting the data from Parkinson's patients.
Note, this could also be done in a single call to `get_subset`.

.. GENERATED FROM PYTHON SOURCE LINES 117-120

.. code-block:: default

    test_subset_pd = test_subset.get_subset(cohort="PD")
    test_subset_pd


.. GENERATED FROM PYTHON SOURCE LINES 121-126

We can also filter based on information that is not directly in the index.
For example, we could further filter for only the participants that are taller than 1.7m.

For this, we access the ``participant_information`` (or ``participant_metadata``, more on the difference below)
attribute of the dataset.

.. GENERATED FROM PYTHON SOURCE LINES 126-129

.. code-block:: default

    test_subset_pd_p_info = test_subset_pd.participant_information
    test_subset_pd_p_info


.. GENERATED FROM PYTHON SOURCE LINES 130-133

Note, that the participant information only contains the information of the participants that are in the current
subset.
So we can easily get the list of participants that are taller than 1.7m.

.. GENERATED FROM PYTHON SOURCE LINES 133-140

.. code-block:: default

    tall_participants = (
        test_subset_pd_p_info[test_subset_pd_p_info["height_m"] > 1.7]
        .reset_index()["participant_id"]
        .to_list()
    )
    tall_participants


.. GENERATED FROM PYTHON SOURCE LINES 141-144

We can then use this list to get a subset of the data that only contains the data of the tall participants.

Similar, we could filter based on any other information that is in the participant information (like disease status).

.. GENERATED FROM PYTHON SOURCE LINES 144-149

.. code-block:: default

    test_subset_tall_pd = test_subset_pd.get_subset(
        participant_id=tall_participants
    )
    test_subset_tall_pd


.. GENERATED FROM PYTHON SOURCE LINES 150-156

We can also make more complex manipulations of the data, by directly manipulating the index.
Let's say for all remaining test data, we want the last trial of each test.
Note, that some tests for some participants have one, and others have two trials.
However, as the index is sorted, we know that the last trial is always the last row of each test.

We can use normal pandas operations on the dataset index and then provide the resulting index to get subset.

.. GENERATED FROM PYTHON SOURCE LINES 156-165

.. code-block:: default

    test_subset_tall_pd_last_trial = test_subset_tall_pd.get_subset(
        index=(
            test_subset_tall_pd.index.groupby(
                ["test", "participant_id", "time_measure"]
            ).tail(1)
        )
    )
    test_subset_tall_pd_last_trial


.. GENERATED FROM PYTHON SOURCE LINES 166-170

Metadata
--------
Let's assume we now have the data, we want to work with, let's have a look at the actual data, we want to work with.
We will rename the variable of our data subset to avoid typing the long name.

.. GENERATED FROM PYTHON SOURCE LINES 170-172

.. code-block:: default

    subset = test_subset_tall_pd_last_trial


.. GENERATED FROM PYTHON SOURCE LINES 173-175

Then we are going to access the available meta information.
The main demographic information is stored in the `participant_information` attribute (that we already saw above).

.. GENERATED FROM PYTHON SOURCE LINES 175-177

.. code-block:: default

    subset.participant_information


.. GENERATED FROM PYTHON SOURCE LINES 178-182

It contains basic information about the participants, general clinical scores, disease specific clinical scores, and
information about potential use of walking aids.

In case, the entire list is too much, we can also access the respective subsets.

.. GENERATED FROM PYTHON SOURCE LINES 182-184

.. code-block:: default

    subset.demographic_information


.. GENERATED FROM PYTHON SOURCE LINES 185-187

.. code-block:: default

    subset.general_clinical_information


.. GENERATED FROM PYTHON SOURCE LINES 188-190

.. code-block:: default

    subset.cohort_clinical_information


.. GENERATED FROM PYTHON SOURCE LINES 191-193

.. code-block:: default

    subset.walking_aid_use_information


.. GENERATED FROM PYTHON SOURCE LINES 194-205

When ever a data value was not recorded or not applicable for a participant, the value is set to NaN.

All the information provided above is extracted from the ``participant_information.xlsx`` file provided with the
dataset.
A small subset of the information is also provided again within the ``infoForAlgo.mat`` files.
The information there is the information deemed directly necessary for some of the algorithms.

When using the standard pipelines, the information provided in the ``infoForAlgo.mat`` files is directly forwarded to
the action method (e.g. ``detect`` or ``calculate``) of the algorithms as keyword arguments.
In the dataset the information can be accessed via the ``participant_metadata_as_df`` attribute for multiple
participants or the ``participant_metadata`` attribute for a single participant.

.. GENERATED FROM PYTHON SOURCE LINES 205-207

.. code-block:: default

    subset.participant_metadata_as_df


.. GENERATED FROM PYTHON SOURCE LINES 208-209

Or for a single participant (just selecting the first row)

.. GENERATED FROM PYTHON SOURCE LINES 209-211

.. code-block:: default

    subset[0].participant_metadata


.. GENERATED FROM PYTHON SOURCE LINES 212-219

Similarly, we can access recording metadata.

Note, that the recording metadata is stored in the actual data.mat file.
These files can be relatively large and hence, accessing the metadata (in particular for multiple participants) can
be slow.
To speed things up, you can use the caching mechanism provided by the dataset via the ``memory`` parameter.
subset.recording_metadata_as_df

.. GENERATED FROM PYTHON SOURCE LINES 221-222

Or for a single trial (just selecting the first row)

.. GENERATED FROM PYTHON SOURCE LINES 222-224

.. code-block:: default

    subset[0].recording_metadata


.. GENERATED FROM PYTHON SOURCE LINES 225-239

The final piece of meta information that is available is the data quality of the SU (wearable sensor) and the
reference data.
This is a simple quality score (0-3) + additional comments that is provided for each recording.

The numbers can be interpreted as follows:

- 0: Recording discarded completely (these recordings are likely not included in the dataset in the first place)
- 1: Recording has issues, but included in the dataset. Individual tests or trials might be missing, or might have
  degraded quality.
- 2: Recording had some issues, but they could be fixed. Actual data should be good (INDIP only)
- 3: Recording is good

Depending on your requirements, it might be necessary to filter out data with a quality score of 1.
The comments can provide further inside into the issues.

.. GENERATED FROM PYTHON SOURCE LINES 239-241

.. code-block:: default

    subset.data_quality


.. GENERATED FROM PYTHON SOURCE LINES 242-245

IMU and Reference Data
----------------------
As explained in the other data loader examples, data can be accessed, once only a single trial is selected.

.. GENERATED FROM PYTHON SOURCE LINES 245-248

.. code-block:: default

    single_trial = subset[0]
    single_trial


.. GENERATED FROM PYTHON SOURCE LINES 249-251

The imu data is stored in the ``data_ss`` attribute.
This is the data of the single sensor that was selected during the dataset creation.

.. GENERATED FROM PYTHON SOURCE LINES 251-253

.. code-block:: default

    single_trial.data_ss


.. GENERATED FROM PYTHON SOURCE LINES 254-255

The reference data is stored in the ``reference_parameters_`` / ``reference_parameters_relative_to_wb_`` attribute.

.. GENERATED FROM PYTHON SOURCE LINES 255-257

.. code-block:: default

    single_trial.reference_parameters_


.. GENERATED FROM PYTHON SOURCE LINES 258-265

Usage in Algorithms and Pipelines
---------------------------------
The TVS datasets follow the exact same API as the other datasets in mobgap and hence, can be used in the same way.
Everything that you can do with the :class:`~mobgap.data.LabExampleDataset` can also be done with the TVS datasets.
This means, in all examples and tutorials that use the :class:`~mobgap.data.LabExampleDataset`, you can simply
replace it with the TVS datasets.
For more information check out the other data and algorithm examples.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.000 seconds)

**Estimated memory usage:**  0 MB


.. _sphx_glr_download_auto_examples_data__04_tvs_data_no_exc.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: _04_tvs_data_no_exc.py <_04_tvs_data_no_exc.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: _04_tvs_data_no_exc.ipynb <_04_tvs_data_no_exc.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_