modelarrayio.utils.misc.load_and_normalize_cohort

modelarrayio.utils.misc.load_and_normalize_cohort(cohort_file, scalar_columns=None) tuple[DataFrame, str][source]

Load a cohort CSV, normalise it, and detect the neuroimaging modality.

This is the single entry-point for cohort ingestion shared by all *_to_h5 converters. It performs, in order:

  1. pd.read_csv the file.

  2. cohort_to_long_dataframe to normalise wide/long format.

  3. Empty-cohort validation.

  4. Modality detection from every unique source_file extension.

  5. Mixed-modality validation (all rows must be the same modality).

Parameters:
  • cohort_file (path-like) – Path to the cohort CSV file.

  • scalar_columns (list of str, optional) – Column names for wide-format cohort files. If omitted the CSV must already contain scalar_name and source_file columns.

Returns:

  • cohort_long (pandas.DataFrame) – Normalised long-format cohort dataframe.

  • modality (str) – Detected modality: 'nifti', 'mif', or 'cifti'.

Raises:

ValueError – If the cohort is empty after normalisation, if a source file has an unrecognised extension, or if the cohort contains mixed modalities.