modelarrayio.utils.misc.load_and_normalize_cohort

modelarrayio.utils.misc.load_and_normalize_cohort(cohort_file, scalar_columns=None) → tuple[DataFrame, str][source]

Load a cohort CSV, normalise it, and detect the neuroimaging modality.

This is the single entry-point for cohort ingestion shared by all *_to_h5 converters. It performs, in order:

pd.read_csv the file.
cohort_to_long_dataframe to normalise wide/long format.
Empty-cohort validation.
Modality detection from every unique source_file extension.
Mixed-modality validation (all rows must be the same modality).

Parameters:

cohort_file (path-like) – Path to the cohort CSV file.
scalar_columns (list of str, optional) – Column names for wide-format cohort files. If omitted the CSV must already contain scalar_name and source_file columns.

Returns:

cohort_long (pandas.DataFrame) – Normalised long-format cohort dataframe.
modality (str) – Detected modality: 'nifti', 'mif', or 'cifti'.

Raises:

ValueError – If the cohort is empty after normalisation, if a source file has an unrecognised extension, or if the cohort contains mixed modalities.