modelarrayio.utils.mif.load_cohort_mif

modelarrayio.utils.mif.load_cohort_mif(cohort_long, s3_workers)[source]

Load all MIF scalar rows from the cohort, optionally in parallel.

When s3_workers > 1, a ThreadPoolExecutor is used to run mrconvert calls concurrently (subprocess calls release the GIL). Results arrive via as_completed and are indexed by (scalar_name, subj_idx) so the final ordered lists are reconstructed correctly regardless of completion order.

Parameters:
  • cohort_long (pandas.DataFrame) – Long-format cohort dataframe with columns ‘scalar_name’ and ‘source_file’.

  • s3_workers (int) – Number of parallel workers for loading.

Returns:

  • scalars (dict[str, list[np.ndarray]]) – Per-scalar ordered list of 1-D subject arrays, ready for stripe-write.

  • sources_lists (dict[str, list[str]]) – Per-scalar ordered list of source file paths (for HDF5 metadata).