modelarrayio.utils.nifti.load_cohort_voxels

modelarrayio.utils.nifti.load_cohort_voxels(cohort_long, group_mask_matrix, s3_workers)[source]

Load all voxel rows from the cohort, optionally in parallel.

When s3_workers > 1, a ThreadPoolExecutor is used. Threads share memory so group_mask_matrix is accessed directly with no copying overhead. Results arrive via as_completed and are indexed by (scalar_name, subj_idx) so the final ordered lists are reconstructed correctly regardless of completion order.

Parameters:
  • cohort_long (pandas.DataFrame) – Long-format cohort dataframe with columns ‘scalar_name’, ‘source_file’, and ‘source_mask_file’.

  • group_mask_matrix (numpy.ndarray) – Boolean group mask array.

  • s3_workers (int) – Number of parallel workers for loading.

Returns:

  • scalars (dict[str, list[np.ndarray]]) – Per-scalar ordered list of 1-D subject arrays, ready for stripe-write.

  • sources_lists (dict[str, list[str]]) – Per-scalar ordered list of source file paths (for HDF5 metadata).