modelarrayio.cli.nifti_to_h5.nifti_to_h5

modelarrayio.cli.nifti_to_h5.nifti_to_h5(group_mask_file, cohort_long, backend='hdf5', output=PosixPath('voxelarray.h5'), storage_dtype='float32', compression='gzip', compression_level=4, shuffle=True, chunk_voxels=0, target_chunk_mb=2.0, workers=1, s3_workers=1, split_outputs=False)[source]

Load all volume data and write to an HDF5 or TileDB file.

Parameters:
  • group_mask_file (str) – Path to a NIfTI-1 binary group mask file.

  • cohort_long (pandas.DataFrame) – Normalised long-format cohort dataframe (from load_and_normalize_cohort()).

  • backend (str) – Storage backend ('hdf5' or 'tiledb').

  • output (pathlib.Path) – Output path. For the hdf5 backend, path to an .h5 file; for the tiledb backend, path to a .tdb directory.

  • storage_dtype (str) – Floating type to store values. Options: 'float32' (default), 'float64'.

  • compression (str) – Compression filter. gzip works for both backends; lzf is HDF5-only; zstd is TileDB-only.

  • compression_level (int) – Compression level (codec-dependent). Default 4.

  • shuffle (bool) – Enable shuffle filter. Default True.

  • chunk_voxels (int) – Chunk/tile size along the voxel axis. If 0, auto-compute. Default 0.

  • target_chunk_mb (float) – Target chunk/tile size in MiB when auto-computing. Default 2.0.

  • workers (int) – Maximum number of parallel TileDB write workers. Default 1. Has no effect when backend='hdf5'.

  • s3_workers (int) – Number of parallel workers for S3 downloads. Default 1.

  • split_outputs (bool) – If True, write one output file per scalar. Default False.