tables_io.io_utils.write

Io write functions for tables_io

Functions

`write`(→ Optional[str])	Writes Table-like or TableDict-like objects to a file or files. If the format (fmt) is given,
`write_native`(→ Optional[str])	Writes Table-like or TableDict-like objects to a file or files. The file type will be determined
`initialize_HDF5_write_single`(filepath[, groupname, comm])	Prepares an HDF5 file for output, where the file will be have datasets in only one group.
`initialize_HDF5_write`(filepath[, comm])	Prepares an HDF5 file for output, where the file will be split up into one or more groups.
`write_dict_to_HDF5_chunk_single`(fout, odict, start, ...)	Writes a data chunk from a Table-like object to an hdf5 file
`write_dict_to_HDF5_chunk`(groups, odict, start, end, **kwds)	Writes a data chunk from an OrderedDict or TableDict-like object to an hdf5 file in groups.
`finalize_HDF5_write`(fout[, groupname])	Writes any last data given as keyword arguments, and closes an hdf5 file.
`write_ap_tables_to_fits`(tables, filepath, **kwargs)	Writes a dictionary of astropy.table.Table to a single FITS file
`write_ap_tables_to_HDF5`(tables, filepath, **kwargs)	Writes a dictionary of astropy.table.Table to a single hdf5 file
`write_recarrays_to_fits`(recarrays, filepath, **kwargs)	Writes a dictionary of np.recarray to a single FITS file
`write_dict_to_HDF5`(odict, filepath, groupname, **kwargs)	Writes a dictionary of numpy.array or jaxlib.xla_extension.DeviceArray
`write_dicts_to_HDF5`(odicts, filepath)	Writes a TableDict-like object, a OrderedDict of dictionaries of numpy.array, to a single hdf5 file.
`write_dataframes_to_HDF5`(dataFrames, filepath)	Writes a dictionary of pandas.DataFrame to a single hdf5 file
`write_dataframes_to_pq`(dataFrames, filepath, **kwargs)	Writes a dictionary of pandas.DataFrame to parquet files
`write_dataframes_to_csv`(dataFrames, filepath, **kwargs)	Writes a dictionary of pandas.DataFrame to csv files
`write_table_to_HDF5`(table, filepath, key)	Writes a pyarrow.Table to a single hdf5 file
`write_tables_to_HDF5`(tables, filepath)	Writes a dictionary of pyarrow.Table to a single hdf5 file
`write_tables_to_pq`(tables, filepath, **kwargs)	Writes a dictionary of pyarrow.Table to parquet files. If no extension is
`write_json`(tables, filepath, **kwargs)	Writes a json string to a json file. If no extension is

Module Contents

write(obj, filepath: str, fmt: str | None = None) → str | None[source]

Writes Table-like or TableDict-like objects to a file or files. If the format (fmt) is given, or the file has a suffix, the function will convert the given data to the associated tabular type, and then write out the file as the requested type. If no file type is requested, the function will use write_native() to write the file to the default file type for the tabular type.

Note: This function will automatically overwrite any previously existing files at the given filepath.

Parameters:

obj (Table-like or TableDict-like) – The data to write
filepath (str) – File name for the file to write. If there’s no suffix, it will be applied based on the object type.
fmt (str or None) – The output file format, If None and the file path provided does not have a suffix, this will use write_native to write out the default file type for the given tabular type.

Returns:

filepath – Returns the path to the new file, or None if there was no data given.

Return type:

str or None

Example

>>> import tables_io
>>> import pandas as pd
>>> tab = pd.DataFrame({'col1': [2,4,6], 'col2': [5,7,9]})
>>> tables_io.write(tab, 'data','h5') # tells the function to write it to PANDAS_HDF5
'data.h5'
>>> tables_io.write(tab, 'data.h5') # does the same thing as the line above
'data.h5'

write_native(odict, filepath: str) → str | None[source]

Writes Table-like or TableDict-like objects to a file or files. The file type will be determined by the default file type given the tabular format. The supported file types are: astropyHDF5 (“.hf5”), numpyHDF5 (“.hdf5”), numpyFits (“.fit”), pandasParquet (“.parq”), pyarrowParquet (“.parquet”).

To write to a specific file format, use write() instead.

Note: This function will automatically overwrite any previously existing files at the given filepath.

Parameters:

odict (Table-like or TableDict-like) – The data to write
filepath (str) – File name for the file to write. If there’s no suffix, it will be applied based on the object type.

Returns:

filepath – Returns the path to the new file, or None if there was no data given.

Return type:

str or None

Example

>>> import tables_io
>>> from astropy.table import Table
>>> tab = Table([[1,3,5],[10,8,6]], names=('c1','c2'))
>>> tables_io.write(tab, 'data') # writes the file to ASTROPY_HDF5 by default
'data.hf5'

initialize_HDF5_write_single(filepath: str, groupname: str | None = None, comm=None, **kwds)[source]

Prepares an HDF5 file for output, where the file will be have datasets in only one group. The keywords (**kwds) argument(s) are required. They provide the data structure of the file. The name of each keyword argument provides the name of the dataset, and the value of the argument should be a dictionary with the dataset information (see below for details).

The function will run in series if no MPI communicator (comm) is provided. To write the file in parallel, the MPI communicator argument is required.

Parameters:

filepath (str) – The output file name
groupname (str or None) – The output group name
comm (communicator) – MPI communicator to do parallel writing
**kwds (one or more dictionaries) –
Each keyword should provide a tuple of ( (shape), (dtype) )

shapetuple ( int )
The shape of the data for this dataset

dtypestr
The data type for this dataset

Returns:

group (h5py.File or h5py.Group) – The group to write to.
fout (h5py.File) – The output file

Example

To initialize an HDF5 file with two datasets with different shapes:

>>> from tables_io import hdf5
>>> data = dict(scalar=((100000,), 'f4'), vect=((100000, 3), 'f4')
>>> group, fout = hdf5.initialize_HDF5_write_single('test.hdf5',data=data)
>>> print(group.name))
'/data'

To do the same in parallel with MPI using mpi4py:

>>> from tables_io import hdf5
>>> from mpi4py import MPI
>>> data = dict(scalar=((100000,), 'f4'), vect=((100000, 3), 'f4')
>>> fout = hdf5.initialize_HDF5_write_single('test.hdf5',comm=MPI.COMM_WORLD, data=data)

initialize_HDF5_write(filepath: str, comm=None, **kwds)[source]

Prepares an HDF5 file for output, where the file will be split up into one or more groups. The keywords (**kwds) argument(s) are required. They provide the data structure of the file. The name of each keyword argument provides the group name, and the value of the argument should be a dictionary with dataset name and information (see below for details).

The function will run in series if no MPI communicator (comm) is provided. To write the file in parallel, the MPI communicator argument is required.

Parameters:

filepath (str) – The output file name
comm (communicator) – MPI communicator to do parallel writing
kwds (one or more dict arguments) –
Each keyword should provide a dictionary with the group name and data set information of the form: group = {'data1' : ( (shape1), (dtype1) ), 'data2' : ( (shape2), (dtype2) )}

groupstr
Name of the Hdf5 group

datastr
Name of the column to be written

shapetuple ( int )
The shape of the data for this dataset

dtypestr
The data type for this dataset

Returns:

group (dict of h5py.File or h5py.Group) – A dictionary of the groups to write to.
fout (h5py.File) – The output file

Example

To initialize an HDF5 file with two groups named group1 and group2:

>>> from tables_io import hdf5
>>> group1 = {'data1' : ((10,), 'f8'), 'data2': ((50,2), 'f8')}
>>> group2 = {'data3': ((20,20), 'f8)}
>>> groups, fout = hdf5.initializeHdf5Write('test.hdf5', group1=group1, group2=group2)

To do the same in parallel with MPI using mpi4py:

>>> from tables_io import hdf5
>>> from mpi4py import MPI
>>> group1 = {'data1' : ((10,), 'f8'), 'data2': ((50,2), 'f8')}
>>> group2 = {'data3': ((20,20), 'f8)}
>>> fout = hdf5.initialize_HDF5_write('test.hdf5',comm=MPI.COMM_WORLD, group1=group1, group2=group2)

write_dict_to_HDF5_chunk_single(fout, odict: Mapping, start: int, end: int, **kwds)[source]

Writes a data chunk from a Table-like object to an hdf5 file

Parameters:

fout (h5py.File) – The file
odict (OrderedDict, (str, numpy.array)) – The data being written
start (int) – Starting row number to place the data
end (int) – Ending row number to place the data

Notes

The kwds can be used to control the output locations, i.e., to rename the columns in data_dict when they go into the output file.

For each item in data_dict, the output location is set as

k_out = kwds.get(key, key)

This will check the kwds to see if they contain key and if so, will return the corresponding value. Otherwise it will just return key.

I.e., if key is present in kwds in will override the name.

write_dict_to_HDF5_chunk(groups, odict: Mapping, start: int, end: int, **kwds)[source]

Writes a data chunk from an OrderedDict or TableDict-like object to an hdf5 file in groups.

Parameters:

groups (h5py.Group) – The h5py groups or file object (which is also a group object)
odict (OrderedDict, (str, OrderedDict`(`str, numpy.array))) – The data being written
start (int) – Starting row number to place the data
end (int) – Ending row number to place the data

Notes

The kwds can be used to control the output locations, i.e., to rename the columns in the input data when they go into the output file. The format of kwds should be old_key = new_key, where old_key is the key to be replaced by new_key.

For each item in the input data, the output location is set as

k_out = kwds.get(key, key)

This will check the kwds to see if they contain key and if so, will return the corresponding value. Otherwise it will just return key.

I.e., if key is present in kwds it will override the name.

finalize_HDF5_write(fout, groupname: str | None = None, **kwds)[source]

Writes any last data given as keyword arguments, and closes an hdf5 file. If groupname is given, will create a group with that name before writing the data. If not, no new group will be created.

Parameters:

fout (h5py.File) – The file
groupname (None or str) – The name to give the group. If None, no group will be created.

Notes

The keywords can be used to write additional data, where key is the name of the dataset and value should be the dataset to write.

write_ap_tables_to_fits(tables: Mapping, filepath: str, **kwargs)[source]

Writes a dictionary of astropy.table.Table to a single FITS file

Parameters:

tables (dict of astropy.table.Table) – Keys will be HDU names, values will be tables
filepath (str) – Path to output file
kwargs – kwargs are passed to astropy.io.fits.writeto call.

write_ap_tables_to_HDF5(tables: Mapping, filepath: str, **kwargs)[source]

Writes a dictionary of astropy.table.Table to a single hdf5 file

Parameters:

tables (dict of astropy.table.Table) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file
kwargs – kwargs are passed to astropy.table.Table call.

write_recarrays_to_fits(recarrays: Mapping, filepath: str, **kwargs)[source]

Writes a dictionary of np.recarray to a single FITS file

Parameters:

recarrays (dict of np.recarray) – Keys will be HDU names, values will be tables
filepath (str) – Path to output file
kwargs – kwargs are passed to astropy.io.fits.writeto call.

write_dict_to_HDF5(odict: Mapping, filepath: str, groupname: str | None, **kwargs)[source]

Writes a dictionary of numpy.array or jaxlib.xla_extension.DeviceArray to a single hdf5 file

Parameters:

odict (Mapping, (str, numpy.array or jaxlib.xla_extension.DeviceArray)) – The data being written
filepath (str) – Path to output file
groupname (str or None) – The groupname for the data

write_dicts_to_HDF5(odicts: Mapping, filepath: str)[source]

Writes a TableDict-like object, a OrderedDict of dictionaries of numpy.array, to a single hdf5 file.

Note: This will remove any previously existing files at the filepath.

Parameters:

odicts (OrderedDict, (str, Table-like)) – The data being written
filepath (str) – Path to output file

write_dataframes_to_HDF5(dataFrames: Mapping, filepath: str)[source]

Writes a dictionary of pandas.DataFrame to a single hdf5 file

Parameters:

dataFrames (dict of pandas.DataFrame) – Keys will be passed to ‘key’ parameter
filepath (str) – Path to output file

write_dataframes_to_pq(dataFrames: Mapping, filepath: str, **kwargs)[source]

Writes a dictionary of pandas.DataFrame to parquet files

Parameters:

tables (dict of pandas.DataFrame) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file

write_dataframes_to_csv(dataFrames: Mapping, filepath: str, **kwargs)[source]

Writes a dictionary of pandas.DataFrame to csv files

Parameters:

tables (dict of pandas.DataFrame) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file

write_table_to_HDF5(table, filepath: str, key: str)[source]

Writes a pyarrow.Table to a single hdf5 file

Parameters:

table (dict of pyarrow.Table) – Keys will be passed to ‘key’ parameter
filepath (str) – Path to output file
key (str) – The hdf5 groupname

write_tables_to_HDF5(tables: Mapping, filepath: str)[source]

Writes a dictionary of pyarrow.Table to a single hdf5 file

Parameters:

tables (dict of pyarrow.Table) – Keys will be passed to ‘key’ parameter
filepath (str) – Path to output file

write_tables_to_pq(tables: Mapping, filepath: str, **kwargs)[source]

Writes a dictionary of pyarrow.Table to parquet files. If no extension is given in the base path, it will be written as a .parq file.

Parameters:

tables (dict of pyarrow.Table) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file

write_json(tables: Mapping, filepath: str, **kwargs)[source]

Writes a json string to a json file. If no extension is given in the base path, it will be written as a .json file.

Parameters:

tables (dict of str) – Dict with the json strings of each table
filepath (str) – Path to output file