API Documentation for tables_io¶
Stuff
TableDict interface¶
tables_io.TableDict¶
Functions to store analysis results as astropy data tables
- class tables_io.tableDict.TableDict[source]¶
Object to collect various types of table-like objects
This class is a dictionary mapping name to table-like and a few helper functions, e.g., to add new tables to the dictionary and to read and write files, either as FITS or HDF5 files.
- write(basepath, fmt=None)[source]¶
Write tables to the corresponding file type
- Parameters
basepath (str) – base path for output files. Suffix will be added based on type
fmt (str or None) – The output file format, If None this will use writeNative
- convert(tType)[source]¶
Build a new TableDict by converting all the table in the object to a different type
- Parameters
tType (int) – The type to convert to
- Returns
td – The new TableDict
- Return type
TableDict
- classmethod read(filepath, tType=None, fmt=None, keys=None)[source]¶
Read a file to the corresponding table type
- Parameters
filepath (str) – File to load
tType (int or None) – Table type, if None this will use readNative
fmt (str or None) – File format, if None it will be taken from the file extension
keys (list or None) – Keys to read for parquet files
- Returns
tableDict – The data
- Return type
TableDict
Tables types and File Formats¶
tables_io.types¶
Type defintions for tables_io
- tables_io.types.tableType(obj)[source]¶
Identify the type of table we have
- Parameters
obj (object) – The input object
- Returns
otype – The object type, one of TABULAR_FORMATS.keys()
- Return type
int
:raises TypeError : The object is not a supported type: :raises IndexError : One of the columns in a Mapping is the wrong length:
- tables_io.types.istablelike(obj)[source]¶
Test to see if an object is one of the supported table types
- Parameters
obj (object) – The input object
- Returns
tablelike – True is the object is Tablelike, False otherwise
- Return type
bool
- tables_io.types.istabledictlike(obj)[source]¶
Test to see if an object is a Mapping, (str, Tablelike)
- Parameters
obj (object) – The input object
- Returns
tabledict – True is the object is a Mapping, (str, Tablelike), False otherwise
- Return type
bool
- tables_io.types.fileType(filepath, fmt=None)[source]¶
Identify the type of file we have
- Parameters
filepath (str) – The path to the file
fmt (str or None) – Overrides the file extension
- Returns
otype – The object type, one of FILE_FORMATS.keys()
- Return type
int
:raises KeyError : The file format is not a support value:
Infrastructure and Core functionality¶
tables_io.arrayUtils¶
Array-related utility functions for tables_io
- tables_io.arrayUtils.arrayLength(arr)[source]¶
Get the length of an array
The works on scalars and arrays, so it is safe to use
For scalars it returns 0 For arrays it np.shape(arr)[0]
- Parameters
arr (array-like) – The input array
- Returns
length – The object length
- Return type
int
- tables_io.arrayUtils.forceToPandables(arr, check_nrow=None)[source]¶
Forces a numpy.array into a format that pandas can handle
- Parameters
arr (numpy.array) – The input array
check_nrow (int or None) – If not None, require that arr.shape[0] match this value
- Returns
out – Something that pandas can handle
- Return type
numpy.array or list of numpy.array
- tables_io.arrayUtils.getGroupInputDataLength(hg)[source]¶
Return the length of a HDF5 group
- Parameters
hg (h5py.Group or h5py.File) – The input data group
- Returns
length – The length of the data
- Return type
int
Notes
For a multi-D array this return the length of the first axis and not the total size of the array.
Normally that is what you want to be iterating over.
The group is meant to represent a table, hence all child datasets should be the same length
- tables_io.arrayUtils.printDictShape(in_dict)[source]¶
Print the shape of arrays in a dictionary. This is useful for debugging astropy.Table creation.
- Parameters
in_dict (dict) – The dictionary to print
- tables_io.arrayUtils.sliceDict(in_dict, subslice)[source]¶
Create a new dict by taking a slice of of every array in a dict
- Parameters
in_dict (dict) – The dictionary to extract from
subslice (int or slice) – Used to slice the arrays
- Returns
out_dict – The converted dicionary
- Return type
dict
tables_io.convUtils¶
IO Functions for tables_io
- tables_io.convUtils.dataFrameToTable(df)[source]¶
Convert a pandas.DataFrame to an astropy.table.Table
- Parameters
df (pandas.DataFrame) – The dataframe
- Returns
tab – The table
- Return type
astropy.table.Table
- tables_io.convUtils.convertTable(obj)[source]¶
Convert an object to an astropy.table.Table
- Parameters
obj (object) – The object being converted
- Returns
tab – The table
- Return type
astropy.table.Table
- tables_io.convUtils.tableToDict(tab)[source]¶
Convert an astropy.table.Table to an OrderedDict of str : numpy.array
- Parameters
tab (astropy.table.Table) – The table
Returnes –
-------- –
data (OrderedDict, (str : numpy.array)) – The tabledata
- tables_io.convUtils.dataFrameToDict(df)[source]¶
Convert a pandas.DataFrame to an OrderedDict of str : numpy.array
- Parameters
df (pandas.DataFrame) – The dataframe
Returnes –
-------- –
data (OrderedDict, (str : numpy.array)) – The tabledata
- tables_io.convUtils.hdf5GroupToDict(hg)[source]¶
Convert a hdf5 object to an OrderedDict, (str, numpy.array)
- Parameters
hg (h5py.File or h5py.Group) – The hdf5 object
Returnes –
-------- –
data (OrderedDict, (str : numpy.array)) – The tabledata
- tables_io.convUtils.convertDict(obj)[source]¶
Convert an object to an OrderedDict, (str, numpy.array)
- Parameters
obj (object) – The object being converted
- Returns
tab – The table
- Return type
astropy.table.Table
- tables_io.convUtils.tableToDataFrame(tab)[source]¶
Convert an astropy.table.Table to a pandas.DataFrame
- Parameters
tab (astropy.table.Table) – The table
- Returns
df – The dataframe
- Return type
pandas.DataFrame
- tables_io.convUtils.dictToDataFrame(odict, meta=None)[source]¶
Convert an OrderedDict, (str, numpy.array) to a pandas.DataFrame
- Parameters
odict (OrderedDict, (str, numpy.array)) – The dict
meta (dict or None) – Optional dictionary of metadata
- Returns
df – The dataframe
- Return type
pandas.DataFrame
- tables_io.convUtils.convertDataFrame(obj)[source]¶
Convert an object to a pandas.DataFrame
- Parameters
obj (object) – The object being converted
- Returns
df – The dataframe
- Return type
pandas.DataFrame
- tables_io.convUtils.convertObj(obj, tType)[source]¶
Convert an object to a specific type of Tablelike
- Parameters
obj (object) – The object being converted
tType (int) – The type of object to convert to, one of TABULAR_FORMAT_NAMES
- Returns
out – The converted object
- Return type
Tablelike
- tables_io.convUtils.convertTables(odict)[source]¶
Convert several objects to astropy.table.Table
- Parameters
odict (Mapping, (str, Tablelike)) – The input objects
- Returns
tabs – The tables
- Return type
OrderedDict of astropy.table.Table
- tables_io.convUtils.convertDicts(odict)[source]¶
Convert several objects to OrderedDict, (str, numpy.array)
- Parameters
odict (Mapping, (str, Tablelike)) – The input objects
- Returns
tabs – The tables
- Return type
OrderedDict of OrderedDict, (str, numpy.array)
tables_io.ioUtils¶
IO Functions for tables_io
- tables_io.ioUtils.readHdf5DatasetToArray(dataset, start=None, end=None)[source]¶
Reads part of a hdf5 dataset into a numpy.array
- Parameters
dataset (h5py.Dataset) – The input dataset
start (int or None) – Starting row
end (int or None) – Ending row
- Returns
out – Something that pandas can handle
- Return type
numpy.array or list of numpy.array
- tables_io.ioUtils.getInputDataLengthHdf5(filepath, groupname=None)[source]¶
Open an HDF5 file and return the size of a group
- Parameters
filepath (str) – Path to input file
groupname (str or None) – The groupname for the data
- Returns
length – The length of the data
- Return type
int
Notes
For a multi-D array this return the length of the first axis and not the total size of the array.
Normally that is what you want to be iterating over.
- tables_io.ioUtils.initializeHdf5Write(filepath, groupname=None, **kwds)[source]¶
Prepares an hdf5 file for output
- Parameters
filepath (str) – The output file name
groupname (str or None) – The output group name
- Returns
group (h5py.File or h5py.Group) – The group to write to
fout (h5py.File) – The output file
Notes
The keywords should be used to create_datasets within the hdf5 file. Each keyword should provide a tuple of ( (shape), (dtype) )
- shapetuple ( int )
The shape of the data for this dataset
- dtypestr
The data type for this dataset
For exmaple initialize_writeout(‘test.hdf5’, scalar=((100000,), ‘f4’), vect=((100000, 3), ‘f4’))
Would initialize an hdf5 file with two datasets, with shapes and data types as given
- tables_io.ioUtils.writeDictToHdf5Chunk(fout, odict, start, end, **kwds)[source]¶
Writes a data chunk to an hdf5 file
- Parameters
fout (h5py.File) – The file
odict (OrderedDict, (str, numpy.array)) – The data being written
start (int) – Starting row number to place the data
end (int) – Ending row number to place the data
Notes
The kwds can be used to control the output locations, i.e., to rename the columns in data_dict when they good into the output file.
For each item in data_dict, the output location is set as
k_out = kwds.get(key, key)
This will check the kwds to see if they contain key and if so, will return the corresponding value. Otherwise it will just return key.
I.e., if key is present in kwds in will override the name.
- tables_io.ioUtils.finalizeHdf5Write(fout, groupname=None, **kwds)[source]¶
Write any last data and closes an hdf5 file
- Parameters
fout (h5py.File) – The file
Notes
The keywords can be used to write additional data
- tables_io.ioUtils.iterHdf5ToDict(filepath, chunk_size=100000, groupname=None)[source]¶
iterator for sending chunks of data in hdf5.
- Parameters
filepath (input file name (str)) –
chunk_size (size of chunk to iterate over (int)) –
- Returns
output – iterator chunk
Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: dictionary of all data from start:end (dict)
- tables_io.ioUtils.iterH5ToDataFrame(filepath, chunk_size=100000, groupname=None)[source]¶
iterator for sending chunks of data in hdf5.
- Parameters
filepath (input file name (str)) –
chunk_size (size of chunk to iterate over (int)) –
- Returns
output – iterator chunk
Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: pandas.DataFrame of all data from start:end (dict)
- tables_io.ioUtils.iterPqToDataFrame(filepath)[source]¶
iterator for sending chunks of data in parquet
- Parameters
filepath (input file name (str)) –
- Returns
output – iterator chunk
Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: pandas.DataFrame of all data from start:end (dict)
- tables_io.ioUtils.writeTablesToFits(tables, filepath, **kwargs)[source]¶
Writes a dictionary of astropy.table.Table to a single FITS file
- Parameters
tables (dict of astropy.table.Table) – Keys will be HDU names, values will be tables
filepath (str) – Path to output file
call. (kwargs are passed to astropy.io.fits.writeto) –
- tables_io.ioUtils.readFitsToTables(filepath)[source]¶
Reads astropy.table.Table objects from a FITS file.
- Parameters
filepath (str) – Path to input file
- Returns
tables – Keys will be HDU names, values will be tables
- Return type
OrderedDict of astropy.table.Table
- tables_io.ioUtils.writeTablesToHdf5(tables, filepath, **kwargs)[source]¶
Writes a dictionary of astropy.table.Table to a single hdf5 file
- Parameters
tables (dict of astropy.table.Table) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file
call. (kwargs are passed to astropy.table.Table) –
- tables_io.ioUtils.readHdf5ToTables(filepath)[source]¶
Reads astropy.table.Table objects from an hdf5 file.
- Parameters
filepath (str) –
Path to input file
Returns
------- –
tables (OrderedDict of astropy.table.Table) – Keys will be ‘paths’, values will be tables
- tables_io.ioUtils.readHdf5Group(filepath, groupname=None)[source]¶
Read and return group from an hdf5 file.
- Parameters
filepath (str) – File in question
groupname (str or None) – For hdf5 files, the groupname for the data
- Returns
grp (h5py.Group or h5py.File) – The requested group
infp (h5py.File) – The input file (returned so that the used can explicitly close the file)
- tables_io.ioUtils.readHdf5GroupToDict(hg, start=None, end=None)[source]¶
Reads numpy.array objects from an hdf5 file.
- Parameters
filepath (str) – Path to input file
- Returns
tables – Keys will be ‘paths’, values will be tables
- Return type
OrderedDict of numpy.array
- tables_io.ioUtils.writeDictToHdf5(odict, filepath, groupname, **kwargs)[source]¶
Writes a dictionary of numpy.array to a single hdf5 file
- Parameters
odict (Mapping, (str, numpy.array)) – The data being written
filepath (str) – Path to output file
groupname (str or None) – The groupname for the data
- tables_io.ioUtils.writeDictsToHdf5(odicts, filepath)[source]¶
Writes a dictionary of numpy.array to a single hdf5 file
- Parameters
odicts (OrderedDict, (str, Tablelike)) – The data being written
filepath (str) – Path to output file
- tables_io.ioUtils.readHdf5ToDicts(filepath)[source]¶
Reads numpy.array objects from an hdf5 file.
- Parameters
filepath (str) – Path to input file
- Returns
dicts – The data
- Return type
OrderedDict, (str, OrderedDict, (str, numpy.array) )
- tables_io.ioUtils.readHdf5ToDataFrame(filepath, key=None)[source]¶
Reads pandas.DataFrame objects from an hdf5 file.
- Parameters
filepath (str) – Path to input file
key (str or None) – The key in the hdf5 file
- Returns
df – The dataframe
- Return type
pandas.DataFrame
- tables_io.ioUtils.readH5ToDataFrames(filepath)[source]¶
Open an h5 file and and return a dictionary of pandas.DataFrame
- Parameters
filepath (str) – Path to input file
- Returns
tab – The data
- Return type
OrderedDict (str : pandas.DataFrame)
Notes
We are using the file suffix ‘h5’ to specify ‘hdf5’ files written from DataFrames using pandas They have a different structure than ‘hdf5’ files written with h5py or astropy.table
- tables_io.ioUtils.writeDataFramesToH5(dataFrames, filepath)[source]¶
Writes a dictionary of pandas.DataFrame to a single hdf5 file
- Parameters
dataFrames (dict of pandas.DataFrame) – Keys will be passed to ‘key’ parameter
filepath (str) – Path to output file
- tables_io.ioUtils.readPqToDataFrame(filepath)[source]¶
Reads a pandas.DataFrame object from an parquet file.
- Parameters
filepath (str) – Path to input file
- Returns
df – The data frame
- Return type
pandas.DataFrame
- tables_io.ioUtils.writeDataFramesToPq(dataFrames, filepath, **kwargs)[source]¶
Writes a dictionary of pandas.DataFrame to a parquet files
- Parameters
tables (dict of pandas.DataFrame) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file
- tables_io.ioUtils.readPqToDataFrames(basepath, keys=None)[source]¶
Reads pandas.DataFrame objects from an parquet file.
- Parameters
basepath (str) – Path to input file
keys (list) – Keys for the input objects. Used to complete filepaths
- Returns
tables – Keys will be taken from keys
- Return type
OrderedDict of pandas.DataFrame
- tables_io.ioUtils.readPqToDict(filepath, columns=None)[source]¶
Open a parquet file and return a dictionary of numpy.array
- Parameters
filepath (str) – Path to input file
columns (list (str) or None) – Names of the columns to read, None will read all the columns
- Returns
tab – The data
- Return type
OrderedDict (str : numpy.array)
- tables_io.ioUtils.readH5ToDict(filepath, groupname=None)[source]¶
Open an h5 file and and return a dictionary of numpy.array
- Parameters
filepath (str) – Path to input file
groupname (str or None) – The group with the data
- Returns
tab – The data
- Return type
OrderedDict (str : numpy.array)
Notes
We are using the file suffix ‘h5’ to specify ‘hdf5’ files written from DataFrames using pandas They have a different structure than ‘hdf5’ files written with h5py or astropy.table
- tables_io.ioUtils.readHdf5ToDict(filepath, groupname=None)[source]¶
Read in h5py hdf5 data, return a dictionary of all of the keys
- Parameters
filepath (str) – Path to input file
groupname (str or None) – The groupname for the data
- Returns
tab – The data
- Return type
OrderedDict (str : numpy.array)
Notes
We are using the file suffix ‘hdf5’ to specify ‘hdf5’ files written with h5py or astropy.table They have a different structure than ‘h5’ files written panda
- tables_io.ioUtils.readNative(filepath, fmt=None, keys=None)[source]¶
Read a file to the corresponding table type
- Parameters
filepath (str) – File to load
fmt (str or None) – File format, if None it will be taken from the file extension
groupname (str or None) – For hdf5 files, the groupname for the data
- Returns
data – The data
- Return type
OrderedDict ( str -> Tablelike )
- tables_io.ioUtils.read(filepath, tType=None, fmt=None, keys=None)[source]¶
Read a file to the corresponding table type
- Parameters
filepath (str) – File to load
tType (int or None) – Table type, if None this will use readNative
fmt (str or None) – File format, if None it will be taken from the file extension
groupname (str or None) – For hdf5 files, the groupname for the data
- Returns
data – The data
- Return type
OrderedDict ( str -> Tablelike )
- tables_io.ioUtils.iterateNative(filepath, fmt=None, **kwargs)[source]¶
Read a file to the corresponding table type and iterate over the file
- Parameters
filepath (str) – File to load
fmt (str or None) – File format, if None it will be taken from the file extension
- Returns
data – The data
- Return type
TableLike
Notes
The kwargs are used passed to the specific iterator type
- tables_io.ioUtils.iterate(filepath, tType=None, fmt=None, **kwargs)[source]¶
Read a file to the corresponding table type iterate over the file
- Parameters
filepath (str) – File to load
tType (int or None) – Table type, if None this will use readNative
fmt (str or None) – File format, if None it will be taken from the file extension
groupname (str or None) – For hdf5 files, the groupname for the data
- Returns
data – The data
- Return type
OrderedDict ( str -> Tablelike )
- tables_io.ioUtils.writeNative(odict, basename)[source]¶
Write a file or files with tables
- Parameters
odict (OrderedDict, (str, Tablelike)) – The data to write
basename (str) – Basename for the file to write. The suffix will be applied based on the object type.
- tables_io.ioUtils.write(obj, basename, fmt=None)[source]¶
Write a file or files with tables
- Parameters
obj (Tablelike or TableDictLike) – The data to write
basename (str) – Basename for the file to write. The suffix will be applied based on the object type.
fmt (str or None) – The output file format, If None this will use writeNative
tables_io.testUtils¶
Utilities for testing
- tables_io.testUtils.compare_tables(t1, t2)[source]¶
Compare all the tables in two astropy.table.Table)
- Parameters
t1 (astropy.table.Table) – One table
t2 (astropy.table.Table) – Another tables
- Returns
identical – True if the tables are identical, False otherwise
- Return type
bool
Notes
For now this explicitly flattens each of the columns, to avoid issues with shape
- tables_io.testUtils.compare_table_dicts(d1, d2, strict=False)[source]¶
Compare all the tables in two OrderedDict, (str, astropy.table.Table)
- Parameters
d1 (OrderedDict, (str, astropy.table.Table)) – One dictionary of tables
d2 (OrderedDict, (str, astropy.table.Table)) – Another dictionary of tables
- Returns
identical – True if all the tables are identical, False otherwise
- Return type
bool