API Documentation for tables_io¶

Stuff

TableDict interface¶

tables_io.TableDict¶

Functions to store analysis results as astropy data tables

class tables_io.tableDict.TableDict[source]¶

Object to collect various types of table-like objects

This class is a dictionary mapping name to table-like and a few helper functions, e.g., to add new tables to the dictionary and to read and write files, either as FITS or HDF5 files.

write(basepath, fmt=None)[source]¶

Write tables to the corresponding file type

Parameters

basepath (str) – base path for output files. Suffix will be added based on type
fmt (str or None) – The output file format, If None this will use writeNative

convert(tType)[source]¶

Build a new TableDict by converting all the table in the object to a different type

Parameters: tType (int) – The type to convert to
Returns: td – The new TableDict
Return type: TableDict

classmethod read(filepath, tType=None, fmt=None, keys=None)[source]¶

Read a file to the corresponding table type

Parameters

filepath (str) – File to load
tType (int or None) – Table type, if None this will use readNative
fmt (str or None) – File format, if None it will be taken from the file extension
keys (list or None) – Keys to read for parquet files

Returns

tableDict – The data

Return type

TableDict

Tables types and File Formats¶

tables_io.types¶

Type defintions for tables_io

tables_io.types.tableType(obj)[source]¶

Identify the type of table we have

Parameters: obj (object) – The input object
Returns: otype – The object type, one of TABULAR_FORMATS.keys()
Return type: int

:raises TypeError : The object is not a supported type: :raises IndexError : One of the columns in a Mapping is the wrong length:

tables_io.types.istablelike(obj)[source]¶

Test to see if an object is one of the supported table types

Parameters: obj (object) – The input object
Returns: tablelike – True is the object is Tablelike, False otherwise
Return type: bool

tables_io.types.istabledictlike(obj)[source]¶

Test to see if an object is a Mapping, (str, Tablelike)

Parameters: obj (object) – The input object
Returns: tabledict – True is the object is a Mapping, (str, Tablelike), False otherwise
Return type: bool

tables_io.types.fileType(filepath, fmt=None)[source]¶

Identify the type of file we have

Parameters

filepath (str) – The path to the file
fmt (str or None) – Overrides the file extension

Returns

otype – The object type, one of FILE_FORMATS.keys()

Return type

int

:raises KeyError : The file format is not a support value:

Infrastructure and Core functionality¶

tables_io.arrayUtils¶

Array-related utility functions for tables_io

tables_io.arrayUtils.arrayLength(arr)[source]¶

Get the length of an array

The works on scalars and arrays, so it is safe to use

For scalars it returns 0 For arrays it np.shape(arr)[0]

Parameters: arr (array-like) – The input array
Returns: length – The object length
Return type: int

tables_io.arrayUtils.forceToPandables(arr, check_nrow=None)[source]¶

Forces a numpy.array into a format that pandas can handle

Parameters

arr (numpy.array) – The input array
check_nrow (int or None) – If not None, require that arr.shape[0] match this value

Returns

out – Something that pandas can handle

Return type

numpy.array or list of numpy.array

tables_io.arrayUtils.getGroupInputDataLength(hg)[source]¶

Return the length of a HDF5 group

Parameters: hg (h5py.Group or h5py.File) – The input data group
Returns: length – The length of the data
Return type: int

Notes

For a multi-D array this return the length of the first axis and not the total size of the array.

Normally that is what you want to be iterating over.

The group is meant to represent a table, hence all child datasets should be the same length

tables_io.arrayUtils.printDictShape(in_dict)[source]¶

Print the shape of arrays in a dictionary. This is useful for debugging astropy.Table creation.

Parameters: in_dict (dict) – The dictionary to print

tables_io.arrayUtils.sliceDict(in_dict, subslice)[source]¶

Create a new dict by taking a slice of of every array in a dict

Parameters

in_dict (dict) – The dictionary to extract from
subslice (int or slice) – Used to slice the arrays

Returns

out_dict – The converted dicionary

Return type

dict

tables_io.arrayUtils.checkKeys(in_dicts)[source]¶

Check that the keys in all the in_dicts match

Parameters

in_dicts (list, (OrderedDict, (str, numpy.array))) – The dictionaries for which compare keys
match. (Raises KeyError if one does not) –

tables_io.arrayUtils.concatenateDicts(in_dicts)[source]¶

Create a new dict by concatenating each array in in_dicts

Parameters: in_dicts (list, (OrderedDict, (str, numpy.array))) – The dictionaries to stack
Returns: out_dict – The stacked dicionary
Return type: dict

tables_io.convUtils¶

IO Functions for tables_io

tables_io.convUtils.dataFrameToTable(df)[source]¶

Convert a pandas.DataFrame to an astropy.table.Table

Parameters: df (pandas.DataFrame) – The dataframe
Returns: tab – The table
Return type: astropy.table.Table

tables_io.convUtils.convertTable(obj)[source]¶

Convert an object to an astropy.table.Table

Parameters: obj (object) – The object being converted
Returns: tab – The table
Return type: astropy.table.Table

tables_io.convUtils.tableToDict(tab)[source]¶

Convert an astropy.table.Table to an OrderedDict of str : numpy.array

Parameters

tab (astropy.table.Table) – The table
Returnes –
-------- –
data (OrderedDict, (str : numpy.array)) – The tabledata

tables_io.convUtils.dataFrameToDict(df)[source]¶

Convert a pandas.DataFrame to an OrderedDict of str : numpy.array

Parameters

df (pandas.DataFrame) – The dataframe
Returnes –
-------- –
data (OrderedDict, (str : numpy.array)) – The tabledata

tables_io.convUtils.hdf5GroupToDict(hg)[source]¶

Convert a hdf5 object to an OrderedDict, (str, numpy.array)

Parameters

hg (h5py.File or h5py.Group) – The hdf5 object
Returnes –
-------- –
data (OrderedDict, (str : numpy.array)) – The tabledata

tables_io.convUtils.convertDict(obj)[source]¶

Convert an object to an OrderedDict, (str, numpy.array)

Parameters: obj (object) – The object being converted
Returns: tab – The table
Return type: astropy.table.Table

tables_io.convUtils.tableToDataFrame(tab)[source]¶

Convert an astropy.table.Table to a pandas.DataFrame

Parameters: tab (astropy.table.Table) – The table
Returns: df – The dataframe
Return type: pandas.DataFrame

tables_io.convUtils.dictToDataFrame(odict, meta=None)[source]¶

Convert an OrderedDict, (str, numpy.array) to a pandas.DataFrame

Parameters

odict (OrderedDict, (str, numpy.array)) – The dict
meta (dict or None) – Optional dictionary of metadata

Returns

df – The dataframe

Return type

pandas.DataFrame

tables_io.convUtils.convertDataFrame(obj)[source]¶

Convert an object to a pandas.DataFrame

Parameters: obj (object) – The object being converted
Returns: df – The dataframe
Return type: pandas.DataFrame

tables_io.convUtils.convertObj(obj, tType)[source]¶

Convert an object to a specific type of Tablelike

Parameters

obj (object) – The object being converted
tType (int) – The type of object to convert to, one of TABULAR_FORMAT_NAMES

Returns

out – The converted object

Return type

Tablelike

tables_io.convUtils.convertTables(odict)[source]¶

Convert several objects to astropy.table.Table

Parameters: odict (Mapping, (str, Tablelike)) – The input objects
Returns: tabs – The tables
Return type: OrderedDict of astropy.table.Table

tables_io.convUtils.convertDicts(odict)[source]¶

Convert several objects to OrderedDict, (str, numpy.array)

Parameters: odict (Mapping, (str, Tablelike)) – The input objects
Returns: tabs – The tables
Return type: OrderedDict of OrderedDict, (str, numpy.array)

tables_io.convUtils.convertDataFrames(odict)[source]¶

Convert several objects to pandas.DataFrame

Parameters: odict (Mapping, (str, Tablelike)) – The input objects
Returns: df – The dataframes
Return type: OrderedDict of pandas.DataFrame

tables_io.convUtils.convert(obj, tType)[source]¶

Convert several objects to a specific type

Parameters

obj (‘Tablelike` or TableDictlike) – The input object
tType (int) – One of TABULAR_FORMAT_NAMES.keys()

Returns

out – The converted data

Return type

Tablelike or TableDictlike

tables_io.ioUtils¶

IO Functions for tables_io

tables_io.ioUtils.readHdf5DatasetToArray(dataset, start=None, end=None)[source]¶

Reads part of a hdf5 dataset into a numpy.array

Parameters

dataset (h5py.Dataset) – The input dataset
start (int or None) – Starting row
end (int or None) – Ending row

Returns

out – Something that pandas can handle

Return type

numpy.array or list of numpy.array

tables_io.ioUtils.getInputDataLengthHdf5(filepath, groupname=None)[source]¶

Open an HDF5 file and return the size of a group

Parameters

filepath (str) – Path to input file
groupname (str or None) – The groupname for the data

Returns

length – The length of the data

Return type

int

Notes

For a multi-D array this return the length of the first axis and not the total size of the array.

Normally that is what you want to be iterating over.

tables_io.ioUtils.initializeHdf5Write(filepath, groupname=None, **kwds)[source]¶

Prepares an hdf5 file for output

Parameters

filepath (str) – The output file name
groupname (str or None) – The output group name

Returns

group (h5py.File or h5py.Group) – The group to write to
fout (h5py.File) – The output file

Notes

The keywords should be used to create_datasets within the hdf5 file. Each keyword should provide a tuple of ( (shape), (dtype) )

shapetuple ( int ): The shape of the data for this dataset
dtypestr: The data type for this dataset

For exmaple initialize_writeout(‘test.hdf5’, scalar=((100000,), ‘f4’), vect=((100000, 3), ‘f4’))

Would initialize an hdf5 file with two datasets, with shapes and data types as given

tables_io.ioUtils.writeDictToHdf5Chunk(fout, odict, start, end, **kwds)[source]¶

Writes a data chunk to an hdf5 file

Parameters

fout (h5py.File) – The file
odict (OrderedDict, (str, numpy.array)) – The data being written
start (int) – Starting row number to place the data
end (int) – Ending row number to place the data

Notes

The kwds can be used to control the output locations, i.e., to rename the columns in data_dict when they good into the output file.

For each item in data_dict, the output location is set as

k_out = kwds.get(key, key)

This will check the kwds to see if they contain key and if so, will return the corresponding value. Otherwise it will just return key.

I.e., if key is present in kwds in will override the name.

tables_io.ioUtils.finalizeHdf5Write(fout, groupname=None, **kwds)[source]¶

Write any last data and closes an hdf5 file

Parameters: fout (h5py.File) – The file

Notes

The keywords can be used to write additional data

tables_io.ioUtils.iterHdf5ToDict(filepath, chunk_size=100000, groupname=None)[source]¶

iterator for sending chunks of data in hdf5.

Parameters

filepath (input file name (str)) –
chunk_size (size of chunk to iterate over (int)) –

Returns

output – iterator chunk
Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: dictionary of all data from start:end (dict)

tables_io.ioUtils.iterH5ToDataFrame(filepath, chunk_size=100000, groupname=None)[source]¶

iterator for sending chunks of data in hdf5.

Parameters

filepath (input file name (str)) –
chunk_size (size of chunk to iterate over (int)) –

Returns

output – iterator chunk
Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: pandas.DataFrame of all data from start:end (dict)

tables_io.ioUtils.iterPqToDataFrame(filepath)[source]¶

iterator for sending chunks of data in parquet

Parameters

filepath (input file name (str)) –

Returns

output – iterator chunk
Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: pandas.DataFrame of all data from start:end (dict)

tables_io.ioUtils.writeTablesToFits(tables, filepath, **kwargs)[source]¶

Writes a dictionary of astropy.table.Table to a single FITS file

Parameters

tables (dict of astropy.table.Table) – Keys will be HDU names, values will be tables
filepath (str) – Path to output file
call. (kwargs are passed to astropy.io.fits.writeto) –

tables_io.ioUtils.readFitsToTables(filepath)[source]¶

Reads astropy.table.Table objects from a FITS file.

Parameters: filepath (str) – Path to input file
Returns: tables – Keys will be HDU names, values will be tables
Return type: OrderedDict of astropy.table.Table

tables_io.ioUtils.writeTablesToHdf5(tables, filepath, **kwargs)[source]¶

Writes a dictionary of astropy.table.Table to a single hdf5 file

Parameters

tables (dict of astropy.table.Table) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file
call. (kwargs are passed to astropy.table.Table) –

tables_io.ioUtils.readHdf5ToTables(filepath)[source]¶

Reads astropy.table.Table objects from an hdf5 file.

Parameters

filepath (str) –
Path to input file

Returns
------- –
tables (OrderedDict of astropy.table.Table) – Keys will be ‘paths’, values will be tables

tables_io.ioUtils.readHdf5Group(filepath, groupname=None)[source]¶

Read and return group from an hdf5 file.

Parameters

filepath (str) – File in question
groupname (str or None) – For hdf5 files, the groupname for the data

Returns

grp (h5py.Group or h5py.File) – The requested group
infp (h5py.File) – The input file (returned so that the used can explicitly close the file)

tables_io.ioUtils.readHdf5GroupToDict(hg, start=None, end=None)[source]¶

Reads numpy.array objects from an hdf5 file.

Parameters: filepath (str) – Path to input file
Returns: tables – Keys will be ‘paths’, values will be tables
Return type: OrderedDict of numpy.array

tables_io.ioUtils.writeDictToHdf5(odict, filepath, groupname, **kwargs)[source]¶

Writes a dictionary of numpy.array to a single hdf5 file

Parameters

odict (Mapping, (str, numpy.array)) – The data being written
filepath (str) – Path to output file
groupname (str or None) – The groupname for the data

tables_io.ioUtils.writeDictsToHdf5(odicts, filepath)[source]¶

Writes a dictionary of numpy.array to a single hdf5 file

Parameters

odicts (OrderedDict, (str, Tablelike)) – The data being written
filepath (str) – Path to output file

tables_io.ioUtils.readHdf5ToDicts(filepath)[source]¶

Reads numpy.array objects from an hdf5 file.

Parameters: filepath (str) – Path to input file
Returns: dicts – The data
Return type: OrderedDict, (str, OrderedDict, (str, numpy.array) )

tables_io.ioUtils.readHdf5ToDataFrame(filepath, key=None)[source]¶

Reads pandas.DataFrame objects from an hdf5 file.

Parameters

filepath (str) – Path to input file
key (str or None) – The key in the hdf5 file

Returns

df – The dataframe

Return type

pandas.DataFrame

tables_io.ioUtils.readH5ToDataFrames(filepath)[source]¶

Open an h5 file and and return a dictionary of pandas.DataFrame

Parameters: filepath (str) – Path to input file
Returns: tab – The data
Return type: OrderedDict (str : pandas.DataFrame)

Notes

We are using the file suffix ‘h5’ to specify ‘hdf5’ files written from DataFrames using pandas They have a different structure than ‘hdf5’ files written with h5py or astropy.table

tables_io.ioUtils.writeDataFramesToH5(dataFrames, filepath)[source]¶

Writes a dictionary of pandas.DataFrame to a single hdf5 file

Parameters

dataFrames (dict of pandas.DataFrame) – Keys will be passed to ‘key’ parameter
filepath (str) – Path to output file

tables_io.ioUtils.readPqToDataFrame(filepath)[source]¶

Reads a pandas.DataFrame object from an parquet file.

Parameters: filepath (str) – Path to input file
Returns: df – The data frame
Return type: pandas.DataFrame

tables_io.ioUtils.writeDataFramesToPq(dataFrames, filepath, **kwargs)[source]¶

Writes a dictionary of pandas.DataFrame to a parquet files

Parameters

tables (dict of pandas.DataFrame) – Keys will be passed to ‘path’ parameter
filepath (str) – Path to output file

tables_io.ioUtils.readPqToDataFrames(basepath, keys=None)[source]¶

Reads pandas.DataFrame objects from an parquet file.

Parameters

basepath (str) – Path to input file
keys (list) – Keys for the input objects. Used to complete filepaths

Returns

tables – Keys will be taken from keys

Return type

OrderedDict of pandas.DataFrame

tables_io.ioUtils.readPqToDict(filepath, columns=None)[source]¶

Open a parquet file and return a dictionary of numpy.array

Parameters

filepath (str) – Path to input file
columns (list (str) or None) – Names of the columns to read, None will read all the columns

Returns

tab – The data

Return type

OrderedDict (str : numpy.array)

tables_io.ioUtils.readH5ToDict(filepath, groupname=None)[source]¶

Open an h5 file and and return a dictionary of numpy.array

Parameters

filepath (str) – Path to input file
groupname (str or None) – The group with the data

Returns

tab – The data

Return type

OrderedDict (str : numpy.array)

Notes

We are using the file suffix ‘h5’ to specify ‘hdf5’ files written from DataFrames using pandas They have a different structure than ‘hdf5’ files written with h5py or astropy.table

tables_io.ioUtils.readHdf5ToDict(filepath, groupname=None)[source]¶

Read in h5py hdf5 data, return a dictionary of all of the keys

Parameters

filepath (str) – Path to input file
groupname (str or None) – The groupname for the data

Returns

tab – The data

Return type

OrderedDict (str : numpy.array)

Notes

We are using the file suffix ‘hdf5’ to specify ‘hdf5’ files written with h5py or astropy.table They have a different structure than ‘h5’ files written panda

tables_io.ioUtils.readNative(filepath, fmt=None, keys=None)[source]¶

Read a file to the corresponding table type

Parameters

filepath (str) – File to load
fmt (str or None) – File format, if None it will be taken from the file extension
groupname (str or None) – For hdf5 files, the groupname for the data

Returns

data – The data

Return type

OrderedDict ( str -> Tablelike )

tables_io.ioUtils.read(filepath, tType=None, fmt=None, keys=None)[source]¶

Read a file to the corresponding table type

Parameters

filepath (str) – File to load
tType (int or None) – Table type, if None this will use readNative
fmt (str or None) – File format, if None it will be taken from the file extension
groupname (str or None) – For hdf5 files, the groupname for the data

Returns

data – The data

Return type

OrderedDict ( str -> Tablelike )

tables_io.ioUtils.iterateNative(filepath, fmt=None, **kwargs)[source]¶

Read a file to the corresponding table type and iterate over the file

Parameters

filepath (str) – File to load
fmt (str or None) – File format, if None it will be taken from the file extension

Returns

data – The data

Return type

TableLike

Notes

The kwargs are used passed to the specific iterator type

tables_io.ioUtils.iterate(filepath, tType=None, fmt=None, **kwargs)[source]¶

Read a file to the corresponding table type iterate over the file

Parameters

filepath (str) – File to load
tType (int or None) – Table type, if None this will use readNative
fmt (str or None) – File format, if None it will be taken from the file extension
groupname (str or None) – For hdf5 files, the groupname for the data

Returns

data – The data

Return type

OrderedDict ( str -> Tablelike )

tables_io.ioUtils.writeNative(odict, basename)[source]¶

Write a file or files with tables

Parameters

odict (OrderedDict, (str, Tablelike)) – The data to write
basename (str) – Basename for the file to write. The suffix will be applied based on the object type.

tables_io.ioUtils.write(obj, basename, fmt=None)[source]¶

Write a file or files with tables

Parameters

obj (Tablelike or TableDictLike) – The data to write
basename (str) – Basename for the file to write. The suffix will be applied based on the object type.
fmt (str or None) – The output file format, If None this will use writeNative

tables_io.testUtils¶

Utilities for testing

tables_io.testUtils.compare_tables(t1, t2)[source]¶

Compare all the tables in two astropy.table.Table)

Parameters

t1 (astropy.table.Table) – One table
t2 (astropy.table.Table) – Another tables

Returns

identical – True if the tables are identical, False otherwise

Return type

bool

Notes

For now this explicitly flattens each of the columns, to avoid issues with shape

tables_io.testUtils.compare_table_dicts(d1, d2, strict=False)[source]¶

Compare all the tables in two OrderedDict, (str, astropy.table.Table)

Parameters

d1 (OrderedDict, (str, astropy.table.Table)) – One dictionary of tables
d2 (OrderedDict, (str, astropy.table.Table)) – Another dictionary of tables

Returns

identical – True if all the tables are identical, False otherwise

Return type

bool

tables_io.testUtils.make_test_data()[source]¶: Make and return some test data