API Documentation for tables_io

Stuff

TableDict interface

tables_io.TableDict

Functions to store analysis results as astropy data tables

class tables_io.tableDict.TableDict[source]

Object to collect various types of table-like objects

This class is a dictionary mapping name to table-like and a few helper functions, e.g., to add new tables to the dictionary and to read and write files, either as FITS or HDF5 files.

write(basepath, fmt=None)[source]

Write tables to the corresponding file type

Parameters
  • basepath (str) – base path for output files. Suffix will be added based on type

  • fmt (str or None) – The output file format, If None this will use writeNative

convert(tType)[source]

Build a new TableDict by converting all the table in the object to a different type

Parameters

tType (int) – The type to convert to

Returns

td – The new TableDict

Return type

TableDict

classmethod read(filepath, tType=None, fmt=None, keys=None)[source]

Read a file to the corresponding table type

Parameters
  • filepath (str) – File to load

  • tType (int or None) – Table type, if None this will use readNative

  • fmt (str or None) – File format, if None it will be taken from the file extension

  • keys (list or None) – Keys to read for parquet files

Returns

tableDict – The data

Return type

TableDict

Tables types and File Formats

tables_io.types

Type defintions for tables_io

tables_io.types.tableType(obj)[source]

Identify the type of table we have

Parameters

obj (object) – The input object

Returns

otype – The object type, one of TABULAR_FORMATS.keys()

Return type

int

:raises TypeError : The object is not a supported type: :raises IndexError : One of the columns in a Mapping is the wrong length:

tables_io.types.istablelike(obj)[source]

Test to see if an object is one of the supported table types

Parameters

obj (object) – The input object

Returns

tablelike – True is the object is Tablelike, False otherwise

Return type

bool

tables_io.types.istabledictlike(obj)[source]

Test to see if an object is a Mapping, (str, Tablelike)

Parameters

obj (object) – The input object

Returns

tabledict – True is the object is a Mapping, (str, Tablelike), False otherwise

Return type

bool

tables_io.types.fileType(filepath, fmt=None)[source]

Identify the type of file we have

Parameters
  • filepath (str) – The path to the file

  • fmt (str or None) – Overrides the file extension

Returns

otype – The object type, one of FILE_FORMATS.keys()

Return type

int

:raises KeyError : The file format is not a support value:

Infrastructure and Core functionality

tables_io.arrayUtils

Array-related utility functions for tables_io

tables_io.arrayUtils.arrayLength(arr)[source]

Get the length of an array

The works on scalars and arrays, so it is safe to use

For scalars it returns 0 For arrays it np.shape(arr)[0]

Parameters

arr (array-like) – The input array

Returns

length – The object length

Return type

int

tables_io.arrayUtils.forceToPandables(arr, check_nrow=None)[source]

Forces a numpy.array into a format that pandas can handle

Parameters
  • arr (numpy.array) – The input array

  • check_nrow (int or None) – If not None, require that arr.shape[0] match this value

Returns

out – Something that pandas can handle

Return type

numpy.array or list of numpy.array

tables_io.arrayUtils.getGroupInputDataLength(hg)[source]

Return the length of a HDF5 group

Parameters

hg (h5py.Group or h5py.File) – The input data group

Returns

length – The length of the data

Return type

int

Notes

For a multi-D array this return the length of the first axis and not the total size of the array.

Normally that is what you want to be iterating over.

The group is meant to represent a table, hence all child datasets should be the same length

tables_io.arrayUtils.printDictShape(in_dict)[source]

Print the shape of arrays in a dictionary. This is useful for debugging astropy.Table creation.

Parameters

in_dict (dict) – The dictionary to print

tables_io.arrayUtils.sliceDict(in_dict, subslice)[source]

Create a new dict by taking a slice of of every array in a dict

Parameters
  • in_dict (dict) – The dictionary to extract from

  • subslice (int or slice) – Used to slice the arrays

Returns

out_dict – The converted dicionary

Return type

dict

tables_io.arrayUtils.checkKeys(in_dicts)[source]

Check that the keys in all the in_dicts match

Parameters
  • in_dicts (list, (OrderedDict, (str, numpy.array))) – The dictionaries for which compare keys

  • match. (Raises KeyError if one does not) –

tables_io.arrayUtils.concatenateDicts(in_dicts)[source]

Create a new dict by concatenating each array in in_dicts

Parameters

in_dicts (list, (OrderedDict, (str, numpy.array))) – The dictionaries to stack

Returns

out_dict – The stacked dicionary

Return type

dict

tables_io.convUtils

IO Functions for tables_io

tables_io.convUtils.dataFrameToTable(df)[source]

Convert a pandas.DataFrame to an astropy.table.Table

Parameters

df (pandas.DataFrame) – The dataframe

Returns

tab – The table

Return type

astropy.table.Table

tables_io.convUtils.convertTable(obj)[source]

Convert an object to an astropy.table.Table

Parameters

obj (object) – The object being converted

Returns

tab – The table

Return type

astropy.table.Table

tables_io.convUtils.tableToDict(tab)[source]

Convert an astropy.table.Table to an OrderedDict of str : numpy.array

Parameters
  • tab (astropy.table.Table) – The table

  • Returnes

  • --------

  • data (OrderedDict, (str : numpy.array)) – The tabledata

tables_io.convUtils.dataFrameToDict(df)[source]

Convert a pandas.DataFrame to an OrderedDict of str : numpy.array

Parameters
  • df (pandas.DataFrame) – The dataframe

  • Returnes

  • --------

  • data (OrderedDict, (str : numpy.array)) – The tabledata

tables_io.convUtils.hdf5GroupToDict(hg)[source]

Convert a hdf5 object to an OrderedDict, (str, numpy.array)

Parameters
  • hg (h5py.File or h5py.Group) – The hdf5 object

  • Returnes

  • --------

  • data (OrderedDict, (str : numpy.array)) – The tabledata

tables_io.convUtils.convertDict(obj)[source]

Convert an object to an OrderedDict, (str, numpy.array)

Parameters

obj (object) – The object being converted

Returns

tab – The table

Return type

astropy.table.Table

tables_io.convUtils.tableToDataFrame(tab)[source]

Convert an astropy.table.Table to a pandas.DataFrame

Parameters

tab (astropy.table.Table) – The table

Returns

df – The dataframe

Return type

pandas.DataFrame

tables_io.convUtils.dictToDataFrame(odict, meta=None)[source]

Convert an OrderedDict, (str, numpy.array) to a pandas.DataFrame

Parameters
  • odict (OrderedDict, (str, numpy.array)) – The dict

  • meta (dict or None) – Optional dictionary of metadata

Returns

df – The dataframe

Return type

pandas.DataFrame

tables_io.convUtils.convertDataFrame(obj)[source]

Convert an object to a pandas.DataFrame

Parameters

obj (object) – The object being converted

Returns

df – The dataframe

Return type

pandas.DataFrame

tables_io.convUtils.convertObj(obj, tType)[source]

Convert an object to a specific type of Tablelike

Parameters
  • obj (object) – The object being converted

  • tType (int) – The type of object to convert to, one of TABULAR_FORMAT_NAMES

Returns

out – The converted object

Return type

Tablelike

tables_io.convUtils.convertTables(odict)[source]

Convert several objects to astropy.table.Table

Parameters

odict (Mapping, (str, Tablelike)) – The input objects

Returns

tabs – The tables

Return type

OrderedDict of astropy.table.Table

tables_io.convUtils.convertDicts(odict)[source]

Convert several objects to OrderedDict, (str, numpy.array)

Parameters

odict (Mapping, (str, Tablelike)) – The input objects

Returns

tabs – The tables

Return type

OrderedDict of OrderedDict, (str, numpy.array)

tables_io.convUtils.convertDataFrames(odict)[source]

Convert several objects to pandas.DataFrame

Parameters

odict (Mapping, (str, Tablelike)) – The input objects

Returns

df – The dataframes

Return type

OrderedDict of pandas.DataFrame

tables_io.convUtils.convert(obj, tType)[source]

Convert several objects to a specific type

Parameters
  • obj (‘Tablelike` or TableDictlike) – The input object

  • tType (int) – One of TABULAR_FORMAT_NAMES.keys()

Returns

out – The converted data

Return type

Tablelike or TableDictlike

tables_io.ioUtils

IO Functions for tables_io

tables_io.ioUtils.readHdf5DatasetToArray(dataset, start=None, end=None)[source]

Reads part of a hdf5 dataset into a numpy.array

Parameters
  • dataset (h5py.Dataset) – The input dataset

  • start (int or None) – Starting row

  • end (int or None) – Ending row

Returns

out – Something that pandas can handle

Return type

numpy.array or list of numpy.array

tables_io.ioUtils.getInputDataLengthHdf5(filepath, groupname=None)[source]

Open an HDF5 file and return the size of a group

Parameters
  • filepath (str) – Path to input file

  • groupname (str or None) – The groupname for the data

Returns

length – The length of the data

Return type

int

Notes

For a multi-D array this return the length of the first axis and not the total size of the array.

Normally that is what you want to be iterating over.

tables_io.ioUtils.initializeHdf5Write(filepath, groupname=None, **kwds)[source]

Prepares an hdf5 file for output

Parameters
  • filepath (str) – The output file name

  • groupname (str or None) – The output group name

Returns

  • group (h5py.File or h5py.Group) – The group to write to

  • fout (h5py.File) – The output file

Notes

The keywords should be used to create_datasets within the hdf5 file. Each keyword should provide a tuple of ( (shape), (dtype) )

shapetuple ( int )

The shape of the data for this dataset

dtypestr

The data type for this dataset

For exmaple initialize_writeout(‘test.hdf5’, scalar=((100000,), ‘f4’), vect=((100000, 3), ‘f4’))

Would initialize an hdf5 file with two datasets, with shapes and data types as given

tables_io.ioUtils.writeDictToHdf5Chunk(fout, odict, start, end, **kwds)[source]

Writes a data chunk to an hdf5 file

Parameters
  • fout (h5py.File) – The file

  • odict (OrderedDict, (str, numpy.array)) – The data being written

  • start (int) – Starting row number to place the data

  • end (int) – Ending row number to place the data

Notes

The kwds can be used to control the output locations, i.e., to rename the columns in data_dict when they good into the output file.

For each item in data_dict, the output location is set as

k_out = kwds.get(key, key)

This will check the kwds to see if they contain key and if so, will return the corresponding value. Otherwise it will just return key.

I.e., if key is present in kwds in will override the name.

tables_io.ioUtils.finalizeHdf5Write(fout, groupname=None, **kwds)[source]

Write any last data and closes an hdf5 file

Parameters

fout (h5py.File) – The file

Notes

The keywords can be used to write additional data

tables_io.ioUtils.iterHdf5ToDict(filepath, chunk_size=100000, groupname=None)[source]

iterator for sending chunks of data in hdf5.

Parameters
  • filepath (input file name (str)) –

  • chunk_size (size of chunk to iterate over (int)) –

Returns

  • output – iterator chunk

  • Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: dictionary of all data from start:end (dict)

tables_io.ioUtils.iterH5ToDataFrame(filepath, chunk_size=100000, groupname=None)[source]

iterator for sending chunks of data in hdf5.

Parameters
  • filepath (input file name (str)) –

  • chunk_size (size of chunk to iterate over (int)) –

Returns

  • output – iterator chunk

  • Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: pandas.DataFrame of all data from start:end (dict)

tables_io.ioUtils.iterPqToDataFrame(filepath)[source]

iterator for sending chunks of data in parquet

Parameters

filepath (input file name (str)) –

Returns

  • output – iterator chunk

  • Currently only implemented for hdf5, returns tuple – start: start index (int) end: ending index (int) data: pandas.DataFrame of all data from start:end (dict)

tables_io.ioUtils.writeTablesToFits(tables, filepath, **kwargs)[source]

Writes a dictionary of astropy.table.Table to a single FITS file

Parameters
  • tables (dict of astropy.table.Table) – Keys will be HDU names, values will be tables

  • filepath (str) – Path to output file

  • call. (kwargs are passed to astropy.io.fits.writeto) –

tables_io.ioUtils.readFitsToTables(filepath)[source]

Reads astropy.table.Table objects from a FITS file.

Parameters

filepath (str) – Path to input file

Returns

tables – Keys will be HDU names, values will be tables

Return type

OrderedDict of astropy.table.Table

tables_io.ioUtils.writeTablesToHdf5(tables, filepath, **kwargs)[source]

Writes a dictionary of astropy.table.Table to a single hdf5 file

Parameters
  • tables (dict of astropy.table.Table) – Keys will be passed to ‘path’ parameter

  • filepath (str) – Path to output file

  • call. (kwargs are passed to astropy.table.Table) –

tables_io.ioUtils.readHdf5ToTables(filepath)[source]

Reads astropy.table.Table objects from an hdf5 file.

Parameters
  • filepath (str) –

    Path to input file

    Returns

  • -------

  • tables (OrderedDict of astropy.table.Table) – Keys will be ‘paths’, values will be tables

tables_io.ioUtils.readHdf5Group(filepath, groupname=None)[source]

Read and return group from an hdf5 file.

Parameters
  • filepath (str) – File in question

  • groupname (str or None) – For hdf5 files, the groupname for the data

Returns

  • grp (h5py.Group or h5py.File) – The requested group

  • infp (h5py.File) – The input file (returned so that the used can explicitly close the file)

tables_io.ioUtils.readHdf5GroupToDict(hg, start=None, end=None)[source]

Reads numpy.array objects from an hdf5 file.

Parameters

filepath (str) – Path to input file

Returns

tables – Keys will be ‘paths’, values will be tables

Return type

OrderedDict of numpy.array

tables_io.ioUtils.writeDictToHdf5(odict, filepath, groupname, **kwargs)[source]

Writes a dictionary of numpy.array to a single hdf5 file

Parameters
  • odict (Mapping, (str, numpy.array)) – The data being written

  • filepath (str) – Path to output file

  • groupname (str or None) – The groupname for the data

tables_io.ioUtils.writeDictsToHdf5(odicts, filepath)[source]

Writes a dictionary of numpy.array to a single hdf5 file

Parameters
  • odicts (OrderedDict, (str, Tablelike)) – The data being written

  • filepath (str) – Path to output file

tables_io.ioUtils.readHdf5ToDicts(filepath)[source]

Reads numpy.array objects from an hdf5 file.

Parameters

filepath (str) – Path to input file

Returns

dicts – The data

Return type

OrderedDict, (str, OrderedDict, (str, numpy.array) )

tables_io.ioUtils.readHdf5ToDataFrame(filepath, key=None)[source]

Reads pandas.DataFrame objects from an hdf5 file.

Parameters
  • filepath (str) – Path to input file

  • key (str or None) – The key in the hdf5 file

Returns

df – The dataframe

Return type

pandas.DataFrame

tables_io.ioUtils.readH5ToDataFrames(filepath)[source]

Open an h5 file and and return a dictionary of pandas.DataFrame

Parameters

filepath (str) – Path to input file

Returns

tab – The data

Return type

OrderedDict (str : pandas.DataFrame)

Notes

We are using the file suffix ‘h5’ to specify ‘hdf5’ files written from DataFrames using pandas They have a different structure than ‘hdf5’ files written with h5py or astropy.table

tables_io.ioUtils.writeDataFramesToH5(dataFrames, filepath)[source]

Writes a dictionary of pandas.DataFrame to a single hdf5 file

Parameters
  • dataFrames (dict of pandas.DataFrame) – Keys will be passed to ‘key’ parameter

  • filepath (str) – Path to output file

tables_io.ioUtils.readPqToDataFrame(filepath)[source]

Reads a pandas.DataFrame object from an parquet file.

Parameters

filepath (str) – Path to input file

Returns

df – The data frame

Return type

pandas.DataFrame

tables_io.ioUtils.writeDataFramesToPq(dataFrames, filepath, **kwargs)[source]

Writes a dictionary of pandas.DataFrame to a parquet files

Parameters
  • tables (dict of pandas.DataFrame) – Keys will be passed to ‘path’ parameter

  • filepath (str) – Path to output file

tables_io.ioUtils.readPqToDataFrames(basepath, keys=None)[source]

Reads pandas.DataFrame objects from an parquet file.

Parameters
  • basepath (str) – Path to input file

  • keys (list) – Keys for the input objects. Used to complete filepaths

Returns

tables – Keys will be taken from keys

Return type

OrderedDict of pandas.DataFrame

tables_io.ioUtils.readPqToDict(filepath, columns=None)[source]

Open a parquet file and return a dictionary of numpy.array

Parameters
  • filepath (str) – Path to input file

  • columns (list (str) or None) – Names of the columns to read, None will read all the columns

Returns

tab – The data

Return type

OrderedDict (str : numpy.array)

tables_io.ioUtils.readH5ToDict(filepath, groupname=None)[source]

Open an h5 file and and return a dictionary of numpy.array

Parameters
  • filepath (str) – Path to input file

  • groupname (str or None) – The group with the data

Returns

tab – The data

Return type

OrderedDict (str : numpy.array)

Notes

We are using the file suffix ‘h5’ to specify ‘hdf5’ files written from DataFrames using pandas They have a different structure than ‘hdf5’ files written with h5py or astropy.table

tables_io.ioUtils.readHdf5ToDict(filepath, groupname=None)[source]

Read in h5py hdf5 data, return a dictionary of all of the keys

Parameters
  • filepath (str) – Path to input file

  • groupname (str or None) – The groupname for the data

Returns

tab – The data

Return type

OrderedDict (str : numpy.array)

Notes

We are using the file suffix ‘hdf5’ to specify ‘hdf5’ files written with h5py or astropy.table They have a different structure than ‘h5’ files written panda

tables_io.ioUtils.readNative(filepath, fmt=None, keys=None)[source]

Read a file to the corresponding table type

Parameters
  • filepath (str) – File to load

  • fmt (str or None) – File format, if None it will be taken from the file extension

  • groupname (str or None) – For hdf5 files, the groupname for the data

Returns

data – The data

Return type

OrderedDict ( str -> Tablelike )

tables_io.ioUtils.read(filepath, tType=None, fmt=None, keys=None)[source]

Read a file to the corresponding table type

Parameters
  • filepath (str) – File to load

  • tType (int or None) – Table type, if None this will use readNative

  • fmt (str or None) – File format, if None it will be taken from the file extension

  • groupname (str or None) – For hdf5 files, the groupname for the data

Returns

data – The data

Return type

OrderedDict ( str -> Tablelike )

tables_io.ioUtils.iterateNative(filepath, fmt=None, **kwargs)[source]

Read a file to the corresponding table type and iterate over the file

Parameters
  • filepath (str) – File to load

  • fmt (str or None) – File format, if None it will be taken from the file extension

Returns

data – The data

Return type

TableLike

Notes

The kwargs are used passed to the specific iterator type

tables_io.ioUtils.iterate(filepath, tType=None, fmt=None, **kwargs)[source]

Read a file to the corresponding table type iterate over the file

Parameters
  • filepath (str) – File to load

  • tType (int or None) – Table type, if None this will use readNative

  • fmt (str or None) – File format, if None it will be taken from the file extension

  • groupname (str or None) – For hdf5 files, the groupname for the data

Returns

data – The data

Return type

OrderedDict ( str -> Tablelike )

tables_io.ioUtils.writeNative(odict, basename)[source]

Write a file or files with tables

Parameters
  • odict (OrderedDict, (str, Tablelike)) – The data to write

  • basename (str) – Basename for the file to write. The suffix will be applied based on the object type.

tables_io.ioUtils.write(obj, basename, fmt=None)[source]

Write a file or files with tables

Parameters
  • obj (Tablelike or TableDictLike) – The data to write

  • basename (str) – Basename for the file to write. The suffix will be applied based on the object type.

  • fmt (str or None) – The output file format, If None this will use writeNative

tables_io.testUtils

Utilities for testing

tables_io.testUtils.compare_tables(t1, t2)[source]

Compare all the tables in two astropy.table.Table)

Parameters
  • t1 (astropy.table.Table) – One table

  • t2 (astropy.table.Table) – Another tables

Returns

identical – True if the tables are identical, False otherwise

Return type

bool

Notes

For now this explicitly flattens each of the columns, to avoid issues with shape

tables_io.testUtils.compare_table_dicts(d1, d2, strict=False)[source]

Compare all the tables in two OrderedDict, (str, astropy.table.Table)

Parameters
  • d1 (OrderedDict, (str, astropy.table.Table)) – One dictionary of tables

  • d2 (OrderedDict, (str, astropy.table.Table)) – Another dictionary of tables

Returns

identical – True if all the tables are identical, False otherwise

Return type

bool

tables_io.testUtils.make_test_data()[source]

Make and return some test data