Using tables_io.TableDict

The class tables_io.TableDict is just an Ordered Dictionary of Tables.

The Tables can be in any of the formats that tables_io supports, see more on that in the notebook below.

Let’s have a look

# Standard imports
import os
import numpy as np
import tables_io
import astropy.table as apTable
#from tables_io.testUtils import make_test_data
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      3 import numpy as np
      4 import tables_io
----> 5 import astropy.table as apTable
      6 #from tables_io.testUtils import make_test_data

ModuleNotFoundError: No module named 'astropy'

Some test data.

Ok, lets make some test data and have a look at it

def make_test_data():
    nrow = 1000
    vect_size = 20
    mat_size = 5
    scalar = np.random.uniform(size=nrow)
    vect = np.random.uniform(size=nrow * vect_size).reshape(nrow, vect_size)
    matrix = np.random.uniform(size=nrow * mat_size * mat_size).reshape(nrow, mat_size, mat_size)
    data = dict(scalar=scalar, vect=vect, matrix=matrix)
    table = apTable.Table(data)
    table.meta["a"] = 1
    table.meta["b"] = None
    table.meta["c"] = [3, 4, 5]
    small_table = apTable.Table(dict(a=np.ones(21), b=np.zeros(21)))
    small_table.meta["small"] = True
    return(dict(data=table, md=small_table))
data = make_test_data()
data

Building a table dict

We can using any Mapping (i.e., something that allows use to iterate over key-value pairs) to build a TableDict.

So lets make a TableDict

td = tables_io.TableDict(data)
td

TableDict inherits from the collections.OrderedDict class, so it has the standard interface for python dictionaries

td.keys()
td['data']
td['md']

TableDict will not take non tables

try:
    td['bad'] = 'a'
except TypeError as msg:
    print("Caught attempt to add non table to TableDict: %s" % msg)

Supported Table types and converting between them

TableDict supports several different types of tables. These include:

  1. astropy Tables: astropy.table.Table objects

  2. Mapping of str, numpy.array

  3. pandas DataFrames: pandas.DataFrame objects

Let’s convert to each of these

td_ap = td.convert(tables_io.types.AP_TABLE)
td_ap
td_np = td.convert(tables_io.types.NUMPY_DICT)
td_np
td_pd = td.convert(tables_io.types.PD_DATAFRAME)
td_pd

File IO with TableDict

We can write tables into several different formats. These include:

  1. fits: Writing astropy.table.Table objects to FITS files (with the suffix ‘fits’)

  2. hf5: Writing astropy.table.Table objects to HDF5 files (with the suffix ‘hf5’)

  3. hfd5: Writing numpy.array objects to HDF5 files (with the suffix ‘hdf5’)

  4. h5: Writing pandas.DataFrame objects to HDF5 files (with the suffix ‘h5’)

  5. pq: Writing pandas.DataFrame objects to parquet files (with the suffix ‘pq’)

Also, each table type has a ‘native’ format that we use as a default. Setting the fmt to None in function calls will typically use the ‘native’ format.

all_fmts = list(tables_io.types.FILE_FORMAT_SUFFIXS.keys()) + [None]
print(all_fmts)

Ok let’s write the data to different files

for fmt in all_fmts:
    if fmt is None:
        basename = 'test_native'
    else:
        basename = 'test_out'
    print("Writing to %s using format %s" % (basename, fmt))
    try:
        os.unlink('%s.%s' % (basename, fmt))
    except:
        pass
    try:
        td.write(basename, fmt)
    except ImportError as msg:
        print("Skipping format %s because %s" % (fmt, msg))
! ls test_*

Ok, now let’s read things back

td_r_fits = tables_io.TableDict.read("test_out.fits")
td_r_fits
td_r_hdf5 = tables_io.TableDict.read("test_out.hdf5")
td_r_hdf5
td_r_hf5 = tables_io.TableDict.read("test_out.hf5")
td_r_hf5
td_r_pq = tables_io.TableDict.read("test_out.pq", keys=list(td.keys()))
td_r_pq
td_r_h5 = tables_io.TableDict.read("test_out.h5")
td_r_h5
td_native = tables_io.TableDict.read("test_out.hf5")
td_native