Using tables_io.TableDict
The class tables_io.TableDict is just an Ordered Dictionary of Tables.
The Tables can be in any of the formats that tables_io supports, see more on that in the notebook below.
Let’s have a look
# Standard imports
import os
import numpy as np
import tables_io
import astropy.table as apTable
#from tables_io.testUtils import make_test_data
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 5
3 import numpy as np
4 import tables_io
----> 5 import astropy.table as apTable
6 #from tables_io.testUtils import make_test_data
ModuleNotFoundError: No module named 'astropy'
Some test data.
Ok, lets make some test data and have a look at it
def make_test_data():
nrow = 1000
vect_size = 20
mat_size = 5
scalar = np.random.uniform(size=nrow)
vect = np.random.uniform(size=nrow * vect_size).reshape(nrow, vect_size)
matrix = np.random.uniform(size=nrow * mat_size * mat_size).reshape(nrow, mat_size, mat_size)
data = dict(scalar=scalar, vect=vect, matrix=matrix)
table = apTable.Table(data)
table.meta["a"] = 1
table.meta["b"] = None
table.meta["c"] = [3, 4, 5]
small_table = apTable.Table(dict(a=np.ones(21), b=np.zeros(21)))
small_table.meta["small"] = True
return(dict(data=table, md=small_table))
data = make_test_data()
data
Building a table dict
We can using any Mapping (i.e., something that allows use to iterate over key-value pairs) to build a TableDict.
So lets make a TableDict
td = tables_io.TableDict(data)
td
TableDict inherits from the collections.OrderedDict class, so it has the standard interface for python dictionaries
td.keys()
td['data']
td['md']
TableDict will not take non tables
try:
td['bad'] = 'a'
except TypeError as msg:
print("Caught attempt to add non table to TableDict: %s" % msg)
Supported Table types and converting between them
TableDict supports several different types of tables. These include:
astropy Tables:
astropy.table.TableobjectsMapping of
str,numpy.arraypandas DataFrames:
pandas.DataFrameobjects
Let’s convert to each of these
td_ap = td.convert(tables_io.types.AP_TABLE)
td_ap
td_np = td.convert(tables_io.types.NUMPY_DICT)
td_np
td_pd = td.convert(tables_io.types.PD_DATAFRAME)
td_pd
File IO with TableDict
We can write tables into several different formats. These include:
fits: Writing
astropy.table.Tableobjects to FITS files (with the suffix ‘fits’)hf5: Writing
astropy.table.Tableobjects to HDF5 files (with the suffix ‘hf5’)hfd5: Writing
numpy.arrayobjects to HDF5 files (with the suffix ‘hdf5’)h5: Writing
pandas.DataFrameobjects to HDF5 files (with the suffix ‘h5’)pq: Writing
pandas.DataFrameobjects to parquet files (with the suffix ‘pq’)
Also, each table type has a ‘native’ format that we use as a default. Setting the fmt to None in function calls will typically use the ‘native’ format.
all_fmts = list(tables_io.types.FILE_FORMAT_SUFFIXS.keys()) + [None]
print(all_fmts)
Ok let’s write the data to different files
for fmt in all_fmts:
if fmt is None:
basename = 'test_native'
else:
basename = 'test_out'
print("Writing to %s using format %s" % (basename, fmt))
try:
os.unlink('%s.%s' % (basename, fmt))
except:
pass
try:
td.write(basename, fmt)
except ImportError as msg:
print("Skipping format %s because %s" % (fmt, msg))
! ls test_*
Ok, now let’s read things back
td_r_fits = tables_io.TableDict.read("test_out.fits")
td_r_fits
td_r_hdf5 = tables_io.TableDict.read("test_out.hdf5")
td_r_hdf5
td_r_hf5 = tables_io.TableDict.read("test_out.hf5")
td_r_hf5
td_r_pq = tables_io.TableDict.read("test_out.pq", keys=list(td.keys()))
td_r_pq
td_r_h5 = tables_io.TableDict.read("test_out.h5")
td_r_h5
td_native = tables_io.TableDict.read("test_out.hf5")
td_native