Using tables_io.read, tables_io.write and tables_io.convert

These functions can be used to read and write single tables and to convert them to different formats

The Tables can be in any of the formats that tables_io supports, see more on that in the notebook below.

Let’s have a look

# Standard imports
import os
import numpy as np
import tables_io
import astropy.table as apTable
#from tables_io.testUtils import make_test_data
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      3 import numpy as np
      4 import tables_io
----> 5 import astropy.table as apTable
      6 #from tables_io.testUtils import make_test_data

ModuleNotFoundError: No module named 'astropy'
# make several tables and grab one
def make_test_data():
    nrow = 1000
    vect_size = 20
    mat_size = 5
    scalar = np.random.uniform(size=nrow)
    vect = np.random.uniform(size=nrow * vect_size).reshape(nrow, vect_size)
    matrix = np.random.uniform(size=nrow * mat_size * mat_size).reshape(nrow, mat_size, mat_size)
    data = dict(scalar=scalar, vect=vect, matrix=matrix)
    table = apTable.Table(data)
    table.meta["a"] = 1
    table.meta["b"] = None
    table.meta["c"] = [3, 4, 5]
    return(table)

data = make_test_data()
data_np = tables_io.convert(data, tables_io.types.NUMPY_DICT)
data_np
data_pd = tables_io.convert(data, tables_io.types.PD_DATAFRAME)
data_pd

File IO with tables_io

We can write tables into several different formats. These include:

  1. fits: Writing astropy.table.Table objects to FITS files (with the suffix ‘fits’)

  2. hf5: Writing astropy.table.Table objects to HDF5 files (with the suffix ‘hf5’)

  3. hfd5: Writing numpy.array objects to HDF5 files (with the suffix ‘hdf5’)

  4. h5: Writing pandas.DataFrame objects to HDF5 files (with the suffix ‘h5’)

  5. pq: Writing pandas.DataFrame objects to parquet files (with the suffix ‘pq’)

Also, each table type has a ‘native’ format that we use as a default. Setting the fmt to None in function calls will typically use the ‘native’ format.

all_fmts = list(tables_io.types.FILE_FORMAT_SUFFIXS.keys()) + [None]
print(all_fmts)

Ok let’s write the data to different files

for fmt in all_fmts:
    if fmt is None:
        basename = 'test_single_native'
    else:
        basename = 'test_single_out'
    print("Writing to %s using format %s" % (basename, fmt))
    try:
        os.unlink('%s.%s' % (basename, fmt))
    except:
        pass
    try:
        tables_io.write(data, basename, fmt)
    except ImportError as msg:
        print("Skipping format %s because %s" % (fmt, msg))
! ls test_single_*

Ok, now let’s read things back

data_r_fits = tables_io.read("test_single_out.fits")
data_r_fits
data_r_hdf5 = tables_io.read("test_single_out.hdf5")
data_r_hdf5
data_r_hf5 = tables_io.read("test_single_out.hf5")
data_r_hf5
data_r_pq = tables_io.read("test_single_out.pq", keys=[''])
data_r_pq
data_r_h5 = tables_io.read("test_single_out.h5")
data_r_h5
data_native = tables_io.read("test_single_native.hf5")
data_native