Class CacheData

Inheritance Relationships

Derived Types

Class Documentation

class ral::cache::CacheData

Base Class for all CacheData A CacheData represents a combination of a schema along with a container for a dataframe. This gives us one type that can be sent around and whose only purpose is to hold data until it is ready to be operated on by calling the decache method.

Subclassed by ral::cache::CacheDataIO, ral::cache::CacheDataLocalFile, ral::cache::ConcatCacheData, ral::cache::CPUCacheData, ral::cache::GPUCacheData

Public Functions

inline CacheData(CacheDataType cache_type, std::vector<std::string> col_names, std::vector<cudf::data_type> schema, size_t n_rows)

Constructor for CacheData This is only invoked by the derived classes when constructing.

Parameters
  • cache_type: The CacheDataType of this cache letting us know where the data is stored.

  • col_names: The names of the columns in the dataframe.

  • schema: The types of the columns in the dataframe.

  • n_rows: The number of rows in the dataframe.

inline CacheData()
virtual std::unique_ptr<ral::frame::BlazingTable> decache() = 0

Remove the payload from this CacheData. A pure virtual function. This removes the payload for the CacheData. After this the CacheData will almost always go out of scope and be destroyed.

Return

a BlazingTable generated from the source of data for this CacheData

virtual size_t sizeInBytes() const = 0

. A pure virtual function. This removes the payload for the CacheData. After this the CacheData will almost always go out of scope and be destroyed.

Return

the number of bytes our dataframe occupies in whatever format it is being stored

virtual void set_names(const std::vector<std::string> &names) = 0

Set the names of the columns.

Parameters
  • names: a vector of the column names.

inline virtual ~CacheData()

Destructor

inline std::vector<std::string> names() const

Get the names of the columns.

Return

a vector of the column names

inline std::vector<cudf::data_type> get_schema() const

Get the cudf::data_type of each column.

Return

a vector of the cudf::data_type of each column.

inline size_t num_columns() const

Get the number of columns this CacheData will generate with decache.

inline size_t num_rows() const

Get the number of rows this CacheData will generate with decache.

inline CacheDataType get_type() const

Gets the type of CacheData that was used to construct this CacheData

Return

The CacheDataType that is used to store the dataframe representation.

inline void setMetadata(MetadataDictionary new_metadata)

Set the MetadataDictionary

inline MetadataDictionary getMetadata()

Get the MetadataDictionary

Return

The MetadataDictionary which is used in routing and planning.

Public Static Functions

static std::unique_ptr<CacheData> downgradeCacheData(std::unique_ptr<CacheData> cacheData, std::string id, std::shared_ptr<Context> ctx)

Utility function which can take a CacheData and if its a standard GPU cache data, it will downgrade it to CPU or Disk

Return

If the input CacheData is not of a type that can be downgraded, it will just return the original input, otherwise it will return the downgraded CacheData.

Protected Attributes

CacheDataType cache_type

The CacheDataType that is used to store the dataframe representation.

std::vector<std::string> col_names

A vector storing the names of the columns in the dataframe representation.

std::vector<cudf::data_type> schema

A vector storing the cudf::data_type of the columns in the dataframe representation.

size_t n_rows

Stores the number of rows in the dataframe representation.

MetadataDictionary metadata

The metadata used for routing and planning.