blazingsql.BlazingContext.hdfs

BlazingContext.hdfs(prefix, **kwargs)

Register a Hadoop Distributed File System (HDFS) Cluster.

namestring that represents the name with which you will refer to

your HDFS cluster.

host : string IP Address of your HDFS NameNode. port : integer of the Port number of your HDFS NameNode. user : string of the HDFS User on your NameNode. kerb_ticket (optional) : string file path to your ticket for

kerberos authentication.

You may also need to set the following environment variables to properly interface with HDFS. HADOOP_HOME: the root of your installed Hadoop distribution. JAVA_HOME: the location of your Java SDK installation

(should point to CONDA_PREFIX).

ARROW_LIBHDFS_DIR: explicit location of libhdfs.so if not installed

at $HADOOP_HOME/lib/native.

CLASSPATH: must contain the Hadoop jars.

Register and create table from HDFS:

>>> bc.hdfs('dir_name', host='name_node_ip', port=port_number,
    user='hdfs_user')
>>> bc.create_table('table_name', 'hdfs://dir_name/file.csv')
<pyblazing.apiv2.context.BlazingTable at 0x7f11897c0310>

Docs: https://docs.blazingdb.com/docs/hdfs