blazingsql.BlazingContext.hdfs¶
-
BlazingContext.
hdfs
(prefix, **kwargs)¶ Register a Hadoop Distributed File System (HDFS) Cluster.
- namestring that represents the name with which you will refer to
your HDFS cluster.
host : string IP Address of your HDFS NameNode. port : integer of the Port number of your HDFS NameNode. user : string of the HDFS User on your NameNode. kerb_ticket (optional) : string file path to your ticket for
kerberos authentication.
You may also need to set the following environment variables to properly interface with HDFS. HADOOP_HOME: the root of your installed Hadoop distribution. JAVA_HOME: the location of your Java SDK installation
(should point to CONDA_PREFIX).
- ARROW_LIBHDFS_DIR: explicit location of libhdfs.so if not installed
at $HADOOP_HOME/lib/native.
CLASSPATH: must contain the Hadoop jars.
Register and create table from HDFS:
>>> bc.hdfs('dir_name', host='name_node_ip', port=port_number, user='hdfs_user') >>> bc.create_table('table_name', 'hdfs://dir_name/file.csv') <pyblazing.apiv2.context.BlazingTable at 0x7f11897c0310>