This page aims at creating a "copy-paste"-like tutorial to familiarize with HDFS commands . It mainly focuses on user commands (uploading and downloading data into HDFS).
Requirements
- SSH (for Windows, use PuTTY and see how to create a key with PuTTY)
- An account in the DAPLAB, and send your ssh public key to Benoit.
- A browser -- well, if you can access this page, you should have met this requirement :)
Resources
While the source of truth for HDFS commands is the code source, the documentation page describing the
hdfs dfs
commands is really useful:Basic Manipulations
Listing a folder
Your home folder
$ hdfs dfs -ls Found 28 items ... -rw-r--r-- 3 bperroud daplab_user 6398990 2015-03-13 11:01 data.csv ... ^^^^^^^^^^ ^ ^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^ 1 2 3 4 5 6 7 8
Columns, as numbered below, represent:
- Permissions, in unix-style syntax
- Replication factor (RF in short), default being 3 for a file. Directories have a RF of 0.
- Owner
- Group owning the file
- Size of the file, in bytes. Note that to compute the physical space used, this number should be multiplied by the RF.
- Modification date. As HDFS is mostly a write-once-read-many filesystem, this date often means creation date
- Modification time. Same as date.
- Filename, within the listed folder
Listing the /tmp folder
$ hdfs dfs -ls /tmp
Uploading a file
In /tmp
$ hdfs dfs -copyFromLocal localfile.txt /tmp/
The first arguments after
-copyFromLocal
point to local files or folders, while the last argument is a file (if only one file listed as source) or directory in HDFS.
Note:
hdfs dfs -put
is doing about the same thing, but -copyFromLocal
is more explicit when you're uploading a local file and thus preferred.Downloading a file
From /tmp
$ hdfs dfs -copyToLocal /tmp/remotefile.txt .
The first arguments after
-copyToLocal
point to files or folder in HDFS, while the last argument is a local file (if only one file listed as source) or directory.hdfs dfs -get
is doing about the same thing, but -copyToLocal
is more explicit when you're downloading a file and thus preferred.Creating a folder
In your home folder
$ hdfs dfs -mkdir dummy-folder
In /tmp
$ hdfs dfs -mkdir /tmp/dummy-folder
Note that relative paths points to your home folder,
/user/bperroud
for instance.