13 Must Know Hadoop HDFS Commands Every Data Engineer Must Know

For those of you who are not used to working with servers using the command line, welcome to the command line, here are the Hadoop commands that you need to know to explore the HDFS landscape.

If you have already worked with servers before you know the importance of managing file using the command line.

Let’s get started:-

  1. Hadoop dfs – ls <Path name>

This command lists out all the different files and folders in a particular path.

ls command

There is no cd (change directory) command in hdfs file system. You can only list the directories and use them for reaching the next directory.

ls directory command

2) Hadoop dfs – mkdir <Path name>

For creating new folders in the HDFS use the mkdir command. Provide the relative path where you want to create the folder.

Now let us check if the folder was created.

Check folder

3) Hadoop dfs -cp <File Path> <Path of copied file> 

As I am working with Hadoop cluster on the Google cloud, let’s try to copy a file from the Google cloud storage into our Hadoop cluster.

Other commands that you can use to move files from one location to another are:-

a) Hadoop dfs -put – Moves data from one location to another.

b) Hadoop dfs moveFromLocal – Moves files like the put command but deletes the local source file upon completion.

c) Hadoop dfs -get – For moving data from Hadoop (HDFS) to the local destination.

4) Hadoop dfs – tail <file path>

With this command, you can get a preview of the bottom part of the file to see how the data is structured in the file. 

5) Hadoop dfs -cat <file path>

cat command will give you a preview of the file that you selected

6) HDFS dfs -du <Path>

du command lets you check the disk usage of a certain file

7) Hadoop dfs -mv <file location>  <new file location>

Moves file form one location to another.

8) Hadoop dfs -rm <file location>

This command will delete the file.

9) Hadoop fs – chmod – <file location>

If you need to change permissions of files in Hadoop cluster, this works int he same way as in Linux.

About the author


Mastering Data Engineering/ Data science one project at a time. I have worked and developed multiple startups before, and this blog is for my journey as a startup in itself where I iterate and learn.

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Copyright © 2020. Created by Meks. Powered by WordPress.