Table of Contents
- 1 When designing Hadoop applications HDFS block size may affect the performance what are the problems when we set it too large How about too small?
- 2 What happen to existing data if we change block size in HDFS?
- 3 What if my file size is smaller than the HDFS block size?
- 4 What are the advantages of HDFS?
- 5 How does the block Division work in HDFS?
When designing Hadoop applications HDFS block size may affect the performance what are the problems when we set it too large How about too small?
If the block size is too large there would be other issues: 1) The cluster would be underutilized because of large block size there would be fewer splits and in turn would be fewer map tasks which will slow down the job. 2) Large block size would decrease parallelism.
What happen to existing data if we change block size in HDFS?
This change doesn’t affect the existing files in Hadoop HDFS.
What will happen if the block size in Hadoop cluster is set to 4KB?
HDFS have huge data sets, i.e. terabytes and petabytes of data. So like Linux file system which have 4 KB block size if we had block size 4KB for HDFS, then we would be having too many blocks and therefore too much of metadata. Managing this huge number of blocks and metadata will create huge overhead.
What if my file size is smaller than the HDFS block size?
In case your file size is smaller than the HDFS block size then your file will not be split. This does not happen frequently as we use HADOOP for Big Data file processing which are in Terabytes.
What are the advantages of HDFS?
The major advantage of using HDFS for storing the information in a cluster is that if the size of the file is less then the block size then in that case it will not occupy the full block worth of underlying storage. Why is a Block in HDFS So Large?
What is the default block size in Hadoop?
This does not happen frequently as we use HADOOP for Big Data file processing which are in Terabytes. The default block size in HDFS was 64mb for Hadoop 1.0 and 128mb for Hadoop 2.0 . The block size configuration change can be done on an entire cluster or can be configured for specific blocks.
How does the block Division work in HDFS?
The block division in HDFS is just logically built over the physical blocks of underlying file system (e.g. ext3/fat). The file system is not physically divided into blocks( say of 64MB or 128MB or whatever may be the block size).