In Hadoop Map Reduce, the jobs are broken down into the smaller tasks and these tasks are set to run in parallel.This kind of execution plan will increase the efficiency of the job to a large extent compared to sequential job execution model.
The problem occurs when we encounter the slow tasks as they can impact the overall execution plan .This scenario is very common in a full-fledged production environment where we can have thousands of job running parallely.
In such kind of scenarios, the speculative execution works as a boon for the complete job. Hadoop tries to detect the slow running tasks and run the duplicate tasks in place of a slower task.This process is called Speculative Execution.
Speculative execution in Hadoop does not imply that launching duplicate tasks at the same time so they can race. As this will result in wastage of resources in the cluster. Rather, a speculative task is launched only after a task runs for the significant amount of time and framework detects it running slow as compared to other tasks, running for the same job.
Once the task gets completed successfully, the Hadoop framework kills the task that is still running. It means that the either of the two tasks that will finish early, the slower one is terminated by the Hadoop Framework.
Speculative execution is a MapReduce job optimization technique in Hadoop that is enabled by default. We can disable speculative execution for mappers and reducers in mapred-site.xml as shown below:
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
We should note that the speculative execution can also leads to the cluster inefficiency and impact the overall throughput.There is a good case for turning off speculative execution for reduce tasks, since any duplicate reduce tasks have to fetch the same map outputs as the original task, and this can significantly increase network traffic on the cluster.
The problem occurs when we encounter the slow tasks as they can impact the overall execution plan .This scenario is very common in a full-fledged production environment where we can have thousands of job running parallely.
In such kind of scenarios, the speculative execution works as a boon for the complete job. Hadoop tries to detect the slow running tasks and run the duplicate tasks in place of a slower task.This process is called Speculative Execution.
Speculative execution in Hadoop does not imply that launching duplicate tasks at the same time so they can race. As this will result in wastage of resources in the cluster. Rather, a speculative task is launched only after a task runs for the significant amount of time and framework detects it running slow as compared to other tasks, running for the same job.
Once the task gets completed successfully, the Hadoop framework kills the task that is still running. It means that the either of the two tasks that will finish early, the slower one is terminated by the Hadoop Framework.
Speculative execution is a MapReduce job optimization technique in Hadoop that is enabled by default. We can disable speculative execution for mappers and reducers in mapred-site.xml as shown below:
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
We should note that the speculative execution can also leads to the cluster inefficiency and impact the overall throughput.There is a good case for turning off speculative execution for reduce tasks, since any duplicate reduce tasks have to fetch the same map outputs as the original task, and this can significantly increase network traffic on the cluster.
No comments:
Post a Comment