blog
14 Jul 2013
Once debugging of a MapReduce job on a single node (in the local and pseudo- distributed mode) is over, developers are likely to move to a fully-distributed development hadoop cluster. It’s likely that there are still some things that need polishing and tweaks until most obvious bugs are sorted out.
In the fully-distributed mode logs of a single node are spread all over the
cluster nodes in ${hadoop.log.dir}/userlogs. So they can’t be easily
grepped for some debug output from a single machine at once. Even on a single
machine, it’s complicated by the fact that multiple task attempts write to the
same physical log file due to the JVM reuse. Though when the logs are viewed
via the webUI it’s correctly separated by means of the log.index file. MapR
addresses this problem with the feature called Central Logging. When it is
enabled task logs are streamed to MapR-FS instead of to the local filesystems
of individual nodes. The logs are then accessible throughout the cluster.
It seems expensive but it does not actually involve any cross-node operations
because the log volumes just like MapReduce shuffle volumes are local to the
node. Once the job has finished, you can set up a job-centric view on the job
log directories across the cluster using maprcli job linklogs as explained
in the documentation. It makes easy to grep only mappers, reducers, or just
tasks on a specific node. It works even easier when the cluster is mounted via
NFS.
Even with an Apache Hadoop distribution other than MapR there is a tool called
HadooSh (an interactive Hadoop Shell). HadooSh provides sensible hadoop
command completions (local and HDFS file names, job/task attempt ids). The
tlog command allows grepping task logs easily in moderate-size clusters. It
utilizes the fact that user logs are accessible via the TaskTracker-embedded
Jetty web server.
# Show all logs for a job:
gera > tlog -job job_201306131712_0004
# Show all mapper logs for a teragen job:
gera > tlog -dir tgen -taskpattern *_m_*
# grep logs for job tasks run on certain nodes
gera > tlog -job job_201306131712_0004 -hostpattern *.rack.company.com | grep needle