How to suppress Yarn logs for Spark streaming job running on EMR

2022-12-01T03:14:04

I am running Java Spark (3.1.2) Streaming application on EMR 6.5.0. I am getting continuous stream of Yarn Info messages as below

22/11/26 15:19:47 INFO Client: Application report for application_166818165768_0028 (state: RUNNING)
22/11/26 15:19:48 INFO Client: Application report for application_166818165768_0028 (state: RUNNING)
22/11/26 15:19:49 INFO Client: Application report for application_166818165768_0028 (state: RUNNING)
22/11/26 15:19:50 INFO Client: Application report for application_166818165768_0028 (state: RUNNING)
22/11/26 15:19:51 INFO Client: Application report for application_166818165768_0028 (state: RUNNING)

over days and weeks this occupy significant of storage hence I would like to silence these Info messages. I tried custom log4j.properties as below I believe these messages are coming from org.apache.hadoop.yarn.client (I looked into the yarn code but I could not pin point the class logging this message!) hence I have set log-level as ERROR still it had no impact, what am I doing wrong?

log4j-stream.properties file

log4j.rootLogger=ERROR, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=10
log4j.appender.rolling.file=${spark.yarn.app.container.log.dir}/${vm.logging.name}-executor.log
log4j.appender.rolling.encoding=UTF-8
log4j.logger.org.apache.spark=${vm.logging.level}
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.hadoop.yarn.client=ERROR
log4j.logger.org.apache.hadoop.yarn.server.timeline=ERROR

log4j.logger.org.apache.hadoop.yarn.factories.impl=WARN
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor=WARN
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl=WARN
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.security=WARN
log4j.logger.org.apache.hadoop.yarn.event.AsyncDispatcher=WARN
log4j.logger.org.apache.hadoop.yarn.util.AbstractLivelinessMonitor=WARN
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.security=WARN
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo=WARN
log4j.logger.org.apache.hadoop.yarn.server.timeline.security.TimelineDelegationTokenSecretManagerService=WARN

My spark submit looks below

spark-submit 
--master yarn 
--deploy-mode cluster 
--name UAT-PII_D-1-LOG 
--driver-cores 3 
--driver-memory 5g 
--executor-cores 4 
--executor-memory 5g 
--num-executors 6 
--conf spark.yarn.maxAppAttempts=1 
--conf spark.scheduler.allocation.file=fairscheduler.xml 
--conf spark.hadoop.hive.metastore.uris=thrift://dp-emr-prod-hive-server.private.dataplatform.link:9083 
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-stream.properties -Dlog4j.debug=true -Dvm.logging.level=ERROR -Dvm.logging.name=Stream-App" 
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-stream.properties -Dlog4j.debug=true -Dvm.logging.level=ERROR -Dvm.logging.name=Stream-App" 
--class com.ex.app.StreamJob 
--files /home/hadoop/log4j-stream.properties

I am using bootstrap to copy log4j-stream.properties to executor and driver nodes.

Copyright License:
Author:「Jeevan」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/74633053/how-to-suppress-yarn-logs-for-spark-streaming-job-running-on-emr

About “How to suppress Yarn logs for Spark streaming job running on EMR” questions

I am running Java Spark (3.1.2) Streaming application on EMR 6.5.0. I am getting continuous stream of Yarn Info messages as below 22/11/26 15:19:47 INFO Client: Application report for
I have a long running Spark streaming job, running in yarn mode with log aggregation enabled. Every few weeks, the streaming job gets killed with "RECEIVED SIGNAL TERM". Looking closer, I see below
By default, YARN aggregates logs after an application completes. But I am trying to aggregate logs for spark streaming job which in theory will run forever. I have set the following properties for ...
When I launch my spark streaming job on EMR (cluster mode), I can see stdout from my job for the first few moments then it disappears... I can see the few log lines at the following location in S...
After a spark job completion(spark job is able to upload the files to S3 successfully), Yarn shows the job is completed in Yarn UI, but the EMR shows the step is still running (in AWS EMR console) ...
I'm running a job on Apache Spark on Amazon Elastic Map Reduce (EMR). Currently I'm running on emr-4.1.0 which includes Amazon Hadoop 2.6.0 and Spark 1.5.0. When I start the job, YARN correctly has
I'm not able to locate error logs or message's from println calls in Scala while running jobs on Spark in EMR. Where can I access these? I'm submitting the Spark job, written in Scala to EMR using
I use AWS EMR for our spark streaming. I add a step in EMR that reads data from Kinesis stream. What I need is an approach to stop this step and add a new one. Right now I spawn a thread from the ...
I ran several streaming spark jobs and batch spark jobs in the same EMR cluster. Recently, one batch spark job is programmed wrong, which consumed a lot of memory. It causes the master node not res...
I have a question regarding the Apache Spark job running on AWS EMR. Each time when I executed the Spark job it generated a lot of logs, in my case the logs size around 5-10GB, but the 80% of the l...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.