Digging into EMR failed applications logs

Moses Liao GZ
2 min readNov 28, 2023
Photo by Sarah Worth on Unsplash

Have you ever encountered very vague errors when running AWS EMR spark applications? Sometimes it can be this vague:

Application application_1698851646537_0035 finished with a failed status 
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1191)
at org.apache.spark.deploy.yarn.YarnClusterApplication.stert(Client.scala:1582)

OR

19/11/04 05:24:42 INFO Client: Application report for application_1572839353552_0008 (state: ACCEPTED)

And you get nothing from the verbose log that can help you figure out what is wrong with your EMR code. So what do you do?

When you see the above error, you need to start looking for the application master logs. When the Spark job runs in cluster mode, the Spark driver runs inside the application master. The application master is the first container that runs when the Spark job executes.

The location on the S3 will usually look like this:

s3://<bucket-name>/<some-folder-name>/<emr-id>/containers/application_<the-number-shown-in-error>/container_<same-number>01_000001/stdout.gz

for example, if your application name that failed is called application_1698851646537_0035, the master logs will be stored in S3 path that looks like this:

S3://emr-app-bucket/logs/j-2L0SZ6ILEVI3D/containers/application_1698851646537_0035/container_1698851646537_0035_01_000001/stderr.gz

Now you may ask why the container name must have 01_000001 at the back. It is because EMR runs a lot of containers and they will start off with 01_000001. So if any applications run into error that 01_000001 is the first container which is the application master. From there you download container_1698851646537_0035_01_000002/stderr.gz, container_1698851646537_0035_01_000003/stderr.gz till you get the error. it will be there somewhere.

It can show up like:

java.nio.file.AccessDeniedException: /aaa/bbb/ccc/_temporary/0/:PUT 0-byte object on aaa/bbb
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: Access Denied; Request ID:xxxYYYzzz

Or
19/11/04 05:24:45 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Executor memory 134217728 must be at least 471859200. Please increase executor memory using the --executor-memory option or spark.executor.memory in Spark configuration.

From there you can troubleshoot what is going on with your EMR. Happy looking!

--

--

Moses Liao GZ

Cloud and site reliability enthusiasts. AWS SA Associate certified