Updated May 9th, 2022 by ashritha.laxminarayana

ProtoSerializer stack overflow error in DBConnect

Problem You are using DBConnect (AWS | Azure | GCP) to run a PySpark transformation on a DataFrame with more than 100 columns when you get a stack overflow error. py4j.protocol.Py4JJavaError: An error occurred while calling o945.count. : java.lang.StackOverflowError     at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)     at java.lang.Clas...

1 min reading time
Updated May 10th, 2022 by ashritha.laxminarayana

Object lock error when writing Delta Lake tables to S3

Problem You are trying to perform a Delta write operation to a S3 bucket and get an error message. com.amazonaws.services.s3.model.AmazonS3Exception: Content-MD5 HTTP header is required for Put Part requests with Object Lock parameters Cause Delta Lake does not support S3 buckets with object lock enabled. Solution You should use an S3 bucket that do...

0 min reading time
Updated May 23rd, 2022 by ashritha.laxminarayana

Error when running MSCK REPAIR TABLE in parallel

Problem You are trying to run MSCK REPAIR TABLE <table-name> commands for the same table in parallel and are getting java.net.SocketTimeoutException: Read timed out or out of memory error messages. Cause When you try to add a large number of new partitions to a table with MSCK REPAIR in parallel, the Hive metastore becomes a limiting factor, a...

0 min reading time
Load More