Delta Lake write job fails with java.lang.UnsupportedOperationException

Learn how to prevent java.lang.UnsupportedOperationException in Delta Lake write jobs.

Written by Adam Pavlacka

Last published at: May 10th, 2022

Problem

Delta Lake write jobs sometimes fail with the following exception:

java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.putIfAbsent(path: Path, content: InputStream).
DBFS v1 doesn't support transactional writes from multiple clusters. Please upgrade to DBFS v2.
Or you can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'.
If this is disabled, writes to a single table must originate from a single cluster.

Cause

Delta Lake multi-cluster writes are only supported with DBFS v2. Databricks clusters use DBFS v2 by default. All sparkSession objects use DBFS v2.

However, if the application uses the FileSystem API and calls FileSystem.close(), the file system client falls back to the default value, which is v1. In this case, Delta Lake multi-cluster write operations fail.

The following log trace shows that the file system object fell back to the default v1 version.

<date> <time> INFO DBFS: Initialized DBFS with DBFSV1 as the delegate.

Solution

There are two approaches to prevent this:

  1. Never call FileSystem.close() inside the application code. If it is necessary to call the close() API, then first instantiate a new FileSystemclient object with a configuration object from the current Apache Spark session, instead of an empty configuration object:
    %scala
    
    val fileSystem = FileSystem.get(new java.net.URI(path), sparkSession.sessionState.newHadoopConf())
  2. Alternatively, this code sample achieves the same goal:
    %scala
    
    val fileSystem = FileSystem.get(new java.net.URI(path), sc.hadoopConfiguration())


Was this article helpful?