Data too long for column error

If a column exceeds 4000 characters it is too big for the default datatype and returns an error.

Written by Adam Pavlacka

Last published at: May 16th, 2022

Problem

You are trying to insert a struct into a table, but you get a java.sql.SQLException: Data too long for column error.

Caused by: java.sql.SQLException: Data too long for column 'TYPE_NAME' at row 1
Query is: INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,`COLUMN_NAME`,TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) , parameters [103182,<null>,'address','struct<street_address1:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address2:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address3:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address4:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address5:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address6:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address7:struct<street_number:int,street_name:string,street_type:string,country:string,postal_code:string>,street_address8:struct<street_number:int,street_name:string,street_type:string,coun...
        at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:153)
        at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:255)
        at org.mariadb.jdbc.MariaDbPreparedStatementClient.executeInternalBatch(MariaDbPreparedStatementClient.java:368)
        at org.mariadb.jdbc.MariaDbPreparedStatementClient.executeBatch(MariaDbPreparedStatementClient.java:280)

Cause

The root cause of this issue is a default datatype of varchar(4000) for the column in your table.

If you have a deeply nested struct that exceeds more than 4000 characters in length, it exceeds the size of the default datatype and results in an error message.

You can validate this by describing the column you are trying to insert the data into. It will return a datatype of varchar(4000).

Solution

You should use an external metastore if you are going to exceed 4000 characters within a column. The default datatype for the Databricks Hive metastore is varchar(4000) and cannot be changed.

When you use an external metastore, you have full control over the length of column and database names. You also have control over the collation of column names, database names, and table names.

Review the external Apache Hive metastore (AWS | Azure | GCP) documentation to learn how to setup an external metastore.

Once the metastore tables have been created, you can directly modify the column data types by using the ALTER TABLE command.

Was this article helpful?