

"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ" Partition path generated from key generator: "2020040113" ISO8601WithMsZ with Multiple Input formats Config field Partition path generated from key generator: “ 12” ISO8601WithMsZ with Single Input format Config field Partition path generated from key generator: “ 12:00:00” Partition path generated from key generator: “ 08” Timestamp is DATE_STRING Config field If input field value is null for some rows. Partition path generated from key generator: “ 12” Let's go over some example values for TimestampBasedKeyGenerator. One of the timestamp types supported(UNIX_TIMESTAMP, DATE_STRING, MIXED, EPOCHMILLISECONDS, SCALAR) Users are expected to set few more configs to use this KeyGenerator. Record key is same as before where it is chosen byįield name. The field values are interpreted as timestampsĪnd not just converted to string while generating partition path value for records.
#EASYBOX KEYGENERATOR GENERATOR#
This key generator relies on timestamps for the partition field. So this key generator avoids using partition value for generating HoodieKey. Global index deletes do not require partition value. For example ".field" : “col1,col4” GlobalDeleteKeyGenerator FieldsĪre expected to be comma separated in the config value. ComplexKeyGenerator īoth record key and partition paths comprise one or more than one field by name(combination of multiple fields).

Values are interpreted as is from dataframe and converted to string. This is one of the most commonly used one. Record key refers to one field(column in dataframe) by name and partition path refers to one field (single column in dataframe)īy name. Lets go over different key generators available to be used with Hudi. Will cover those in the respective section. There are few more configs involved if you are looking for TimestampBasedKeyGenerator. Partition field name will be prefixed to the value. When set to true, uses hive style partitioning. When set to true, partition path will be url encoded. Could refer to any of the available ones or user defined one.
#EASYBOX KEYGENERATOR FULL#
Refers to Key generator class(including full path). Is the interface for KeyGenerator in Hudi for your reference.īefore diving into different types of key generators, let’s go over some of the common configs required to be set for Generators that are readily available to use. This blog goes over all different types of key

Implementation for users to implement and use their own KeyGenerator. Hudi provides several key generators out of the box that users can use based on their need, while having a pluggable There won't be any duplicate record keys across Global index, each record is uniquely identified by just the record key. For a dataset with partitioned index(which is mostĬommonly used), each record is uniquely identified by a pair of record key and partition path. In general, Hudi supports both partitioned and global indexes. One should choose the partitioning scheme wisely as it could be aĭetermining factor for your ingestion and query latency. Using primary keys, Hudi can impose a) partition level uniqueness integrity constraintī) enable fast updates and deletes on records. Every record in Hudi is uniquely identified by a primary key, which is a pair of record key and partition path where
