DISTRIBUTE

DISTRIBUTE(recordset [, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ] )

DISTRIBUTE(recordset, expression [, MERGE( sorts ) ] [, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ] )

DISTRIBUTE(recordset, index [, joincondition ] [, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ] )

DISTRIBUTE(recordset, SKEW( maxskew [, skewlimit ] ) [, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ] )

recordsetThe set of records to distribute.
expressionAn integer expression that specifies how to distribute the recordset, usually using one the HASH functions for efficiency.
MERGESpecifies the data is redistributed maintaining the local sort order on each node.
sortsThe sort expressions by which the data has been locally sorted.
indexThe name of an INDEX attribute definition, which provides the appropriate distribution.
joinconditionOptional. A logical expression that specifies how to link the records in the recordset and the index. The keywords LEFT and RIGHT may be used as dataset qualifiers for fields in the recordset and index.
SKEWSpecifies the allowable data skew values.
maxskewA value between zero (0) and one (1.0 = 100%) indicating the maximum percentage of skew to allow before the job fails (the default skew is 1.0 / <number of worker nodes on cluster>).
skewlimitOptional. A value between zero (0) and one (1.0 = 100%) indicating the desired maximum percentage of skew to allow (the default skew is 1.0 / <number of worker nodes on cluster>).
UNORDEREDOptional. Specifies the output record order is not significant.
ORDEREDSpecifies the significance of the output record order.
boolWhen False, specifies the output record order is not significant. When True, specifies the default output record order.
STABLEOptional. Specifies the input record order is significant.
UNSTABLEOptional. Specifies the input record order is not significant.
PARALLELOptional. Try to evaluate this activity in parallel.
numthreadsOptional. Try to evaluate this activity using numthreads threads.
ALGORITHMOptional. Override the algorithm used for this activity.
nameThe algorithm to use for this activity. Must be from the list of supported algorithms for the SORT function's STABLE and UNSTABLE options.
Return:DISTRIBUTE returns a set of records.

The DISTRIBUTE function re-distributes records from the recordset across all the nodes of the cluster.

"Random" DISTRIBUTE

DISTRIBUTE(recordset )

This form redistributes the recordset "randomly" so there is no data skew across nodes, but without the disadvantages the RANDOM() function could introduce. This is functionally equivalent to distributing by a hash of the entire record.