BUILD

BUILD
Prev	Built-in Functions and Actions	Next

[attrname := ] BUILD(baserecset, [ indexrec ] , indexfile [, options ] );

[attrname := ] BUILD(baserecset, keys, payload, indexfile [, options ] );

[attrname := ] BUILD( indexdef [,indexfile] [, options ] );

[attrname := ] BUILD( indexdef, dataset, [, options ] );

BUILD( library );

attrname	Optional. The action name, which turns the action into an attribute definition, therefore not executed until the attrname is used as an action.
baserecset	The set of data records for which the index file will be created. This may be a record set derived from the base data with the key fields and file position.
indexrec	Optional. The RECORD structure of the fields in the indexfile that contains key and file position information for referencing into the baserecset. Field names and types must match the baserecset fields (REAL and DECIMAL value type fields are not supported). This may also contain additional fields not present in the baserecset (computed fields). If omitted, all fields in the baserecset are used. The last field must be the name of an UNSIGNED8 field defined using the {VIRTUAL(filepposition)} field modifier in the DATASET declaration of the baserecset.
keys	The RECORD structure of key fields that reference into the baserecset (the "search terms" for the INDEX). Key fields may be baserecset fields or computed fields. REAL and DECIMAL types are not supported as "search term" fields. If omitted, all fields in the baserecset are used. This RECORD structure is typically defined inline within the BUILD using curly braces ({}), but may also be a separately defined RECORD structure. If the RECORD structure is separately defined it must meet the same requirements as used by the TABLE() function (the RECORD structure must define the type, name, and source of the data for each field), otherwise the BUILD action will not syntax check.
payload	The RECORD structure of the indexfile that contains additional fields not used as "search term" keys. This may contain fields from the baserecordset and/or computed fields. If the name of the baserecset is in this structure, it specifies "all other fields not already named in the keys parameter" are added. The payload fields do not take up space in the non-leaf nodes of the index and cannot be referenced in a KEYED() filter clause. Any field with the {BLOB} modifier (to allow more than 32K of data per index entry) is stored within the indexfile, but not with the rest of the record; accessing the BLOB data requires an additional seek. This RECORD structure is typically defined inline within the INDEX using curly braces ({}), but may also be a separately defined RECORD structure. If the RECORD structure is separately defined it must meet the same requirements as used by the TABLE() function (the RECORD structure must define the type, name, and source of the data for each field), otherwise the BUILD action will not syntax check.
indexfile	A string constant containing the logical filename of the index to produce. See the Scope & Logical Filenames article for more on logical filenames.
options	Optional. One or more of the options listed below.
indexdef	The name of the INDEX attribute to build.
dataset	The name of the DATASET to use when you omit the base dataset parameter from the INDEX definition.
library	The name of a MODULE attribute with the LIBRARY option.

The first four forms of the BUILD action create index files. Indexes are automatically compressed, minimizing overhead associated with using indexed record access. The keyword BUILDINDEX may be used in place of BUILD in these forms.

The fifth form creates an external query library--a workunit that implements the specified library. This is similar to creating a .DLL in Windows programming, or a .SO in Linux.

Index BUILD Options

The following options are available on all three INDEX forms of BUILD (only):

[, CLUSTER ( target ) ] | [ , PLANE ( targetPlane )] [, SORTED] [, DISTRIBUTE( key ) [ , MERGE ] ][, DATASET( basedataset )] [, OVERWRITE] [, UPDATE][,EXPIRE( [days] ) ][, FEW] [, FILEPOSITION(false)] [, LOCAL] [, NOROOT] [, DISTRIBUTED][, COMPRESSED( option ) ] [, WIDTH( nodes ) ] [, DEDUP][,SKEW(limit[, target] ) [, THRESHOLD(size) ] ] [, MAXLENGTH[(value)] ] ][, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ][, SET ( option, value ) ]

CLUSTER	Specifies writing the indexfile to the specified list of target clusters. If omitted, the indexfile is written to the cluster on which the workunit executes. The number of physical file parts written to disk is always determined by the number of nodes in the cluster on which the workunit executes, regardless of the number of nodes on the target cluster(s) unless the WIDTH option is also specified. Use this option for bare-metal deployments.
target	A comma-delimited list of string constants containing the names of the clusters to write the indexfile to. The names must be listed as they appear on the ECL Watch Activity page or returned by the Std.System.Thorlib.Group() function, optionally with square brackets containing a comma-delimited list of node-numbers (1-based) and/or ranges (specified with a dash, as in n-m) to indicate the specific set of nodes to write to.
PLANE	Specifies writing the indexfile to the specified list of target planes. If omitted, the indexfile is written to the default plane. Planes are used by containerized systems, but since bare-metal clusters are implicitly backed with a plane with the same name, you can use PLANE('clustername') for bare-metal deployments.
targetPlane	A comma-delimited list of string constants containing the names of the plane(s) to write the indexfile to. The targetPlane names must be listed as they are defined in the deployment.
SORTED	Specifies that the baserecset is already sorted, implying that the automatic sort based on all the indexrec fields is not required before the index is created.
DISTRIBUTE	Specifies building the indexfile based on the distribution of the key.
key	The name of an existing INDEX attribute definition.
MERGE	Optional. Specifies merging the resulting index into the specified key.
DATASET	This is only needed when the baserecset is the result of an operation (such as a JOIN) whose result makes it ambiguous as to which physical dataset is being indexed (in other words, use this option only when you receive an error that it cannot be deduced). Naming the basedataset ensures that the proper record links are used in the index.
basedataset	The name of the DATASET attribute from which the baserecset is derived.
OVERWRITE	Specifies overwriting the indexfile if it already exists.
UPDATE	Specifies that the file should be rewritten only if the code or input data has changed.
EXPIRE	Optional. Specifies the file is a temporary file that may be automatically deleted after the specified number of days since the file was read.
FILEPOSITION	Optional. If flag is FALSE, prevents the implicit fileposition field from being created and will not treat a trailing integer field any differently from the rest of the payload.
flag	Optional. TRUE or FALSE, indicating whether or not to create the implicit fileposition field.
days	Optional. The number of days from last file read after which the file may be automatically deleted. If omitted, the default is seven (7).
FEW	Specifies the indexfile is created as a single one-part file. Used only for small datasets (typically lookup-type files, such as 2-character state codes). This option is now deprecated in favor of using the WIDTH(1).
indexdef	The name of an existing INDEX attribute definition that provides the baserecset, indexrec, and indexfile parameters to use.
LOCAL	Specifies the operation is performed on each supercomputer node independently, without requiring interaction with all other nodes to acquire data; the operation maintains the distribution of any previous DISTRIBUTE function.
NOROOT	Specifies that the index is not globally sorted, and there is no root index to indicate which part of the index will contain a particular entry. This may be useful in Roxie queries in conjunction with ALLNODES use.
DISTRIBUTED	Specifies both the LOCAL and NOROOT options (congruent with the DISTRIBUTED option on an INDEX declaration, which specifies the index was built with the LOCAL and NOROOT options).
COMPRESSED	Optional. Specifies the index should be compressed using the type of compression specified. If omitted, the default is LZW, a variant of the Lempel-Ziv-Welch algorithm.
option	See Indexes and Compression for options.
WIDTH	Specifies writing the indexfile to a different number of physical file parts than the number of nodes in the cluster on which the workunit executes. If omitted, the default is the number of nodes in the cluster on which the workunit executes. This option is primarily to create indexes on a large Thor that are destined to be deployed to a smaller Roxie (making the Roxie queries more efficient).
nodes	The number of physical file parts to write. If set to one (1), this operates exactly the same as the FEW option, above.
DEDUP	Specifies that duplicate entries are eliminated from the INDEX.
SKEW	Indicates that you know the data will not be spread evenly across nodes (will be skewed and you choose to override the default by specifying your own limit value to allow the job to continue despite the skewing.)
limit	A value between zero (0) and one (1.0 = 100%) indicating the maximum percentage of skew to allow before the job fails (the default skew is 1.0 / <number of worker nodes on cluster>).
target	Optional. A value between zero (0) and one (1.0 = 100%) indicating the desired maximum percentage of skew to allow (the default skew is 1.0 / <number of worker nodes on cluster>).
THRESHOLD	Indicates the minimum size for a single part before the SKEW limit is enforced.
size	An integer value indicating the minimum number of bytes for a single part. Default is 1GB.
MAXLENGTH	Optional. This option is used to create indexes that are backward compatible for platform versions prior to 3.0. Specifies the maximum length of a variable-length index record. Fixed length records always use the minimum size required. If the default maximum length causes inefficiency problems, it can be explicitly overridden.
value	Optional. An integer value indicating the maximum length. If omitted, the maximum size is calculated from the record structure. Variable-length records that do not specify MAXLENGTH may be slightly inefficient
UNORDERED	Optional. Specifies the output record order is not significant.
ORDERED	Specifies the significance of the output record order.
bool	When False, specifies the output record order is not significant. When True, specifies the default output record order.
STABLE	Optional. Specifies the input record order is significant.
UNSTABLE	Optional. Specifies the input record order is not significant.
PARALLEL	Optional. Try to evaluate this activity in parallel.
numthreads	Optional. Try to evaluate this activity using numthreads threads.
ALGORITHM	Optional. Override the algorithm used for this activity.
name	The algorithm to use for this activity. Must be from the list of supported algorithms for the SORT function's STABLE and UNSTABLE options.
SET	Optional. SET is used to set a value to a named metadata option. This allows you to set user metadata whose use and purpose is up to the developer. Currently _nodeSize is the only system-defined metadata, though other names starting with an underscore (_) should be considered reserved for system use. You may want to use SET('_nodeSize', '32768') if your hardware and usage pattern work better with larger page sizes. The default (8192) may not be optimal for all scenarios on modern hardware. We recommend using a power of 2 and not smaller than 8k.
option	A case sensitive string constant containing the name of the option to set.
value	The value to set the option to. This may be any type of value, dependent on what the option expects to be.

Prev	Up	Next
AVE	Home	BUILD an Access Index