[attrname := ] BUILD(baserecset, [ indexrec ] , indexfile [, options ] );
[attrname := ] BUILD(baserecset, keys, payload, indexfile [, options ] );
[attrname := ] BUILD( indexdef [,indexfile] [, options ] );
[attrname := ] BUILD( indexdef, dataset, [, options ] );
BUILD( library );
attrname | Optional. The action name, which turns the action into an attribute definition, therefore not executed until the attrname is used as an action. |
baserecset | The set of data records for which the index file will be created. This may be a record set derived from the base data with the key fields and file position. |
indexrec | Optional. The RECORD structure of the fields in the indexfile that contains key and file position information for referencing into the baserecset. Field names and types must match the baserecset fields (REAL and DECIMAL value type fields are not supported). This may also contain additional fields not present in the baserecset (computed fields). If omitted, all fields in the baserecset are used. The last field must be the name of an UNSIGNED8 field defined using the {VIRTUAL(filepposition)} field modifier in the DATASET declaration of the baserecset. |
keys | The RECORD structure of key fields that reference into the baserecset (the "search terms" for the INDEX). Key fields may be baserecset fields or computed fields. REAL and DECIMAL types are not supported as "search term" fields. If omitted, all fields in the baserecset are used. This RECORD structure is typically defined inline within the BUILD using curly braces ({}), but may also be a separately defined RECORD structure. If the RECORD structure is separately defined it must meet the same requirements as used by the TABLE() function (the RECORD structure must define the type, name, and source of the data for each field), otherwise the BUILD action will not syntax check. |
payload | The RECORD structure of the indexfile that contains additional fields not used as "search term" keys. This may contain fields from the baserecordset and/or computed fields. If the name of the baserecset is in this structure, it specifies "all other fields not already named in the keys parameter" are added. The payload fields do not take up space in the non-leaf nodes of the index and cannot be referenced in a KEYED() filter clause. Any field with the {BLOB} modifier (to allow more than 32K of data per index entry) is stored within the indexfile, but not with the rest of the record; accessing the BLOB data requires an additional seek. This RECORD structure is typically defined inline within the INDEX using curly braces ({}), but may also be a separately defined RECORD structure. If the RECORD structure is separately defined it must meet the same requirements as used by the TABLE() function (the RECORD structure must define the type, name, and source of the data for each field), otherwise the BUILD action will not syntax check. |
indexfile | A string constant containing the logical filename of the index to produce. See the Scope & Logical Filenames article for more on logical filenames. |
options | Optional. One or more of the options listed below. |
indexdef | The name of the INDEX attribute to build. |
dataset | The name of the DATASET to use when you omit the base dataset parameter from the INDEX definition. |
library | The name of a MODULE attribute with the LIBRARY option. |
The first four forms of the BUILD action create index files. Indexes are automatically compressed, minimizing overhead associated with using indexed record access. The keyword BUILDINDEX may be used in place of BUILD in these forms.
The fifth form creates an external query library--a workunit that implements the specified library. This is similar to creating a .DLL in Windows programming, or a .SO in Linux.
The following options are available on all three INDEX forms of BUILD (only):
[, CLUSTER( target )] [, SORTED] [, DISTRIBUTE( key ) [ , MERGE ] ][, DATASET( basedataset )] [, OVERWRITE] [, UPDATE][,EXPIRE( [days] ) ][, FEW] [, FILEPOSITION(false)] [, LOCAL] [, NOROOT] [, DISTRIBUTED][, COMPRESSED( option ) ] [, WIDTH( nodes ) ] [, DEDUP][,SKEW(limit[, target] ) [, THRESHOLD(size) ] ] [, MAXLENGTH[(value)] ] ][, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ][, SET ( option, value ) ]
CLUSTER | Specifies writing the indexfile to the specified list of target clusters. If omitted, the indexfile is written to the cluster on which the workunit executes. The number of physical file parts written to disk is always determined by the number of nodes in the cluster on which the workunit executes, regardless of the number of nodes on the target cluster(s) unless the WIDTH option is also specified. |
target | A comma-delimited list of string constants containing the names of the clusters to write the indexfile to. The names must be listed as they appear on the ECL Watch Activity page or returned by the Std.System.Thorlib.Group() function, optionally with square brackets containing a comma-delimited list of node-numbers (1-based) and/or ranges (specified with a dash, as in n-m) to indicate the specific set of nodes to write to. |
SORTED | Specifies that the baserecset is already sorted, implying that the automatic sort based on all the indexrec fields is not required before the index is created. |
DISTRIBUTE | Specifies building the indexfile based on the distribution of the key. |
key | The name of an existing INDEX attribute definition. |
MERGE | Optional. Specifies merging the resulting index into the specified key. |
DATASET | This is only needed when the baserecset is the result of an operation (such as a JOIN) whose result makes it ambiguous as to which physical dataset is being indexed (in other words, use this option only when you receive an error that it cannot be deduced). Naming the basedataset ensures that the proper record links are used in the index. |
basedataset | The name of the DATASET attribute from which the baserecset is derived. |
OVERWRITE | Specifies overwriting the indexfile if it already exists. |
UPDATE | Specifies that the file should be rewritten only if the code or input data has changed. |
EXPIRE | Optional. Specifies the file is a temporary file that may be automatically deleted after the specified number of days since the file was read. |
FILEPOSITION | Optional. If flag is FALSE, prevents the implicit fileposition field from being created and will not treat a trailing integer field any differently from the rest of the payload. |
flag | Optional. TRUE or FALSE, indicating whether or not to create the implicit fileposition field. |
days | Optional. The number of days from last file read after which the file may be automatically deleted. If omitted, the default is seven (7). |
FEW | Specifies the indexfile is created as a single one-part file. Used only for small datasets (typically lookup-type files, such as 2-character state codes). This option is now deprecated in favor of using the WIDTH(1). |
indexdef | The name of an existing INDEX attribute definition that provides the baserecset, indexrec, and indexfile parameters to use. |
LOCAL | Specifies the operation is performed on each supercomputer node independently, without requiring interaction with all other nodes to acquire data; the operation maintains the distribution of any previous DISTRIBUTE function. |
NOROOT | Specifies that the index is not globally sorted, and there is no root index to indicate which part of the index will contain a particular entry. This may be useful in Roxie queries in conjunction with ALLNODES use. |
DISTRIBUTED | Specifies both the LOCAL and NOROOT options (congruent with the DISTRIBUTED option on an INDEX declaration, which specifies the index was built with the LOCAL and NOROOT options). |
COMPRESSED | Optional. Specifies the index should be compressed using the type of compression specified. If omitted, the default is LZW, a variant of the Lempel-Ziv-Welch algorithm. |
option | See Indexes and Compression for options. |
WIDTH | Specifies writing the indexfile to a different number of physical file parts than the number of nodes in the cluster on which the workunit executes. If omitted, the default is the number of nodes in the cluster on which the workunit executes. This option is primarily to create indexes on a large Thor that are destined to be deployed to a smaller Roxie (making the Roxie queries more efficient). |
nodes | The number of physical file parts to write. If set to one (1), this operates exactly the same as the FEW option, above. |
DEDUP | Specifies that duplicate entries are eliminated from the INDEX. |
SKEW | Indicates that you know the data will not be spread evenly across nodes (will be skewed and you choose to override the default by specifying your own limit value to allow the job to continue despite the skewing.) |
limit | A value between zero (0) and one (1.0 = 100%) indicating the maximum percentage of skew to allow before the job fails (the default skew is 1.0 / <number of worker nodes on cluster>). |
target | Optional. A value between zero (0) and one (1.0 = 100%) indicating the desired maximum percentage of skew to allow (the default skew is 1.0 / <number of worker nodes on cluster>). |
THRESHOLD | Indicates the minimum size for a single part before the SKEW limit is enforced. |
size | An integer value indicating the minimum number of bytes for a single part. Default is 1GB. |
MAXLENGTH | Optional. This option is used to create indexes that are backward compatible for platform versions prior to 3.0. Specifies the maximum length of a variable-length index record. Fixed length records always use the minimum size required. If the default maximum length causes inefficiency problems, it can be explicitly overridden. |
value | Optional. An integer value indicating the maximum length. If omitted, the maximum size is calculated from the record structure. Variable-length records that do not specify MAXLENGTH may be slightly inefficient |
UNORDERED | Optional. Specifies the output record order is not significant. |
ORDERED | Specifies the significance of the output record order. |
bool | When False, specifies the output record order is not significant. When True, specifies the default output record order. |
STABLE | Optional. Specifies the input record order is significant. |
UNSTABLE | Optional. Specifies the input record order is not significant. |
PARALLEL | Optional. Try to evaluate this activity in parallel. |
numthreads | Optional. Try to evaluate this activity using numthreads threads. |
ALGORITHM | Optional. Override the algorithm used for this activity. |
name | The algorithm to use for this activity. Must be from the list of supported algorithms for the SORT function's STABLE and UNSTABLE options. |
SET | Optional. SET is used to set a value to a named metadata option. This allows you to set user metadata whose use and purpose is up to the developer. Currently _nodeSize is the only system-defined metadata, though other names starting with an underscore (_) should be considered reserved for system use. You may want to use SET('_nodeSize', '32768') if your hardware and usage pattern work better with larger page sizes. The default (8192) may not be optimal for all scenarios on modern hardware. We recommend using a power of 2 and not smaller than 8k. |
option | A case sensitive string constant containing the name of the option to set. |
value | The value to set the option to. This may be any type of value, dependent on what the option expects to be. |