attr := DATASET( file, struct, filetype [,LOOKUP]);
attr := DATASET( dataset, file, filetype [,LOOKUP]);
attr := DATASET( WORKUNIT( [ wuid , ] namedoutput ), struct );
[ attr := ] DATASET( recordset [, recstruct ] );
DATASET( row )
DATASET( childstruct [, COUNT( count ) | LENGTH( size ) ] [, CHOOSEN( maxrecs ) ] )
[GROUPED] [LINKCOUNTED] [STREAMED] DATASET( struct )
DATASET( dict )
DATASET( count, transform [, DISTRIBUTED | LOCAL ] )
attr | The name of the DATASET for later use in other definitions. |
file | A string constant containing the logical file name. See the Scope & Logical Filenames section for more on logical filenames. |
struct | The RECORD structure defining the layout of the fields. This may use RECORDOF. |
filetype | One of the following keywords, optionally followed by relevant options for that specific type of file: THOR /FLAT, CSV, XML, JSON, PIPE. Each of these is discussed in its own section, below. |
dataset | A previously-defined DATASET or recordset from which the
record layout is derived. This form is primarily used by the BUILD
action and is equivalent to: ds := DATASET('filename',RECORDOF(anotherdataset), ... ) |
LOOKUP | Optional. Specifies that the file layout should be looked up at compile time. See File Layout Resolution at Compile Time in the Programmer's Guide for more details. |
WORKUNIT | Specifies the DATASET is the result of an OUTPUT with the NAMED option within the same or another workunit. |
wuid | Optional. A string expression that specifies the workunit identifier of the job that produced the NAMED OUTPUT. |
namedoutput | A string expression that specifies the name given in the NAMED option. |
recordset | A set of in-line data records. This can simply name a previously-defined set definition or explicitly use square brackets to indicate an in-line set definition. Within the square brackets records are separated by commas. The records are specified by either: 1) Using curly braces ({}) to surround the field values for each record. The field values within each record are comma-delimited. 2) A comma-delimited list of in-line transform functions that produce the data rows. All the transform functions in the list must produce records in the same result format. |
recstruct | Optional. The RECORD structure of the recordset. Omittable only if the recordset parameter is just one record or a list of in-line transform functions. |
row | A single data record. This may be a single-record passed parameter, or the ROW or PROJECT function that defines a 1-row dataset. |
childstruct | The RECORD structure of the child records being defined. This may use the RECORDOF function. |
COUNT | Optional. Specifies the number of child records attached to the parent (for use when interfacing to external file formats). |
count | An expression defining the number of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname). |
LENGTH | Optional. Specifies the size of the child records attached to the parent (for use when interfacing to external file formats). |
size | An expression defining the size of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname). |
CHOOSEN | Optional. Limits the number of child records attached to the parent. This implicitly uses the CHOOSEN function wherever the child dataset is read. |
maxrecs | An expression defining the maximum number of child records for a single parent. |
GROUPED | Specifies the DATASET being passed has been grouped using the GROUP function. |
LINKCOUNTED | Specifies the DATASET being passed or returned uses the link counted format (each row is stored as a separate memory allocation) instead of the default (embedded) format where the rows of a dataset are all stored in a single block of memory. This is primarily for use in BEGINC++ functions or external C++ library functions. |
STREAMED | Specifies the DATASET being returned is returned as a pointer to an IRowStream interface (see the eclhelper.hpp include file for the definition).Valid only as a return type. This is primarily for use in BEGINC++ functions or external C++ library functions. |
struct | The RECORD structure of the dataset field or parameter. This may use the RECORDOF function. |
dict | The name of a DICTIONARY definition. |
count | An integer expression specifying the number of records to create. |
transform | The TRANSFORM function that will create the records. This may take an integer COUNTER parameter. |
DISTRIBUTED | Optional. Specifies distributing the created records across all nodes of the cluster. If omitted, all records are created on node 1. |
LOCAL | Optional. Specifies records are created on every node. |
The DATASET declaration defines a file of records, on disk or in memory. The layout of the records is specified by a RECORD structure (the struct or recstruct parameters described above). The distribution of records across execution nodes is undefined in general, as it depends on how the DATASET came to be (sprayed in from a landing zone or written to disk by an OUTPUT action), the size of the cluster on which it resides, and the size of the cluster on which it is used (to specify distribution requirements for a particular operation, see the DISTRIBUTE function).
The first two forms are alternatives to each other and either may be used with any of the filetypes described below (THOR/FLAT, CSV, XML, JSON, PIPE).
The third form defines the result of an OUTPUT with the NAMED option within the same workunit or the workunit specified by the wuid (see Named Output DATASETs below).
The fourth form defines an in-line dataset (see In-line DATASETs below).
The fifth form is only used in an expression context to allow you to in-line a single record dataset (see Single-row DATASET Expressions below).
The sixth form is only used as a value type in a RECORD structure to define a child dataset (see Child DATASETs below).
The seventh form is only used as a value type to pass DATASET parameters (see DATASET as a Parameter Type below).
The eighth form is used to define a DICTIONARY as a DATASET (see DATASET from DICTIONARY below).
The ninth form is used to create a DATASET using a TRANSFORM function (see DATASET from TRANSFORM below)
attr := DATASET( file, struct, THOR [,__COMPRESSED__][,OPT ] [,UNSORTED][,PRELOAD([nbr])] [,ENCRYPT(key) ]);
attr := DATASET( file, struct, FLAT [,__COMPRESSED__] [,OPT] [,UNSORTED] [,PRELOAD([nbr])] [,ENCRYPT(key) ]);
This form defines a THOR file that exists in the Data Refinery. This could contain either fixed-length or variable-length records, depending on the layout specified in the RECORD struct.
The struct may contain an UNSIGNED8 field with either {VIRTUAL(fileposition)} or {VIRTUAL(localfileposition)} appended to the field name. This indicates the field contains the record's position within the file (or part), and is used for those instances where a usable pointer to the record is needed, such as the BUILD function.
Example:
PtblRec := RECORD
STRING2 State := Person.per_st;
STRING20 City := Person.per_full_city;
STRING25 Lname := Person.per_last_name;
STRING15 Fname := Person.per_first_name;
END;
Tbl := TABLE(Person,PtblRec);
PtblOut := OUTPUT(Tbl,,'RTTEMP::TestFile');
//write a THOR file
Ptbl := DATASET('~Thor400::RTTEMP::TestFile',
{PtblRec,UNSIGNED8 __fpos {VIRTUAL(fileposition)}},
THOR,OPT);
// __fpos contains the "pointer" to each record
// Thor400 is the scope name and RTTEMP is the
// directory in which TestFile is located
//using ENCRYPT
OUTPUT(Tbl,,'~Thor400::RTTEMP::TestFileEncrypted',ENCRYPT('mykey'));
PtblE := DATASET('~LR::TestFileEncrypted',
PtblRec,
THOR,OPT,ENCRYPT('mykey'));