PARSE(dataset, data, pattern, result , flags [, MAXLENGTH( length ) ])
PARSE(dataset, data, result , XML( path ) [, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ] )
dataset | The set of records to process. |
data | An expression specifying the text to parse, typically the name of a field in the dataset. |
pattern | The parsing pattern to match. |
result | The name of either the RECORD structure attribute that specifies the format of the output record set (like the TABLE function), or the TRANSFORM function that produces the output record set (like PROJECT). |
flags | One or more parsing options, listed below. |
MAXLENGTH | Specifies the the maximum length the pattern can match. If omitted, the default length is 4096. |
length | An integer constant specifying the maximum number of matching characters. |
XML | Specifies the dataset contains XML data. |
path | A string constant containing the XPATH to the tag that delimits the XML data in the dataset. |
UNORDERED | Optional. Specifies the output record order is not significant. |
ORDERED | Specifies the significance of the output record order. |
bool | When False, specifies the output record order is not significant. When True, specifies the default output record order. |
STABLE | Optional. Specifies the input record order is significant. |
UNSTABLE | Optional. Specifies the input record order is not significant. |
PARALLEL | Optional. Try to evaluate this activity in parallel. |
numthreads | Optional. Try to evaluate this activity using numthreads threads. |
ALGORITHM | Optional. Override the algorithm used for this activity. |
name | The algorithm to use for this activity. Must be from the list of supported algorithms for the SORT function's STABLE and UNSTABLE options. |
Return: | PARSE returns a record set. |
The PARSE function performs a text or XML parsing operation.
The first form operates on the dataset, finding records whose data contains a match for the pattern, producing a result set of those matches in the result format. If the pattern finds multiple matches in the data, then a result record is generated for each match. Each match for a PARSE is effectively a single path through the pattern. If there is more than one path that matches, then the result transform is either called once for each path, or if the BEST option is used, the path with the lowest penalty is selected.
If the result names a RECORD structure, then this form of PARSE operates like the TABLE function to generate the result set, but may also operate on variable length text. If the result names a TRANSFORM function, then the transform generates the result set. The TRANSFORM function must take at least one parameter: a LEFT record of the same format as the dataset. The format of the resulting record set does not need to be the same as the input.
Flags can have the following values:
Example:
rec := {STRING10000 line};
datafile := DATASET([
{'Ge 34:2 And when Shechem the son of Hamor the Hivite, prince of the country, saw her,'+
' he took her, and lay with her, and defiled her.'},
{'Ge 36:10 These are the names of Esaus sons; Eliphaz the son of Adah the wife of Esau,'+
' Reuel the son of Bashemath the wife of Esau.'}],rec);
PATTERN ws1 := [' ','\t',','];
PATTERN ws := ws1 ws1?;
PATTERN patStart := FIRST | ws;
PATTERN patEnd := LAST | ws;
PATTERN article := ['A','The','Thou','a','the','thou'];
TOKEN patWord := PATTERN('[a-zA-Z]+');
TOKEN Name := PATTERN('[A-Z][a-zA-Z]+');
RULE Namet := name OPT(ws ['the','king of','prince of'] ws name);
PATTERN produced := OPT(article ws) ['begat','father of','mother of'];
PATTERN produced_by := OPT(article ws) ['son of','daughter of'];
PATTERN produces_with := OPT(article ws) ['wife of'];
RULE relationtype := ( produced | produced_by | produces_with);
RULE progeny := namet ws relationtype ws namet;
results := RECORD
STRING60 Le := MATCHTEXT(Namet[1]);
STRING60 Ri := MATCHTEXT(Namet[2]);
STRING30 RelationPhrase := MatchText(relationtype);
END;
outfile1 := PARSE(datafile,line,progeny,results,SCAN ALL);
OUTPUT(outfile1);