attr := DATASET( file, struct, XML( xpath [, NOROOT ] ) [,ENCRYPT(key) ]);
XML | Specifies the file is an XML file. |
xpath | A string constant containing the full XPATH to the tag that delimits the records in the file. |
NOROOT | Specifies the file is an XML file with no file tags, only row tags. |
ENCRYPT | Optional. Specifies the file was created by OUTPUT with the ENCRYPT option. |
key | A string constant containing the encryption key used to create the file. |
This form is used to read an XML file into the Data Refinery. The xpath parameter defines the record delimiter tag using a subset of standard XPATH (www.w3.org/TR/xpath) syntax (see the XPATH Support section under the RECORD structure discussion for a description of the supported subset).
The key to getting individual field values from the XML lies in the RECORD structure field definitions. If the field name exactly matches a lower case XML tag containing the data, then nothing special is required. Otherwise, {xpath(xpathtag)} appended to the field name (where the xpathtag is a string constant containing standard XPATH syntax) is required to extract the data. An XPATH consisting of empty angle brackets (<>) indicates the field receives the entire record. An absolute XPATH is used to access properties of parent elements. Because XML is case sensitive, and ECL identifiers are case insensitive, xpaths need to be specified if the tag contains any upper case characters.
NOTE: XML reading and parsing can consume a large amount of memory, depending on the usage. In particular, if the specified xpath matches a very large amount of data, then a large data structure will be provided to the transform. Therefore, the more you match, the more resources you consume per match. For example, if you have a very large document and you match an element near the root that virtually encompasses the whole thing, then the whole thing will be constructed as a referenceable structure that the ECL can get at.
Example:
/* an XML file called "MyFile" contains this XML data:
<library>
<book isbn="123456789X">
<author>Bayliss</author>
<title>A Way Too Far</title>
</book>
<book isbn="1234567801">
<author>Smith</author>
<title>A Way Too Short</title>
</book>
</library>
*/
rform := RECORD
STRING author; //data from author tag -- tag name is lowercase and matches field name
STRING name {XPATH('title')}; //data from title tag, renaming the field
STRING isbn {XPATH('@isbn')}; //isbn definition data from book tag
tag
END;
books := DATASET('MyFile',rform,XML('library/book'));