next up previous contents
Next: Translating PhysioNet annotations to Up: Some introductory notes to Previous: What can be included   Contents

Introduction to XML inside EDF+

To stress the importance of coding information in EDF+, we are going to get into the mind of a person trying to create some kind of code to describe the values used in a nerve conduction study.

Imagine a person trying to code the amplitude, latency and characteristics of the stimulus used to describe a segment of a motor nerve conduction study. Probably he/she would think something like this:

3.8 7.8 0.2

This string would mean a latency of 3.8 milliseconds, an amplitude of 7.8 mV obtained with an stimulus whose duration was 0.2 ms.

This way of coding has some subtle inconvenience:

Then we decide to include the fields together with the values; although we could use spaces as separations, we decide to make our notation somewhat clearer. Let us see the result:

latency = 3.8 / amplitude = 7.8 / duration = 0.2/

We have introduced several new rules (separators of fields, equal sign), and it seems a very good solution when we have very few fields. Measurements and characteristics of the stimulus are mixed and, as a consequence, we do not know whether duration is a characteristic of the stimulus or a characteristic of the curve. We would need some additional specifications. We think that separating the characteristics that belong to the stimulus from those that belong to the measurements would be nice. If we want to nest information we need an initial mark and a final mark. Let us see the result

measurements(/latency = 3.8/ amplitude = 7.8/) / stimulus (duration = 0.2)

Now, the measurements and the properties of the stimulus are not mixed and, as a consequence, it is evident that duration is a property of stimulus. But, what are we speaking about?, what on earth is 7.8? We need units. We want some general method to annotate the results of our segment. We need another level of nesting. But we do not want to force the previous tags; so we decide to use an internal field that acts as a property of the field. Let us see the result

measurements (latency <ms> = 3.8 / amplitude <mV> =7.8) /
stimulus (duration <ms> = 0.2)

The line has been folded (as a matter of fact, no carriage return was included). The result is quite nice.

At this moment, we reinvented the basic facts of XML. We have invented tags, character data, attributes and nesting.

<?xml version = "1.0" encoding = "UTF-8" ?>
<segment>
  <measurements>
    <latency unit =  "ms">3.8</latency>
    <amplitude unit = "mV">7.8</amplitude>
  </measurements>
  <stimulus>
    <duration unit = "ms">0.2</duration>
  </stimulus>
</segment>

Notice the analogy between our result and the coding in XML; we coded a specific set of fields and values. XML is a general way to code information. It uses marks in a specific way and above all it is extensible. For those non familiar with XML, its main components are:

  • segment, measurements, latency, amplitude and duration are elements. Elements can contain other elements, can contain text or even can contain nothing (these last elements are called empty elements)
  • unit is an attribute. An attribute only contains a value. One element can contain several attributes
  • <measurements> and <stimulus> are called start tag and </measurements> and </stimulus> are called end tags
  • 3.8, 7.8 and 0.2 are the element's content.

Notice that these elements are included in the body of the document. There is also a prolog (in this case, <?xml version = "1.0" encoding = "UTF-8" ?> that indicates how to process the document.

Introducing XML as a way of coding is not worthless because:

  • It allows a very simple and complete syntax whose correctness can be checked. A document is well formed if it follows some simple rules such as every start tag must have a matching end tag or XML documents have only one root element.
  • It is well handled by especific libraries in most modern languages. Very powerful operations such as extracting a sub-tree, finding a specific tag or creating a document from several sources can be easily carried out.

EDF-XML annotations could be considered "subdocuments", pieces of XML that can be grouped to form a document. It avoids very repetitive coding such as defining a prolog for each data record.

Of course, it is very easy to introduce this notes in a true XML document

EDF+ can be used to code very different signals and procedures. Trying to create a rigid system of annotations for EDF+ is at present a very difficult task. However, some common labels (not in XML) have been fixed and are included in a companion page to the normative document of EDF under the title Some standard texts.


next up previous contents
Next: Translating PhysioNet annotations to Up: Some introductory notes to Previous: What can be included   Contents
je 2006-10-12