As you know PhysioNet is a kind of a forum, a huge collection of data and a software package. PhysioNet data include a very well designed set of annotations. It is relatively simple and allows a generalization to other fields. Translating the annotations used in PhysioNet could be a good example of adapting some external structure to EDF+.
Firstly, I would like to suggest an analogy with graphic formats. When you are using The Gimp, or other similar package, and you select Save as you can choose many formats to store the paint. Some of them compress the picture, some of them allow transparent background, some of them allow layers, some of them are interpreted by web browsers, some of them can be integrated in LATEXdocuments and so on. The election of the format depends on a lot of circumstances and all of them are used. Sometimes you have to lose some features when you translate from one format to another.
The situation is similar in formats used in Clinical Neurophysiology. Some instruments save data in EDF format but some properties included in their native format can not be easily translated. I would like to remember the title of the normative document of EDF+: EDF+, an update of EDF for the archival and exchange of ENG, EMG, EP, ECG, EEG and PSG data. EDF+ is a format designed to exchange a wide range of different techniques with the same basic structure.
The annotations of PhysioNet, on the other hand, are very well designed. They were built having at least the following characteristics:
Since it is a very well defined system, we can try to adapt the code to XML. There is a nice application (rdann) that exports annotations to ASCII. It lets you choose several formats of output. Although sampling of each signal is a good locating method for continuous records, for discontinuous ones it is easier to use time in seconds relative to the beginning of the file. Let us see the result of the application when it is applied to the file 100s included in the package with option -x. Only some rows are shown
The output contains (from left to right) the time of the annotation in seconds, minutes and hours; a mnemonic for the annotation type; the annotation subtyp, chan, and num fields.
Our objective is to encode the information contained in an annotation file as an EDF+ annotation signal. Different files referred to the same record would be included as different annotation signals, and so different algorithms could be compared and so on.
There are other ways to face this problem but we are trying to encode the annotations in XML. Since the first three columns contain repetitive information, we do not need considering them. We also dismiss the last column. Finall,y we have to know the length of the data record. Imagine that we define a data record of 5 seconds. The result would be
Notice that <20> is a single byte with value 20. Following each annotation there is a character 0 (<0>) not shown, and the space empty in each data record is also filled with <0>. Each data record is marked by an empty annotation. Since we decided to use 5 seconds, each data record is marked by +0, +5 and +10. Although several other forms of encoding would be possible, we decide to code the fields as attributes because they can be more precisely defined (it is easy to define a set of possible values).
It has not been very difficult to translate concepts between very different methods of annotation. This case is a good example to evaluate some of the properties of EDF+. With this arrangement each data record uses about 600 characters. Since we are using a data record of 5 seconds, we need each 120 samples per second. So, we have to define a sampling rate of this annotation signal at 60 Hz. Consider that the file is formed by two signals sampled at 360. So, we employ about one tenth of the file in annotations.
Obviously, you can have a lot of annotations concentrated in a small segment of time, which would imply wasting a lot of storing space (the space included in the remainder -almost empty- annotations). Several considerations could be appropriate to face this problem:
At the end of the process, you have one file that contains data and annotations in the same file. This file can be easily edited and segmented. It can be seen with a lot of viewers because annotation signals will not interfere with the remainder signals contained in the file.
0.050 0.00083 0.0000139 + 0 0 0 (N
0.214 0.00356 0.0000594 N 0 0 0
1.028 0.01713 0.0002855 N 0 0 0
1.839 0.03065 0.0005108 N 0 0 0
2.628 0.04380 0.0007299 N 0 0 0
3.419 0.05699 0.0009498 N 0 0 0
4.208 0.07014 0.0011690 N 0 0 0
5.025 0.08375 0.0013958 N 0 0 0
5.678 0.09463 0.0015772 A 0 0 0
6.672 0.11120 0.0018534 N 0 0 0
7.517 0.12528 0.0020880 N 0 0 0
8.328 0.13880 0.0023133 N 0 0 0
9.117 0.15194 0.0025324 N 0 0 0
9.889 0.16481 0.0027469 N 0 0 0
10.728 0.17880 0.0029799 N 0 0 0
11.583 0.19306 0.0032176 N 0 0 0
12.406 0.20676 0.0034460 N 0 0 0...
+0<20><20>
+0.050<20><WFDBann mnem="+" subtyp="0" chan="0" num="0" /><20>
+0.214<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+1.028<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+1.839<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+2.628<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+3.419<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+4.208<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+5<20><20>
+5.025<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+5.678<20><WFDBann mnem="A" subtyp="0" chan="0" num="0" /><20>
+6.672<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+7.517<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+8.328<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+9.117<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+9.889<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+10<20><20>
+10.728<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+11.583<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
+12.406<20><WFDBann mnem="N" subtyp="0" chan="0" num="0" /><20>
...
Next: Creating your own system
Up: Some introductory notes to
Previous: Introduction to XML inside
Contents
je
2006-10-12