The annotated texts are saved in XML format, as representing the standard in data description adopted by the linguistic community as the standard way of representing corpora. Although a standard set of XML tags for syntactic annotation does not exist yet, as is the case for morpho-syntactic annotation (XCES), DGA uses a minimal set of tags inspired by XCES. Thus, the XML files produced by DGA can be easily transformed, by means of XSLT, into XML files which are based on a different vocabulary (tag set) meeting the requirements of the user or being in conformity with a future standard.

In order to illustrate the used set of tags, we present the following fragment of a xml file, representing the annotation of the sentence "John has apples" (see What is DGA).

<s>
   <tok>
      <orth>John</orth>
      <ordno>1</ordno>
      <ctag>Noun</ctag>
      <syn>
         <head>2</head>
         <reltype>Subject</reltype>
      </syn>
   </tok>
   <tok>
      <orth>has</orth>
      <ordno>2</ordno>
      <ctag>Verb</ctag>
      <syn>
         <head>4</head>
         <reltype>Predicate</reltype>
      </syn>
   </tok>
   <tok>
      <orth>apples</orth>
      <ordno>3</ordno>
      <ctag>Noun</ctag>
      <syn>
         <head>2</head>
         <reltype>Object</reltype>
      </syn>
   </tok>
</s>

Each sentence is marked by tag <s> ... </s>. Each word of the sentence, together with all information concerning its annotation, is marked by tag <tok> ... </tok>. Within this tag, the orthographic form, as it occurs in the annotated text, is marked by tag <orth> ... </orth>. Tag <ordno> ... </ordno> indicates the number of the word within the sentence (counting is performed starting from the beginning of the sentence). By means of tag <ctag> ... </ctag> the part of speech is specified, while tag <syn> ... </syn> marks the syntactic information. Within tag <syn> ... </syn> the head word is specified by means of its number within the sentence, this number being marked by tag <head> ... </head>. The type of the dependency relation existing between the two words (the one to which the annotation belongs and the head word) is specified by means of tag <reltype> ... </reltype>.

Next Page

>Dependency Grammar Annotator