qad_doc2xml - Tutorial 3: Conversion of a Word document in XML/TEI

Download the sample-files: files_tutorial3.zip (24k)

Word document

Before starting qad_doc2xml, check the word document wich is going to be converted. In this case it is a part of Lessing's Drama "Emilia Galotti" in a word document with different paragraph styles.

Word Paragraph-Style
Description
Example
TEI Tag(s)
Überschrift 1
act (name)
"Erster Aufzug"
<div1 type="act"><head>...</head></div1>
Überschrift 2
scene (name/nr)
"Erster Auftritt"
<div2 type="scene"><head>...</head></div1>
BAnweisung
stage instruction
"Die Szene, ein Kabinett des Prinzen"
<stage type="setting">...</stage>
Auftritt
stage instruction
"Conti. Der Prinz"
<stage type="persons">...</stage>
Standard
person speech
"Prinz, die Kunst geht nach Brot."
<l>... </l>
Person_Prinz
person
"DER PRINZ."
<sp><speaker>DER PRINZ
</speaker><l>...</l></sp>
alle weitere Personen:
Person_XY
person
"XY"
<sp><speaker>XY
</speaker><l>...</l></sp>

Start qad_doc2xml and select the word document

Select the word document "Emilia_Galotti.doc" in the "Source" section. To avoid an error message, close word before getting started with qad_doc2xml.


Select the target file


Load DTD

To edit conversion rules you can load a list of all tags from a DTD (if one is available). Click "Get Taglist from DTD" and select the "teixlight.dtd" file.

Note: The DTD should not be in Unix file format.)


Simple conversion rules

Set the word paragraph style "Standard" to TEI-XML "<l>":


Conversion rules and attributes

Click the "Special" text fields to set the attributes:
Style "BAnweisung" to "<stage type="setting">...</stage>" and
style "Auftritt" to "<stage type="persons">...</stage>".


"Text in Child"

So far the result should look like sample (1):

Change the following settings to get the result shown in example (2):


Test

Set all Persons on "sp" and click the "Convert" button


Check results

Click "View XML" to view the result in your browser, click "View Code" to view the result in Notepad.

The result should now look like this (Intenet Explorer):


XML Structure

So far the XML Code looks like (1). Change the "level" column as follows to get the result shown in (2):

Result:


"Hard" formats - Bold, Italics

"Italics" to "<stage type="aciton">


Special Characters

Select "Use char_conversion_table.txt" to convert characters in entities (e. g. ü in &uuml;). You can edit the char_conversion_table.txt file with notepad.


Templates

See the sample template-file ""tei_template.txt"":

Edit the text before and after the <! - - word text --> tag (that ist the place where qad_doc2xml will place the resutls).

You can add a header using the function "Select Tmpl.", "tei_template.txt"

Result:

 


Save conversion rules

Save the ruleset to convert similar documents.


see also: XML Tutorial

<<Back to Homepage