Visual Explain XML Specification
Introduction
XML file is used by Visual Explain to transfer information between database
engine side and client side. Java api collects information and generates
XML file , this XML file will be used as information input to Visual Explain.
The basic construct in the specification remains very flexible while
includes specific enough structures that will enable us to explain and
describe the relationship among the tables and descriptors that are part
of the entire knowledge base for Visual Explain. We have currently adopted
the theology to employ a list-based structure to flexibly represent any
possible combination of number of children or sibling nodes in the structure,
and flexible enough for us to add more relationships later without too
much change to our fundamentals.
DTD for Visual Explain
Following is the complete DTD for XML format used by Visual Explain. The
detailed explanation of this DTD file will be found in the following section.
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE osc[
<!ELEMENT osc (info)+>
<!ATTLIST osc appversion CDATA
#REQUIRED
db2version CDATA #REQUIRED
db2platform CDATA "Z/OS"
timestamp CDATA #REQUIRED>
<!ELEMENT info ( source,attrview*,diagram*,
descriptor*, overview*)>
<!ATTLIST info type (service|management)
#REQUIRED
usage
CDATA #REQUIRED
mode
(standard| compatible) #IMPLIED>
<!ELEMENT source ( query+)>
<!ATTLIST source type
( singlequery | scriptfile ) #REQUIRED
name CDATA #IMPLIED>
<!ELEMENT query (#PCDATA)>
<!ELEMENT diagram ( node+ )>
<!ATTLIST diagram
structure (tree|table) "tree"
id ID #REQUIRED
name CDATA #REQUIRED
seqno CDATA #IMPLIED>
<!ELEMENT node (label+, descriptorlink?,node*)>
<!ATTLIST node id
CDATA #REQUIRED
type CDATA #REQUIRED
alignment ( hleft | hmid | hright | vabove | vmid | vbelow ) #REQUIRED
seqno CDATA #IMPLIED>
<!ELEMENT label EMPTY>
<!ATTLIST label type CDATA
#REQUIRED
text CDATA #REQUIRED>
<!ELEMENT descriptorlink EMPTY>
<!ATTLIST descriptorlink descriptorid
IDREF #REQUIRED
type
CDATA #IMPLIED>
<!ELEMENT descriptor (attrTitle,attr*,attrviewId*
, descriptorlinklist*)>
<!ATTLIST descriptor
id ID #REQUIRED
name CDATA #IMPLIED
type CDATA #REQUIRED>
<!ELEMENT attrTitle EMPTY>
<!ATTLIST attrTitle nameTitle CDATA
"Name"
valueTitle CDATA "Value"
otherTitle1 CDATA #IMPLIED
otherTitle2 CDATA #IMPLIED>
<!ELEMENT attr (descriptorlink?)>
<!ATTLIST attr id CDATA #REQUIRED
name CDATA #REQUIRED
value CDATA #REQUIRED
otherValue1 CDATA #IMPLIED
otherValue2 CDATA #IMPLIED
type CDATA #REQUIRED
level CDATA #IMPLIED>
<!ELEMENT attrview (attrseq+)>
<!ATTLIST attrview id ID #REQUIRED
type CDATA #REQUIRED>
<!ELEMENT attrviewId EMPTY>
<!ATTLIST attrviewId id IDREF #REQUIRED>
<!ELEMENT attrseq EMPTY>
<!ATTLIST attrseq attrid CDATA #REQUIRED
order CDATA #REQUIRED>
<!ELEMENT descriptorlinklist (descriptorlink*)>
<!ATTLIST descriptorlinklist name CDATA #REQUIRED>
<!ELEMENT overview (overviewnode*)>
<!ATTLIST overview diagramid IDREF #REQUIRED
name CDATA #REQUIRED
type CDATA
#REQUIRED>
<!ELEMENT overviewnode (overviewnode*)>
<!ATTLIST overviewnode diagramid IDREF #REQUIRED
nodeid CDATA #REQUIRED
alignment ( hleft | hmid | hright | vabove | vmid | vbelow ) #REQUIRED
seqno CDATA #IMPLIED>
]>
Detailed Explanation
Note : In this section , XML DTD is in blue
color , corresponding
explanation is in black color , key words are in bold and italic
format .
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE osc[
<!ELEMENT osc (info)+>
<!ATTLIST osc appversion CDATA
#REQUIRED
db2version CDATA #REQUIRED
db2platform CDATA "Z/OS"
timestamp CDATA #REQUIRED>
-
The outer most document name is "osc" , which represents " optimization
service center" . This element has four attributes:
-
appversion : shows the application version which was used to generate
this XML file. Our first version of Visual Explain application will be
1.0 , and later on it will become 2.0, 3.0...
-
db2version: shows the DB2 version used to generate this XML file
, it could be version 7 , 8 ...
-
db2platform: shows the platform DB2 was running on when the XML
file was generated, it could be windows, unix , OS/2, AS400, VM, Z/OS,
the default value is Z/OS.
-
timestamp: This attribute shows the generation time of the XML file.
<!ELEMENT info ( source,attrview*,diagram*,
descriptor*, overview*)>
<!ATTLIST info type (service|management)
#REQUIRED
usage
CDATA #REQUIRED
mode
(standard| compatible) #IMPLIED>
-
Inside "osc" element, there could be several "info" element , each of them
represents one type of information, which should be general enough for
future extension . This element has three attributes:
-
type : it could be either service or management . If the type is
service , this info element contains information for one single query .
if the type is management, this info element's content is based on one
scriptfile which could contain multiple query statements.
-
usage : This attribute shows the usage of the information contained
inside this "info" element. The possible values could be "ive" , which
represents visual explain, or other possible tools we will have in the
future.
-
mode : This attributes works for Visual Explain. If the current
"info" element is used by visual explain, this attribute should be set
, it is either standard or compatible. Standard mode represents the new
XML format for version 8 Visual Explain , compatible mode is for backward
compatibility to support version 6 or 7 Visual Explain .
<!ELEMENT source ( query+)>
<!ATTLIST source type
( singlequery | scriptfile ) #REQUIRED
name CDATA #IMPLIED>
-
Each "info" elements can have one "source" element , which has the source
text for the current information , it has two attributes:
-
type : The value could be singlequery or scriptfile.
-
name: This could be the file name if current source text is a scriptfile
or could be empty if current source text is only one query
<!ELEMENT query (#PCDATA)>
-
There could be one or more than one "query" element inside the "source"
element. each of them represents one sql statement .
<!ELEMENT diagram ( node+ )>
<!ATTLIST diagram
structure (tree|table) "tree"
id ID #REQUIRED
name CDATA #REQUIRED
seqno CDATA #IMPLIED>
-
There could be one or more "diagram" elements inside "info" element. This
"diagram" element serves to represent the major structure of the information.
If the current "info" element represents the Visual Explain information
, then , the number of diagram could be one or more than one , depending
on if the xml file was generated in DB2 version 8 (under standard mode
) or in previous version ( backward compatible mode ) . If the standard
mode was used , there will be only one diagram , which represents all the
queryblocks inside the current statement. If the backward compatible mode
was used , then there could be multiple diagrams , each represents one
query block.This element has four attributes:
-
structure : Right now , we have only two structures , one is " tree
" , another is " table" . Default value is tree. If the information is
used for Visual Explain , the structure will be "tree" . and each diagram
only includes one top level node , which is the root node. If the information
is used by other application which requires the tabular format of information,
this attribute will be "table" . The diagram will most likely includes
more than one node , each node representing one row in the table.
-
id: it represents the id of the diagram , should be unique value
within the document
-
name : Name of the diagram, in visual explain, if the mode is backward
compatible , this name will be the query block name , will be shown in
the
-
tab of the display panel.
-
seqno: This represents the sequence number of current diagram in
the whole group. This value can be empty . It will be used if there are
multiple diagrams in the current "info" element. The diagram will be displayed
in this sequence.
<!ELEMENT node (label+, descriptorlink?,node*)>
<!ATTLIST node id
CDATA #REQUIRED
type CDATA #REQUIRED
alignment ( hleft | hmid | hright | vabove | vmid | vbelow ) #REQUIRED
seqno CDATA #IMPLIED>
-
There could be one or more "node" elements in each "diagram" element. If
the diagram's type is "tree" , there should be only one node in the top
level , which represents the root. each node could have one or more child
nodes . It should have one or more "label " element, which represents the
node label . Also, one node contains at most one element of "descriptorlink"
. That means , one node can be linked with only one descriptor. The "node"
element has four attributes:
-
id : This uniquely identifies the node inside each diagram.
-
type: This defines the node type, this will be used to find the
display type when we draw the graph
-
alignment: This shows the relevant position of parent and children
nodes. The possible values are : hleft , hmid , hright , vabove , vmid
, vbelow.
-
seqno: This represents the left to right , or up and down sequence
of the nodes inside one parent node. The node with the smaller number will
be drawn at the left or top of the nodes with the larger number.
<!ELEMENT label EMPTY>
<!ATTLIST label type CDATA
#REQUIRED
text CDATA #REQUIRED>
-
Each node has one or more "label" elements, which represents the label
text shown on node's surface when we draw the node. This element doesn't
have any child element, it has two attributes:
-
type: This attribute gives the "label" element a type, such as "totalcost".
User can specify to see particular type of the label . Each label should
have an unique id within one node.
-
text: This attribute shows the text on the label.
<!ELEMENT descriptorlink EMPTY>
<!ATTLIST descriptorlink descriptorid
IDREF #REQUIRED
type
CDATA #IMPLIED>
-
Element "descriptorlink" represents one linkage to a descriptor. This link
could be inside a node or inside other descriptors. It doesn't have children
elements, and it has two attributes:
-
descriptorid : Which is used to find the descriptor this link pointing
to .
-
type : This shows the type of the linkage. by default , this value
is empty, but it is here for later extension if we need to support some
special linkage type.
<!ELEMENT descriptor (attrTitle,attr*,attrviewId*
, descriptorlinklist*)>
<!ATTLIST descriptor
id ID #REQUIRED
name CDATA #IMPLIED
type CDATA #REQUIRED>
-
Element "descriptor" is outside of element " diagram " but inside element
"info" . For each "info" element, it has multiple "descriptor" elements
. Each descriptor can be linked by more than one node or descriptor using
"descriptorlink" . Each descriptor can have multiple attributes and multiple
views . It also can have multiple lists of linkage to other descriptors.
We will explain the meaning of these elements next , now , the attributes
of "descriptor" element will be shown first:
-
id : This uniquely identifies each descriptor inside one "info"
element. It is used by "descriptorlink" to find the required descriptor.
-
name: This is the name of the descriptor , it most likly is for
display purpose only .
-
type: The type of descriptor as it shows different database element
<!ELEMENT attrTitle EMPTY>
<!ATTLIST attrTitle nameTitle CDATA
"Name"
valueTitle CDATA "Value"
otherTitle1 CDATA #IMPLIED
otherTitle2 CDATA #IMPLIED>
-
Element attrTitle is used to display the title column
of attribute , it can support max four columns
-
nameTitle : The title for name column
-
valueTitle : The title for value column
-
otherTitle1 : The title for the first option
column
-
otherTitle2: The title for the second option
column
<!ELEMENT attr (descriptorlink?)>
<!ATTLIST attr id CDATA #REQUIRED
name CDATA #REQUIRED
value CDATA #REQUIRED
otherValue1 CDATA #IMPLIED
otherValue2 CDATA #IMPLIED
type CDATA #REQUIRED
level CDATA #IMPLIED>
-
Element "attr" represents one attribute of the parent descriptor. It has
three attributes:
-
id: The id uniquely identifies this "attr" element inside the parent
"descriptor" element.
-
name : This is the name of the current attribute.
-
value : This is the value of the current attribute.
-
otherValue1 : The value of the first option column
-
otherValue2: The value of the second option column
-
type : the type of the attribute
-
level : the level of the attribute
<!ELEMENT attrview (attrseq+)>
<!ATTLIST attrview id ID #REQUIRED
type CDATA #REQUIRED>
-
Element "attrview" is for displaying the attributes in different views
. This adds the flexibility of displaying the attributes to the user. For
example , in Visual Explain , User can choose to see "all" of the attributes
or only "statistics" . If no attrview is specified ,all the attrs will
be displayed in the sequence This element has one attribute :
-
id : The id of the view
-
type : the type of the view
<!ELEMENT attrviewId EMPTY>
<!ATTLIST attrviewId id IDREF #REQUIRED>
-
Element "attrviewId" is used to link the attrview
to each descriptor .
-
id: The id of the view it is linking
to
<!ELEMENT attrseq EMPTY>
<!ATTLIST attrseq attrid CDATA #REQUIRED
order CDATA #REQUIRED>
-
Inside "attrview" element, there are multiple "attrseq" elements . Each
"attrseq" element has two attributes:
-
attrid : This matches to the "id" value of "attr" element.
-
order : This gives the order of the "attr" referred to by "attrid"
.
<!ELEMENT descriptorlinklist (descriptorlink*)>
<!ATTLIST descriptorlinklist name CDATA #REQUIRED>
-
Element "descriptorlinklist" is a container for several "descriptorlink"
elements. The descriptors linked by one "descriptorlinklist" have some
common attributes. Such as , inside one table descriptor, there are many
"column" descriptor, they are grouped by one descriptorlinklist container
, which is called "columns" . This element has one attribute :
-
name: This is the name of the list , which can be used for display
purpose.
<!ELEMENT overview (overviewnode*)>
<!ATTLIST overview diagramid IDREF #REQUIRED
name CDATA #REQUIRED
type CDATA
#REQUIRED>
<!ELEMENT overviewnode (overviewnode*)>
<!ATTLIST overviewnode diagramid IDREF #REQUIRED
nodeid CDATA #REQUIRED
alignment ( hleft | hmid | hright | vabove | vmid | vbelow ) #REQUIRED
seqno CDATA #IMPLIED>
]>
-
Element "overview" is to show the queryblock overview
or other general purpose overview , it is optional to have an overview
element
XML Example
This XML example shows the information needed by Visual Explain . The structure
is in a tree mode.
This XML example is based on the following query " SELECT
SUM(N_NATIONKEY), SUM(N_NATIONKEY), SUM(N_NATIONKEY), SUM(N_NATIONKEY),
SUM(N_NATIONKEY), AVG(N_NATIONKEY), AVG(N_NATIONKEY), AVG(N_NATIONKEY),
AVG(N_NATIONKEY), AVG(N_NATIONKEY), COUNT(*) FROM NATION_NP"
This XML is in the format of DB2 Version 8 , standard mode , so there
is only one diagram for the whole query .
The content of this example is as follows: