Thursday, March 22, 2012

Xpath (XML Path Language)

XPath, defined by the World Wide Web Consortium (W3C) is a query language for finding elements, attributes, and other information from an XML document.  It is an integral part of XSLT ((Extensible Stylesheet Language Transformation).  XPath uses a tree representation of an XML document.  It uses XPath expressions to traverse the tree structure of the XML and select elements and attributes by a variety of criteria.

An XML Document

<?xml version="1.0" encoding="ISO-8859-1"?>
<School> 
   <Students>  
        <Student id="1"> 
                   <Name>John</Name> 
                   <Age>20</Age> 
        </Student> 
        <Student id="2"> 
                    <Name>Shaya</Name> 
                    <Age>20</Age> 
        </Student> 
    </Students> 
    <Teachers> 
           <Teacher id="1"> 
                 <Name>Tim</Name> 
                 <Age>40</Age> 
                 <Gender>M</Gender>
           </Teacher>
    </Teachers> 
</School>

Element: All the names within  <  /> symbol all called elements, e.g., School is an element. 
Attribute: These are properties of an element, e.g., id is an attribute of Student element.
Each element has a set of children elements or a data enclosed within it.

XPath Summary

1. Absolute path to select elements
We need to specify the complete path from the root till the element we are interested in select the node. For example, XPath expression   /School/Students/Student or School/Students/Student  selects all the Student elements.  A path starting from '/' is always an absolute path.

2. Relative path to select elements
This is used to select an element relative to the current element. For example, we can use Teachers/Teacher to select all the Teacher elements relative to Teachers element.

3. Selecting elements without specifying the full absolute or relative path.
'//' is used to perform this task. For example, we can use XPath expression '//' to select all the elements in the XML document.  We can use Students//Name to find all the Name elements in Students element. Here we do not need to specify the full path.

4. Selecting parent elements of a given element.
We can select the parent element by using  '..' . For example Students/.. will select its parent School.

5. Selecting all the descendent elements of a given element
We can select all the elements of a given element using wildcard * . For example, School/* selects all the descendent elements (Students, Student, Name, Gender, Teachers, Teacher etc.) of School. 

6. Selecting elements with predicates
XPath provides predicates, specified using square brackets [ ], for more flexible element selection. Predicates are used after the parent element.  For example,
School/Students/Student[1]  selects first Student element of Students
School/Students/Student[last()-1]  selects second last Student element of Students
School/Students/Student[position()< 2]  selects first Student element of Students

7. Selecting attributes
We can select an attribute using an XPath expression that specifies a path to the element and the attribute. For example,  School/Students/Student/@id will select id attribute of Student element and //@id will select all the id attributes. 

We can use wildcard to select all the attributes. For example, //Student[@*] will select all the Student elements which has an attribute.

8.  Concatenating multiple XPath expression
We can use bar symbol '|' to concatenate multiple XPath expressions. For example, /School/Students | /School/Teachers select both Students and Teachers  elements.

9. Xpath provides axes, functions, and operators to perform more complex selections.

References:

W3 Documentation
Wikipedia Article
W3Schools Tutorial