Introduction: Basic understanding of XPath and its related concepts

Introduction to XPath:

Full form of XPath is XML Path. It is a query language designed to traverse through an xml document and select the required nodes using XPath Expressions and XPath functions, which I will discuss in the next chapters. XPath is a World Wide Web consortium (w3c) recommendation and the latest specification is Xpath 2.0.

This specification is designed to be referenced normatively from other specifications defining a host language for it. It is not intended to be implemented outside a host language. The implementation ability of this specification has been tested in the context of its normative inclusion in host languages defined by the XQuery and XSLT.

Xpath-Xquery-XSLT

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dev:Corporation xmlns:dev="http://developprojects.com">
	<dev:Organization>
		<dev:website category="technical">
			<dev:name>developprojects.com</dev:name>
			<dev:topic>XPath Tutorial</dev:topic>
			<dev:author>Viswa Tej Swarup Reddy</dev:author>
			<dev:price>FREE</dev:price>
		</dev:website>
	</dev:Organization>
</dev:Corporation>

 

Regular XPath terminology:

The common XPath vocabularies you must know before proceeding further are:

  1. Nodes
  2. Atomic Values
  3. Items

Nodes:
Every XML document is a tree of nodes. Various types of nodes are

  1. Tag nodes
  2. Element nodes
  3. Attribute nodes
  4. Text nodes

In the above XML example,
< Corporation> is the root node.
<Organization> tag is a node.
<topic> XPath Tutorial</topic> is an element node.
category=”technical” is an attribute node.
XPath Tutorial is a text node.

And in turn these nodes can be categorized internally based on their relationships in between.

Some of them are namely

  1. Ancestor:  Ancestor nodes are parent nodes, if the parent nodes have parents, then we have to include their names too and it follows till the root node. In the above example the ancestor nodes of <author> node are <website>,<Organization>, <Corporation>.
  2.  Descendent: Descendent nodes are Child nodes, if the child nodes have children, then we have to include their names too and if follows till the lowest node is reached. In the above example the descendent nodes of <Organization> node are <website>, <name>, <topic>, <author>, <price>.
  3. Parent: Parent nodes are immediate parent of the selected node or attribute. In the above example <website> is the parent node for <name>, <topic>, <author>, <price>.
  4. Child:  Child nodes are the immediate child nodes of the selected nodes. In the above example, child nodes for <website> are <name>, <topic>, <author>, <price>.
  5. Siblings: The nodes which share a common parent are called siblings. In the above example, <name>, <topic>, <author>, <price> are siblings who share a common parent ,<website>.
  6. Text: Text nodes are just the values of the individual nodes.  In the above example, developprojects.com, XPath Tutorial, Viswa Tej Swarup Reddy, FREE are all text nodes.

It may sound redundant but we can also categorize node types based on the node’s functionality and purpose. They are:

  1. Document Node or Root Node
  2. Elements
  3. Attributes
  4. Comments
  5. Namespace
  6. Text
  7. Processing Instruction

 

  1. Document Node or Root Node: The top most element of an XML document is document node. All the other elements remain within the document node.In the Above example <Corporation> is the document node.
  2. Elements:  Elements are contained within the document node. Anything in an XML document with Opening and closing tag is called an Element.
  3. Attributes: Attributes describe the element and usually lies in the opening tag. Example: category=”technical” describes that the website element belong to the technical category. In case if we have two elements with same name, we can differentiate them using attribute node.
  1. Comments: Comments are text defined in XML document for describing things to other users. They are contained in between the <! — and  à tags.
  2. Namespace:  Different systems use different tag names for their nodes. There is always a possibility of name conflict. So, a better option is to use prefix before every tag name. This prefix will have to be defined using xmlns attribute. The syntax would be like <element xmlns:prefix=”URI”>.
  3. Processing Instruction: Processing Instruction is the most import element of an XML document in real time scenarios. It instructs the renderer which may be a browser or another application which uses this xml document, the encoding format and whether it is standalone or not. In our example, we have used standard web encoding=”UTF-8″ tells us that the encoding is a 8 bit Unicode format and standalone=”yes” tell us that the xml document is self-contained . The standalone declaration means that the document is self-contained. That in turn means one of three things:
    1. There is no DOCTYPE declaration in it.
    2. The DOCTYPE declaration is inline only.
    3. The DOCTYPE declaration is external or combined, but the external part contains no data that changes the infoset      representation of the document.
  4. Text: Text nodes are text defined in Element or Attribute or Processing instruction.Examples of text nodes are XPath Tutorial, technical.

Properties of Nodes:
The following are the various types of Properties that a node can have:

  1. Name: Name Property can be applied to Elements, attributes, namespaces, processing instructions and document node. A Name may contain a combination local name and prefix. We can extract those values using the standard functions. Namespace-uri()  function for accessing the prefix and local-name() function for local name.  Example: dev is prefix and Corporation is local name.
  2. String values: It refers to the values of the nodes. String() function helps us for getting the value of the node.
  3. Base URI
  4. Attributes
  5. Namespace
  6. Parent
  7. Children
  8. Type Annotation

Atomic values: Atomic values are nodes with no parent or child nodes. They are indeed similar to text nodes. developprojects.com, XPath Tutorial, Viswa Tej Swarup Reddy, FREE are all atomic values.

Items:Everything in XPath is indeed a sequence of items.  Items can be either nodes or atomic values. It is just similar to a node which may or may not contain parent or child nodes.

 

References:

http://www.w3.org/TR/2014/WD-xpath-31-20140424/

http://en.wikipedia.org/wiki/XPath_2.0

http://www.codingforums.com/xml/39839-xml-standalone-when-why.html

http://www.w3schools.com/XPath/xpath_nodes.asp

http://www.w3.org/TR/REC-xml/#vc-check-rmd

 

How to convert a XML file into a csv file using java?

We Used a property file to get xpath expressions of the corresponding mapped elements in XML file. 
If you got an doubt regarding property file, you can go throw my tutorial about how to read a property file.
Here I converted an XML file into a Document Object and I have used the configured Xpath Expressions in the
property file to retrieve the values from XML document.

XMLToCSVMappings.java

package com.developprojects.java;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.Set;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.TransformerFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;

public class XMLToCSVMappings {
 private static XPathFactory xPathFactory = null;
 private static DocumentBuilderFactory domFactory = null;

public static void main(String args[]) {

domFactory = DocumentBuilderFactory.newInstance();
 domFactory.setNamespaceAware(true);
 xPathFactory = XPathFactory.newInstance();
 TransformerFactory.newInstance();

ReadXML();
 }

public static void ReadXML() {
 System.out.println("In ReadXML method");
 File xmlFile = new File(
 "D:/Develop Projects Workspace/java/resources/sample.xml");
 try {
 InputStream fis = new FileInputStream(xmlFile);
 if (fis != null) {
 Document xmlDoc = getDocFromXMLString(fis);
 HashMap<String, String> propertiesKeypair = readPropertyFile();
 FileWriter writer = new FileWriter("c:\\SampleXMLtoCSVFile.csv");
 writer.append("Key");
 writer.append(',');
 writer.append("Value");
 writer.append('\n');

for (Map.Entry<String, String> entry : propertiesKeypair
 .entrySet()) {
 System.out.println("Key : " + entry.getKey()
 + "Xpath value is::"
 + getElementValue(entry.getValue(), xmlDoc));

writer.append(entry.getKey());
 writer.append(',');
 writer.append(getElementValue(entry.getValue(), xmlDoc));
 writer.append('\n');
 }
 writer.flush();
 writer.close();
 System.out
 .println("ResultMap Updated. CSV File is being generated...");
 }
 } catch (FileNotFoundException e) {
 e.printStackTrace();
 } catch (Exception e) {
 e.printStackTrace();
 }
 }

public static Document getDocFromXMLString(InputStream xml)
 throws Exception {
 DocumentBuilder builder;
 Document doc;
 try {
 builder = domFactory.newDocumentBuilder();
 doc = builder.parse(xml);
 } catch (Exception exception) {
 throw exception;
 } finally {
 }
 return doc;
 }

public static String getElementValue(final String xpathExpression,
 final Document doc) {

String textValue = null;
 try {
 XPath xpath = xPathFactory.newXPath();
 textValue = xpath.evaluate(xpathExpression, doc);
 } catch (final XPathExpressionException xpathException) {
 xpathException.printStackTrace();
 }

return textValue;
 }

public static HashMap<String, String> readPropertyFile() {
 System.out.println("In readPropertyFile method");
 Properties prop = new Properties();
 InputStream input;
 HashMap<String, String> Propvals = new HashMap<String, String>();
 try {

input = XMLToCSVMappings.class
 .getResourceAsStream("JustProperties.properties");
 System.out.println("before load");
 prop.load(input);
 System.out.println("Property File Loaded Succesfully");
 Set<String> propertyNames = prop.stringPropertyNames();
 for (String Property : propertyNames) {
 Propvals.put(Property, prop.getProperty(Property));
 }
 System.out.println("HashMap generated::" + Propvals);
 } catch (FileNotFoundException e) {
 e.printStackTrace();
 } catch (IOException e) {
 e.printStackTrace();
 } catch (Exception e) {
 e.printStackTrace();
 }
 return Propvals;
 }
}

sample.xml
<?xml version="1.0" encoding="UTF-8" ?>
<channel>
 <title>DEVELOP PROJECTS</title>
 <link>http://www.developprojects.com</link>
 <Authur>Swarup Reddy Kovvuri</Authur>
 <language>Java</language>
</channel>
JustProperties.properties
Author //Authur
Website //link
Language //language

Output file generated:

SampleXMLtoCSVFile.csv

Key Value
Language Java
Website http://www.developprojects.com
Author Swarup Reddy Kovvuri
http://developprojects.com/code-snippets/how-to-read-a-property-file-into-hash-map-in-java/

How to Read a Property File into Hash Map in Java

Property file is a typical plain java file with key and value pairs. It will have an extension of .properties.  Usually system level configurations or server level configuration. Each line typically refers to a configurable parameter. Format looks like key=value . Key being the Property name and value being the key’s Value. This is the simple Properties file  used in our sample program.

AuthorInfo.properties

Author=Viswa Teja Swarup Reddy
Website=www.DevelopProjects.com
Language=English

Example Program to Read a Property file in Java

package com.developprojects.java;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Properties;
import java.util.Set;

public class ReadPropertyFile {

 public static void main(String args[]) {
 readPropertyFile();
 }

 public static void readPropertyFile() {
 System.out.println("In readPropertyFile method");
 Properties prop = new Properties();
 InputStream input;
 HashMap<String, String> propvals = new HashMap<String, String>();
 try {

 input = ReadPropertyFile.class
 .getResourceAsStream("/AuthorInfo.properties");
 prop.load(input);
 System.out.println("Property File Loaded Succesfully");
 Set<String> propertyNames = prop.stringPropertyNames();
 for (String Property : propertyNames) {
 System.out.println(Property + ":" + prop.getProperty(Property));
 propvals.put(Property, prop.getProperty(Property));
 }
 System.out.println("HashMap generated::" + propvals);
 } catch (FileNotFoundException e) {
 e.printStackTrace();
 } catch (IOException e) {
 e.printStackTrace();
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
}

output:

In readPropertyFile method
Property File Loaded Succesfully
Language:English
Website:www.DevelopProjects.com
Author:Viswa Teja Swarup Reddy
HashMap generated::{Website=www.DevelopProjects.com, Author=Viswa Teja Swarup Reddy, Language=English}

Explanation:

Properties prop = new Properties() creates a new Properties Object.

HashMap<String, String> propvals = new HashMap<String, String>() creates a propvals HashMap Object.

readPropertyFile() is the custom function which we have written to read the property file in the class path.

The method getResourceAsStream returns an InputStream. By calling this method with class loader, it will search the resource in the class path.

Load method will read the key value pairs from the given input stream.

The rest of the program is just iterating the elements in property file.

stringPropertyNames() returns a Set containing all the property keys.

We iterated the set to get value of the corresponding key using the method, getProperty(<propertyName>).

We are just storing the key value pairs just in the same order into our HashMap.

In the next Program, I showed you how to write into a property file.

How to Create XML Document or String Object in Java Dynamically

Explanation:

This sample example will construct a XML / DOM / Document Object in java.

We will create new instance of DocumentBuilderFactory using DocumentBuilderFactory.newInstance() method. The newDocumentBuilder will create new instance of DocumentBuilder with configured parameters. The newDocument() method will create new instance of Document Object to Build a DOM Tree.Element is an interface from org.w3c.dom package which extends from Node interface. Elements represents an element from an XML document.These Elements may have attributes, Nodes. This interface provides method which can help us build, read and edit a complex Dom Tree.

This is the XML which we wish to generate dynamically using Java Document Object Model (DOM)

 

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<DevelopProjects rootAttr="true">
	<FirstChild flag="1">
		This is the First Child in DevelopProjects.com
		<FirstChildNode flag="1.1">
			This is the Child Element of FirstChild
			<FirstSubChildNode flag="1.1.1">This is the Sub Child
				Element of FirstChild</FirstSubChildNode>
		</FirstChildNode>
	</FirstChild>
	<SecondChild flag="2">This is the Second Child in
		DevelopProjects.com</SecondChild>
	<ThirdChild flag="3" />
	<FourthChild flag="4" />
	<FifthChild flag="5" />
	<SixthChild flag="6" />
	<SeventhChild flag="7" />
	<EigthChild flag="8" />
	<NinethChild flag="9" />
	<TenthElement>This is how I'm 10 Now</TenthElement>
</DevelopProjects>

Sample Java Program to Create XML Object

 

package com.developprojects.java;

import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class CreateXML {

/**
 * @param args
 */
 public static void main(String[] args) {
 // TODO Auto-generated method stub
 System.out.println(generateXML());
 }
 public static String generateXML(){
 System.out.println("generateXML() Entry");
 Document doc=null;
 StringWriter writer = new StringWriter();

 try {
 final String FLAG = "flag";
 DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
 DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
 doc = docBuilder.newDocument();
 Element rootElement = doc.createElement("DevelopProjects");
 doc.appendChild(rootElement);
 Attr attr = doc.createAttribute("rootAttr");
 attr.setValue("true");
 rootElement.setAttributeNode(attr);

 Element firstChildNode = doc.createElement("FirstChildNode");
 attr = doc.createAttribute(FLAG);
 attr.setValue("1.1");
 firstChildNode.setAttributeNode(attr);
 firstChildNode.appendChild(doc.createTextNode("This is the Child Element of FirstChild"));

 Element firstSubChildNode = doc.createElement("FirstSubChildNode");
 attr = doc.createAttribute(FLAG);
 attr.setValue("1.1.1");
 firstSubChildNode.setAttributeNode(attr);
 firstSubChildNode.appendChild(doc.createTextNode("This is the Sub Child Element of FirstChild"));

 Element firstChildElement = doc.createElement("FirstChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("1");
 firstChildElement.setAttributeNode(attr);
 firstChildElement.appendChild(doc.createTextNode("This is the First Child in DevelopProjects.com"));
 firstChildElement.appendChild(firstChildNode);
 firstChildNode.appendChild(firstSubChildNode);
 rootElement.appendChild(firstChildElement);

 Element secondChildElement = doc.createElement("SecondChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("2");
 secondChildElement.setAttributeNode(attr);
 secondChildElement.appendChild(doc.createTextNode("This is the Second Child in DevelopProjects.com"));
 rootElement.appendChild(secondChildElement);

 Element thirdChildElement = doc.createElement("ThirdChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("3");
 thirdChildElement.setAttributeNode(attr);
 rootElement.appendChild(thirdChildElement);

 Element fourthChildElement = doc.createElement("FourthChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("4");
 fourthChildElement.setAttributeNode(attr);
 rootElement.appendChild(fourthChildElement);

 Element fifthChildElement = doc.createElement("FifthChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("5");
 fifthChildElement.setAttributeNode(attr);
 rootElement.appendChild(fifthChildElement);

 Element sixthChildElement = doc.createElement("SixthChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("6");
 sixthChildElement.setAttributeNode(attr);
 rootElement.appendChild(sixthChildElement);

 Element seventhChildElement = doc.createElement("SeventhChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("7");
 seventhChildElement.setAttributeNode(attr);
 rootElement.appendChild(seventhChildElement);

 Element EightChildElement = doc.createElement("EigthChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("8");
 EightChildElement.setAttributeNode(attr);
 rootElement.appendChild(EightChildElement);

 Element NinethChildElement = doc.createElement("NinethChild");
 attr = doc.createAttribute(FLAG);
 attr.setValue("9");
 NinethChildElement.setAttributeNode(attr);
 rootElement.appendChild(NinethChildElement);

 Element tenthElement = doc.createElement("TenthElement");
 tenthElement.appendChild(doc.createTextNode("This is how I'm 10 Now"));
 rootElement.appendChild(tenthElement);

 TransformerFactory transformerFactory = TransformerFactory.newInstance();
 Transformer transformer = transformerFactory.newTransformer();
 DOMSource source = new DOMSource(doc);
 StreamResult result = new StreamResult(writer);
 transformer.transform(source, result);

 } catch (ParserConfigurationException pce) {
 System.out.println("Exception Occured While parsing the XML::"+pce);
 }catch (TransformerException tfe) {
 System.out.println("Exception Occured While Transforming the XML::"+tfe);
 }
 System.out.println("generateXML() Entry");
 return writer.toString();
 }

}

How to Convert a Document DOM Object to XML String Object

Explanation:

Document is an interface in org.w3c.dom package which extends Node Interface. This example is all about converting a document object to String Object which will be a valid XML. The method getXMLFromDOM() will take Document object as argument. So, we have created a document object using the method createDocument().
This createDocument() method is inturn clearly explained in my post How to Create or generate a Document Object.
In my another post, you can find a sample program to Convert a XML String to DOM / Document object.

In the current example in the method getXMLFromDOM(),
DOMSource acts as a holder for a transformation Source tree in the form of a Document Object Model (DOM) tree. It implements the Source interface.
StreamResult acts as an holder for a transformation result, which may be XML, plain Text, HTML, or some other form of markup. It is a class in javax.xml.transform.stream package which implements Result interface.
Transformer is an abstract class which can be instantiated with transformerFactory.newTransformer(). It is a class in javax.xml.transform package whose instance method transform() will transform the xml Dom source to Stream result. The getWriter() method of StreamResult will return a characterStream of return type Writer which will be converted to String Object using the toString() method.

 

 Java example to Convert a DOM / Document Object to XML String

package com.developprojects.java;

import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class DocumentToString {

private static TransformerFactory transformerFactory = null;

public static void main(String[] args) {

transformerFactory = TransformerFactory.newInstance();
 String XML = new String();
 Document sampleDoc = createDocument();
 try {
 XML = getXMLFromDOM(sampleDoc);
 } catch (IllegalArgumentException e) {
 e.printStackTrace();
 } catch (TransformerConfigurationException e) {
 e.printStackTrace();
 } catch (TransformerException e) {
 e.printStackTrace();
 }
 System.out.println("String Generated from Document is " + XML);
 }

// This method will convert a Document to String Object
 public static String getXMLFromDOM(Document requestXMLDoc)
 throws IllegalArgumentException, TransformerConfigurationException,
 TransformerException {

if (requestXMLDoc == null) {
 System.out.println("Supplied document object is null");
 throw new IllegalArgumentException("Document cannot be null");
 }
 DOMSource source = new DOMSource(requestXMLDoc);
 StreamResult result = new StreamResult(new StringWriter());
 Transformer transformer;
 try {
 transformer = transformerFactory.newTransformer();
 transformer.transform(source, result);
 } catch (TransformerConfigurationException tConfigurationException) {
 System.out.println("TransformerConfigurationException::"
 + tConfigurationException);
 } catch (TransformerException transformerException) {
 System.out.println("TransformerConfigurationException::"
 + transformerException);
 }
 String returnString = result.getWriter().toString();

return returnString;
 }

// This method will create a Document Object
 public static Document createDocument() {
 DocumentBuilderFactory docFactory = DocumentBuilderFactory
 .newInstance();
 Document doc = null;
 try {
 DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
 doc = docBuilder.newDocument();
 } catch (ParserConfigurationException e1) {
 e1.printStackTrace();
 }

Element rootElement = doc.createElement("SampleDoc");
 doc.appendChild(rootElement);
 Attr attr = doc.createAttribute("attr");
 attr.setValue("true");
 rootElement.setAttributeNode(attr);
 return doc;

}

}
Google Circle
Join my Circle on Google+

Plugin by Social Author Bio