Prashant Sahni Blog

Parsing XML using XPATH with nokogiri ruby

XPath is a language for finding information in an XML file. XPath is used to navigate through elements and attributes in an XML document. We can also use XPath to traverse through an XML file in Ruby. We will Nokogiri gem for that.

Xpath is a very powerful tool to fetch the relevant information, read items and attributes from xml file.

Please read about the xpath syntax from following link -

For demo, let us consider an xml file that holds information of employees.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?xml version="1.0"?>
<Employees>
    <Employee id="1111" type="admin">
        <firstname>John</firstname>
        <lastname>Watson</lastname>
        <age>30</age>
        <email>johnwatson@sh.com</email>
    </Employee>
    <Employee id="2222" type="admin">
        <firstname>Sherlock</firstname>
        <lastname>Homes</lastname>
        <age>32</age>
        <email>sherlock@sh.com</email>
    </Employee>
    <Employee id="3333" type="user">
        <firstname>Jim</firstname>
        <lastname>Moriarty</lastname>
        <age>52</age>
        <email>jim@sh.com</email>
    </Employee>
    <Employee id="4444" type="user">
        <firstname>Mycroft</firstname>
        <lastname>Holmes</lastname>
        <age>41</age>
        <email>mycroft@sh.com</email>
    </Employee>
</Employees>

As we can see there are 4 employeees. Attributes - id, type. Child nodes - firstname, lastname, age and email.

Lets write code..

We are going to use Nokogiri ruby gem that provides a beautiful api for parsing, ability to search documents via Xpath. More info - nokogiri

Examples:

Ex 1. Read firstname of all employees

1
2
3
4
5
6
7
8
9
10
11
require 'nokogiri'
f = File.open("employee.xml")
doc = Nokogiri::XML(f)

puts "== First name of all employees"
expression = "Employees/Employee/firstname"
nodes = doc.xpath(expression)

nodes.each do |node|
  p node.text
end
1
2
3
4
"John"
"Sherlock"
"Jim"
"Mycroft"

Ex 2. Read a specific employee using employee id

1
2
3
4
5
6
7
expression = "/Employees/Employee[@emplid='2222']"
nodes = doc.xpath(expression)

nodes.children.each do |node|
 p "#{ node.name }: #{ node.text }"   if node.class == Nokogiri::XML::Element
end

1
2
3
4
"firstname: Sherlock"
"lastname: Homes"
"age: 32"
"email: sherlock@sh.com"

Ex 3. Read firstname of all employees who are admin

1
2
3
4
5
6
7
expression = "/Employees/Employee[@type='admin']/firstname"
nodes = doc.xpath(expression)


nodes.each do |node|
 p "#{ node.text }"
end
1
2
"John"
"Sherlock"

Ex 4. Read firstname of all employees who are older than 40 year

1
2
3
4
5
6
7
expression = "/Employees/Employee[age>40]/firstname"
nodes = doc.xpath(expression)


nodes.each do |node|
 p "#{ node.text }"
end
1
2
"Jim"
"Mycroft"

Ex 5. Read firstname of first two employees (defined in xml file)

1
2
3
4
5
6
7
8
expression = "/Employees/Employee[position() <= 2]/firstname"
nodes = doc.xpath(expression)


nodes.each do |node|
 p "#{ node.text }"
end

1
2
"John"
"Sherlock"

Thanks for reading :-)

comments powered byDisqus