TOC

This article is currently in the process of being translated into Spanish (~31% done).

XML:

Using XPath with the XmlDocument class

En un capitulo anterior, usamos la clase XmlDocument para obtener información de un archivo XML. Lo hicimos utilizando la propiedad ChildNodes. Aunque era un ejemplo sencillo, el código no era fácil de entender, así que en este capitulo veremos un enfoque diferente. La tecnología que usaremos se llama XPath que es mantenida por la misma organización que creó el estándar XML. Actualmente XPath es un lenguaje completo de consultas con muchas posibilidades, pero éste articulo no es un tutorial de XPath, solo veremos algunas consultas básicas. Sin embargo, incluso con consultas simples XPath sigue siendo muy útil como verás en los siguientes ejemplos.

La clase XmlDocument tiene algunos métodos que reciben una expresión XPath como parámetro y devuelve los XmlNodes que coincidan con dicha expresión. En este capitulo veremos dos métodos: SelectSingleNode(), que devuelve el primer nodo que coincida con la expresion de XPath, y SelectNodes(), que devuelve una colección XmlNodeList que contendrá los XmlNode que coincidan con la expresion XPath.

Probaremos los dos métodos que mencionamos, pero en lugar de usar el XML que usamos en capítulos anteriores, usaremos un documento RSS. Los RSS feeds son documentos XML construidos con una estructura específica que permite a diferentes lectores RSS interpretar y mostrar la misma información a su manera.

We will use an RSS feed from CNN, located at http://rss.cnn.com/rss/edition_world.rss, with news from across the world. If you open it in your browser, your browser may render this in a nice way, allowing you to get an overview of the feed and subscribe to it, but don't get fooled by that: Under the hood, it's just XML, which you will see if you do a "View source" in your browser. You will see that the root element is called "rss". The rss element usually has one or several "channel" elements, and within this element, we can find information about the feed as well as the "item" nodes, which are the news items we usually want.

In the following example, we will use the SelectSingleNode() method to get the title of the feed. If you look at the XML, you will see that there is a <title> element as a child element of the <channel> element, which is then a child element of the <rss> element, the root. That query can be described like this in XPath:

//rss/channel/title

We simply write the names of the element we're looking for, separated with a forward-slash (/), which states that the element should be a child to the element before the preceeding forward-slash. Using this XPath is as simple as this:

using System;
using System.Text;
using System.Xml;

namespace ParsingXml
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.Load("http://rss.cnn.com/rss/edition_world.rss");
            XmlNode titleNode = xmlDoc.SelectSingleNode("//rss/channel/title");
            if(titleNode != null)
                Console.WriteLine(titleNode.InnerText);
            Console.ReadKey();   
        }
    }
}

We use the SelectSingleNode() method to locate the <title> element, which simply takes our XPath as a string parameter. We then check to make sure that it returned a result, and if it did, we print the InnerText of the located node, which should be the title of the RSS feed.

In the next example, we will use the SelectNodes() method to find all the item nodes in the RSS feed and then print out information about them:

using System;
using System.Text;
using System.Xml;

namespace ParsingXml
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.Load("http://rss.cnn.com/rss/edition_world.rss");
            XmlNodeList itemNodes = xmlDoc.SelectNodes("//rss/channel/item");
            foreach(XmlNode itemNode in itemNodes)
            {
                XmlNode titleNode = itemNode.SelectSingleNode("title");
                XmlNode dateNode = itemNode.SelectSingleNode("pubDate");
                if((titleNode != null) && (dateNode != null))
                    Console.WriteLine(dateNode.InnerText + ": " + titleNode.InnerText);
            }
            Console.ReadKey();   
        }
    }
}

The SelectNodes() method takes an XPath query as a string, just like we saw in the previous example, and then it returns a list of XmlNode objects in a XmlNodeList collection. We iterate through it with a foreach loop, and from each of the item nodes, we ask for a child node called title and pubDate (published date) using the SelectSingleNode() directly on the item node. If we get both of them, we print out the date and the title on the same line and then move on.

In our example, we wanted two different values from each item node, which is why we asked for the item nodes and then processed each of them. However, if we only needed the e.g. the titles of each item, we could change the XPath query to something like this:

//rss/channel/item/title

It will match each title node in each of the item nodes. Here's the query with some C# code to make it all happen:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("http://rss.cnn.com/rss/edition_world.rss");
XmlNodeList titleNodes = xmlDoc.SelectNodes("//rss/channel/item/title");
foreach(XmlNode titleNode in titleNodes)
    Console.WriteLine(titleNode.InnerText);            
Console.ReadKey();

This article has been fully translated into the following languages: Is your preferred language not on the list? Click here to help us translate this article into your language!