Easy XML parsing
Posted by Davy Brion on 29th September 2007
I’ve mentioned my manual xml parsing in Noma before, but i wanted to give a detailed overview of how it works. I think this approach is pretty nice, and should be very easy to reuse. First of all, let me restate what i needed before i started working on it and why i ultimately chose this solution.
I needed to parse nhibernate mapping files (which have to conform to the nhibernate-mapping.xsd schema) and i wanted all of the data to be available in an easy to use object model. The easiest approach would be to generate classes based on the nhibernate-mapping.xsd through the xsd.exe tool. Then it would simply be a matter of deserializing the nhibernate mapping file and you’d have your data available. Unfortunately, using classes generated by xsd.exe is a pain and i’d only consider it in the case of a very simple schema. The nhibernate mapping schema is very extensive and anything but simple so i didn’t go for this approach. I could also let xsd.exe generate a DataSet based on the schema. I’ll go out of my way to avoid DataSets even when i get paid to use them so i sure as hell won’t use them for a project i’m working on in my spare time. Another interesting approach was offered by a coworker of mine. The idea was to basically use xslt to transform the hibernate mapping xml data to another xml format that i could simply deserialize into my own object model. That is a very interesting approach, but i don’t like using xslt so i didn’t go for this approach either.
I didn’t really find another good approach so i just started parsing the xml files manually. It was a pain at first and the work progressed very slowly. But i had my tests to back me up, so i was often refactoring my parsing code to make it more reusable and to allow me to write less code as i went along to parse different kinds of xml elements. The result is a generic NodeParser class which really makes it easy to do your own xml parsing. If you ever need to parse xml files and you don’t have control over the schema, this approach is definitely worthy of consideration.
It all starts with the following abstract base class:
public abstract class NodeParser<T>
This class defines the following abstract method:
public abstract T ParseNode(XmlNode node);
If you need to parse a specific xml element, simply inherit from this class and pass the type of the resulting object (the one that will hold the data offered by the xml element) as the generic parameter. You will then implement the ParseNode method so it will create an instance of the type you defined and pass it the data contained in the given XmlNode. The NodeParser class provides several protected methods to make it very easy to retrieve the data from the XmlNode.
The first thing i wanted was a way to retrieve the value of an attribute of the element, so the following protected method is provided:
protected string GetAttributeValue(XmlNode node, string attribute, string defaultValue)
{
if (node.Attributes[attribute] == null)
{
return defaultValue;
}
return node.Attributes[attribute].Value;
}
As you can see, there’s not much to it. But you obviously don’t want to do this each an every time you need an attribute’s value so you can just use this method. Specific overloads also exist to retrieve boolean or numeric attribute values instead of string values.
That takes care of retrieving simple values of attributes. The nhibernate mapping xsd defines a lot of enumerations though, so i needed something easy to retrieve those values as well. Basically i just wanted a method where i could pass the xml attribute containing the enumeration value as a string, and it would return my own custom enumeration value. The following method provides that ability:
protected TValue GetAttributeValue<TValue>(XmlNode node, string attribute, TValue defaultValue,
ValueMapper<TValue> mapper)
{
string value = GetAttributeValue(node, attribute, (string)null);
if (value == null)
{
return defaultValue;
}
return mapper.GetValue(value);
}
More information about the ValueMapper type can be found here. This allows me to retrieve whatever enumerated value i want to, provided that i have a class derived from ValueMapper which maps the string values to the specific enumerated values.
So what do we have now? A very easy way to retrieve the value of an attribute in any way we want to: a string, a numeric value, a boolean, or an enumerated value. If we need support for more types (dates for instance), it’ll be easy to add another overload for that specific type. We can get all of the values of all the attributes of an xml element very easily now. Now let’s move on to child elements and collections of child elements.
The following methods provide a way to retrieve one child node, or a list of child nodes based on an xpath query:
protected XmlNodeList FindChildNodes(XmlNode parentNode, string xpath)
{
return parentNode.SelectNodes(xpath, GetNamespaceManager(parentNode));
}
protected XmlNode FindChildNode(XmlNode parentNode, string xpath)
{
return parentNode.SelectSingleNode(xpath, GetNamespaceManager(parentNode));
}
private static XmlNamespaceManager GetNamespaceManager(XmlNode node)
{
XmlNamespaceManager nsManager = new XmlNamespaceManager(node.OwnerDocument.NameTable);
nsManager.AddNamespace(Constants.XmlNamespacePrefix, Constants.XmlNamespace);
return nsManager;
}
Even though xpath is very powerful, i don’t really like using it so if i can hide it in a method, i surely will:
protected XmlNode GetChildNodeOfType(string elementName, XmlNode parentNode)
{
return FindChildNode(parentNode, "./" + Constants.XmlNamespacePrefix + ":" + elementName);
}
protected XmlNodeList GetChildNodesOfType(string elementName, XmlNode parentNode)
{
return FindChildNodes(parentNode, "./" + Constants.XmlNamespacePrefix + ":" + elementName);
}
So now i can use these methods to easily retrieve a child element, or a list of elements simply by passing the parent xml node and the name of the element(s) i’m looking for. That’s nice, but then i still need to convert that element to a typed object, or perhaps a list of typed objects. Fortunately, i can reuse the NodeParser type for this:
protected TObject GetObjectFromNode<TObject>(string elementType, XmlNode node,
NodeParser<TObject> nodeParser)
{
XmlNode objectNode = GetChildNodeOfType(elementType, node);
if (objectNode == null)
{
return default(TObject);
}
return nodeParser.ParseNode(objectNode);
}
This method will return a typed object from the xml node passed into it, based on the type of NodeParser passed in to it. A NodeParser could very well contain other NodeParser types to parse specific child elements. These contained NodeParsers can then be passed to this method to retrieve other typed objects. This might seem weird at first, but you’ll see an example of this later on which will make it clear.
One more thing i needed was an easy way to retrieve a whole collection of typed objects representing multiple child elements within an xml node. We can use the same trick we used in the GetObjectFromNode method for this:
protected Dictionary<string, TObject> CreateDictionaryOfObjectsInNode<TObject>(string elementType,
XmlNode node,
NodeParser<TObject>
nodeParser)
where TObject : INamedMapping
{
Dictionary<string, TObject> objects = new Dictionary<string, TObject>();
foreach (XmlNode objectNode in GetChildNodesOfType(elementType, node))
{
TObject typedObject = nodeParser.ParseNode(objectNode);
objects.Add(typedObject.Name, typedObject);
}
return objects;
}
This method returns a dictionary containing typed objects representing all of the child elements in the parent xml node that match the given elementType. And obviously, it uses another typed NodeParser to parse those child elements.
So far, all of this has been pretty abstract and i can imagine it’ll be a lot clearer once you take a look at the following examples.
Let’s start with a simple one:
public class ManyToManyMappingParser : NodeParser<ManyToManyMapping>
{
public override ManyToManyMapping ParseNode(XmlNode node)
{
return new ManyToManyMapping(
GetAttributeValue(node, Attributes.Class, DefaultValues.ManyToManyClassName),
GetAttributeValue(node, Attributes.Column, DefaultValues.ManyToManyColumnName),
GetAttributeValue<FetchMode>(node, Attributes.Fetch, DefaultValues.ManyToManyFetchMode,
MapperProvider.FetchModeMapper),
GetAttributeValue<NotFoundAction>(node, Attributes.NotFound, DefaultValues.ManyToManyNotFoundAction,
MapperProvider.NotFoundActionMapper),
GetAttributeValue(node, Attributes.Where, DefaultValues.ManyToManyWhereClause));
}
}
Now, this NodeParser will parse nhibernate’s many-to-many element and convert it to a ManyToManyMapping instance. As you can see, 3 simple attribute values are retrieved (Class, Column and Where), together with 2 enumerated values (FetchMode and NotFoundAction). As you can see, this is not a lot of code and is very readable.
The many-to-many element is a possible child element of a list mapping. So let’s see how a list element is parsed:
public class ListMappingParser : NodeParser<ListMapping>
{
private readonly CompositeElementMappingParser _compositeElementParser = new CompositeElementMappingParser();
private readonly ElementMappingParser _elementMappingParser = new ElementMappingParser();
private readonly IndexMappingParser _indexMappingParser = new IndexMappingParser();
private readonly ManyToManyMappingParser _manyToManyMappingParser = new ManyToManyMappingParser();
private readonly OneToManyMappingParser _oneToManyMappingParser = new OneToManyMappingParser();
public override ListMapping ParseNode(XmlNode node)
{
return new ListMapping(
GetAttributeValue<Access>(node, Attributes.Access, DefaultValues.ListAccess, MapperProvider.AccessMapper),
GetAttributeValue(node, Attributes.BatchSize, DefaultValues.ListBatchSize),
GetAttributeValue<Cascade>(node, Attributes.Cascade, DefaultValues.ListCascade,
MapperProvider.CascadeMapper),
GetAttributeValue(node, Attributes.Check, DefaultValues.ListCheck),
GetObjectFromNode<CompositeElementMapping>(Elements.CompositeElement, node, _compositeElementParser),
GetObjectFromNode<ElementMapping>(Elements.Element, node, _elementMappingParser),
GetAttributeValue<FetchMode>(node, Attributes.Fetch, DefaultValues.ListFetchMode,
MapperProvider.FetchModeMapper),
GetAttributeValue(node, Attributes.Generic, DefaultValues.ListGeneric),
GetObjectFromNode<IndexMapping>(Elements.Index, node, _indexMappingParser),
GetAttributeValue(node, Attributes.Inverse, DefaultValues.ListInverse),
GetAttributeValueFromChildElement(Elements.Key, Attributes.Column, node),
GetObjectFromNode<ManyToManyMapping>(Elements.ManyToMany, node, _manyToManyMappingParser),
GetAttributeValue(node, Attributes.Name, DefaultValues.ListName),
GetObjectFromNode<OneToManyMapping>(Elements.OneToMany, node, _oneToManyMappingParser),
GetAttributeValue(node, Attributes.Persister, DefaultValues.ListPersister),
GetAttributeValue(node, Attributes.OptimisticLock,
DefaultValues.ListReAcquireOptimisticLockWhenDirty),
GetAttributeValue(node, Attributes.Schema, DefaultValues.ListSchema),
GetAttributeValue(node, Attributes.Table, DefaultValues.ListTableName),
GetAttributeValue(node, Attributes.Where, DefaultValues.ListWhere));
}
}
Whoa, that’s a lot all of a sudden! But when you think about the complexity of the list element in an nhibernate mapping file, this is actually not a lot of code. This specific NodeParser contains references to other NodeParsers to parse ElementMappings, CompositeElementMappings, IndexMappings, ManyToManyMappings and ManyToOneMappings. All of which are possible child elements of a list mapping. Take a look at the calls to GetObjectFromNode. By passing along a NodeParser to parse a specific child element, the GetObjectFromNode method can retrieve the typed mapping object you want. And all you needed to do was call GetObjectFromNode. Well you also had to develop the specific NodeParser class but at least you can reuse that one later on. Anyway, i hope the usage of GetObjectFromNode is now very clear.
I still haven’t shown you an example of the CreateDictionaryOfObjectsInNode method being used so here’s one:
public class CompositeElementMappingParser : NodeParser<CompositeElementMapping>
{
private readonly ManyToOneMappingParser _manyToOneMappingParser = new ManyToOneMappingParser();
private readonly PropertyMappingParser _propertyMappingParser = new PropertyMappingParser();
public override CompositeElementMapping ParseNode(XmlNode node)
{
return new CompositeElementMapping(
GetAttributeValue(node, Attributes.Class, DefaultValues.CompositeElementClassName),
GetAttributeValueFromChildElement(Elements.Parent, Attributes.Name, node),
CreateDictionaryOfObjectsInNode<ManyToOneMapping>(Elements.ManyToOne, node, _manyToOneMappingParser),
CreateDictionaryOfObjectsInNode<PropertyMapping>(Elements.Property, node, _propertyMappingParser));
}
}
A composite element mapping can contain one or more property mappings, and one or more many-to-one mappings. With the little bit of code shown above we can parse the entire node and convert it to a typed object that will be easy to use.
I hope the entire usage of the NodeParser class is clear and i hope i demonstrated that doing your own xml parsing doesn’t have to be that cumbersome. It’s still a lot more work than simply deserializing into classes generated by xsd.exe but i think the end result of this approach is a lot better.
Posted in Software Development | No Comments »