A relaxing approach to XSD

In my last post last week, I said I'd write up the guidelines I use when I work with XSD. The approach that I take is shaped by two factors: RelaxNG and a desire to design schemas that are easy to work with as both XML and mapping to objects. It embraces how XML works and avoids the ways that XSD seeks to extend that. Here are the guidelines I follow:

1) Focus on document shape, not named complex type

Apps that consume XML look at element and attribute names and namespaces in order to decide what to do. This is true for code you write against XML APIs like XmlReader or XPathDocument. It's also true for XmlSerializer, which drives its behavior off of a CLR type annotated with element and attribute names and namespaces. In practice, the only time we really deal with XSD types directly in an app is when we map the single text value inside an element or attribute to a simple type like a date or an integer. (We also deal with types directly when someone uses xsi:type to substitution of one complex type for another, but as I'll argue shortly, you should not allow that in a schema.) Your schemas should focus on the structure of the elements and attributes, not their named complex type.

So, in practice, instead of writing this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:example-org:xsd"
           xmlns:tns="urn:example-org:xsd"
           elementFormDefault="qualified">

  <xs:complexType name="person">
    <xs:sequence>
      <xs:element name="name" type="xs:string" />
      <xs:element name="age" type="xs:int" />
    </xs:sequence>
  </xs:complexType>

  <xs:element name="person" type="tns:person" />

</xs:schema>

I prefer this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:example-org:xsd"
           xmlns:tns="urn:example-org:xsd"
           elementFormDefault="qualified">

  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name" type="xs:string" />
        <xs:element name="age" type="xs:int" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

The second approach reflects the structure of a person instance, not the complex type. Introducing the complex type as a named entity adds no benefit in this case, no one processing the XML will ever deal with that information. It just confuses things.

2) Prefer anonymous types

It follows from (1) that anonymous types are preferrable to named types. Named types just confuse the issue because, as I observed above, they don't really mean anything to 99% of the code consuming XML instances. At least, they don't mean anything at run-time. They may be used at dev time to generate classes for XML to object serializers to use, but any serializer worth it's salt should work well with anonymous types too. XmlSerializer certainly does.

There is only one XML construct that requires a named type in order to describe it properly in XSD: an element with text only content and attributes. This requires a named simple type which is extended by a complex type. The complex type may be anonymous. If you are extending one of the built in XSD simple types, that's your named type and you don't need another.

3) Mark your schema blockDefault="#all"

If you can't bring yourself to use anonymous types as per (2), then make sure you control substitution. The vast majority of XSDs I've seen use named complex types and do not restrict either type or element substitution. This is a mistake because the vast majority of apps consuming XML are not prepared for substitutions that aren't known a priori. If you open the door for type or element substitution, you can't restrict it to a well-known set of possible substitute values that you define. Since no one writes apps that are prepared to handle all possible substitutions, you're much better off turning this all off by marking your schema blockDefault="#all", like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:example-org:xsd"
           xmlns:tns="urn:example-org:xsd"
           elementFormDefault="qualified"
           blockDefault="#all">

  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name" type="xs:string" />
        <xs:element name="age" type="xs:int" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

This doesn't mean that you can't have named complex types that derive from one another, just that you can't substitute one for another at runtime. If you want that sort of polymorphism, focus on containment and choice, which better reflect how XML really works and how people program with it.

Of course, if you can bring yourself to use anonymous types, this is a non-issue because no one can derive from them.

4) Reuse elements and attributes

Once you've restricted how you use complex types to (2) prefer anonymous types or (3) block substitution at the very least, you need a way to reuse definitions. I prefer to reuse elements and attributes, reflecting how XML really works. To make an element definition reusable, you have to declare it globally. Here's an example:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:example-org:xsd"
           xmlns:tns="urn:example-org:xsd"
           elementFormDefault="qualified"
           blockDefault="#all">

  <xs:element name="name" type="xs:string" />
  <xs:element name="age" type="xs:int" />

  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="tns:name" />
        <xs:element ref="tns:age" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="names">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="tns:name" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

In this case, the {urn:example-org:xsd}name element is reused in the definition of {urn:example-org:xsd}person and {urn:example-org:xsd}names.

I really like global element decls for a couple of reasons, in addition to reuse. First, a GED can be the root of a validation episode, so you can validate any subtree in an instance as long as it starts on a GED. Similarly, a GED can be the root of an instance document, making it possible to treat a subtree as a valid document in its own right. Second, it pushes you toward globally unique names; that is, each element qname has exactly one meaning.

It's worth noting that some people don't like to use more than one GED because they want only one possible root element for their document. If you have more than one GED, it isn't clear which one(s) can be the root of a document being sent from A to B. I would solve that problem with a marker of some sort, either an attribute that extends XSD or an element inside xs:appInfo. Note as well that, even with one GED, you have to check that an instance doc you are handed has the right root element. The XSD validation process will simply skip a document with an unexpected root element, it won't raise an error.

5) If you want to reuse structure, consider groups

If you want to reuse a structure that combines multiple elements without wrapping them in there own element, use a group. You can do the same thing with attributes using attribute groups. So, if I wanted to repeat the structure of a person, I could do this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:example-org:xsd"
           xmlns:tns="urn:example-org:xsd"
           elementFormDefault="qualified"
           blockDefault="#all">

  <xs:element name="name" type="xs:string" />
  <xs:element name="age" type="xs:int" />

  <xs:group name="personContent">
    <xs:sequence>
      <xs:element ref="tns:name" />
      <xs:element ref="tns:age" />
    </xs:sequence>
  </xs:group>

  <xs:element name="person">
    <xs:complexType>
      <xs:group ref="tns:personContent" />
    </xs:complexType>
  </xs:element>

  <xs:element name="salary" type="xs:double" />

  <xs:element name="employee">
    <xs:complexType>
      <xs:sequence>
        <xs:group ref="tns:personContent" />
        <xs:element ref="tns:salary" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:group name="personLike">
    <xs:choice>
      <xs:element ref="tns:person" />
      <xs:element ref="tns:employee" />
    </xs:choice>
  </xs:group>

  <xs:element name="crowd">
    <xs:complexType>
      <xs:sequence minOccurs="3" maxOccurs="unbounded">
        <xs:group ref="tns:personLike" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

In this case, the definition of {urn:example-org:xsd}employee reuses the structure of person, without containing or deriving from a person. The definition of the {urn:example-org:xsd}crowd element shows how to use choice to achieve the equivalent of polymorphic elements without using element substitution groups. I think this approach is much cleaner because, again, it reflects how XML itself works.

There are limitations to using groups that it's important to be aware of. You can't define a group that includes both elements and attributes. There are element groups and attribute groups, but they are separate. You also can't use a group to define the structure of a simple type, e.g., the facets of its restriction. If you need those things, go ahead and define a type, but remember to (3) block substitution to avoid unpleasant surprises. In most of the schemas I write, I have no named types. If I have any, they are simple types.

6) Use all instead of sequence if you want to

A lot of people discount the need to order elements in an XSD content model of any complexity as a small price to pay. In my experience, however, it can be a real -- and unexpected -- pain. In the system I've been working on lately, we use XML extensively. We have a large library of XML test cases. As we modify the structure of our documents, maintaining those tests is hard, because if you maintain order, you have to ensure that changes to the tests conform to the new order. Most people are dissatisfied with an order that reflects the order of edits to the schema because it doesn't make any sense to someone approaching the latest version of the schema for the first time. They want some sort of semantic pattern that will make sense to schema users instead. That often means reshuffling elements to make some sort of sense as changes are made; and that means updating loads of test cases. That experience left me wanting my elements to be either unordered or alphabetically ordered; either one makes it easy to know how to keep things working.

XSD uses the all compositor to support unordered. All isn't nearly as good as RelaxNG's interleave pattern, but you can make do. Basically, all requires every element it contains to appear 0 or 1 times. If you are willing to wrap elements that appear multiple times in a container element, this works fine. So if a person had multiple addresses, you'd need to do this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:example-org:xsd"
           xmlns:tns="urn:example-org:xsd"
           elementFormDefault="qualified"
           blockDefault="#all">

  <xs:element name="name" type="xs:string" />
  <xs:element name="age" type="xs:int" />
  <xs:element name="address" type="xs:string" />
  <xs:element name="addresses">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="tns:address" minOccurs="1" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:group name="personContent">
    <xs:all>
      <xs:element ref="tns:name" />
      <xs:element ref="tns:age" />
      <xs:element ref="tns:addresses" />
    </xs:all>
  </xs:group>

  <xs:element name="person">
    <xs:complexType>
      <xs:group ref="tns:personContent" />
    </xs:complexType>
  </xs:element>

  <xs:element name="salaray" type="xs:double" />

  <xs:element name="employee">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="tns:person" />
        <xs:element ref="tns:salary" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:group name="personLike">
    <xs:choice>
      <xs:element ref="tns:person" />
      <xs:element ref="tns:employee" />
    </xs:choice>
  </xs:group>

  <xs:element name="crowd">
    <xs:complexType>
      <xs:sequence minOccurs="3" maxOccurs="unbounded">
        <xs:group ref="tns:personLike" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

This approach has one big drawback: once you've defined a group with an all compositor, you can only reuse that group as the top level compositor of an element and you can't add additional content. So the definition of {urn:example-org:xsd}employee would have to be modified to actually contain a person (or some other element with the desired content model). This has nothing to do with the use of anonymous types or global elements, named types and local elements would present the same problem. It's simply an intrinsic limitation of the all compositor.

One way to work around that is to define elements that contain wildcards restricted to the target namespace. Mixed with global element decls, you get a very flexible content model indeed. It makes versioning simpler, but occurence constraints harder. My current system works this way, but the juries out on whether it's the best approach.

Anyway, that's all that I have time to capture right now. The big topic I didn't talk about is the different approaches to versioning. More on that later. Also, in case you're wondering, all of the schemas here work with XmlSerializer. I haven't tested with other tools yet.


Posted Aug 24 2004, 10:09 AM by tim-ewald

Comments

on 08-24-2004 5:53 PM
Randy Charles Morin wrote re: A relaxing approach to XSD
on 08-24-2004 8:40 PM
I don't buy #1 and #2. I restrict use of xsd:element to elements that truly can be instantiated as a document root. When using tools like XMLSpy, if you use xsd:element to define all element types, then XMLSpy will allow you to use those elements as root elements. If you only use xsd:element to denote true root elements, then you can better guide the XSD user.
Fumiaki Yoshimatsu wrote re: A relaxing approach to XSD
on 08-24-2004 10:06 PM
Tim, you are basically following Kohsuke Kawaguchi's guideline written years ago. http://www.kohsuke.org/xmlschema/XMLSchemaDOsAndDONTs.html How do you think his guidelines?
At Your Service wrote More on using global element decls
on 08-25-2004 4:51 AM
Confluence: XFire wrote XML Schema Best Practices
on 08-25-2004 6:51 AM
How should we be writing our types? That is the question. As schema groups and then reference them as complex types?See A relaxing approach to XSD|http://pluralsight.com/blogs/tewald/archive/2004/08/24/2020.aspx and a followup|http://pluralsight....
At Your Service wrote XSD could make me very happy
on 08-25-2004 7:09 AM
The XML Files wrote XSD Can Make Us Happy
on 08-25-2004 11:45 AM
The XML Files wrote XSD Can Make Us Happy
on 08-25-2004 11:45 AM
At Your Service wrote WS-I to profile schema? This can only end badly!
on 08-26-2004 3:23 PM
Dan Finucane wrote re: A relaxing approach to XSD
on 08-26-2004 5:53 PM
I have been following your advice here to develop my first WSDL. So far I am happy with what I have but I have run into a problem. There are a handful of places where I would like the WSDL to describe arrays of complex types. I can't do that with anonymous types can I?
Tim wrote re: A relaxing approach to XSD
on 08-27-2004 9:13 AM
Yes, you can describe arrays of complex types. Here's an example with a persons element containing n person elements:

<xs:element name="items">
<xs:complexType>
<xs:sequence>
<xs:element name="item" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
...
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>

You could also make the item element a GED that's reused by reference.

Tim-
The XML Files wrote When something is
on 08-27-2004 9:45 AM
Erik Johnson wrote re: A relaxing approach to XSD
on 09-02-2004 8:51 AM
We also adopted most of the ideas presented here (I had forgotten to include the blockDefault="#all" bit).

One tools issue we ran into was that potential callers who want to create VS Web References weren't able to get .NET types created until we stuck a named complexType into the schema like this:

<xs:schema>
<!-- GED -->
<xs:element name="x"/>

<!-- Named type for VS XSD tool -->
<xs:complexType name="typeX">
<xs:sequence>
<xs:element ref="tns:X"/>
</xs:element>
</xs:complexType>

</xs:schema>

I don't know if it's still an issue in Whidbey.


At Your Service wrote A free market for XSD and tool interop
on 09-03-2004 8:24 AM
Jason Dossett wrote re: A relaxing approach to XSD
on 09-15-2004 5:22 AM
This was a really good article. Using these rules simplifies XSD to the most basic thing needed to do almost all of what I typically use XSD for.

A question: if you have an element that needs two references to the same global element, but needs them specified in such a way that they are distinct (as opposed to min and max equal to 2), how would you propose defining that? Would you use a group and declare two different global elements based on that group?

I've started an add-in for a popular UML tool to generate XSD from a class diagram, and this is the most complex case I have -- a class with two composite relationships to the same child class. To be consistent, I am considering mapping every class in the diagram to a group in the schema, then creating a global element every time a class is used in a relationship, using the role name as the element name.

Thoughts?
Confluence: XFire wrote Designing Web Services
on 11-09-2004 10:46 AM
Over the last several years we've learned a lot about web services. What to do and what not to do. This section should help piece some of that together and provide a set of best practices. Web Services Are Not Objects RPC/Encoded vs....

Add a Comment

(required)  
(optional)
(required)  
Remember Me?