The XML in your objects is closer than it appears

DaveO and MarkNot replied to last night's post. Both raised interesting points, to which I'm compelled to respond.

To start with, though, let me be clear: I want interoperable web services just as much as anyone else. If web services don't interoperate, then we spent the last 5 years working on them for nothing. I just disagree with the path being proposed.

Thinking about posts from the last couple of weeks, I couldn't help noticing the odd path they've taken... I said I want RelaxNG. Some agreed with me. Aaron (and others) said, too bad, there's industry concensus on XSD and that's what we're using. Dare said that XSD was easier for object mapping than RNG. Then Dave responded to my original post about not profiling indicating that in fact people can't get XSD-based object mappers to work right and WS-I should act. So what exactly is the concensus on XSD; that we have to use it, but it doesn't work so we shouldn't use it really, we should derive a new language from it instead? (And, as Mark asks, whose concensus is it anyway?) If, just as I was starting to accept using XSD again (as I periodically do), people say “well, actually not XSD but something else derived from it“, I have to wonder why, again, we can't move to RNG.

Now, to comments (my favorite part of the morning :-)...

Mark rejects my concern about subsetting XSD pointing us toward RPC:

That doesn't follow. Just because we're limited to a subset of Schema doesn't mean that it magically becomes RPC; it's possible to have document-oriented messages without using the Infoset and Schema; that was precisely the point of my original entry.

When we worked on the WS-I Basic Profile, a lot of people focused on mapping operations to methods and messages to objects. Some of us viewed that as an implementation detail and while we wanted to support it, we didn't want to restrict people to that. We talked at the time about subsetting XSD to remove, for instance, redefine and element substitution groups. These were specific concerns to people working on object serialization plumbing. I argued at the time that you could always simply surface XML whenever you couldn't do a reasonable object mapping, but most people rejected that because they don't want developers to have to see any angle brackets.

So, frankly, the heart of my concern about WS-I subsetting XSD is that the subset will be restricted to the things that map obviously to modern OO languages. For someone like me, who is willing to (and even likes to) deal with XML and happens to work in a document-centric problem space, that's troubling. I don't want to lose options for describing certain XML structures just because someone else wants to pretend that Web services are about methods and objects. I also don't want to be forced to have a very loose schema definition and then some additional document that says what I really mean (don't think I haven't thought about making all my schemas wildcards with an RNG definition in appInfo ;-), as Mark seems to suggest. Why should I have to do that, just because someone else has a tool that isn't working for them? If users care, as DaveO suggests, do they care about people like me?

The case in point is my argument about using XHTML. Mark thinks I'm not gaining much because there aren't any semantics captured there anyway. There are several counter arguments. First, as a system that publishes documents for presentation, that is the semantic of my system. Next, even if there is no semantic information, the syntax validation is helpful (especially since we use an XHTML 1.1 modular doctype that does not allow certain elements, like script). Finally, does XSD ever capture semantics, or is it just a complicated way to define XML syntax? Anyway, if my system consumes and produces XHTML, why shouldn't I be able to use the W3C XSD definition as part of my Web service's contract? Why should I have to do something looser and non-standard and non-interoperable because of how someone's object mapper works? The pushback is only coming from people who want to program against my service with a certain tool and find that it doesn't work well. If using XHTML as an example confuses the issue, consider the other vertical languages I mentioned in my last post: XBRL and HL7. They convey data, not presentation. Are they to be restricted away or redefined if they don't fit the profile? Everyone agreed (so they tell me) to use XSD, and they did. We can't rescind that now.

In short, I connect subsetting XSD to RPC because I believe that's the force that's really driving it. It's because developers writing classes and exposing them as Web services are having trouble with their tools. If people were willing to move to a simpler language like RNG, instead of trying to derive such from XSD, I could get on board. But RNG doesn't necessarily map all that well to objects either. The real problem is that XML doesn't map that well to objects, because the content model of an XML element is much more flexible than the content model of a class with fields and properties.

That brings me around to Dave's latest post on this thread. He feels customer pain and is frustrated that all the vendors can say is “wait until the tools get better“. I agree. Since that position is unacceptable for developers, profiling XSD seems like the best bet, unless someone has a concrete counter-proposal. I disagree. So, here's my counter proposal: don't build better tools, build different tools.

Steve chimed in last night and said something I've felt and said myself for years: the object/XML mapping problem is akin to the object/relational mapping problem, and just as hard. We've spent more than a decade trying to get O/R mappings to work correctly with very little success. We're 5 years into O/X mappings and we're seeing the same problems. In fact, XML and SQL are more alike in their ability to construct flexible data “shapes“ than objects are. It kills me that people want to build systems that store data on the back-end in a relational database and sends it out on the wire as XML, but in between it has to all be objects. There are two reasons for this that I see. One is that “developers shouldn't have to know“, which is a fantasy, at least at this point. You only get to not know when the tooling actually does an effective job for you (for instance, I'm so confident a good optimizing compiler can generate more efficient assembly than I can that I don't even think about it anymore). We've never gotten anywhere near that with these data mappings. The other is that they need a good way to implement their systems logic and it's hard to do on top of raw SQL data and raw XML data. Steve's suggestion is that we have to stop trying to hide the XML inside compile-time-generated classes and I totally agree. That doesn't mean (to me at least) that you should never map XML to objects if that's what you want. But it should be a deliberate act when desired, not the default behavior of the tools you use. You should treat it as one of many options you have for implementing your system, and one with significant limitations at that.

I often hear people say that if you don't map Web services to something familiar, i.e., objects and methods, developers won't adopt them. I don't buy it. The success of ASP and it's descendants on all platforms say that's not true. ASP used objects and script to provide a model for constructing pages that was different from what had gone before. It was easy and effective and developers loved it. So it's not familiarity but simplicity that matters. I also often hear people say that XML is too hard and that objects make it easy for people. Well, based on this thread, clearly the second part is false. It isn't XSD that's at fault though, it's the two layers of OO serialization plumbing between the author of a service and the author of its client.

The solution, I believe, is to make tools and languages that surface XML and make it easier to deal with directly. XPath, XSLT, XSD are languages that allow you to act on XML in particular ways and you can use them to implement portions of your system. What's missing is good, simple ways to manage and deploy them, invoke them from code, etc. Yes, this implies that people have to learn something about these languages, the same way they had to learn about HTML and JScript and SQL, etc. (this is one of the reasons I think RNG is the place to go). But this shouldn't surprise us. We can't build an entire distributed system infrastructure on XML while at the same time saying that no one should have to know anything about it. It just won't work.

I still meet people who've designed a Web service that passes one string in and one string out. The strings contain XML. They do this to bypass their marshalers, and it works. Yes, they have to read and write XML, but so what? It's effective and interoperable; and because they are deliberate about the data they consume and produce, it opens the door to loose-coupling in ways that static object mappers never will.

Finally, Dave asks what it means for WS-I if it can't solve this problem. I worry about what it means if it does. If WS-I restricts XSD to a subset that makes OO programmers happy, but XML programmers sad, we run the risk of derailing the whole train. Companies adopt standards as a matter of policy. Where I work, people have asserted that all services should be WS-I BP compliant. If, someday, they also asserted that all services be WS-I XSD subset compliant, we'd might well be forced to either abandon plans for some services or abandon the WS stack and go back to raw XML over HTTP. Frankly, for the work I do, I'd rather that WS-I was irrelevant than WS-* was irrelevant.


Posted Sep 03 2004, 06:45 AM by tim-ewald

Comments

Paul Downey wrote re: The XML in your objects is closer than it appears
on 09-03-2004 6:50 AM
Tim, i want to applaud you for promoting this discussion,

i hate the "soap shovel" approach of shoving an XML document into a string. Without a schema that's like going back to screen scraping. it's not the serialising/deserialising of objects that the issue here, it's the wanting to process messages based upon a formal description.

Like you i'd be happy to move to a different description language, RelaxNG, heck why not RDF? but that's like the joke about being lost asking for directions and being told "i wouldn't start from here if i were you".

So i'd like to ask you the question what should a company do to ensure if it wants to publish a Web service today but wants to reach a market place populated by developers wanting to use .NET, BEA, IBM, Apache, SOAP::Lite, etc tools?

i certainly don't want to make document people sad, and doubt giving them a clear road to travel will do so. i'd say that the publisher of a vertical standard will thank the WS-I for saying here is a vocabulary that is *known* to work with tools rather than just "drink the whole of W3C schema".

i'd like to remind folks that what sold Web services to many people was across the board interoperability *and* convenient tools.
Erik Johnson wrote re: The XML in your objects is closer than it appears
on 09-03-2004 8:15 AM
I can’t say whether profiling schema does anything useful beyond letting tools vendors off the hook in providing quality products. My initial conclusion is that it does not. But I haven’t seen a list of issues or situations yet that drove the WS-I to crank up the WG.

I seems like many tools vendors started on the wrong foot with respect to XML Schema. They started by generating schemas from type/class definitions. They then built tools to reconstitute types from those schemas (for client-side processing). There was little need to broadly implement the XML Schema spec – you only built what was needed to close your own loop using only Schema features that intersect with the target platform. It’s no wonder interop problems pop up when you then try and aim these stacks at each other. But it’s really not the fault of the Schema Spec, is it?

Take xs:nonPositiveInteger as an example. There is no analagous native type in many languages, so does this mean I can't use it anymore? Does the fact that XSD.EXE in .NET (a type building tool) emits nonPositiveInteger as a string an interop problem? Not to me, so long as my service rejects messages where that value > 0. Of course users of the generated type could easily be mislead into thinking a string ("-1") is a valid value. This is where I think the tools simply need to improve. I wouldn't throw out .NET altogether -- just bypass what I think is a problem in one area.

In the meantime, should my using nonPositiveInteger keep me from being WS-I Profile conformant? I would hope not.

But if a caller cannot determine my message format because of ambiguity in XML Schema, then there is a case for profiling. If there is an area of XML Schema as anachronistic as SOAP encoding (outlawed by the Basic Profile practically on the first day) then that is also a case for profiling.

I agree with Tim that that history is is not on our side. The Basic Profile carries baggage just for RPC method signature divination. A "WS-I Schema Profile" could become dominated by restrictions aimed solely at simplifying serializer/code generator implementations. If that happens, then we are headed down the road to multiple web service conventions (if not standards).
The XML Files wrote What does the XSD profiling mean?
on 09-03-2004 8:42 AM
Dan Diephouse wrote re: The XML in your objects is closer than it appears
on 09-03-2004 8:45 AM
I've been working on a tool to create document oriented web services with your object model. Its only for java right now, but you can see an example here:

http://xfire.codehaus.org/Aegis

It allows me to map an arbitrary object model to an arbitrary document via OGNL (like XPath, but for objects). It uses a pseudo-xsd type descriptor to define the services/create WSDL. Its still a bit RPC oriented, but it sure beats writing WSDL/XSD by or serializing your object model.

I hope to do an alpha release next week, but I thought you might find it interesting.
Dilip wrote re: The XML in your objects is closer than it appears
on 09-03-2004 11:22 AM
"We've spent more than a decade trying to get O/R mappings to work correctly with very little success"

Atleast the "Other" side doesn't think so. Just look at http://www.hibernate.org
Tim wrote re: The XML in your objects is closer than it appears
on 09-03-2004 11:32 AM
There are plenty of people on "This" side who like the O/R idea too. I'm not a fan of it on any side.

Tim-
Dilip wrote re: The XML in your objects is closer than it appears
on 09-04-2004 4:59 AM
Tim

I am not talking about simple likes & dislikes. When I say "just look at http://www.hibernate.org" I meant "look at how successful this particular O/R implementation has turned out to be". It directly contradicts the comment you made in your post about O/R mappers.
Steve Loughran wrote re: O/X mapping and O/R mapping
on 09-04-2004 6:33 AM
Dilip, Tim was repeating my claim that O/R mapping still sucks, and I am on "the other side", assuming that means the Java cam, though I pine for Prolog, which can represent graphs properly.

Hibernate is the best yet in java land, certainly compared to EJB. And how many years has it taken to come up with hibernate, which is just mapping tables and queries to java objects. Also, popular doesnt mean perfect. Hey, EJB1.x was popular for a while too :)

Now look at XML: arbitrary graphs of stuff. XML schema, lets you describe subclasses though subtraction as well as extension, which no OO language does yet (AFAIK). Then add the fact that a sufficiently complex hand-written XSD is extensible with arbitrary stuff, something your endpoint has to handle.

For example, my current endpoint uses this schema: http://cvs.sourceforge.net/viewcvs.py/smartfrog/core/components/cddlm/src/org/smartfrog/services/cddlm/xsd/deployAPIschema.xsd?rev=1.16&view=markup

and it has things that are trouble
1. enums. Axis handles these but names them 'value1', 'value2' instead of giving them meaningful names.

2. anywhere I have xsd:any my endpoint hands me arrays of org.apache.axis.MessageElement structs. What can I do with those? I want to bind them to XOM, which is a painful translation.

3. the runtime completely ignores minoccurs and maxoccurs. so I have to do that myself. Which adds extra grunt work to the endpoint logic
( http://cvs.sourceforge.net/viewcvs.py/smartfrog/core/components/cddlm/src/org/smartfrog/services/cddlm/api/DeployProcessor.java?rev=1.11&view=markup ) and more complex tests. Or you forget about the extra logic and end up with code that crashes when someone sends too much or to little data.

We have to do better than this. I know what I am doing and it hurts. Think how much harder it is for people who dont know what they are doing...



Tim wrote re: The XML in your objects is closer than it appears
on 09-05-2004 9:55 AM
Dilip,

Your example of the Hibernate framework does not directly contradict my quote, which was "We've spent more than a decade trying to get O/R mappings to work correctly with *very little success* [emphasis added]." Yes, there are cases where people have solved a particular problem by using an automatic O/R mapping. I concede that point. But as far as I know, no such approach has ever become really mainstream, so much so that it has replaced traditional database access techniques. How many people use Hibernate compared to JDBC?

Perhaps I should have said that I've never encountered an O/R mapping layer that did what I needed, at the level of performance I required, and worked on top of the store-procedures and views that made up the API to my database. (Certainly, I've seen plenty of attempts that didn't do what I needed, were really slow, and required changes to my database.) But hey, this my blog reflecting my opinion, which is that we've put a ton of effort into O/R mapping and seen very little benefit. If Hibernate turns out to be the answer, great. I'll admit I was wrong and will adopt it straight away. But I'm not holding my breath.

The same is happening with XML. People who insist that the only way they want to process XML is by mapping it to objects (the "XML at the edge" view) are really missing a lot of the power of the technology while at the same time creating a lot of problems for themselves.

Interestingly, in my current project, we got a lot of traction out of an building an X/R mapper that used XSLT to generate an intermediate language that drives a generic data access API. I was only using it for shredding XML into the database, breaking some of it down into tables and leaving other pieces as blobs (which are written in as chunks for efficiency), and it worked quite well, without anything special happening in the database to support it.
Peter Rodgers wrote re: The XML in your objects is closer than it appears
on 09-06-2004 12:01 AM
Back in 99 at HP, before SOAP/WS/XSD etc, we developed a bunch of engines for XML message protocols using a data binding approach (X/O mapping).

The problem we had was that, whilst you can always create an object mapping, the XML messages (like any web-service, anyone had to update a web site recently?) are constantly in flux. That's not technology, that's business. Building these very early WS systems wasn't a technical problem, it was that the economics of the systems didn't add up - you demand flexibilty but, object binding/procedural approaches to handling XML in general, are brittle and the ongoing maintenance costs were too high.

I've been banging on to Steve ( Loughran ) for years about the current state of XML systems being directly analogous to early RDBMS (see http://www.1060research.com/whitepaper/netkernel.html). RDBMSs written as procedures were brittle and too costly to maintain. But with XML-services there's not been a declarative abstraction like SQL to raise the lowest common denominator. So by (a loose) analogy with SQL it makes sense to do XML processing in a declarative environment.

Incidentally, for a good summary of why it's not just about which Schema technology you choose see the summary of the 'schema meets reality' thread on xml-dev, http://lists.xml.org/archives/xml-dev/200409/msg00027.html.
Dilip wrote re: The XML in your objects is closer than it appears
on 09-06-2004 5:36 AM
Tim

Before this discussion gets sidetracked, let me just point out that I am in complete agreement with you guys regarding trying to shoehorn XMLism into OOism or vice versa. I was just trying to draw your attention to that one point you (& Steve) made about O/R Mappers in your otherwise excellent post.

Too many times in the past we have been talking about how "difficult" it is to create such mapping implementations and how "little success" we've had so far. Well, there are some people who have gone ahead and created such frameworks just to prove the opposite. One such framework is Hibernate. I can't give your any statistics on how many people use Hibernate in favor of JDBC but you'd be extremely surprised at the answer. A casual surfing of www.serverside.com will tell you that Hibernate has been heralded as nothing less than Christ's second coming!

I don't work in the Java side so I have no idea how _effective_ the framework is but its available as open source (the .NET contemporaries LLBLGenPro and EntityBroker are paid frameworks). You could just download it and give it a spin.

Trust me, you'd be pleasantly surprised. Did you get a chance to look at this?
http://www.hibernate.org/21.html

Add a Comment

(required)  
(optional)
(required)  
Remember Me?