About UsCommunityTrainingContent DevelopmentContact

Blogs
Pluralsight
Course Schedule
Scott Allen
Craig Andera
Mark Baciak
Don Box
Keith Brown
John CJ
Tim Ewald
Jon Fancey
Jon Flanders
Vijay Gajjala
Kirill Gavrylyuk
Ian Griffiths
Martin Gudgin
Jim Johnson
John Justice
Mike Henderson
Joe Hummel
Matt Milner
Ted Neward
Fritz Onion
Brian Randell
Jeffrey Schlimmer
Aaron Skonnard
Dan Sullivan
Herb Sutter
Doug Walter
Jim Wilson
Mike Woodring

My Links
Home
Contact
Login

Blog Stats
Posts - 127
Stories - 0
Comments - 452
Trackbacks - 232

Archives
Oct, 2007 (1)
Aug, 2007 (1)
May, 2007 (1)
Apr, 2007 (4)
Jun, 2006 (2)
May, 2006 (3)
Apr, 2006 (11)
Dec, 2005 (3)
Sep, 2005 (1)
Aug, 2005 (3)
Jun, 2005 (5)
May, 2005 (3)
Apr, 2005 (5)
Mar, 2005 (7)
Feb, 2005 (11)
Jan, 2005 (5)
Dec, 2004 (2)
Nov, 2004 (9)
Oct, 2004 (15)
Sep, 2004 (11)
Aug, 2004 (13)
Jul, 2004 (6)
Jun, 2004 (5)


Fight the power

Friday, October 19, 2007

Bill deHora has a great response to one of my (somewhat) recent REST posts. The timing is perfect, because I've been thinking a lot about what I said in those posts. I've also just finished reading the RESTful Web Services book, which is the most useful technical book I've read in a while, and have been thinking a lot about what it says too (and talking about it with Craig). I've been mulling over a post for a couple of days, and Bill's post is the impetus I needed to share what I've been thinking.

In RESTful Web Services, the authors define and argue the merits of Resource Oriented Architecture. ROA maps directly to the REST-as-CRUD model for thinking of the world, where interaction with your system is modeled entirely in terms of create, read, update and delete operations against data resources. For data-centric services, like the del.icio.us service that they refactor in the book, this model makes a ton of sense.

But what if your problem domain is more focused on processes than data? As the authors show with their proposal for modeling transactions, you can map any process to ROA with the following steps (or ones like them):

  • PUT/POST to create a new process resource
  • PUT to update that resource to include the data it needs
  • PUT to execute that process
  • GET to fetch the result

This maps any process to a set of resources. From this perspective, my argument that HTTP is all about POST and everything else is an optimization or unnecessary doesn’t make sense.

So could I model all processes the ROA way? Undoubtedly yes. But, what’s the value of this over just doing a POST with the data I want to process and getting the result back? It’s much easier to implement that because I don’t need the build-up of the process resource to span multiple stateless requests. I don’t ask this rhetorically, I really want to understand why people think this approach to modeling processes is better. (This is especially true if core bits of network infrastructure don’t support PUT/DELETE and you end up having to tunnel them through POST anyway.)

Maybe the process perspective is just the wrong way to think about the world, but it’s where a lot of people’s heads are at, including almost all the SOA/WS folks out there. In looking at my own system, I see one interface that is obviously data-centric and fits easily into the ROA model. Another interface is very process oriented. Right now we’re doing it with GET for read and POST for write, and it works and people get it. While I can see how to map it into the ROA model, I’m still unclear on how it helps.

So, to summarize, Bill is right: you shouldn’t follow what I said in that earlier post. (I always reserve the right to get smarter, and I’m using that now.) I’m still debating between the full ROA vs. GET for read and POST for write (again, especially if I have to tunnel through POST anyway), I really need to build some more stuff and see how it plays out.

posted @ 12:51 PM | Feedback (4)

Sunday, August 26, 2007

I haven't been reading or writing much lately - too heads down getting to 1.0. But I did happen across Don's recent post about retiring the four tenets of SOA and asking for input on what we'd like to write when we implement services. Let me take these topics in order. First, the four tenets of SOA...

What people are trying to build are loosely-coupled systems where pieces can be changed without breaking other pieces. I've built a couple of systems that meet that goal to a reasonable degree, so I don't think it's unreasonable for me to say that it's possible but hard and it takes a long time. In fact, it's just like building reusable components, which is also possible, hard and takes a long time. It isn't clear to me that many organizations are actually prepared to put in the work that loose-coupling really requires. What is clear to me is that the 4 tenets and the current tools aren't going to get them there. The only thing they really do is distract Capital-A-architects, which isn't a bad thing.

Which gets me to the second topic, what does the code I want to write look like. Here's my current thinking...

There are 4 essential distributed technologies: sockets, message queuing, RPC, and REST (in the distributed state machine sense I was writing about a couple months ago). There is a place for each in my current system. We use sockets to support certain industry and legacy protocols, both TCP and UDP, including multicast. We're looking at using message queuing for some ancillary processing off the main line of execution. We're using RPC to talk to SQL Server - either TDS, which is essentially RPC with streams, or SOAP via WCF, which is also essentially RPC despite what the proponents of mass customization exhort you to, in cases where a physical deployment separates our client from our database and the IT guys don't want to open 1433. For loosely-coupled cross-component integration we use REST.

I don't care that there are different APIs for all these technologies. In fact, I think it's a good thing because (a) I get an API tailored for the technology I'm using which (b) makes the differences between them clear. The only API that is lacking in .NET, IMO, is the REST one. If you aren't doing pages, ASP.NET leaves you hanging with IHttpHandler. Similarly, HttpListener isn't much of an API. Yes, I could put the new WCF REST bits on top, but I haven't, for several reasons. we needed a solution before they were available, so we already have a UriTemplate-based dispatcher of my own. We don't want to change .NET libraries at this point in the project, and can't move to a beta anyway. And finally, I'm not sure I want to build on a layer designed to factor HTTP in on top of a layer that was designed to factor it out.

The lack of a strong REST API is problematic because REST is a much better alternative for loose-coupling than RPC is. So, what do I want for an API? Easy: I want a UriTemplate matching layer (with a back-door for arbitrary regex's) that works on ASP.NET or HttpListener, abstracting away the differences between their respective context/request/response objects (which is just annoying). And I want a text-templating engine for output, a la NVelocity, with editor support. I'm close to building my own, since I already have the first half, with NVelocity, but haven't had time yet. (Yes, I've looked at monorail, but it feels like more than I want and I don't think it integrates with HttpListener.)

I'd like all that to run on top of 2.0 if possible, or at least in a kit that runs on XP and 2003, even if it requires 3.0 or 3.5. And I'd like it in the next 6 months or so (which I know is a pipe dream). The lack of this interfaces is a big part of my interest in Rails.

As to an example based on TransferMoney, here's what I'd do...

Design an idempotent implementation with a unique request id that can be saved on a client before initiating the request, used by a client in the case of failure to ask if the operation was processed (maybe by looking through a list of the operations processed in the last 24 hours), and used by the server to avoid processing the same request twice. (I'd make this part of the application logic instead of the transport protocol because reliability of the latter doesn't mean I can' t crash between getting a message and executing logic based on it, unless I go all transactional messaging, which introduces other issues.)

I would map out the state machines for the client, the server and the protocol between them. I would map the protocol itself to a set of URLs, as per my recent posts on REST. I would implement the endpoints by mapping URLs to methods using UriTemplates, parse the inbound data using whatever felt right and emit a response using a text templating engine.

I'd test it with the browser. I'd document it in prose, and maybe with an XSD for reference or to generate code if desired.

Don't know if that's concrete enough. It echoes what we're doing in our system now, which works great.

If the question was what do I want to see as a SOAP API, I don't have any issue with the [WebMethod]-esque style that WCF and every other stack uses. SOAP/WS-* turned out to be RPC/CORBA and the API is fine. That isn't as negative a statement as you might think.

posted @ 12:17 PM | Feedback (4)

Thursday, May 10, 2007

My recent posts are my attempt to describe how I think about the Web and how it can be leveraged to integrate systems independent of the browser. I got a ton of feedback in comments and email, which was all great. I’ve been away for a bit, so I thought it would be good to come back and summarize, and to ask a question about what to call the model I’ve adopted. Specifically, some worry that the term REST means too many things to too many people to have any real meaning at all. But before I get to that, let me summarize and respond to some of the key points people made.

 

Most of the feedback I got can be divided into a couple of bins...

 

First, that it’s just RPC. That’s wrong for all the reasons I listed in my last 4 posts.

 

Second, it’s not RPC, so it isn’t appropriate beyond browsing a hypertext graph of multimedia content. The first half is right. I strongly believe the second half is wrong, but we don’t have enough experience applying the model in other systems to say concretely. We need to get that experience.

 

Third, and most interesting, what you describe isn’t REST, for a range of reasons. Some of this was because I wasn’t crisp in defining the difference between managing system state, session state, and transitions between nodes in a protocol state machine. Clients make HTTP requests to move through a protocol state machine. This is completely unrelated to changing system state or to maintaining session state. Specifically, a GET is a move between nodes (or states) in the protocol, but does not (or should not) affect system state. I adopted the term “node” here to describe moving between states in the protocol state machine. Is that clearer?

 

Some felt I didn’t conform to REST because I didn’t include the notion of changes over time. Specifically, I said that each state in a protocol state machine has a URI, which I think is true. Each time the client enters this state, they get a representation of it. That representation may change over time. That doesn’t mean that each state in the protocol doesn’t have a unique URI, it means that each representation does not (unless you mix perma-links into the protocol).

 

The most important feedback worried about the fact that the Web isn’t really RESTian or that my model doesn’t align very well with the HTTP spec, which focuses on CRUD operations on entities. Let me address those concerns.

 

The Web does not fully conform to Fielding’s paper, but the reality is that the Web is RESTful enough. An implementation is never as pure as an idea (Look deeply the SOAP header processing model and the not-so-independent WS-* specs for another great example.). Yes, the modern Web uses cookies extensively, which don’t conform to Fielding’s ideas, but are extremely useful. More importantly, they do not sacrifice the notion of a stateless server, as long as there is a shared backend store. We could debate the merits of a stateless client vs. a stateless server, but the latter is how the Web really works, and pragmatically, that’s what I’m after, whether it conforms to Fielding or not.

 

About the model I’ve adopted not aligning with the HTTP spec, I disagree. The spec is written substantially in terms of CRUD operations on resources. The goal of the early Web was essentially distributed content publishing, the ancestor of the Wiki. The HTTP spec reflects that. We’ve moved well passed that problem domain to all sorts of other things, e.g., shopping at Amazon. It’s easy to think of the shopping process in terms of a protocol state machine. It’s harder to think of the content management problem that way, but you can if you try. In that problem space, the protocol state machine and the CRUD operations on entities converge into one model – in content management, the CRUD operations on the entities are the transitions in the protocol state machine.

 

I’ve been very careful in my last several posts not to get hung up on which HTTP methods are being used to transition between states. The HTTP spec positions the four main methods as equals, but I don’t look at it that way. POST is the core method. GET is an optimization that enables caching and, more importantly, makes it possible to bookmark protocol states as a single text string. PUT and DELETE made it in because of how the Web was first conceived. If it had started with the Amazon shopping cart, those verbs might not have been there. They might have been in an extension protocol, like WebDAV, but it didn’t happen that way. (Hopefully that isn’t too inflammatory.) I back up this position with the observation that while we encounter GET and POST all the time, PUT and DELETE are very rare. (Maybe APP will change that, but it hasn’t happened yet, and falls in the content management problem space for which those special verbs were defined anyway.)

 

But all of that is really moot. I don’t want to get lost in discussions or architectural styles, whether or the Web really conforms to Fielding (my answer is it conforms enough), whether REST and HTTP align, whether it is possible to be RESTian at all, the meaning of PUT and DELETE, or other distractions. I want to focus on digging into HTTP-based distributed state machines (a term I got from Sam) that leverage the pragmatic solutions for scalability, reliability, and security that we’ve developed on the Web.

 

So given all that, is REST the best term to use for this, or is there something better?

posted @ 1:06 PM | Feedback (12)

Saturday, April 28, 2007

I've gotten several comments saying that, at the end of the day, REST is just RPC. That's wrong, for at least 3 very reasons:

1) Each unique state in your protocol state machine has its own URI. That's different from an RPC endpoint that maintains a black-boxed state machine at a single endpoint. Being able to do state transition processing at disparate locations is hugely powerful. Watch the URLs you are navigating through as you browse, shop and checkout at Amazon. A single process can span machines offering differing levels of scalability, reliability and security.

RPC doesn't do that. You could conceivably build an RPC system that did do that, but if that happens it is a very rare occurence indeed. The more likely solution would be to have multiple RPC interfaces to the different machines involved in the process and then tie their protocols together. The application in the middle would have to understand how to mix the two protocols, passing data from one into the other, and back the other way. Yes, you can make it work, but it bakes a lot of protocol detail into the application mixing the calls, creating a much more complex solution.

2) With transition URIs embedded in the representations of states, it's easier to transition between states in the protocol. All of the URIs are accessed the same way, there is no separate interface per endpoint because the each endpoint represents the transition to an individual states, not all of the transitions required by a protocol. Again, technically this is doable with RPC, but nobody does it this way and solutions are more complex as a result.

3) The messages aren't interpreted as serialized call-stacks. The purpose of RPC is to copy a stack frame from one process to another, where it exists for the duration of a method call. That makes RPC very natural to anyone used to invoking a method, but it also makes it very hard to alter one side of the system without altering the other. While HTTP-based systems still use request/response messages, they aren't call-stacks. The fact that WS tools want to treat them that way is one of the reasons they have so much trouble handling large binary data, e.g., JPEG.

None of this is to say that RPC isn't useful. A couple people have mentioned callbacks to clients behind NAT based firewalls, a problem that RPC based on WCF duplex channels are very good at solving (protocols like XMPP and plain old TCP work well too). My system uses both RPC and REST. But it's a mistake to say that REST is RPC just because HTTP is request/response, we often use XML to represent state, and there are tools to map XML to and from callstacks. I made that mistake for a long time, until I realized what REST really is.

posted @ 7:01 PM | Feedback (8)

Friday, April 27, 2007

Ittay commented on my REST post:

the thing is, when you write software, you use an RPC model. what bothers me about REST is that it is not only an API. it enforces you to change your programming model.

that is not to say i don't like it. i do, for its simplicity and self documentation (e.g., provide all moves as links), but there is a price you pay.

When you write software, you use a programming model that works. And sometimes you have to change models. We changed them for the Web: we moved to the notion of pages. It wasn't RPC, it wasn't even objects (at least from most developers perspectives originally). But it was simple and did what it was supposed to do. I've done RPC, CORBA, DCOM, Remoting, RMI, and Web services. All of those technologies have their place. But they all struggle in a loosely-coupled, massively distributed world. I'll happily change my programming model to solve that.

posted @ 3:04 PM | Feedback (4)
 

I got a lot of great comments on last nights post, including a couple about REST being no different from xml-based RPC. I used to think so too, which is why my recent epiphany was so eye opening.

Consider a protocol for finding and reserving a flight between two cities. The client is in one of these states:

<ready>
- searched
- retrieved details
- reserved

These states map to URIs:

<none>
- http://quuxTravel.com/searched
- ??? depends on previous state
- ??? depends on previous state

A client begins by navigating to the searched state by GETting http://quuxTravel.com/searched?src=London&dest=NYC. The client gets back some XML like this:

<itineraries>
  <itinerary src=“London“ dest=“NYC“ price=“400.03“>
    <getDetails uri=“http://quuxTravel.com/details?itinerary=402“ />
    <reserve uri=“http://reservations.bookingsunlimited.com/quuxTravel?itinerary=402“ />
  </itinerary>
  <itinerary src=“London“ dest=“NYC“ price=“109.88“>
    <getDetails uri=“http://quuxTravel.com/details?itinerary=219“ />
    <reserve uri=“http://reservations.bookingsunlimited.com/quuxTravel?itinerary=219“ />
  </itinerary>
</itineraries>

The client is now in the searched state. It scans the list of itineraries to find the one with the lowest price. If the client wanted some other criteria that isn't surfaced in this state, e.g., total flight time, it could transition to the retrieved details state by GETting the URI stored in the itinerary's getDetails/@uri attribute. It would then return to the searched state (either by an explicit back-link or a history a la' the browser). The system would return an XML representation of that state that contained flight info.

When the client has chosen a flight, it transitions to the reserved state by POSTing to the URI stored in the itinerary's reserve/@uri attribute. It gets back an XML document confirming the reservation. At this point the protocol is complete. The client can begin again if desired, or go do something else.

Now, why is this different from RPC? Imagine the following interface for implementing this same protocol:

interface IFlightSystem
{
    Itineraries Search(string src, string dest);
    Details GetDetails(int itineraryId);
    Confirmation Reserve(itineraryId);
}

This interface exposes the same protocol, with more or less the same requirements on the client to know what the data being sent and received means. The difference is that in this case, the client is talking to one endpoint and mapping request/response payloads to call stacks. In the previous case neither of those things were true.

The REST model opens the door for the protocol to be implemented across different endpoints. This is useful for scalability, partitioning and data-directed routing, integration with external systems (note that the transition to the reserved state uses a URI at a partner company). In other word, it's actually a web of endpoints. Further, those URIs are dynamically constructed, so you can change them based on user, time of day, the data they're interested in, locale they're from, state of your data center, or whatever. That is hugely powerful. Because the documents being sent around are not mapped to call stacks, and may not even be mapped to objects, it's easier to stream data, add extra stuff over time, etc.

In one of his comments, Ittay asked what the REST model for the a method “string Foo(string, int, bool)” would be. In response, I described a simple protocol with one state, “FooInvoked”. To get to that state, you'd access a URI for /foo, passing a string, int and bool as query string parameters or in the request body. You'd get back a result state that contained a string. He countered that that felt just like a function call, which isn't surprising because that's where we started. The key, for me, is to look at problems not from the perspective of methods, as Ittay did, or entities, as Joe describes, but as states and the transitions between them. Then it really starts to make sense. And the power of it is real.

posted @ 2:56 PM | Feedback (14)

Thursday, April 26, 2007

Yeah, I'm alive. And I remember the password to my blog. I've been away for a bit, working on something very cool involving the TV. If all goes well, you'll hear about it in a big way. Anyway, I'm still having a ball out here in reality. Building something real has a way of focusing your decisions about technology. My app is a distributed system, some of which runs in a cable plant head-end or telco office (whatever's on the other end of the wire in your living room), and some of which runs elsewhere. We also connect to some things on the Web. These connections have to be extremely flexible and the bar to adoption has to be low. The thing I finally realized (some of you will say “Duh!”) is that Web services are not a good way to do this.

It's depressing to think that SOAP started just about 10 years ago and that now that everything is said and done, we built RPC again. I know SOAP is really an XML messaging protocol, you can do oneway async stuff, etc, etc, but let's face it. The tools make the technology and the tools (and the examples and the advice you get) point at RPC. And we know what the problems with RPC are. If you want to build something that is genuinely loosely-coupled, RPC is a pretty hard path to take.

That realization would have gotten me down if not for the fact that something else jazzed me up an hour or so later. I was in the process of considering the alternatives when I finally understood REST. And wow, it was eye-opening. REST is often positioned as CRUD operations against entities identified by URIs. Then it is dismissed as to simplistic to be useful. You can't build with just CRUD, the reasoning goes, just think about why we write sprocs. I've been down that path any number of times and always ended up in the same place. But I had it all wrong.

I skimmed Fielding's thesis a while back, but it wasn't until I read Sam Ruby's recent posts that it really sank in. Here's what I came to understand. Every communication protocol has a state machine. For some protocols they are very simple, for others they are more complex. When you implement a protocol via RPC, you build methods that modify the state of the communication. That state is maintained as a black box at the endpoint. Because the protocol state is hidden, it is easy to get things wrong. For instance, you might call Process before calling Init. People have been looking for ways to avoid these problems by annotating interface type information for a long time, but I'm not aware of any mainstream solutions. The fact that the state of the protocol is encapsulated behind method invocations that modify that state in non-obvious ways also makes versioning interesting.

The essence of REST is to make the states of the protocol explicit and addressible by URIs. The current state of the protocol state machine is represented by the URI you just operated on and the state representation you retrieved. You change state by operating on the URI of the state you're moving to, making that your new state. A state's representation includes the links (arcs in the graph) to the other states that you can move to from the current state. This is exactly how browser based apps work, and there is no reason that your app's protocol can't work that way too. (The ATOM Publishing protocol is the canonical example, though its easy to think that its about entities, not a state machine.)

The “state machine as node graph traversed via URI” view of the world has really interesting implications for being able to suspend and resume a protocol. Because links to other states are embedded in a state's representation there are interesting ways to solve dynamic load-balancing, data-directed-routing, versioning and other problems using normal Web infrastructure. And because it's HTTP based, you get all the features that protocol supplies, including streaming and support for non-XML MIME types (a huge concern when you're doing TV stuff). The one thing that's really missing here is a simple framework for implementing a URI graph on top of an HTTP handler (similar to what Marc's been working on for Java). I'm building my own now.

The thing I love about this model is that, as Sam says, it is of the Web, not over the Web. That doesn't mean I'll use it for everything. I use TDS to get to SQL Server. I use WCF for RPC-style communication between distributed components within major systems. I'll use this model when I cross major system boundaries, especially when I don't own both sides. I'll let you know how it turns out.

 

 

posted @ 2:39 PM | Feedback (236)

Wednesday, June 14, 2006

I heard about Excel Services a while ago, but hadn't had any time to look at them even briefly until now. Basically, it's a server-side system that lets you access data and calculations in Excel spreadsheets via Web services. Think about how much business data and calculation is done with Excel. Now imagine being able to leverage the directly. Want to change the algorithm you use to compute some key financial data? Let the analyst modify the spreadsheet and copy the update to your server and you're done. Now *this* is the way to align technology and business. Of course, that assumes it all actually works well - I haven't done anything yet. But still, it has *tons* of potential. Very cool idea, definitely something to spend more time with.
posted @ 6:58 AM | Feedback (8)

Monday, June 12, 2006

One of the last things I did before leaving the MSDN team was to prototype a Web service for retrieving content programmatically. It's been a while since then, but a production version is now live. Craig provides an excellent introduction here. The prototype client we always talked about was msdnman - a command line tool like the Unix man command. Craig was the lucky one - he got to build it. It will be interesting to see what people do with these services. Off the top of my head, I can imagine including actual docs in a technology site, integrating with tools (a la' Reflector), and building a team level doc repository on TFS that allows you to annotate, add your own docs, etc. Anyway, congrats to the MTPS team for a job well done!

posted @ 12:58 PM | Feedback (3)

Friday, May 19, 2006

I ran into two Web service related articles recently. One really resonated with me: Enable the Service Oriented Enterprise, in the MS Architecture Journal. It presents the Enterprise Service Orientation Maturity Model, or ESOMM. Okay, I know what you're thinking: eeeeeeeewwww, a maturity model! But it's a lot more interesting and useful than you think (and they distance themselves from that other MM in a sidebar). Lots of developers and some architects think about service orientation in terms of the famous four tenets. They are good guiding principles and very useful, but not what a lot of people mean when they talk about SOA, all caps. Sure, there's a lot of hype around SOA, but there is a real point there too. Many companies are trying to redesign their software infrastructure into a portfolio of coarse-grained, reusable services. To do that successfully, you have go way past the four tenets for individual services and think about how you're going to organize the whole thing. I spent a lot of time thinking about that problem when I was at Mindreef. This article really summarized a lot of what I'd thought about, and a bunch of stuff I hadn't, in a nice, easy to understand way. If you work at a company doing “big SOA”, look at ESOMM.

The other article, Avoid XML Schema Wildcards For Web Services Interfaces, appeared in Internet Computing, but can't be downloaded without purchasing (which is pretty crumby, guys!). I got my copy forwarded by an interested reader who wanted my opinion on the position it took. I agree with the beginning of the article, which was disagreeing with the schema techniques described by Dave Orchard (and reiterated with slight variation by Dare) that were embraced by the W3C TAG. This approach is just too complicated in practice. The second part of the article described a versioning model that supported backward compatibility (old sender, new receiver) but did not address forward compatibility (new sender, old receiver). The problem with this one-way compatibility is that it just doesn't work. Imagine a typical request/response message exchange between an old client and a new service. The request message must support backward compatibility so that the old sender (the client) can communicate with the new receiver (the service). But the response message must support forward compatibility so that the new sender (the service) can communicate with the old receiver (the client). Having one without the other is essentially useless. Supporting both was big part of what lead me to my own versioning model. I'm not saying there are no other approaches to versioning, but if they don't support both backward AND forward compatiblity, then they're not useful in the context of most Web services today.

posted @ 2:30 PM | Feedback (3)


 
   
 
© 2004 Pluralsight.
Visual Design by Studio Creativa
Privacy Policy