What should I think about when using transactions with the file system?

 

In a comment Sahil Malik asked:

 

"Some guidelines around using Transactional filesystem versus otherwise would be helpful. I am sure TxFs has it's uses and advantages but it must have some downsides as well."  I interpreted this to be 'what should I think about when using the file system as part of a transaction?'.  This is a pretty interesting question, and one that deserved enough space and visibility that I wanted to address it in its own note.

 

As I'd mentioned in my reply:

 

Like any use of transactions, there are tradeoffs to consider.  The fact that the data is stored in a file doesn't change the concerns you'd have with transactions anyway: how long can data be locked before it is a problem?  Can you reasonably handle in doubt transactions?  Do you have the storage for the recovery information?

 

In other words, the introduction of a file system store into a transaction does not relax any of the design concerns that you have whenever you use transactions today.  If it unreasonable to have the data locked in a row in a database, then it is still unreasonable if it is in a file instead.

 

There are also some similarities that can be a little surprising.  Today, if you move a database around between systems you generally need to have completed recovery against any distributed (as in multi-resource or network distributed) transactions before doing so.  This is also likely to be true with the file system and the registry.  It is the nature of things that it is easier to find that you're using multiple volumes, say, than that you're using multiple databases.  That can lead to locked, in doubt data if you move such a volume.

 

Finally, there are a few ways in which file systems and registry hives differ from databases, however.  One of the more obvious is that their information may be needed earlier in system startup.  That probably isn't often the case, but if it is in yours, then that you need to think about the what happens if some of the information is indoubt.  If it needs, say, MSDTC to resolve the transaction, can you wait until MSDTC has an opportunity to start up?

 

A second one is that the registry and the file system support the read committed isolation level.  Note that the default isolation level for COM+ is Serializable.  That is the simplest isolation level to understand, but also the most pessimal for execution.  This will occasionally cause a surprise if you are depending on the differences between these two isolation modes (e.g. that reads will lock later writes in another concurrent transaction).


Posted Sep 15 2005, 10:12 AM by jim-johnson
Filed under:

Comments

Sahil Malik wrote re: What should I think about when using transactions with the file system?
on 09-15-2005 12:17 PM
Thank you Jim. Wow it is news to me that the default isolation level for reg and FS is ReadCommitted. Thanks for pointing that out. Also, I'm guessing TxFs works under Durable Enlistment.

Y'know Sys.Tx opens doors of possibilities, but it is important to understand the responsibilities that come with it.

Recently, I was having a discussion with someone who insisted on putting DataAdapter.Fill and DataAdapter.Update within the same transaction scope.

My stance was, that is a bad idea, and you should use SqlTransaction over Sys.Tx in that scenario. And it is an EVEN worse idea if you are doing CommitableTransaction instead.

I'll spare the details (unless you are interested), but due to deadlocks and a higher isolation level, that code would be simply unusable - but on the face of it, it looks so innoccous.

So to use this judiciously, one must know what implications this may have ... "It just works" is not the right answer. :)
Sahil Malik [MVP C#] wrote Ssytem.Transactions: Judicious use of Transactional FileSystem in Windows Vista
on 09-15-2005 3:07 PM
So if you've been following my blog for a while, you'd know that I have a dedicated category for System.Transactions ....
Robert Hurlbut wrote re: What should I think about when using transactions with the file system?
on 09-15-2005 7:22 PM
Well said, Sahil. You can't blindly apply this without understanding the implications.
Sahil Malik wrote re: What should I think about when using transactions with the file system?
on 09-16-2005 9:24 AM
Thanks Robert .. :) .. But nonetheless, Sys.Tx is great .. but y'know possibilities = responsibilities (Thats my Vote for Sahil Catchline).

Sam Gentile::Longhorn wrote New and Notable 78
on 09-17-2005 3:09 PM


Yet another big gap (August 3rd) [:D]. As you can tell from all the latest posts, I am playing a...
Jim Johnson wrote re: What should I think about when using transactions with the file system?
on 09-19-2005 10:33 PM
Thanks Sahil,

That's a bunch of questions there. First, by Windows Vista beta 2, the Oletx interface (via the MSDTC proxy) implements support to delegate directly to the KTM component in a manner similar to PSPE. The KTM (kernel transaction manager) component provides the direct transaction coordination for the kernel resources of the file system and the registry.

This means that transactions that are in a single process and have only the file system or the registry for their durable resource managers will have the transaction move directly from the unmanaged 'LTM' in the MSDTC proxy to the KTM in the kernel. Much like PSPE, if the transaction then acquires another durable resource or distributes to another process, the transaction will be promoted to MSDTC.

Note that while the effect is much like PSPE, the actual mechanism is more specific and targeted to the KTM component.

Second, yes, there are a number of general rules and guidelines that apply whenever you use two phase commitment. I'm not sure that I follow that there are that many cases that apply only to System.Transactions. For instance, if a single SQL transaction is reasonable for the two DataAdapter calls, I don't immediately follow that putting them into a System.Transactions transaction is notably a bad decision. Can you elaborate?

Finally, ease of use aside, I don't understand the distinction around CommittableTransaction -- that is, in fact, what TransactionScope creates behind the scenes when it is building the outer scope.
Sahil Malik wrote re: What should I think about when using transactions with the file system?
on 10-03-2005 10:00 PM
Jim,

Thank you for your answers, and I apologize for my delay in getting back.

Here is an elaboration to what I was saying above ...

By putting the two calls to DataAdapter.Fill and Update within the same transaction scope, the following happens.

1. Fill = Open SqlConnection, FILLDATA, Close SqlConnection.
2. Update = Open SqlConnection, UPDATEDATA, Close SqlConnection.

However, SqlConnection.State = Closed means nothing because the underlying database connection is still open (due to connection pooling), and that connection locks the resources that the data was filled out of. This happens for 1 minute - the default timeout.

So now what happens is that during Update, another connection is opened, which bumps up the isolation level to serializable, everything saves - life is good.

Why is it a problem? Consider the following scenario.

user1:Fill - Connection #1 opened.
user2:Fill - Connection #2 opened.
user2:Update - Connection #3 opened - isolation level serializable. goes in a timeout since user1:fill:connection#1 has locked the resource
user1:Update - Connection #4 opened - isolation level serializable, deadlocks connection #3, but since #3 came first, #4 is made the deadlock victim, and since #3 is blocked by #1, he just times out !! BUMMER !!

So what you have here is -
a) A concurrency management scheme that is much more expensive.
b) Both parties fail in event of concurrent updates.

The above behavior is Sql2k5 BTW, Sql2k would be different (as you already must know that is serializable by default, no promotable enlistment).

Now why is CommitableTransaction important here? Because TransactionScope in a using construct keeps the resources tied up for (hopefully) a short duration - atleast one code scope. CommitableTransaction gives you a programming model that allows for on-demand enlistment, and on-demand commit, i.e. it encourages the programmer to do something like

load asp.net page - start transaction,
wait for user input,
postback - commit transaction

Of course the above is do-able via Transactionscope also, but just that CommittableTransaction seems more encouraging :)

Sahil
Jim Johnson wrote re: What should I think about when using transactions with the file system?
on 10-06-2005 9:48 AM
Sahil,

Thanks for your explanation. It clarified the issues you've been presenting quite well. I have some observations and suggestions to make about this. Forgive the length of this reply, but you bring up a number of interesting issues.

First, there are many times when a single atomic transaction is inappropriate. Absolutely. Some typical examples are:

- When the action is going to run "long", where long is defined by whatever measure holds up too much work for that business;

- When there are consistent points during the action that you want to make available for other threads or users;

- When the compensation you want to do is not a simple undo, but something else entirely;

There is also a variation that isn't purely part of the consensus protocol, but rather is associated with how the isolation is handled. That's the case where the isolation level that has been chosen, and the requirements for forward progress do not mesh -- either due to deadlocks or due to too high of a rate of cascading aborts. Handling this issue is somewhat more difficult, in that atomic transactions may be a legitimate part of the solution. What is needed is to think about how the application's data flow and progress requirements mesh. This seems to be part of what your DataAdapter issue is getting at.

All these points are true of any atomic transaction scheme that I know about, and I can agree with them. I would point out, though, that these are invariant from what API is used to drive that transaction -- in other words, these are generic to transactions, and not specific to System.Transactions.

When I read your first comment, I wasn't sure exactly what the DataAdapter scenarios were. Let me try to walk through a few. First, there's one where each thread does:

using (TransactionScope scope = new TransactionScope ())
{
… dataAdapter.Fill (); // pseudo code
… do some stuff ...
… dataAdapter.Update (); // pseudo code

scope.Complete ();
}

This is little different, in fact, from doing the same thing with a single SQL transaction. According to people I've talked to here, this really shouldn't be deadlocking due to object pooling -- System.Data goes to some length to reuse a closed connection that was part of the current transaction. If you are seeing deadlocks, please send me some details offline and I'll pass them along to someone to look at. <Btw: I'll talk about locking levels in a second>

The alternative is to split these into different transactions, where the dataAdapter.Fill call and the dataAdapter.Update call do not share a transaction. Roughly, if I use System.Transactions to represent this out of simplicity, this is:

using (TransactionScope scope = new TransactionScope ())
{
… dataAdapter.Fill (); // pseudo code

scope.Complete ();
}

… do some stuff ...

using (TransactionScope scope = new TransactionScope ())
{
… dataAdapter.Update (); // pseudo code

scope.Complete ();
}

However, this suffers from the problem that if there is any contention on the data, the dataAdapter.Update call may silently overwrite other updates, since the data does not remain locked between the dataAdapter.Fill call and the dataAdapter.Update call. So, it won't deadlock, but it may silently lose updates.

You also talked about isolation levels changing. I think I understand what you're saying, but it isn't that the isolation levels change. I suspect that you're talking about lock promotion. The idea is that within a given isolation level there are rules about both read and write locks, and for some levels those rules allow read locks to be promoted to write locks. That promotion could cause deadlocks under the right conditions, and transaction timeout is the traditional way to handle that.

Is that what you're seeing? If not, again, please send me an example offline. Thanks.

Finally, I did follow your example with CommittableTransaction. You're right, that in that case it is hard to not create the raw transaction. There is another relatively long topic there -- I'll put that as a topic to write about soon.
Sahil Malik wrote re: What should I think about when using transactions with the file system?
on 10-07-2005 9:26 AM
LOL I like how you describe it .. "raw transaction".

Umm .. I'd like my transaction done Medium-Well please .. thanks !!
Lars-Inge Tønnessen wrote re: What should I think about when using transactions with the file system?
on 11-05-2005 11:38 AM
> Umm .. I'd like my transaction done Medium-Well please .. thanks !!


:o)


> That promotion could cause deadlocks under
> the right conditions, and transaction
>timeout is the traditional way to handle that.


Hi Jim! :o)

A timeout is probably the easy solution to deadlocks. A deadlock graph algorithm would probably outperform a timeout. I thought the industry used an algorithm, that was interesting to know. If the TM is tightly integrated with the resources, it could abort a deadlocked transaction by using a strict rule. This could probably give the highest performance.
Jim Johnson wrote re: What should I think about when using transactions with the file system?
on 11-08-2005 7:12 AM
Hello back!

Now talk of deadlock graphs brings back some memories. Have you looked at the distributed lock manager (DLM) in OpenVMS? It was a great tool for the environment that it was in. It never understood transactions, so using it in a resource manager could be 'interesting' at times.

Jim.
Jim Johnson wrote re: What should I think about when using transactions with the file system?
on 11-08-2005 7:14 AM
Also, I wanted to close part of this thread with a note that I did put Sahil into contact with some folks in ADO.Net offline. Last I heard, they had gotten back to him and were discussing the DataAdapter issues with him.

Jim.
Sahil Malik wrote re: What should I think about when using transactions with the file system?
on 11-11-2005 6:56 PM
Yes Jim and thank you for that :). I will be writing an article for Code-magazine on this topic shortly. It's a tasty article IMO.
Sahil Malik wrote Code-Magazine: New Article on System.Transactions
on 04-09-2006 3:18 PM
My new article on System.Transactions should now be available in the newest issue of code-magazine. It...

Add a Comment

(required)  
(optional)
(required)  
Remember Me?