More on Persisting RSS Data…

You Can Take it With You

Syndication

News

  • Don't miss the next Windows Mobile Webcast... Unit Testing for Mobile Devices: http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032382824&EventCategory=4&culture=en-US&CountryCode=US.

I received an excellent question following last week's webcast on persisting data returned from Windows Live Services (and similar RSS & non-RSS services). As you'll recall, in the webcast we showed using a Typed DataSet as an easy way to manage and persist RSS data; the Typed DataSet was generated using the RSS returned by http://qna.live.com. This person's question specifically addressed the issue of merging two DataSets containing RSS data.

The problem the person is encountering is that when they attempt to merge the two DataSets, that the merge fails because of the AutoIncrement columns that are created on the rss & channel DataTables. The issue occurs because these AutoIncrement columns (which are also supposed to be unique) are starting with values of zero (0) in both DataSets; therefore, when the DataSets are merged, there is a conflict.

In the scenario we've been working in where the first DataSet is populated prior to populating the second DataSet, one easy way I've found to deal with this issue is to simply set the AutoIncrementSeed value on the tables in the new DataSet to start at a higher value then the largest value in the first DataSet.

Using a function like the following you can find the largest value of the AutoIncrement column in the rss DataTable (rss_id) and then return the next value to use:

private int GetNextRssId()
{
// Get idx of the last row of rss table
int idxLastRssRow = existingDataSet.rss.Rows.Count - 1;
// Get the rss_id from last rss row
return existingDataSet.rss[idxLastRssRow].rss_Id + 1;
}

And do the same thing for the channel_id column on the channel DataTable.

private int GetNextChannelId()
{
// Get idx of the last row of channel table
int idxLastChannelRow = newDataSet.channel.Rows.Count - 1;
// Get the channel_id from the last channel row
return newDataSet.channel[idxLastChannelRow].channel_Id + 1;
}

With this, you can set the AutoIncrementSeed on the rss and channel DataTables in the new DataSet to start at the non-overlapping value as shown here:

NewDataSet tempDataSet = new NewDataSet();
tempDataSet.channel.channel_IdColumn.AutoIncrementSeed = GetNextChannelId();
tempDataSet.rss.rss_IdColumn.AutoIncrementSeed = GetNextRssId();

With non-overlapping values, you can now populate the second DataSet and merge it with the original without difficulty.

But wait, there's more…

Merging the two RSS-filled DataSets works well, especially if you need to operate on the two RSS feeds separately before combining them. However, if all you want to do is add additional feeds to the existing DataSet, you don't have to populate a separate DataSet then merge them; you can instead read the additional feeds directly into the DataSet as we did when populating the original DataSet.

As you'll recall, when we populate the original DataSet, we read 1 feed at a time from the HTTP stream into the DataSet. There's no reason this same technique can't be used to add more feeds later.

You can download an updated QnA RSS Feed Sample that adds two new menu options "Get More RSS then Merge" and "Get More RSS - Read Directly" that demonstrates adding feeds to the original DataSet by merging the DataSets and reading directly into the original DataSet respectively.

Overlapping Feeds…

One thing you'll want to be sure and remember is that both of these techniques assume that you're adding new feeds. If you'll being doing a mixture of refreshing existing feeds and adding new feeds, you'll want to be sure to add the appropriate logic as to whether you should add a whole new RSS entry or simply update the details of an existing RSS entry.

 


Posted Jan 31 2007, 08:45 AM by jim-wilson

Comments

Andy wrote re: More on Persisting RSS Data…
on 01-31-2007 11:29 AM
Thank you for your tips.
I did test out your suggested solution. The
tempDataSet channel_id autoincrement is not working as I expected. After setting the AutoIncrementSeed for the new feeds, all new channels ids remain the same(channel_id = 21). I loaded the technology category and then merged with life category. The result was 21 channels. The 21th channel had 20 questions with channel_id = 21. Am I missing somthing?
What I wanted to do was getting data for all the IDs as part of the download from QNA live service instead of our own identity fields and use it to elimate the duplicate feeds when the datasets are merged. Not sure that's possible.

Andy
Jim Wilson wrote re: More on Persisting RSS Data…
on 01-31-2007 1:19 PM
Andy;

I think the problem is that you're only setting the AutoIncrementSeed on the rss DataTable. In the 2nd paragraph of the above post it mentions that you have to set the AutoIncrementSeed on both the channel and rss DataTables.

I'm sorry if only showing the rss DataTable part of the code in the sample text in the post was misleading but I didn't want to clutter up the post with duplicate code (in hindsight that was a mistake on my part). If you create a GetNextChannelId function that works like the GetNextRssId function does and then set the channel_Id on the channel DataTable, I think you'll find that all is well.

The download sample I mention at the end of the post has the complete implementations of both GetNextChannelId and GetNextRssId functions along with the merge of the 2 DataSets if you want to grab the code from there. (http://jwhh.com/downloads/WindowsLiveEx1_WithRssMerge.zip)

Come on back if you're still running into trouble and we'll get it figured out.

-Jim
Jim Wilson wrote re: More on Persisting RSS Data…
on 01-31-2007 1:27 PM
Andy;

In looking at the post again, the post is too confusing without showing both the rss and channel sides of the issue. I've added the GetNextChannelId function and the line that sets the channel_id in the new DataSet.

Again my apologies for not making the post more clear.

-Jim
theCoach wrote re: More on Persisting RSS Data…
on 02-01-2007 5:41 AM
Perhaps I am missing something. Is there always a guarantee that the last item will have the largest idx (these always come sorted that way?)

This seems like a standard enough problem that the MS guys should be working on some plumbing that does this, and does it properly with a standard property setting or standard function.
Jim Wilson wrote re: More on Persisting RSS Data…
on 02-01-2007 6:40 AM
In terms of whether the last row is guaranteed to have the largest index depends on how the DataTable is loaded. In the case of loading the RSS data into the DataSet one feed at a time from the HTTP stream (as we're doing in this example), yes we can assume this to be the case. Each read adds another row to the DataTable with each row getting the next AutoIncrement value. In an application model with a more complex DataTable population mechanism, it may not be safe to make this assumption about the order but since the example focuses on the specific issue of loading RSS data into the DataSet, I wanted to demonstrate the most efficient way to determine the next AutoIncrement value in this usage.

I understand your point about the fact that it’d be nice for Microsoft to provide an @@Identity/IDENT_CURRENT() equivalent for each DataTable but to my knowledge no such property exists. If we were in a scenario where we didn’t have direct control over the DataTable loading as we do in the RSS scenario and therefore didn’t necessarily know that the order of the DataTable Rows correspond to the order of the AutoIncrement values, I’d use the DataTable Select method to get the current value:

int GetCurrentAutoIncrementValue(DataTable dataTable, string autoIncrementColumnName)
{
string sortString = autoIncrementColumnName + " DESC";
DataRow[] orderedRows = dataTable.Select("", sortString);
return orderedRows.Length > 0 ? ((int)orderedRows[0][ autoIncrementColumnName]) : -1;
}

This function builds a sort clause to return the DataTable rows in descending order by the AutoIncrement value. The largest AutoIncrement value is simply the first member of the returned DataRow array. With that I could then set the AutoIncrementSeed values on the other DataSet like this:

int ChannelIdCurrentValue = GetCurrentAutoIncrementValue(newDataSet.channel, "channel_id");
int rssIdCurrentValue = GetCurrentAutoIncrementValue(newDataSet.rss, "rss_id");
tempDataSet.channel.channel_IdColumn.AutoIncrementSeed = ChannelIdCurrentValue + 1;
tempDataSet.rss.rss_IdColumn.AutoIncrementSeed = rssIdCurrentValue + 1;

By finding the max AutoIncrement value using this technique, I am guaranteed to get the correct value without regard for the current order of the rows. But as you can see, it involves a fair bit more processing overhead then simply retrieving the value from the last row in the DataTable. It’s a classic case of using the appropriate technique for the scenario. Because I know how the RSS data is being loaded, I can use the optimization. If I didn’t have that knowledge and control, I’d use the DataTable.Select technique.

Hopefully, that addresses your concerns. If not, keep the questions coming. :-)

-Jim

Andy wrote re: More on Persisting RSS Data…
on 02-01-2007 7:32 AM
I simply copy & paste your source code.

private void menuGetMoreRssThenMerge_Click(object sender, EventArgs e)
{

NewDataSet tempDataSet = new NewDataSet();
tempDataSet.channel.channel_IdColumn.AutoIncrementSeed = GetNextChannelId();
tempDataSet.rss.rss_IdColumn.AutoIncrementSeed = GetNextRssId();

PopulateDataSet(tempDataSet, urlListB);
newDataSet.Merge(tempDataSet);
DisplayNumberOfChannels();

}

private int GetNextChannelId()
{
// Get idx of the last row of channel table
int idxLastChannelRow = newDataSet.channel.Rows.Count - 1;
// Get the channel_id from the last channel row
return newDataSet.channel[idxLastChannelRow].channel_Id + 1;
}

private int GetNextRssId()
{
// Get idx of the last row of rss table
int idxLastRssRow = newDataSet.rss.Rows.Count - 1;
// Get the rss_id from last rss row
return newDataSet.rss[idxLastRssRow].rss_Id + 1;
}

private void menuGetMoreRssReadDirectly_Click(object sender, EventArgs e)
{
PopulateDataSet(newDataSet, urlListB);
DisplayNumberOfChannels();
}

private void DisplayNumberOfChannels()
{
statusBar1.Text = "Number of Channels: " +
(newDataSet != null && newDataSet.channel != null ?
newDataSet.channel.Rows.Count.ToString() : "0");
}

What I did was loading 2 categories.
technology = "http://qna.live.com/Browse.aspx?tag=technology&limit=10.000000&count=20&format=rss";

life = "http://qna.live.com/Browse.aspx?tag=life&limit=10.000000&count=20&format=rss";

I first loaded technology then merge life category to the dataset and that was the result I got. The first download gave me 20 channels(channels id 1...20) which is expected, after datasets are merged I got 21 channels in total. The new channel has 20 questions all under channel id = 21.
It seems the schemas got out of sync during the merging process.
I don't know in depth about the merging process so not sure how it operates under the cover to understand the problem. If you have a solution for it, I would love to hear it from you. Tks again.

Andy

Jim Wilson wrote re: More on Persisting RSS Data…
on 02-01-2007 2:54 PM
Andy;

I'm going to try those specific URLs shortly when I have a few moments. That said, I'm not sure that the DataSet.Merge solution is the optimal choice for this scenario.

The situation you're describing seems perfect for using the Direct Read method I describe in the above blog post (under the "But wait, there's more..." section). Is there a specific feature of the Merge solution that you're using that the faster Direct Read solution doesn't provide?

- Jim
Jim Wilson wrote re: More on Persisting RSS Data…
on 02-01-2007 3:15 PM
Andy;

I think I see what the problem is.

The URLs that you're using return data differently from what the application was architected for. When we built the application during the Webcast, we used the http://qna.live.com/ShowQuestion.aspx call to retrieve the content which retrieves a single question along with it's answers. With this call, the question is the Channel and the answers are the Items.

The URLs that you're using are for the http://qna.live.com/Browse.aspx call which returns the questions belonging to a particular category. With this call, the category is the Channel and then each question is the Item.

Although both results are returning RSS formatted data, the nature of the data in the hierarchy is different. In effect, they're different schemas.

Probably the best way to go is to attack the problem in layers. Use the Browse.aspx call as you are to get the list of questions in a category. Store these in a temporary DataSet. You can then iterate through the question URLs stored in this temporary DataSet and then load the results of the question URLs directly into the DataSet that the UI is bound to. This way the DataSet that the UI is bound to is consistent with the data hierarchy for which it was designed.

-Jim

Add a Comment

(required)  
(optional)
(required)  
Remember Me?