Wednesday, June 27, 2007

Rico Mariani's DLinq Performance Tips (Part 2) and Compiled Queries

Rico Mariani continues his analysis of LINQ to SQL performance issues with a new DLinq (Linq to SQL) Performance (Part 2) post dated June 25, 2007 by recommending compiled LINQ query expressions.

LINQ to SQL query expressions take advantage of lazy evaluation. Your code doesn't hit the SQL Server instance until the the IEnumerable<T> sequence's iterator (foreach or For Each structure) requests the first row. However, before the iterator starts it's pass, the compiler builds an expression tree, which in turn generates the T-SQL statement, and sends it to the database. As Rico observes:

In many cases all that will be different from one invocation to another is a single integer filtering parameter.

The expression tree incorporates logic to send a parameterized query when the query expression contains one or more parameters in the Where clause. For example:

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[ShippedDate]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[ShipCountry] = @p0
ORDER BY [t0].[OrderID] DESC
-- @p0: Input NVarChar (Size = 3; Prec = 0; Scale = 0) NOT NULL [USA]

SQL Server caches the query plan and uses the cached version for different parameter values. Thus the only option remaining to the developer for improving performance is to compile the query, which eliminates building the expression tree and generating the T-SQL on each execution. 

Compiling Parameterized LINQ Query Expressions

Rico provides an example of a query delegate type that actually will compile. The only item missing is the delegate function:

private IEnumerable<Order> GetOrderById(int orderId)
    return q(Northwinds, orderId);

The LINQ to SQL: .NET Language-Integrated Query for Relational Data white paper by Dinesh Kulkarni, Luca Bolognese, Matt Warren, Anders Hejlsberg, and Kit George (March 2007) is one of the few (about 7) hits you'll get on a search for compiledquery.compile. The white paper has a very brief section that covers the the CompiledQuery class and its Compile method for compiling parameterized query expressions with a single example that will compile but won't work more than once during a session.

Note: The paucity of posts about compiled LINQ queries is surprising.

A bug—presumably in a System.Data.Linq class—causes a null reference exception if you create new DataContext objects for successive executions of the compiled query during a session. The sample code for the function that invokes the query delegate type does exactly that (and has a variable name error, to boot):

public IEnumerable<Customer> GetCustomersByCity(string city)
   Northwind db = new Northwind();
   return Queries.CustomersByCity(myDb, city);

Delete the Northwind db = new Northwind(); instruction, change the erroneous  myDB argument to your DataContext variable and the sample code should work as expected. However, this is another example of untested sample code. The authors might not have found the multiple invocations bug, but certainly wouldn't have been able to compile the function with the wrong argument name.

Note: Anders Borum confirmed the bug and pointed out the source of the problem in an answer to my Problem with Reuse of Compiled LINQ to SQL Query in Beta 1 post in the LINQ Project General forum. Matt Warren confirmed that the bug was detected and fixed in the Beta 1 timeframe but didn't make it into the Beta 1 bits. It's fixed for the forthcoming Beta 2 drop.

Where's the Meat?

Rico didn't offer any performance numbers in his Part 2 post, possibly because the missing function prevented him from comparing the execution time of his compiled query with baseline SqlDataReader data. VB 9.0 doesn't support lambda expressions in VS 2008 Beta 1, so I ported my original VB test harness to C# 3.0, added code to generate a combo box of Northwind country names with the number of orders per country, and wrote the code to add a parameterized query option for LINQ and SqlDataReader queries, and a compiled query option for LINQ. Here's a screen capture of the test harness's form:

The combo box contains the name and count of orders from 21 distinct ShipCountry values from the Northwind Orders table. The count varies from 6 (Norway) to 122 (USA and Germany.) The  DataContext's Orders object has only the fields required for the query expression (the three fields of the original test harness and Rico's first test object) plus the ShipCountry field for the Where clause constraint.

Following is the performance penalty data for executing 500 parameterized and compiled queries with differing numbers of rows:


Base, s.
LINQ (Param), s. Param / Base LINQ (Compiled), s. Compiled / Base Param / Compiled
0 0.293 1.941 6.63 0.589 2.01 3.30
6 0.317 2.051 6.47 0.664 2.09 3.08
22 0.340 2.299 6.76 0.803 2.36 2.86
77 0.429 3.164 7.38 1.273 2.96 2.49
122 0.507 3.894 7.68 1.684 3.32 2.31

Base is the parameterized SqlDataReader execution time in seconds. Param/Base is the ratio of parameterized LINQ query execution time to the Base time. Compiled/Base is the ratio of compiled, parameterized LINQ query execution time to the Base time. Param/Compiled is the performance ratio of compiled-parameterized to parameterized LINQ queries.

Data: Rows contain 36 bytes of date: 2 integers (8 bytes), 1 5-character CustomerID (10 bytes), one DateTime field (8 bytes), and an nvarchar ShipCountry field that averages 5 characters (10 bytes) for a total of 36 bytes. The rows of the Part 1 tests contained an average of 26 bytes (no ShipCountry field).

Hardware: 2005-era, single core 2.26-GHz Pentium 4 Dell 400SC server with 1 GB RAM and 80-GB 7,200 RPM UltraATA-100 drive (IBM IC35L090AVV207-0, 2MB buffer, 8.8 ms. seek time).

Conclusions: With Beta 1 bits, LINQ to SQL parameterized queries are about 7 times slower than the baseline SqlDataReader; LINQ to SQL compiled queries are about 2.5 times slower than the baseline SqlDataReader. The LINQ to SQL query performance penalty increases for both types of LINQ to SQL queries as the number of rows returned increases. The performance benefit of compiled over parameterized LINQ queries decreases as the number of rows increases.

Note: The the 330% performance factor of compiled queries in this scenario doesn't match the 500% performance factor reported by Anders Borum, probably because of the masking effect of database access.

What About Compiled Query Caching?

Frans Bouma, the developer of the LLBLGen Pro object/relational mapping (O/RM) tool for .NET, raises in a comment the issue of compiled query caching between DataContext refreshes. My assumption is that the compiled query is cached independently of the DataContext, just as the database connection is independent of the DataContext and is closed once the object(s) of interest have been hydrated.

Matt Warren says in a reply to the Problem with Reuse of Compiled LINQ to SQL Query in Beta 1 post:

We kind of knew we needed something like compiled queries before we even started seriously looking at perf.  We already had experience with ObjectSpaces to tell us that an ORM would have translation overhead.  Usually that overhead is not a big deal for a client-app, but we figured we could do better on server apps where the same query gets executed over and over again.

Due to the Beta 1 bug, I can't test inter-DataContext caching currently. I'll see what I can find out and update the post later.

Jomo Fisher, a member of the LINQ to SQL team, subsequently confirmed my assumption about DataContext-independent compiled queries in a June 28. 2007 comment:

Compiled queries are not bound to a particular [DataContext]. An intended usage is to compile a query once and use it across many DataContexts.

Where's Beta 2?

I'm seeing increasing numbers of "beta 2 fixes that," "beta 2 is faster," "wait for beta 2" and "it's better in beta 2" comments from the LINQ to ADO.NET folks. Rico's added his comment to the first post in his series:

Some things made it into Beta 1 but the bulk of what I'm going to post in the next few days didn't happen until after. You'll first see it in Beta 2. [Emphasis added.]

and Matt Warren seconds Rico's motion in this comment to his LINQ to SQL: Learning to Crawl post about Rico's item: 

Major improvements coming. :-)

 Let's hope bug fixes are coming, too.

Mini-Connectionless DataContext Is Gone from VS 2008

The "mini-connectionless DataContext" that Matt Warren described in his April 12, 2007 post to the Update existing disconnected objects (dlinq) thread in the LINQ Project General forum won't be in VS 2008's LINQ to SQL implementation. Matt says in June 18 and 27, 2007 posts to the LINQ: Next CTP thread:

No, this feature will not be in the first release of LINQ to SQL.

It is only missing because of lack of time.  When I mentioned it, I was talking about our forward thinking and a sample I'm trying to put together.

It will be interesting to see how Matt proposes to run LINQ to SQL v1.0 in the middle tier as a WCF service. I expected to see this feature in Beta 2, so I've updated my Changes Coming to LINQ for SQL post of May 15, 2007 with the bad news.

Update 6/28/2007: Fixed major formatting problems caused by <pre> elements, minor typos, and added sections on caching and the missing mini-connectionless DataContext.

Update 6/29/2007: Matt Warren said Rico suspects I have a faster disk drive, so I corrected and updated the drive specs on the earlier post and added a hardware section here. Clarified the row size (in bytes) and DataContext specs, also.

VB 9.0 Won't Support Collection Initializers (or Array Initializers!)

Paul Vick answered my Will Visual Basic 9.0 Have Collection Initializers? question in his June 27, 2007 What's in VB 2008? What's out? post. The answer, as I anticipated, is no. However, I was surprised to learn that VB 9.0 won't have array initializers, either. My limited tests indicated that array initializers were behaving as expected (i.e., like C# array initializers) in Beta 1. Paul says:

For VB 2008, we will only support initializing read-write fields of non-collection objects.

And explains the disappearance of array and collection initializers as follows:

Our original plans, going back to PDC05, included several more features for object initializers, such as being able to write to read-only properties, as well as collection and array initializers. In the end, the schedule for VS 2008 was not sufficient to implement these features with a high degree of confidence. Which unfortunately means that they will have to wait to a release beyond VS 2008.

Array and collection initializers deliver a major productivity boost, especially for unit testing where you don't want to hit the database because of greatly increased test times.

On the LINQ front, VB 9.0's lambda expressions won't support statement blocks, but that's been known for some time and probably isn't a major issue.

I was surprised to learn that the VB 9.0 compiler will deliver expression trees. Neither the September 2005 "Overview of Visual Basic 9.0" .doc file or the current HTML Overview of Visual Basic 9.0 version of February 2007 mentions expression trees, so I assumed

that support for writing them with VB was gone, too.

I doubt if you'll see many expression trees written in VB. Folks who write custom LINQ for Whatever implementations for specialized data domains are certain to do so with C# 3.0.

On the whole, I think most—if not all—developers who prefer writing VB code would rather have array and collection initializers than the capability to write expression trees.

Slightly off-topic: The more C# 3.0 LINQ code I write, the better I like C#'s "fat arrow" lambda (=>) operator (argument-list => expression-or-statement-block) rather than VB's forthcoming inline Function(argument-list) expression syntax.

Sunday, June 24, 2007

Visual Basic Team Starts LINQ Cookbook Series

Serial blogs about LINQ topics appear to be gaining popularity. The VB Team has embarked on writing the LINQ Cookbook and posted the first two recipes on 6/22/2007.

Update 7/12/2007: Added recipes 5, 6, and 7.

Note: O'Reilly Media appears to claim the word Cookbook™ as a trademark, which seems outrageous to me.

LINQ Cookbook, Recipe 1: Change the font for all labels on a windows form iterates a sample WinForm's controls collection and partially refactors your project by changing the fonts of Label controls to the Comic Sans MS family.

LINQ Cookbook, Recipe 2: Find all capitalized words in a phrase and sort by length (then alphabetically) uses the String.Split() method to divide a block of text into an array of words, then uses LINQ to iterate the array, discarding words that aren't initial-letter-capped, and populating a list box with a sequence sorted first by word length and then in alphabetical order.

LINQ Cookbook, Recipe 3: Find all the prime numbers in a given range accepts two integer arguments and returns all prime numbers between and including the argument values. This project requires Beta 2 because it uses Group By in the query expression; it won't compile in Beta 1 (and might not in Beta 2 because of what appears to be a conflict between Count(), an array, and Count, an integer).

LINQ Cookbook, Recipe 4: Find all complex types in a given assembly fills a list box with the names of "complex types" from Mscorlib.dll (the System.Core namespace). The project defines a complex type as "having more than 10 public methods, of which at least one has more than 3 arguments." The original code contained a typo: t.GetMethods() should be type.GetMethods(). After fixing the typo, the code compiles in Beta 1. Update: The typo is fixed.

LINQ Cookbook, Recipe 5: Concatenating the selected strings from a CheckedListBox is a contrived app that does what it says (comma-separated) with the LINQ Aggregate ... In construct. Several comments recommend code improvements.

Linq Cookbook, Recipe 6: Your first Linq Application using Northwind shows users new to LINQ to SQL how to install and connect to the Northwind database on an SQL Express database instance. As usual, the author recommends a connection to the NORTHWND.MDF file rather than using SQL Server Management Studio Express (SSMSX) to attach the .mdf file to the instance.

LINQ Cookbook, Recipe 7: Selecting Pages of Data from Northwind demonstrates use of the Skip()Skip and Take()methods for paging data items from LINQ to SQL queries.

Update 7/14/2007: Bill McCarthy's July 14, 2007 Inside Recipe 7 post observes that running Recipe 7's LINQ query within a function that has no return type requires setting Option Strict Off, which isn't a recommended practice. Bill also provides and example of the convoluted T-SQL syntax required to support the Skip and Take operators.

Recipe 42 is the most more interesting project so far, but it would have been more interesting if it returned LINQ-related types with a query such as this:

Dim q = From type In System.Reflection.Assembly.GetAssembly( _
        GetType(Funclet)).GetTypes(), _
        m In type.GetMethods() _
        Where (type.IsPublic) _
        AndAlso (type.ToString.Contains("Linq")) _
        Select type Distinct

After giving the projects a quick test drive, I have some recommendations for these and future recipes:

  • Use a conventional <pre> block for code to be copied rather than the inline style (MsoNormal with Courier), which double-spaces the code in VS 2008 and gives line continuations the fits. See Will Visual Basic 9.0 Have Collection Initializers? as an example of a post and VB 9.0 Code for the LINQ to SQL Performance Test Harness as a page with <pre> blocks.
  • Encourage folks to set Option Strict On by strongly typing variables and literals. For example, in Recipe 2 Dim Text As String and New Char() {CChar(","), CChar("."), CChar(";"), CChar(":"), CChar(""""), CChar("!"), CChar("?"), CChar(" ")} instead of New Char() {",", ".", "!", " "} (I added a few extra splitter chars to cover more bases.)  Some examples won't run with Option Strict On.
  • Break the habit of using & as the string concatenation character; + has been preferred since VB 7.0 (see the text variable in Recipe 2).
  • Don't establish recipe prerequisites that don't apply and aren't available. Most of us don't have VS 2008 Beta 2 and it isn't needed for the first two projects.

Making VS 2008 a prerequisite for these projects indicates to me that it's coming soon. Let's hope.

Update 6/25/2007: Bill McCarthy set me straight on use of the + and & operators for string concatenation. I got into the habit of using + in VB 7.0 as the result of using it in C# and, as I recall, some problem I encountered with & early on. As the "Concatenation Operators in Visual Basic" online help topic says:

The & operator is recommended for string concatenation because it is defined exclusively for strings and reduces your chances of generating an unintended conversion.

However, it's not going to be easy to break a seven-year (bad) habit. Also, &= looks a lot stranger to me than +=.

Surprise!: In doing a search to try to remember what problem I had encountered with the & operator, I found an example of Billy Hollis using + to concatenate strings (Gasp!) on page 1 of his "Straighten Out Your Strings" article for Visual Studio Magazine:

Dim sTest As String
For i As Integer = 0 To 15000
    sTest = sTest + "Strings in VB"

To his credit, he uses & elsewhere.

Nick Randolph noted in a comment that there's a simpler VB syntax for specifying literal strings as Char: New Char() {","c, "."c, ";"c, ":"c, """"c, "!"c, "?"c, " "c}. (An even shorter syntax that I should have thought of is ",.;:""!? ".ToCharArray, which Billy Hollis discusses at length in his article.) I just used the syntax suggested by the Error Correction Options popup: "Replace "," with CChar(",")." Another case of typing when I should have been thinking. On the other hand, maybe the popup should have suggested ToCharArray.

Mea culpa. With this much egg on my face, I think I'll stop making VB syntax recommendations.

Update 7/4/2007: Added Kit George's Recipe 5.

Saturday, June 23, 2007

Rico Mariani Starts LINQ to SQL Performance Analysis Series

Rico Mariani is a "Performance Preacher" in Microsoft's Developer Division and a co-author of the "Improving .NET Application Performance and Scalability" patterns & practices white paper. His Channel9 video, "Rico Mariani: Writing better, faster code," was the featured "Behind the Code" show segment earlier this year. He also starred in "Rico Mariani - Making .NET Perform (Happy Birthday Video #2)" produced in February 2005.

DLinq (Linq to SQL) Performance (Part 1) is Rico's first post about LINQ to SQL performance on his Performance Tips blog. His tests pit a simple LINQ to SQL query against an SqlDataReader with the Northwind sample database's Orders table as the data source. Rico's results for 500 executions of a simple query that iterated the orders table and created an instance of an Order object with OrderID, CustomerID, EmployeeID, and ShippedDate properties were as follows:

Time for 500 Queries
May 2006 CTP LINQ to SQL 8.027s 62.29
Raw Cost (SQL Data Reader) 1.094s 457.04

To see if I could duplicate Rico's results with the VS 2008 Beta 1 bits running under Vista Ultimate, I built a simple test harness that emulated his test conditions (to the extent that I could determine them from the code provided in Rico's post.)

Note: You can copy and paste the code for my LINQtoSQLPerf.sln test harness from my VB.NET Code for the LINQ to SQL Performance Test Harness Web page to a VB 2008 WinForm project. The O/R Designer contains only the Order object with OrderID, CustomerID, EmployeeID, and ShippedDate properties, so loading of related entities isn't an issue.

Rico must have a considerably faster machine than my two-years-old, single core 2.26-GHz Pentium 4 Dell 400SC server with 1 GB RAM and 80-GB 7,200 RPM UltraATA-100 drive (IBM IC35L090AVV207-0, 2MB buffer, 8.8 ms. seek time). Here are my results:

Time for 500 Queries
VS 2008 Beta 1 LINQ to SQL 14.698s 34.01
Raw Cost (SQL Data Reader) 1.465s 341.30

Changes, such as running a networked instance of SQL Server Express or moving from Debug to Release configuration made less than a 0.5 second change to the LINQ to SQL query time or 0.2 second change to the SQL Data Reader time. Removing unneeded columns from the Order object in the O/R Designer didn't change the LINQ to SQL time, either; CPU usage was pinned at 100% for the entire ~14.5 seconds; about 500 MB physical memory was free during the tests.

Rico says:

In May 2006 DLinq is running at about 1/8 the speed of the underlying provider (13.62%).*  We can do better than that.  And we did... Stay tuned for the details and some modern era DLinq results.

*Remember no real application would ever see a result as poor as 13.62% because of course they would be doing "actual work" as well as the DLinq operations resulting in more comparable performance.

My test show LINQ to SQL to be an order of magnitude slower than the underlying SqlDataReader. I'll be interested to see how much better Rico did with "modern era" LINQ to SQL bits. Rico says in a comment:

Some things made it into Beta 1 but the bulk of what I'm going to post in the next few days didn't happen until after.  You'll first see it in Beta 2.

That's what they all say. Stay tuned.

Updated 6/29/2007: Corrected and added fixed disk specs.

Technorati tags: , ,

Friday, June 22, 2007

Will Visual Basic 9.0 Have Collection Initializers?

For reasons unknown, all mention of collection initializers for Visual Basic 9.0 has disappeared. The disappearance seems to have occurred without comment from the VB team or VB users.

The September 2005 "Overview of Visual Basic 9.0" .doc file covered the topic briefly, but the current HTML Overview of Visual Basic 9.0 version of February 2007 discusses only object initializers and array initializers. VS 2008 Beta 1 online help has a "connection initializers (C#)" topic but no corresponding VB topic.

The 2005 Overview's section 1.3 "Object and Collection Initializers" says this about collection initializers:

As we have seen, object initializers are also convenient for creating collections of complex objects. Any collection that supports an Add method can be initialized using a collection initializer expression. For instance, given the declaration for cities as the partial class,

Partial Class City
  Public Property Name As String
  Public Property Country As String
  Public Property Longitude As Float 
  Public Property Latitude As Float
End Class

we can create a List(Of City) of capital cities of our example countries as follows:

Dim Capitals = New List(Of City){ _
  { .Name = "Antanarivo", _
    .Country = "Madagascar", _
    .Longitude = 47.4, _
    .Lattitude = -18.6 }, _
  { .Name = "Belmopan", _
    .Country = "Belize", _
    .Longitude = -88.5, _
    .Latitude = 17.1 }, _
  { .Name = "Monaco", _
    .Country = "Monaco", _
    .Longtitude = 7.2, _
    .Latitude = 43.7 }, _
  { .Country = "Palau",
    .Name = "Koror", _
    .Longitude = 135, _
    .Latitude = 8 } _

This example also uses nested object initial[iz]ers, where the constructors of the nested initializers are inferred from the context. In this case, each nested initializer is precisely equivalent to the full form New City{…}.

Note: The preceding excerpt also appears in the session of the same name made to the XML 2005 Conference & Exhibition held in Atlanta November 14-18, 2005. I doubt if that code worked in any VB 9.0 preview.

Neither the class nor the sample collection initializer will compile. It's one thing to change the syntax for a new feature, but another to use a defective class declaration in sample code.

The authors of the original version, Erick Meijer, Amanda Silver, and Paul Vick, imported and modified the class code and modified the List(Of City) code to utilize the current New Type With syntax in "Object and Array Initializers" of the February 2007 version:

As we have seen, object initializers are also convenient for creating collections of complex objects. Arrays can be initialized and the element types inferred by using an array initializer expression. For instance, given the declaration for cities as the class,

Partial Class City
  Public Property Name As String
  Public Property Country As String
  Public Property Longitude As Long 
  Public Property Latitude As Long
End Class

we can create an array of capital cities for our example countries as follows:

Dim Capitals = { _
  New City With { _
    .Name = "Antanarivo", _
    .Country = "Madagascar", _
    .Longitude = 47.4, _
    .Lattitude = -18.6 }, _
  New City With { _
    .Name = "Belmopan", _
    .Country = "Belize", _
    .Longitude = -88.5, _
    .Latitude = 17.1 }, _
  New City With { _
    .Name = "Monaco", _
    .Country = "Monaco", _
    .Longtitude = 7.2, _
    .Latitude = 43.7 }, _
  New City With { _
    .Country = "Palau",
    .Name = "Koror", _
    .Longitude = 135, _
    .Latitude = 8 } _

After fixing the class declaration by removing Property and changing Long to Float, the array initializer expression won't compile because there's a missing New City() type declaration in the first line, which should read Dim Capitals = New City(){ _. It's clear that no one tested the code before publishing the update.

Note: It's a shame that VB didn't adopt the Dim corollary to C#'s var arrayName = New[]{ ... } syntax, as Ralf Ehlert mentions in his "Problems with New Features of VB9" post in the Visual Studio VB Express Orcas forum.

The authors made no mention of VB 9.0 collection initializers, which were also missing from Jean-Marie Pirelli's "C# 3.0 and VB 9.0" session at Microsoft TechDays 07 March 27-28, 2007 in Geneva. Uncharacteristically, Jean-Marie provided both C# and VB examples, but he dispensed with VB examples when he reached the Collection Initializers slide. (Anders Hejlsberg and Jay Schmelzer received credit for the slides.)

It's interesting that Jeff Bramwell's Collection Initializer post in the Visual Basic Orcas forum didn't receive a reply from a Microsoft representative. The suggestion he received from Klaus Even Enevoldsen wasn't apropos collection initializers.

Note: My March 26, 2007 Updated "Overview of Visual Basic 9.0" Stealth Post lists other problems with the Overview reported by Jim Wooley. These errors relate to unimplemented VB 9.0 features scheduled for Beta 2.

There also were errors in the LINQ to SQL documentation that I reported in my April 26, 2007 More Issues with VB 9.0 Samples in the Updated LINQ to SQL Documentation post.

Update 6/24/2007: Added comment about data type error in Cities class declaration and corrected minor typos.

Update 6/25/2007: Bill McCarthy takes issue with my "It's clear that no one tested the code before publishing the update" statement about the second sample that uses the New Type With syntax in his VB9 and collection initializers post.

Regardless of syntax issues with the array initializer code, the class declaration has both syntax (Property) and data type (Long) errors.

Regarding the failure of the initializer code to compile, Bill says:

[I]t is clear the document is forward looking, and talking about how things will/should be.

So, given that, if we stop and analyze the syntax for array initializers it should be basically as presented. Perhaps one could argue the syntax should be Dim Capitals() = {.    It should also support the syntax you can use today in VS 2005, e.g: Dim ints() As Int32 = {1,2,3} , or as per the code Roger suggests Dim ints() As Int32 = New Int32(){1,2,3}

Okay so given the minimum it needs to support (current syntax), in VB9 we add anonymous types and inferred typing.  Inferred typing means we remove the "As XXX" part.  So this means the syntax Dim Capitals As City() becomes Dim Capitals and the Dim Capitals() As City syntax becomes  Dim Capitals().  Anonymous types means we can't define the type name. So New City() { would become New() { _ , and given that the existing syntax doesn't even require the New Int32(), the New() becomes superfluous because the set brackets { } define the array data.

IOW: the syntax as shown is as it should be. (IMO)

Explicitly typing the array members by adding New City With { ... syntax precludes the array being of an anonymous type or use of inferred typing and the Dim Capitals = New City(){ ... statement is required. I didn't propose Dim Capitals() As City = {...} or Dim Capitals As City() = New City(){...}  as inferred by Bill's "Dim ints() As Int32 = {1,2,3} , or as per the code Roger suggests Dim ints() As Int32 = New Int32(){1,2,3}" sentence.

You can create anonymously-typed array elements by substituting New With for New City With, but Dim Capitals() = {...} returns an array of type Object with Option Strict Off and won't compile with Option Strict On. If VB 9.0 finally supports the New() construct for anonymously typed arrays, the the preceding code should compile with Dim Capitals = New(){ _, but I don't believe that New(){ _ would be superfluous. Here are two C# array initializer examples that use new[]:

/* Array initializer with object initializers (array is inferred type LineItem) */

var LineItems = new[]
    new LineItem {OrderID = 11000, ProductID = 11, Quantity = 10, QuantityPerUnit = "24  500-g bottles", UnitPrice = 15.55M, Discount = 0.0F},
    new LineItem {OrderID = 11000, ProductID = 21, Quantity = 20, QuantityPerUnit = "12 1-kg cartons", UnitPrice = 20.2M, Discount = 0.1F},
    new LineItem {OrderID = 11000, ProductID = 31, Quantity = 30, QuantityPerUnit = "24 1-kg bags", UnitPrice = 25.45M, Discount = 0.15F}

/* Array initializer with object initializers (array and elements are anonymous types) */

var LineItems2 = new[]
    new {OrderID = 11000, ProductID = 11, Quantity = 10, QuantityPerUnit = "24 500-g bottles", UnitPrice = 15.55M, Discount = 0.0F},
    new {OrderID = 11000, ProductID = 21, Quantity = 20, QuantityPerUnit = "12 1-kg cartons", UnitPrice = 20.2M, Discount = 0.1F},
    new {OrderID = 11000, ProductID = 31, Quantity = 30, QuantityPerUnit = "24 1-kg bags", UnitPrice = 25.45M, Discount = 0.15F}

However, I haven't heard anything about VB 9.0 implementing New(). On the other hand, I'm not sure why I would want to create an anonymously typed array or what its use would be.

Tuesday, June 19, 2007

LINQ to SharePoint 0.2 Alpha to Meet Major WSS 3.0 Template/Apps Surge

Bart De Smet's LINQ to SharePoint - Announcing the 0.2 alpha release post yesterday and Mary Jo Foley's SharePoint: Microsoft’s Web 2.0 hub column today are an interesting coincidence.

Bart's enhancing LINQ to SharePoint with these new features:

  • Enhanced support for SharePoint list field types, including Lookup and LookupMulti fields with lazy loading support and subquery support
  • Changes to the entity model used by LINQ to SharePoint, in preparation for update support down the road
  • Optimization of the CAML queries generated by the query parser
  • Support for First and FirstOrDefault query operators
  • Introduction of a CamlMethods static class with helper functions, including DateRangeOverlaps
  • Support for Now and Today elements
  • Multiple Where clauses per query are supported
  • SpMetal improvements and separation of front-end and back-end functionality, allowing for hosting of SpMetal functionality in other environments (such as an IDE)

Bart also announced that work is starting on the 0.3 Alpha version that will include update capability. Current code, a draft of the technical spec and the first unit tests for the query parser are available now on his CodePlex site. Final 0.2 Alpha code will be  ready in a few days.

Bart picked a great data domain for his third-party LINQ implementation. SharePoint is getting much more of Microsoft's attention and resources as the company prepares to repulse the attack on its Office hegemony by online competitors Google, et al.

Update 7/26/2007: Jeff Raikes reported during the Microsoft's annual financial analysts conference that SharePoint generated $800 million in revenue during fiscal 2007. (From Joe Wilcox of Microsoft Watch).

SharePoint "Social Computing" Enhancements

Almost simultaneously, Mary Jo Foley announced the availability of the Community Kit for SharePoint (CKS) 2.0 beta, which opened on the CKS CodePlex site Monday. CKS 2.0 adds these features to Windows SharePoint Services (WSS) 3.0 and its big brother, Microsoft Office SharePoint Services (MOSS) 2007:

The Microsoft SharePoint Products and Technologies Team Blog's June 18, 2007 Community Kit for SharePoint 2.0 Pre-Release announcement says:

The ultimate goal of the CKS is to enable community oriented features and solutions by leveraging, enhancing, and extending SharePoint as a social computing platform.

100 "Next-Generation" SharePoint Business Apps Coming

Mary Jo also reports that Derek Burney, general manager of Microsoft's SharePoint Platform and Tools group, will commit today at the Enterprise 2.0 Conference to delivering 100 "next-generation" business applications (not templates) over the next 12 months to SharePoint users for internal use by Microsoft employees.

Sandy Kemsley, who's covering Enterprise 2.0 in her EbizQ Column 2 blog, didn't mention anything about this topic in her Enterprise 2.0: Derek Burney item. The same is true for Michael Sampson (Michael's Thoughts, Notes on Derek Burney, "Amplify the Impact of Your People with Enterprise 2.0 Technologies"), John Eckman (openparenthesis, Liveblogging Enterprise 2.0 - Microsoft’s Derek Burney), and Mike Gotta (Collaborative Thinking, Amplify the Impact of Your People with Enterprise 2.0 Technologies). Michael, John and Mike mention "Next Generation Applications" but not that Microsoft is giving 100 of the them to users.

Update 6/20/2007: According to later reports in Network World (Lotus, Microsoft jostle to land social networking customers by John Fontana)and eWeek (IBM, Microsoft Show Web 2.0 Wares by Renee Boucher Ferguson and Darryl K. Taft) articles, the 100 apps will be for internal use only. Fontana writes:

In addition, Microsoft said it is committing to build 100 social networking business applications before June 2008 for use inside the company. One currently in development is SharePointPedia, which helps users find SharePoint technical and support information from both Microsoft and other sources.

If SharePointPedia is an example, at least some internal apps might make reach SharePoint customers in the form of templates. According to Lawrence Liu's post in the CodePlex site for CKS:SharePointPedia:

Microsoft is embarking on an ambitious project to create an application codenamed "SharePointPedia" that will be used to enable a "community driven and supplemented content lifecycle." ... [I]t's being designed (yes, the project kicked off just last week) to be used primarily by the community. ...

SPP is scheduled to go live ... by the end of October.

Community Kit for SharePoint Background

The Community Kit for SharePoint Vision and Scope Document describes CKS:

At the most basic level, the CKS is a site template that enables practically anyone to create very quickly a functional community website on Windows SharePoint Services 3.0 or Microsoft Office SharePoint Server 2007. The “Standard Edition” will require nothing more than the out-of-the-box Web Parts that come with WSS 3.0. In this way, the CKS:SE is just like the Application Templates for WSS, but that is where the similarity ends.

Instead of being solely developed by Microsoft, the CKS will be a collaborative development project hosted on CodePlex, an online software development environment for open and shared source developers to create, host, and manage projects throughout the entire software development lifecycle.

Here's the CKS vision statement from Project Management and Evangelism Lead Lawrence Liu:

  • A set of best practices, templates, Web Parts, tools, and source code that enables practically anyone to create a community website based on SharePoint technology for practically any group of people with a common interest.
  • A technology framework that sits on top of Windows SharePoint Services or Office SharePoint Server and can be further customized or extended to suit the community website implementer’s needs.
  • A shared source community development project that is provided at no cost and allows anyone to use for commercial or non-commercial purposes.

As mentioned in the earlier Vision and Scope Document quote, you don't need to run a pricey Microsoft Office SharePoint Server (MOSS) 2007 version:

Targeted Platform: Given that Windows SharePoint Services 3.0 was released on November 16, 2006 and is available for free to licensed customers of Windows Server 2003, the development efforts on the CKS should be targeted at this version of SharePoint. Opportunities for “feature light up” when Office SharePoint Server 2007 is present should also be considered.

Just Say No to Web and Enterprise Two-Point-Oh?

"Web 2.0" and "Enterprise 2.0" are two terms that I've come to distrust—if not despise—as overhyped and basically without meaning. However, Dion Hinchcliffe's May 2006 A round of Web 2.0 reductionism item and July 2006 Enable richer business outcomes: Free your intranet with Web 2.0 post shed some light on the two topics in the enterprise.

Sunday, June 17, 2007

Entity Framework: Complex Types Redux in Beta 2

The June 2006 ADO.NET Tech Preview: Entity Data Model specification asserts: "An EntityType can have one or more properties of the specified SimpleType, ComplexType, or RowType. Properties can be either single-valued or multi-valued." I've assumed (until now) that the Entity Framework (EF), which implements the Entity Data Model (EDM), supported complex types.

Eric Evans calls complex types value objects in Domain-Driven Design: Tackling Complexity in the Heart of Software and defines them as follows:

An object that represents a descriptive aspect of the domain with no conceptual identity is called a VALUE OBJECT. VALUE OBJECTS are instantiated to represent elements of the design that we care about only for what they are, not who or which they are.

The classic value object that appears in most domains is an Address type. In ordinary contexts, the Address instance that Order.ShipAddress or Vendor.Address represents isn't important if the Address property values are correct. The existence of multiple instances of the Address type having the same or slight variation in property values (such as ST and Street or E and East) isn't significant. However, Address is an entity for gas, water, electric, telephone, and cable TV utilities because it represents a service and billing endpoint and might be the location of a utility's assets, such as a set-top box.

I also assumed that I'd need the missing EDM Designer to define complex types because the current (Beta 1) EDM Wizard doesn't have a feature to define a table as persisting a ComplexType or group table fields to generate a property of the ComplexType. I didn't want to spend the time to research the syntax and hand-code the mapping (MSL) and conceptual (CSDL) layer XML files to test a feature that might change in the EF CTP promised for June.

So I was surprised to learn from Danny Simmons' Non-scalar Value Objects aka "Complex Types" post of late last night that the EF doesn't support complex types. I cringed as I read:

At the moment the EF CTPs do not support complex types, and someone has asked for that feature we just can't support. [Emphasis added.]

Why can't we support it?  Well, first off the lack of a property or properties that can be the key is fundamental.  Secondly, if you think about it, what would insert and delete operations that affect one but not both of the entities in a row really mean?  Things get crazy really quick.

I fully expected to see the persistence ignorance (PI) ruckus raised anew with even greater ferocity. The inability to generate a property of the ComplexType from a group of fields in the table that persists the containing entity appeared to me to be an especially egregious failure to support PI.

Earlier in the post, Danny had mentioned that "the system doesn't let you map two entities to the same row in the same table." That restriction didn't appear to me to prevent mapping an entity and one or more complex types to a single row. For example, a SalesOrder entity might have BillAddress and ShipAddress complex types. Something wasn't adding up here.

Finally, I read the [April Fool!] punch line:

Fortunately, we did decide to implement complex types, and they will appear in an upcoming CTP.

Update 6/18/2007: Danny says in his comment that he intends to clarify his post.

Here's the full story from Danny's answer to Szymon Kobalczyk's June 15, 2007 EDM: Mapping two entities from single table message in the ADO.NET Orcas forum:

A complex type is a value type that does not have identity of its own but it does have structure.  You can define a complex type for address info, and that type can appear as a property of one or more entities.  When we generate code for the model you will have a separate class for the complex type and it will appear as a direct member of the entity class, so you will get a structure much like you define above, but you won't actually have a navigation property from the supplier to the address you would instead just have a property of the supplier whose type is an address. 

With complex types you can actually map a complex type's properties to different columns in the same row with the entity that contains the type--in fact the identity of the complex type is slaved to the identity of the entity which contains it so it must be part of the same row (or rows in multiple tables if you are using entity splitting or table-per-type inheritance or something like that where parts of a single entity appear in multiple tables). 

The nice thing about this kind of construct is that you can fill out the address class with methods and validation and such and share that logic across all instances of an address regardless of which entity type they appear in--it's just that each of those instances is merely a part of the containing entity rather than an entity of its own.

It's my suspicion that the 2nd approach is what you want, but you'll just have to wait until beta 2 before that's available.

The ability to validate a common Address type, as well as AddressUS, AddressCA, AddressUK, AddressEU, etc. subtypes, is an extremely important feature of object/relational mapping (O/RM) tool. EF's complex types support inheritance, according to the spec. Value objects usually map to NHibernate components. I'm not certain how LLBLGen Pro handles them. Frans?

Recursive complex types (a complex type containing a complex type) would be nice in v1. They're presently in the spec's "EDM Future Directions" section, along with conditional association, many-to-many relationships with payload, n-ary relationships, relationship inheritance, and dynamic entity extensibility.

Update 6/18/2007: Danny Simmons' comment indicates that v1 won't implement ComplexType inheritance but will support recursive (nested) complex types. I'd gladly forego ComplexType nesting for inheritance and I believe most other EF users would, too.

Recommended Reading: Agile Joe [Ocampo] has almost-chapter-length posts about entities and value objects with different analogies, as well as domain-driven design as a whole. He offers a substantial number of posts about NHibernate, also. (Some of Joe's posts are even longer than mine.)

More Recommended Reading: John Papa says today that his July 2007 Data Points column has been posted to the MSDN Magazine Web site as "ADO.NET Entity Framework Overview". MSDN should have given him the 7,000 words from his draft version. I was surprised to see  "ADO.NET in the next release of Visual Studio® code-named 'Orcas' features the new Entity Framework" in the online lead, because the Entity Framework hasn't been a part of VS 2008 since April 28, 2007.

Thursday, June 14, 2007

Sahil Malik Takes on LINQ, LINQ to SQL, and Entity Framework

From the "How Did I Miss This?" department. (I subscribe to Sahil's blah!bLaH!BLOG from IE7, but his June 9, 2007 post in question probably was buried under a long SharePoint entry. Thanks to Sam Gentile for yesterday's New and Notable item, which also includes a link to Ian Cooper's June 10, 2007 Being Ignorant with LINQ to SQL post, the subject of this review.)

ADO.NET and SharePoint guru Sahil Malik's My Views about the ORM space, Entity Framework and all such stuff! dissertation on Microsoft's "New Generation Data Access" efforts for ADO.NET 3.5 delivers the following conclusions:

  • LINQ is Microsoft's latest "making a molehill from a mountain" project.

"What LINQ does give me, is a way to simplify my code for about 10-20% of the use cases in my code. It will revolutionize the 10-20%, but really - in an entire project - 10-20% of a revolution that took 3 years to incubate? I'm beginning to not get impressed."

  • LINQ to SQL isn't a production-grade technology. 

"LINQ to SQL is great for scratch and sniff - concept projects. ... LINQ to SQL is to .NET 3.5, what TableAdapters were to .NET 2.0. In my honest and not so humble opinion, most production projects will and should stay away from it."

  • The Entity Framework and Entity SQL get Sahil "really excited."

"The whole concept of the Entity Data Mapper, the Mapping provider, Entity Model and the best thing around—Entity SQL—are quite awesome. ... eSQL is what differentiates Entity Framework from the rest of the ORMs. eSQL is quite kickass. eSQL + LINQ gives you the organizability(!?) of C#, and the ease of Foxpro. What is there not to like? ... 

I am quite disappointed to learn that it won't be a part of Orcas. ... I am bored of data access, and will continue to watch it from the sidelines until we have some serious progress on the only one possible MSFT winner out of the above at this time, the Entity framework."

Yet Sahil finally concludes:

Note that LINQ will also fly, but IMO is a different animal and has nothing to do with Data Access. But really, LINQ isn't that big or complex (or even impressive! :-/).

Sounds like damnation by faint praise (or weak condemnation) to me.

LINQ and Data Access

Sahil is right; LINQ has nothing to do with data access.

LINQ is an enabling technology for applying a common SQL-like query syntax to a wide variety of data domains. LINQ's strongly typed queries consist of C# 3.0 or VB 9.0 keywords so they're checked at compile time—not runtime—and provide IntelliSense and statement completion.

LINQ to SQL is simply a domain-specific LINQ implementation. LINQ to Entities is another domain-specific implementation. There are many third-party LINQ implementations in process, including Ayende's LINQ to NHibernate and Bart De Smet's LINQ to SharePoint.

Note: Forgot to mention in prior posts that Bart De Smet is going to work for Microsoft's WPF team in the Developer Division. Hopefully, he'll continue working on this third-party LINQ implementations.

Update 6/16/2007: Bart says in this comment that he plans to continue work on his LINQ implementations after moving to Microsoft in October and that an update to LINQ to SharePoint is scheduled for this month.

Update 6/17/2007: True to his word, Bart posted a set of samples that illustrate how to use LINQ to Sharepoint today. (I bet he'll miss the moules aux vin blanc from the joints off the Grand Place like Chez Leon on the petit rue des Bouchers or less touristy places in the outskirts of Brussels.)

LINQ also is responsible for adding many new constructs to C# 3.0 and VB 9.0, including some from functional languages, such as Haskell:

  • Local variable type inference implemented by C# 3.0’s var and VB 9.0’s Dim keywords to shorten the syntax for declaring and instantiating generic types and support anonymous types
  • Object initializers to simplify object construction and initialization with syntax similar to array initializers
  • Collection initializers to combine the concept of array initializers and object initializers and extend it to generic collections
  • Anonymous types to define inline CLR types without writing a formal class declaration for the type
  • Lambda expressions to simplify the syntax of C# 2.0’s anonymous methods, deliver inline functions to Visual Basic developers, and aid type inference and conversion for expression trees
  • Extension methods to enable chaining of extensions that add custom methods to a CLR type without the need to subclass or compile it

However, Sahil complains in his May 12, 2007 A different point of view post (linked from my New Series on Closures in Visual Basic 9.0 item) that:

It's a shame that the language is getting so complex. This is the same mistake C++ made 7 years ago, and why .NET was so successful. It's a shame that we are seeing the same mistake being made all over again.

It seems to me that LINQ and its query expressions make the language simpler by cloaking the complexity of some of these new language features with "syntactic sugar." Sahil seems to abandon his "LINQ has nothing to do with data access" point when he says:

Finally, don't forget - LINQ doesn't buy you any performance gain, or set based theory like Foxpro did, it is more or less syntactical sugar and bunch of .NET code under the scenes that you didn't have to write. [Emphasis added.]

And Sahil did say earlier that that LINQ "gives you .. the ease of Foxpro," which implies the relational data domain to me.

Sahil on Entity SQL as the "Best Thing Around"

I believe Sahil waxes a bit too enthusiastic when describing Entity SQL (eSQL) as the "best thing around" in his paean to the Entity Framework. Apparently, he missed the several posts that mention omission of Data Management Language (DML) constructs (INSERT, UPDATE and DELETE) from eSQL v1. Perhaps most of his work involves read-only data access. Microsoft recommends using the ObjectContext—presumably with LINQ to Entities for UPDATEs and DELETEs—for DML operations.

Erik Meijer, known as the "Creator" of LINQ, has these plans for updatable views and O/R mapping in LINQ 2.0:

Just as we provided deep support for XML in Visual Basic, in LINQ 2.0 we hope to directly support relationships and updatable views at the programming-language level. In that case, we only need a very thin layer of non-programmable default mapping at the edge between the relation and object world and allow programmers to express everything else in their own favourite LINQ-enabled programming language. The result is that just as LINQ 1.0 side-stepped the impedance mismatch “problem” with something better (monads and monad comprehensions), LINQ 2.0 will sidestep the mapping “problem” with something better (composable and programmable mapping).

It sounds to me as if LINQ 2.0 might transcend eSQL and potentially replace the Entity Framework.

Comments Tell the Tale

Sahil's post had 16 comments on June 14, 2007, many of which were from .NET luminaries, such as Don Damsak (donxml), Aaron Erickson (author of the i4o LINQ indexing extension), Ian Cooper, and Frans Bouma (lead developer of the LLBLGen Pro O/RM). I have the feeling that this post caused (or at least contributed to) Ian's Being Ignorant with LINQ to SQL post. The comments include this astounding claim by Damon:

No vendor (open source or for purchase) has anything now production quality except NHIbernate that can really call itself an ORM

Sahil's item also elicited a Sahil on O/RM response from Ayende and this comment from Sahil:

eSQL gives you runtime ability to run queries against your objects.

You might suggest that LINQ does the same, but not really. eSQL gives you the ability to truly bring set based theory into higher level programming languages. There is a query optimizer built into the eSQL framework, so the queries on your object model take advantage of db concepts.

This is something current ORMs cannot do.

Secondly, I am pretty firm on my LINQ to SQL views - I am pretty sure of that.

The Entity Framework's EntityCommand and ObjectCommand objects take eSQL strings and bring no more "set based theory into higher level programming languages" than T-SQL or PL/SQL strings do. Only LINQ incorporates a query syntax "into higher level programming languages" (e.g., C# and VB). That's LINQ's claim to fame and the objective of LINQ to Entities.

The way I understand the plan is: eSQL is an SQL dialect that enables querying the entities defined by the conceptual schema layer of the Entity Data Model (EDM) by CSDL (Conceptual Schema Definition Language) or against the optional Object Services layer's ObjectContext. A command tree in the EntityClient's custom query pipeline for the RDBMS (limited to SQL Server 200x and SQL Server Express Edition at present) translates eSQL to the RDBMS's SQL dialect. Query optimization, if any, takes place on the database server. 

I have my doubts that eSQL will become the lingua franca of "SQL for Entities" any time soon, although IBM, Oracle, MySQL and others appear to be developing custom EntityClient implementations. IBM's interest appears to be LINQ-enabling DB2; the remaining third-parties haven't stated their goals.

Update 5/15/2007: LINQ to SQL architect Matt Warren elaborates in this comment on eSQL and confirms that eSQL doesn't include a query optimizer.

Michael Pizzo offers an architect's view of the Entity Framework with emphasis on inheritance in his "An Application-Oriented Model for Relational Data" article for Microsoft's The Architectural Journal #12. He says the following about Client Views generated by eSQL:

The Entity Framework uses a Client View mechanism to expand
queries and updates written against the conceptual model into
queries against the storage schema. The expanded queries are
evaluated entirely within the database; there is no client-side query
processing. These Client Views may be compiled into your application
for performance, or generated at runtime from mapping metadata
provided in terms of XML files, allowing deployed applications to work
against different or evolving storage schemas without recompilation.

Update 5/14/2007: Entity Framework developer Danny Simmons clarifies the relative ease with which third parties can develop EntityClients for their RDBMSs in this comment.

LINQ to Entities and LINQ to SQL are analogous implementations; their command trees translate LINQ expressions to eSQL and T-SQL respectively. From what I've seen of eSQL, I would use LINQ to Entities unless there was some eSQL construct I badly needed and LINQ to Entities couldn't translate it. (Such a issue would appear to me to qualify as a bug.)

Microsoft Issues Hotfix for SQL Server 2005 Ordered View and Inline Function Issue

My September 11, 2006 SQL Server 2005 Ordered View and Inline Function Problems post described a change of behavior between SQL Server 2000 and 2005 when displaying views created with a SELECT TOP 100 PERCENT ... ORDER BY Whatever query: SQL Server 2000 sorts the resultset and SQL Server 2005 [Express] doesn't.

The post received many comments, including several that objected to my claiming this behavior was an issue for SQL Server users because the behavior was by design. Unfortunately, comments posted before the change of the OakLeaf blog format were lost in the transition to the wider page.

Yesterday Microsoft issued a hotfix, FIX: When you query through a view that uses the ORDER BY clause in SQL Server 2005, the result is still returned in random order, which is only available from Microsoft Support, describes the following Symptoms:

You have a view in a database in SQL Server 2005. In the definition of the view, the SELECT statement meets the following requirements:

  • The SELECT statement uses the TOP (100) PERCENT expression.
  • The SELECT statement uses the ORDER BY clause.

When you query through the view, the result is returned in random order.
However, this behavior is different in Microsoft SQL Server 2000. In SQL Server 2000, the result is returned in the order that is specified in the ORDER BY clause.

The hotfix involves modifications to 11 files, including Sqlservr.exe and the workaround described in my post is simple, so Microsoft must have received many complaints about the problem.

Thanks to IDisposable (Marc Brooks) for the heads-up on the hotfix.

Technorati tags: , , , , , , , , , , , , , , ,

Monday, June 11, 2007

Ian Cooper Takes on DDD, TDD and PI with LINQ to SQL

UK developer Ian Cooper posted Sunday a detailed analysis of how LINQ to SQL fits into domain-driven design (DDD) and test-driven development (TDD), and then raises the issue of the LINQ implementation's persistence ignorance (PI). His bio says:

Ian has over 15 years of experience delivering Microsoft platform solutions in government, healthcare, and finance. During that time he has worked for the DTi, Reuters, Sungard, Misys and Beazley delivering everything from bespoke enterprise solutions to 'shrink-wrapped' products to thousands of customers. Ian is a passionate exponent of the benefits of OO and Agile [programming]. He is test-infected and contagious. When he is not writing C# code he is also the and founder of the London .NET user group.


Ian's Being Ignorant with LINQ to SQL essay starts by contrasting the data-centric versus domain-centric design approaches: "Data-centric designs tend to flow the relational model into the code" while "[t]hose who tend to be domain-centric flow the domain model out to their persistent store." Ian classifies LINQ to SQL "as a domain-centric tool because of its design goal of making it possible to share on[e] query syntax across many collection types and in the feature set provided by data context." The capability to employ a common set of LINQ queries over in-memory collections and the persistent store is critical to his final conclusion:

LINQ to SQL is usable with a TDD/DDD approach to development. Indeed the ability to swap between LINQ to Objects and LINQ to SQL promises to make much more of the code easily testable via unit tests than before. [Emphasis added.]

Ian goes on to analyze LINQ to SQL's feature set in terms of patterns from Martin Fowler's Patterns of Enterprise Application Development. This is the first example of such an analysis that I've seen for LINQ to SQL. I wouldn't characterize the ActiveRecord pattern, which Ruby on Rails and MonoRail use, as domain-centric. As Ian notes in a reply to Gregory Young's comment:

I understand some folks like ActiveRecord, but I think it has issues for PI, because it is usually intrusive into the domain classes via a template method or reference to protected variables.

Update 6/12/2007: In a 6/12/2007 reply to a comment from Gregory Young, Ian agrees that ActiveRecord is data-centric.

I've added a request that the ADO.NET team provide a similar analysis for the Entity Framework (EF) and Entity Data Model (EDM) as #17 to my suggestions for Defining the Direction of LINQ to Entities/EDM.

Persistence Ignorance

He tests LINQ to SQL against Jimmy Nilsson's eight conditions that preclude persistence ignorance and concludes:

LINQ to SQL scores pretty well against the PI checklist. As always there are trade-offs where performance can be obtained by specific features. It would be nice if we could choose to trade off lazy loading for standard collections so that we could obviate the need to use specific collection types for associations unless we needed lazy loading, but otherwise there is nothing to complain about here.

The "specific collection types" Ian refers to are EntityRef and EntitySet for associations. A reader named Wuz notes can be replaced by a plain object reference and and a list type, if you don't mind giving up lazy loading and specifying the entities to load with DataShape [to become DataLoadOptions in Beta 2 and later.]

There's been little or no discussion up to this point about PI in LINQ to SQL. The lack of interest on the part of the participants in the PI in EF and EDM controversy probably is due to LINQ to SQL's permanent connection at the hip to SQL Server 200x.

TDD and Code Generation

Ian isn't a fan of code generation for creating classes or databases:

This article is about a TDD approach to using LINQ which means that I am not using the code-generation made available through the designers in Orcas. ...

SQLMetal provides code generation support for strongly typed data-contexts in LINQ to SQL (for both mapping file and attribute based approaches); Orcas will ship with designers for people who don't like working with a command line. I prefer to avoid them for anything that is not demo based or first-cut.

He then goes on to describe his approach to TDD with LINQ to SQL and LINQ to Objects and demonstrates how to switch between an in-memory repository and the persistent store (SQL Server) for test doubling.

The Entity Framework versus LINQ to Entities

Ian is a proponent of LINQ to SQL and an Entity Framework detractor. In an earlier LINQ to Entities and Occam's Razor post, which I quoted in my LINQ to SQL:Entity Framework::REST:SOAP? entry, he says:

The key to most ORM toolsets adoption is the productivity benefits they bring and the clean programming model - persistence ignorance - that they support. When I look at LINQ to Entities I see the former being dragged-down by additional abstractions and in the latter case entirely absent; by contrast, LINQ to SQL hits both of these spots.

LINQ to Entities is overcomplex for many needs and its use in many scenarios defies Occam's Razor - Entities should not be multiplied beyond necessity. For simple mapping scenarios, LINQ to Entities feels bloated and I don't want to use until I have to use it. The very design goals for LINQ to Entities preclude it ever being a simple solution.

He then goes on with a plea to Microsoft to enable LINQ to SQL for databases other than SQL Server. I agree that EF and EDM are far too heavyweight approaches to ordinary object persistence needs and that a single-file or attribute-based approach is likely to satisfy 90% of developer's needs for an object/relational mapping tool. But it won't if it's locked into SQL Server 200x. It's especially surprises me that the ADO.NET team would choose EF and EDM over LINQ to SQL as the O/RM tool for their lightweight SQL Server Compact Edition.

Note: It took 13 seconds to dynamically generate a default EDM for the Northwind database in my test of Using the Entity Framework with IronPython 1.1 in Project Jasper. This is considerably better than the approximately 30 seconds it took in Sam Drucker and Shyam Pather's DEV18 - Rapidly Building Data Driven Web Pages with Dynamic ADO.NET MIX07 video (19:17 to 19:47).

You can read more about Microsoft's travels from persistence schizophrenia to persistence parsimony with LINQ to SQL here: Future LINQ to SQL Support for Multiple Databases?