Rico Mariani continues his analysis of LINQ to SQL performance issues with a new DLinq (Linq to SQL) Performance (Part 2) post dated June 25, 2007 by recommending compiled LINQ query expressions.
LINQ to SQL query expressions take advantage of lazy evaluation. Your code doesn't hit the SQL Server instance until the the IEnumerable<T> sequence's iterator (foreach or For Each structure) requests the first row. However, before the iterator starts it's pass, the compiler builds an expression tree, which in turn generates the T-SQL statement, and sends it to the database. As Rico observes:
In many cases all that will be different from one invocation to another is a single integer filtering parameter.
The expression tree incorporates logic to send a parameterized query when the query expression contains one or more parameters in the Where clause. For example:
SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[ShippedDate]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[ShipCountry] = @p0
ORDER BY [t0].[OrderID] DESC
-- @p0: Input NVarChar (Size = 3; Prec = 0; Scale = 0) NOT NULL [USA]
SQL Server caches the query plan and uses the cached version for different parameter values. Thus the only option remaining to the developer for improving performance is to compile the query, which eliminates building the expression tree and generating the T-SQL on each execution.
Compiling Parameterized LINQ Query Expressions
Rico provides an example of a query delegate type that actually will compile. The only item missing is the delegate function:
private IEnumerable<Order> GetOrderById(int orderId)
return q(Northwinds, orderId);
The LINQ to SQL: .NET Language-Integrated Query for Relational Data white paper by Dinesh Kulkarni, Luca Bolognese, Matt Warren, Anders Hejlsberg, and Kit George (March 2007) is one of the few (about 7) hits you'll get on a search for compiledquery.compile. The white paper has a very brief section that covers the the CompiledQuery class and its Compile method for compiling parameterized query expressions with a single example that will compile but won't work more than once during a session.
Note: The paucity of posts about compiled LINQ queries is surprising.
A bug—presumably in a System.Data.Linq class—causes a null reference exception if you create new DataContext objects for successive executions of the compiled query during a session. The sample code for the function that invokes the query delegate type does exactly that (and has a variable name error, to boot):
public IEnumerable<Customer> GetCustomersByCity(string city)
Northwind db = new Northwind();
return Queries.CustomersByCity(myDb, city);
Delete the Northwind db = new Northwind(); instruction, change the erroneous myDB argument to your DataContext variable and the sample code should work as expected. However, this is another example of untested sample code. The authors might not have found the multiple invocations bug, but certainly wouldn't have been able to compile the function with the wrong argument name.
Note: Anders Borum confirmed the bug and pointed out the source of the problem in an answer to my Problem with Reuse of Compiled LINQ to SQL Query in Beta 1 post in the LINQ Project General forum. Matt Warren confirmed that the bug was detected and fixed in the Beta 1 timeframe but didn't make it into the Beta 1 bits. It's fixed for the forthcoming Beta 2 drop.
Where's the Meat?
Rico didn't offer any performance numbers in his Part 2 post, possibly because the missing function prevented him from comparing the execution time of his compiled query with baseline SqlDataReader data. VB 9.0 doesn't support lambda expressions in VS 2008 Beta 1, so I ported my original VB test harness to C# 3.0, added code to generate a combo box of Northwind country names with the number of orders per country, and wrote the code to add a parameterized query option for LINQ and SqlDataReader queries, and a compiled query option for LINQ. Here's a screen capture of the test harness's form:
The combo box contains the name and count of orders from 21 distinct ShipCountry values from the Northwind Orders table. The count varies from 6 (Norway) to 122 (USA and Germany.) The DataContext's Orders object has only the fields required for the query expression (the three fields of the original test harness and Rico's first test object) plus the ShipCountry field for the Where clause constraint.
Following is the performance penalty data for executing 500 parameterized and compiled queries with differing numbers of rows:
|LINQ (Param), s.||Param / Base||LINQ (Compiled), s.||Compiled / Base||Param / Compiled|
Base is the parameterized SqlDataReader execution time in seconds. Param/Base is the ratio of parameterized LINQ query execution time to the Base time. Compiled/Base is the ratio of compiled, parameterized LINQ query execution time to the Base time. Param/Compiled is the performance ratio of compiled-parameterized to parameterized LINQ queries.
Data: Rows contain 36 bytes of date: 2 integers (8 bytes), 1 5-character CustomerID (10 bytes), one DateTime field (8 bytes), and an nvarchar ShipCountry field that averages 5 characters (10 bytes) for a total of 36 bytes. The rows of the Part 1 tests contained an average of 26 bytes (no ShipCountry field).
Hardware: 2005-era, single core 2.26-GHz Pentium 4 Dell 400SC server with 1 GB RAM and 80-GB 7,200 RPM UltraATA-100 drive (IBM IC35L090AVV207-0, 2MB buffer, 8.8 ms. seek time).
Conclusions: With Beta 1 bits, LINQ to SQL parameterized queries are about 7 times slower than the baseline SqlDataReader; LINQ to SQL compiled queries are about 2.5 times slower than the baseline SqlDataReader. The LINQ to SQL query performance penalty increases for both types of LINQ to SQL queries as the number of rows returned increases. The performance benefit of compiled over parameterized LINQ queries decreases as the number of rows increases.
Note: The the 330% performance factor of compiled queries in this scenario doesn't match the 500% performance factor reported by Anders Borum, probably because of the masking effect of database access.
What About Compiled Query Caching?
Frans Bouma, the developer of the LLBLGen Pro object/relational mapping (O/RM) tool for .NET, raises in a comment the issue of compiled query caching between DataContext refreshes. My assumption is that the compiled query is cached independently of the DataContext, just as the database connection is independent of the DataContext and is closed once the object(s) of interest have been hydrated.
Matt Warren says in a reply to the Problem with Reuse of Compiled LINQ to SQL Query in Beta 1 post:
We kind of knew we needed something like compiled queries before we even started seriously looking at perf. We already had experience with ObjectSpaces to tell us that an ORM would have translation overhead. Usually that overhead is not a big deal for a client-app, but we figured we could do better on server apps where the same query gets executed over and over again.
Due to the Beta 1 bug, I can't test inter-DataContext caching currently. I'll see what I can find out and update the post later.
Compiled queries are not bound to a particular [DataContext]. An intended usage is to compile a query once and use it across many DataContexts.
Where's Beta 2?
I'm seeing increasing numbers of "beta 2 fixes that," "beta 2 is faster," "wait for beta 2" and "it's better in beta 2" comments from the LINQ to ADO.NET folks. Rico's added his comment to the first post in his series:
Some things made it into Beta 1 but the bulk of what I'm going to post in the next few days didn't happen until after. You'll first see it in Beta 2. [Emphasis added.]
Major improvements coming. :-)
Let's hope bug fixes are coming, too.
Mini-Connectionless DataContext Is Gone from VS 2008
The "mini-connectionless DataContext" that Matt Warren described in his April 12, 2007 post to the Update existing disconnected objects (dlinq) thread in the LINQ Project General forum won't be in VS 2008's LINQ to SQL implementation. Matt says in June 18 and 27, 2007 posts to the LINQ: Next CTP thread:
No, this feature will not be in the first release of LINQ to SQL.
It is only missing because of lack of time. When I mentioned it, I was talking about our forward thinking and a sample I'm trying to put together.
It will be interesting to see how Matt proposes to run LINQ to SQL v1.0 in the middle tier as a WCF service. I expected to see this feature in Beta 2, so I've updated my Changes Coming to LINQ for SQL post of May 15, 2007 with the bad news.
Update 6/28/2007: Fixed major formatting problems caused by <pre> elements, minor typos, and added sections on caching and the missing mini-connectionless DataContext.
Update 6/29/2007: Matt Warren said Rico suspects I have a faster disk drive, so I corrected and updated the drive specs on the earlier post and added a hardware section here. Clarified the row size (in bytes) and DataContext specs, also.