Thursday, August 25, 2011

LINQ Language-Integrated Query (LINQ) .Net Framework


 Introduction

One of the key elements of the ADO.NET programming model is that ability to explicitly cache data in a disconnected and backend agnostic manner using the DataSet.  A DataSet represents a set of tables and relationships, along with the appropriate metadata to describe the structure and constraints of the data contained in it. ADO.NET includes various classes that make it easy to load data from a database into a DataSet, and push changes made within a DataSet back into the database.
One interesting aspect of DataSet is that it allows applications to bring a subset of the information contained in a database into the application space and then manipulate it in-memory while retaining its relational shape. This enables many scenarios that require flexibility in how data is represented and handled. In particular, generic reporting, analysis, and intelligence applications support this method of manipulation.
In order to query the data within a DataSet, the DataSet API includes methods, such as DataTable.Select(), for searching for data in certain, somewhat predefined ways. However, there hasn’t been a general mechanism available for rich query over DataSet objects that provided the typical expressiveness required in many data-centric applications.
LINQ provides a unique opportunity to introduce rich query capabilities on top of DataSet and to do it in a way that integrates with the environment.


 

What’s in the Preview

This LINQ Preview now includes support for LINQ over the DataSet. This version supports the standard query operators, plus a few DataSet-specific sequence operators. Since LINQ over DataSet fully implements the operator patterns expected by LINQ-enabled languages, the query comprehensions syntax in VB can also be used against DataTable objects.
In the LINQ Preview, DataSet exposes DataTable objects as enumerations of DataRow objects. The standard query operators’ implementation actually executes the queries on enumerations of DataRow objects. This results in a very straightforward implementation and one that is highly consistent with the standard query operators. The implication is that certain classes of queries do not get optimized like a typical database would do; for example, even if there is an index over the columns involved in join condition, the query would not use it for execution.
Additionally, this LINQ Preview includes a modified version of the typed-DataSet generator, which enables the compiler to validate field names, data types, etc. while generating objects from DataRows.
Finally the “101 LINQ Samples” applied to DataSet have been included. To see these samples, start the SampleQueries sample application included with the LINQ Preview to see how to formulate various LINQ queries against DataSet.

 

  Using LINQ over DataSets

   

1.              Creating a Visual Studio project that supports LINQ over DataSets

For this preview, to use LINQ queries on top of the DataSet, you will need to create one of the special LINQ projects provided in the preview. The pre-existing Visual Studio projects will not include LINQ query capabilities for DataSet.
With the LINQ Preview projects, Visual Studio will use the new VB compiler that supports LINQ, and will also add a reference to System.Data.Extensions.dll (where LINQ over DataSet support is implemented); in these project types Visual Studio will also use an updated typed-DataSet generator that generates LINQ-compatible DataSets. As noted above, LINQ support for DataSet is implemented entirely in System.Data.Extensions.dll and doesn’t require a new version of System.Data.dll.

2.              Loading data into DataSets

Before using LINQ queries on your DataSets, you’ll first need to populate them with data. From the LINQ over DataSet perspective, how data is loaded into the DataSet is not important. Two common methods are to use the DataAdapter class to retrieve data from a database or take advantage of DLinq to query the database and then load the resulting data into a DataSet.
For example, to load data into the DataSet using a DataAdapter from the same database, you could use the following code snippet.

Sub FillOrders(ByVal ds As DataSet, _
               ByVal connString As String)

    Dim da = New SqlDataAdapter( _
        "SELECT SalesOrderID, OrderDate, " & _
        "OnlineOrderFlag, SalesOrderNumber " & _
        "FROM Sales.SalesOrderHeader " & _
        "WHERE DATEPART(YEAR, OrderDate) = @year; " & _
        "SELECT d.SalesOrderID,” & _
        “d.SalesOrderDetailID, d.OrderQty, " & _
        "d.ProductID, d.UnitPrice " & _
        "FROM Sales.SalesOrderDetail d " & _
        "INNER JOIN Sales.SalesOrderHeader h " & _
        "ON d.SalesOrderID = h.SalesOrderID " & _
        "WHERE DATEPART(YEAR, OrderDate) = @year", _
connString)

    da.SelectCommand().Parameters.AddWithValue("@year", 2002)
    da.TableMappings.Add("Table", "SalesOrderHeader")
    da.TableMappings.Add("Table1", "SalesOrderDetail")

    da.Fill(ds)
End Sub

For typed DataSets that have associated TableAdapters, you can also use the TableAdapter methods to load the data.

3.              Querying DataSets

Once data has been loaded into the DataSet you can begin querying it. Formulating queries over DataSet with LINQ is just like using LINQ against any other data-source. When using LINQ queries over Dataset, the only particular aspect to keep in mind is that instead of querying an enumeration of a given custom type, you are querying an enumeration of DataRow objects, so you have all the members of DataRow available for query expressions.  And as such, these query expressions can be as rich and complex as any that you currently use. 
To enable LINQ over DataSet, your VB project must import the System.Data.DataTableExtensions type, either at the project or file level. This class contains a number of extension methods that allow LINQ over DataSet to work without changes to the underlying System.Data.dll assembly.

For example, continuing the example from the previous section, if the ID and date of the orders that were submitted online was required:

Dim ds = new DataSet()
FillOrders(ds, myConnectionString)

Dim orders = ds.Tables("SalesOrderHeader").ToQueryable()
Dim orderLines = ds.Tables("SalesOrderDetail").ToQueryable()

Dim query = From o In orders _
            Where o!OnlineOrderFlag = True _
Select SalesOrderID := o!SalesOrderID, _
                   OrderDate := o!OrderDate

LINQ and Query Comprehensions work on sources that are IEnumerable(of T) or IQueryable(of T).  As DataTable does not implement either interface, you must call DataTable.ToQueryable() and use the returned IQueryable(of DataRow) object as the source in a LINQ Queries.

The query above only referred to a single table. It is also possible to do cross-table queries. The following example does a traditional join to obtain the order lines for the online orders for August:

Dim orders = ds.Tables("SalesOrderHeader").ToQueryable()
Dim orderLines = ds.Tables("SalesOrderDetail").ToQueryable()

Dim query = From o In (From t In orders _
                      Where t!OnlineOrderFlag = True _
                      And t!OrderDate.Month = 8 _
                      Select t), ol In orderLines _
                Where o!SalesOrderID = ol!SalesOrderID _
                Select  SalesOrderID := o!SalesOrderID, _
                  SalesOrderDetailID := ol!SalesOrderDetailID, _
                  OrderDate := o!OrderDate, _
                  ProductID := ol!ProductID

If you have a relationship set up in DataSet between tables, you can also use that to walk through relationships without having to explicitly state a join condition; for example, if there was a relationship between orders and order lines:
ds.Relations.Add("OrderLines", _
         ds.Tables("SalesOrderHeader").Columns("SalesOrderID"),       
         ds.Tables("SalesOrderDetail").Columns("SalesOrderID"))

then it would be possible to write the query as follows (this is equivalent to the join query above):


Dim query = From o In orders,  _
ol In o.GetChildRows("OrderLines") _
            Where o!OnlineOrderFlag = True _
            And o!OrderDate.Month = 8 _
            Select  SalesOrderID := o!SalesOrderID, _
                SalesOrderDetailID := ol!SalesOrderDetailID, _
                OrderDate := o!OrderDate, _
                ProductID := ol!ProductID


4.              Querying typed DataSets

LINQ queries over typed-DataSets work exactly like they do over regular DataSets. The difference is that since type information is present, one can directly call the strongly typed properties for the typed DataRow thereby greatly simplifying query formulation and enhancing readability. For example, in the previous section the query was:

Dim orders = ds.Tables("SalesOrderHeader").ToQueryable()
Dim orderLines = ds.Tables("SalesOrderDetail").ToQueryable()

Dim query = From o In (From t In orders _
                      Where t!OnlineOrderFlag = True _
                      And t!OrderDate.Month = 8 _
                      Select t), ol In orderLines _
                Where o!SalesOrderID = ol!SalesOrderID _
                Select  SalesOrderID := o!SalesOrderID, _
                  SalesOrderDetailID := ol!SalesOrderDetailID, _
                  OrderDate := o!OrderDate, _
                  ProductID := ol!ProductID

Using a typed DataSet with Orders and OrderLines typed tables, the query would become:

Dim query = From o In northwind.SalesOrderHeader, _
            od In o.GetSalesOrderDetailRows() _
            Where o.OnlineOrderFlag = true AND _
                o.OrderDate.Month = 8 _
            Select o.SalesOrderID, od.SalesOrderDetailID, _
                o.OrderDate, od.ProductID 

By using typed-DataSets, the query becomes much easier to formulate and read.

5.              Using LINQ over DataSet with TableAdapters

TableAdapters are a combination of a typed DataSet, pre-configured adapters that encapsulate connection information, and predefined queries that are exposed as methods and can either fill an existing typed DataTable or create and return a newly populated one.
In all cases, as TableAdapters operate on typed DataSets support for LINQ queries over them applies as described in section 3.4 above.

6.              Custom Operators

To enable LINQ queries over DataSet there are a number of custom operators added to enable querying over a set of DataRows.  These custom operators and their behaviors are described below:

1.ToDataTable

The ToDataTable operator creates a DataTable from a source sequence:

Public Function ToDataTable(of T)(source as IEnumerable(Of T)) as DataTable

The ToDataTable operator enumerates the source sequence and returns a DataTable containing the elements of the sequence. To accomplish this, each element within the sequence is iterated, and each property on the element is added as a new DataColumn and the value of each property is inserted.  When new properties are found on subsequent elements in the sequence, new columns are added to the Table in order to support full data fidelity.
Here is an example that creates a new DataTable from a given source:

Dim orders = ds.Tables("SalesOrderHeader").ToQueryable()
Dim orderLines = ds.Tables("SalesOrderDetail").ToQueryable()


Dim newDataTable = (From o In orders _
                    Where o!OnlineOrderFlag = true _
                    Select o!SalesOrderID, o!OrderDate) _
                   .ToDataTable()

7.LoadSequence

The LoadSequence operator adds data into an existing DataTable from a source sequence.

Public Function LoadSequence(of T)(source as IEnumerable(Of T))

Public Function LoadSequence(of T)(source as IEnumerable(Of T), _
Options as System.Data.LoadOption)

The LoadSequence method enumerates the source sequence and adds DataRows to the existing DataTable. To accomplish this, each element within the sequence is iterated, and each property on the element is added a a new DataColumn, the value of each property is inserted into that specific DataColumn, and then the DataRow is added to the existing DataTable.  When new properties are found on subsequent elements in the sequence, new columns are added to the DataTable in order to support full data fidelity.

Dim orders = ds.Tables("SalesOrderHeader").ToQueryable()
Dim orderLines = ds.Tables("SalesOrderDetail").ToQueryable()

Dim myDataTable = new DataTable("myTable")

myDataTable.Columns.Add( _
    new DataColumn("SalesOrderID", GetType(Integer)))
myDataTable.Columns.Add( _
    new DataColumn("OrderDate", GetType(DateTime)))


Dim query = From o In orders _
            Where o!OnlineOrderFlag = true _
            Select o!SalesOrderID, o!OrderDate

myDataTable.LoadSequence(query)

8.DistinctRows

The DistinctRows operator eliminates duplicate DataRows from a source sequence.

Public Function DistinctRows( _
        source as IEnumerable(Of DataRow)) _
        as IEnumerable(of DataRow)

The DistinctRows operator allocates and returns an enumerable set of DataRow’s that capture the source argument. An ArgumentNullException is thrown if the source argument is null.
When the enumerable set of DataRow’s returned by DistinctRows is enumerated, it enumerates the source sequence, yielding each DataRow that hasn’t previously been yielded. DataRows are compared using the number of Columns, the static type for each Column, and then using IComparable on the dynamic type or the Equals static method in System.Object.

LINQ defines a number of set operators (Distinct, EqualAll, Union, Intersect, Except) which compares equality of source elements by calling the element’s GetHashCode and Equals methods.  In the case of DataRows, this does a reference comparison which is generally not the expected behavior for set operations over tabular data.  Hence, DataSet adds a number of custom operators (DistinctRows, EqualAllRows, UnionRows, IntersectRows, and ExceptRows) that compare row values instead.

9.EqualAllRows

The EqualAllRows operator checks whether two sequences of DataRows are equal:

Public Function EqualAllRows( _
        first as IEnumerable(Of DataRow), _
        second as IEnumerable(Of DataRow)) _
        as IEnumerable(of DataRow)


The EqualAllRows operator enumerates the two source sequences in parallel and compares corresponding DataRows by comparing the number of Columns, the static type for each Column, and then using IComparable on the dynamic type or the Equals static method in System.Object. The method returns true if all corresponding elements compare equal and the two sequences are of equal length. Otherwise, the method returns false.

10.UnionRows

The UnionRows operator produces the set union of two sequences of DataRows.

Public Function UnionRows( _
        first as IEnumerable(Of DataRow), _
        second as IEnumerable(Of DataRow)) _
        as IEnumerable(of DataRow)

The Union operator allocates and returns an enumerable object that captures the arguments passed to the operator. An ArgumentNullException is thrown if any argument is null.
When the object returned by Union is enumerated, it enumerates the first and second sequences, in that order, yielding each element that hasn’t previously been yielded. DataRows are compared using the number of Columns, the static type for each Column, and then using IComparable on the dynamic type or the Equals static method in System.Object.

11.IntersectRows

The IntersectRows operator produces the set intersection of two sequences of DataRows:

Public Function IntersectRows( _
        first as IEnumerable(Of DataRow), _
        second as IEnumerable(Of DataRow)) _
        as IEnumerable(of DataRow)

The IntersectRows operator allocates and returns an enumerable set of DataRows that captures the arguments passed to the operator.
When the object returned by IntersectRows is enumerated, it enumerates the first sequence, collecting all distinct DataRows of that sequence. It then enumerates the second sequence, marking those DataRows that occur in both sequences. It finally yields the marked DataRows in the order in which they were collected. DataRows are compared using the number of Columns, the static type for each Column, and then using IComparable on the dynamic type or the Equals static method in System.Object.



12. ExceptRows

The ExceptRows operator produces the set difference between two sequences of DataRows.

Public Function ExceptRows( _
        first as IEnumerable(Of DataRow), _
        second as IEnumerable(Of DataRow)) _
        as IEnumerable(of DataRow)

The ExceptRows operator allocates and returns an enumerable set of DataRow’s that captures the arguments passed to the operator. An ArgumentNullException is thrown if any argument is null.
When the sequence of DataRows returned by ExceptRows is enumerated, it enumerates the first sequence, collecting all distinct DataRows of that sequence. It then enumerates the second sequence, removing those DataRows that were also contained in the first sequence. It finally yields the remaining DataRows in the order in which they were collected. DataRows are compared using the number of Columns, the static type for each Column, and then using IComparable on the dynamic type or the Equals static method in System.Object.






No comments:

Post a Comment