Monday, August 26, 2013

Data Conversion Methodology

Data conversion methodology typically comprises of three phases as outlined in the below diagram

Phase 1: Analysis & Design
Phase 2: Development
Phase 3: Go Live



Phase 1: Analysis & Design

In this phase we do an in-depth qualitative and quantitative gap analysis of source systems and data structure model. This phase provides us critical detailed insight into source data. Based on the analysis, a decision is required to be taken in this phase as to whether there is a necessity of data cleansing, level of data cleansing for maximum migration and the corresponding tradeoffs.

During this phase we will also understand the architecture of the existing data and evaluate data to identify potential anomalies and major complications. We also identify conversion patterns and come up with a set of mappings and transform algorithms along with the business rules that will convert and migrate data. During subsequent phases these will be refined as the requirements and consequences evolve.

At the end of this phase we perform Data Mapping and Data Cleansing activities parallel as outlined below:

Data Mapping

During this activity we perform the data mapping of the source data elements with the target data elements. While mapping, we examine and address data elements required by target system but unavailable in the source system and business critical data needed from the source system which does not have a defined place in the target system. Along with business analysts and technical teams we formulate the data mapping rules. This data mapping activity is an iterative process till the consensus for all the data elements are reached.

Data Cleansing

The data cleansing is critical in the success of this data conversion. As per the necessity the data cleansing activity will be started in this phase. It should be appreciated that data cleansing is an iterative process and may require generous time scales for improvement in data quality.  This activity involves business analysts to work closely with us. This cleansing activity will be done parallel to the Data Mapping and also overlaps with the next phase.  The data cleansing can be manual or through automated scripts.

Phase 2: Development

This is the most important phase of data conversion process. Based on the mapping specifications and rules we develop conversion scripts. During this phase we need to identify all the conversion issues during and formulate appropriate conversion strategies such as update data mapping rules, risk mitigation and contingency plans.

Once the conversion scripts are developed, we will perform a mock conversion and validate the conversion. The data conversion scripts development, mock conversion and validate conversion is an iterative process and will be performed till the desired conversion is achieved and accepted.

Phase 3: Go Live

At the end of the data conversion and before the cut over for the target system we will perform a final conversion during a planned downtime. After going into production with the new system solution we will also provide support for unforeseen data issues, if any.

Saturday, August 24, 2013

Lambda Expressions Explained


In this article I will take a simple Lambda expression and will show how this can be done using delegates (C# 1.0) and how Microsoft transitioned to the current syntax.

Definition

As per the MSDN documentation:

A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.

All lambda expressions use the lambda operator =>, which is read as "goes to". The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression or statement block. The lambda expression x => x * x is read "x goes to x times x."

Simple Lambda Expression
 Lets have a list of employees which have EmployeeID & Name as shown below:

static List<Employee> lst = new List<Employee>() {  
                    new Employee() { EmployeeID=1, Name="Emp One"},
                    new Employee() { EmployeeID=2, Name="Emp Two"},
                    new Employee() { EmployeeID=3, Name="Emp Three"},
                    new Employee() { EmployeeID=4, Name="Emp Four"},
                    new Employee() { EmployeeID=5, Name="Emp Five"}
                    };

Then using Lambda expression we can find a specific employee as shown below:

var emp = lst.Find(e => e.EmployeeID == 3);

Using Delegates


In the absence of Lambda expressions the above functionality can be achieved by using delegates. First lets define a delegate to find an employee. Below is the signature of this delegate:

public delegate bool EmpFind(Employee emp);

Similar to the List.Find method, we will have our own find method which iterates through the list and call the delegate whether the current item matches the condition. The below code depicts our Find function.

public Employee Find(EmpFind findPredicate)
{
    foreach (Employee emp in lst)
    {
        if (findPredicate(emp))
            return emp;
    }
    return null;
}

Now we can call the above Find method and pass in our predicate as below:

Employee emp = Find(FindPredicate);
         
private bool FindPredicate(Employee e)
{
    if (e.EmployeeID == 4)
        return true;
    else
        return false;
}

In the above code we defined a FindPredicate function which matches our employee id. This predicate is passed as an input to the Find method.

Using Anonymous methods

C# 2.0 introduced anonymous methods which eliminated the need to define the function explicitly. Hence using anonymous methods the above Find method can be rewritten as:

Employee emp = Find(delegate(Employee e) {
                        if (e.EmployeeID == 4)
                            return true;
                        else
                            return false;
                    });

Using Anonymous Types & Lambda Expressions

Using C# 3.0 Anonymous types and Lambda expressions the above code can be simplified to

Employee emp = Find(e => {
                        if (e.EmployeeID == 4)
                            return true;
                        else
                            return false;
                    });

Using Lambda Expressions we can eliminate the if statement and reduce to the code as below:

Employee emp = Find(e => e.EmployeeID == 4);

Finally Microsoft predefined delegates that support simple operations similar to ours. Hence we don't need our delegate, instead we can use the Microsoft's delete as mentioned below:

public delegate TResult Func<in T, out TResult>(T arg)

References:


Lambda Expressions: http://msdn.microsoft.com/en-us/magazine/cc163362.aspx 
Func<T, TResult> Delegate: http://msdn.microsoft.com/en-us/library/bb549151.aspx
Further reading: http://www.codeproject.com/Articles/24255/Exploring-Lambda-Expression-in-C