Thursday, June 16, 2011

Dynamic search in SQL

There may be a number of reasons to implement dynamic SQL, some are good, some bad, and others...well, how else would we learn?
In this post I am going to focus on one of the better uses for dynamic SQL-building dynamic search based on conditions.
Let's take an example scenario where we need to search a table based on user user input. For the sake of this demonstration let's assume that application logic implies that empty fields should be excluded from search conditions. This leaves a question on how to implement supporting functionality on the server side. While this may not be a problem for databases that use row modeling where data is represented across multiple rows, most of today's relational databases use good ol' columnar representation. 
One of the ways to implement dynamic search query in MS SQL Server 2005/2008 is using COALESCE function.

Select Firstname,LastName, Phone,CustID from myTable where
Firstname=COALESCE(@name,Firstname)
and
Phone=COALESCE(@phone, Phone)


This functionality can easily be replaced with ISNULL checks if you are running an earlier version of SQL Server.
First nuance, which  is rarely addressed, is the issue of NULLs in tuples. Without introducing additional NULL checks for column contents you are running into risk of having a portion of your query turn into NULL=NULL comparison, yielding empty result sets.
Another drawback to keep in mind is that similar queries are great for small tables and smaller set of searchable variables. COALESCE usage may cause full table scans thus significantly affecting performance.  As a faster option for searching large tables we can use dynamic queries. So the statement will now look like this:

Declare @QueryString as nvarchar(1000)
set @QueryString='Select Name,Address,Phone from Customer where '

if @CustName is not null
                set @QueryString+='Name = '+@CustName +' and '
if @CustAdderss is not null
                set @QueryString+='Address='+@CustAdderss+'and '
set @QueryString+='1=1'
EXECUTE sp_executesql @QueryString

While the second option performs faster and generally utilizes less resources, there are a few caveats to keep in mind. As Erland Sommarskog warns in his much referenced "The Curse and Blessings of Dynamic SQL", there are consequences to using dynamic SQL, a few of which are:
  * You cannot access local variables within the block of the dynamic SQL. We can still pass parameters in and out though.
  * Temp tables created in dynamic SQL will not be accessible to the rest of the SP (as a matter of fact, they will be dropped as soon as the dynamic SQL exectues).
  * USE statements within the dynamic SQL do not affect the calling stored procedure.
I have mentioned above that dynamic search query performs better than coalesce search. Now it is time to put the money where the cursor is and deliver the test results. For this performance test I have created three tables, with 78,000 , 430,000 and 2,200,000 records.  I ran each query several times, each time introducing more search variables to see whether the number of search variables may (or may not) affect performance.  
Trace on the smallest table yields the following results:

COALESCE Search
Dynamic Query
78,000 rows
Search Variables
CPU
Duration
Reads
Search Variables
CPU
Duration
Reads
1
47
65
1598
1
0
49
1598
2
31
69
1598
2
31
34
1598
4
63
62
1598
4
16
42
1598
6
32
70
1680
6
31
42
1680
avg
43.25
66.5
1618.5
avg
19.5
41.75
1618.5

As we can see, the dynamic query is slightly faster than COALESCE search.


For the larger table trace results yield the following:
COALESCE Search
Dynamic Query
430,000 rows
Search Variables
CPU
Duration
Reads
Search Variables
CPU
Duration
Reads
1
297
231
8834
1
125
102
8870
2
295
217
8834
2
126
128
8876
4
249
221
8834
4
172
118
8878
6
281
181
8834
6
171
126
8834
avg
280.5
212.5
8834
avg
148.5
118.5
8864.5

It is clear from the second table that the performance gap increases.
And the largest table yields:
COALESCE Search
Dynamic Query
2,200,000 rows
Search Variables
CPU
Duration
Reads
Search Variables
CPU
Duration
Reads
1
1450
1120
44568
1
719
483
44502
2
1435
1117
44568
2
671
434
44518
4
1451
1068
44568
4
811
604
44520
6
1295
1049
44568
6
857
497
44568
avg
1407.75
1088.5
44568
avg
764.5
504.5
44527
Now we can see that the dynamic query performs nearly twice as fast as COALESCE search, with CPU resource utilization of nearly half than that of COALESCE search.  Based on trace results it is also clear that there is no significant performance change associated with the number of search variables. 

Saturday, June 11, 2011

Programmers vs Non-Programmers

Buzz words, like: business logic, analytics diagrams, project estimation, cloud computing.. are ones that business people build their career on, managers try to use in most of the applicable and not so much circumstances and programmers try to run as far and hide.. Hide when possible to avoid being trapped into committing to finalization of development phase within certain period of time. Programmers do realize that estimating far ahead of time, committing to the project, that relies on other code you have never seen before is a trap.



All the hot-shot "young" programmers believe that given some time, they would re-implement the buggy modules given, rather than spend tedious hours debugging and trying to make sense of project components, that your managers have bought the sales pitch for. With years of programming comes wisdom, and with wisdom comes realization that you will never be able to sell your top-notch perfect solution that you enjoyed every second developing... Your so-perfect solution with never be able to compete with competitors solutions, since with the really ugly interface and much more uglier back-end they were able to bind a couple of open source projects together, reuse some other components, and within fraction of your development and deployment time, and fraction of the man-hours spent on the project release their solution to the market with the fraction of your price.

Relatively early in my career, Graduate Student Research Assistant was my title. Given freedom to choose what project I want to work on, what results I was pursuing, it seemed I had all the time in the world to convert my dreams into C# lines of code. Armed with resources as time, caffeine and Internet, and thrown into the dark woods of exploring the great and grand world of Artificial Intelligence. Overwhelmed with all the possibilities, I worked on the interface for displaying the algorithms for five months, writing polished and re-usable code, and re-thinking every detail of the project. Needless to say, I never quite finalized the project, although parts and pieces of it were perfection, the time came to move to a "real" job, where everything is much better structured, and my dream project was left unfinished.

Looking back, and analyzing the value of the re-usable and clean code I used to cherish so dearly, I conclude - the experience was a great exercise, and made great "code sample" asked at some interviews. However what counts is, that nobody ever re-used a single module from that code, and the project was left undone. Comparing this with the model of successful small to medium scale software companies, comes the realization of fact that marketing folks [that announce worldwide about non-existence features of your product], sales folks [that hide the truth from your customers about pricing going up], support [that tell your customers that feature they requested will be soon added, although from engineering perspective it is near-impossible], project management [that forces developers like yourself to commit to deadline, which seems unrealistic], are so very important for the success of your company, for generating the revenue and keeping you employed.

They will never understand the pure joy of writing beautiful code, the importance of scientific discoveries you are about to make, but this collaboration, often disagreements and compromised between engineering and non-technical stuff is what keeps your product revenue and sales high, and keeps the balance ensuring all the effort is utilized optimally.