I am creating a search engine for a website i'm working on and I'm trying to decide which of the following would be best for performance, as I need to keep overhead to a minimum. Also note that I am using SQL Server 2008 R2
The general idea is to create a paging type effect for search results by users.
Originally I was going to do it like so:
The user submits the search criteria to the stored procedure. The stored procedure creates a local temproary table that will only contain ids of search results (unique identifiers) and an auto-incremented int row Id column. A select statement using inner
joins on 4 tables is performed and the selected results Id's are inserted into the temp table. A variable is used to determine how many results are to be returned per "page" of results. For this example I'll say 20 results are allowed per page.
The first page of results would be returned by then performing a select with inner joins to the 4 permanent tables and this time including the temp table in the inner joins (on the Id column) to return the first 20 rows of data required from the 4 permanent
tables based on a WHERE clause on the rowId of the temp table.
Then I was going to have ALL the unique identifier Id's from the temp table concatenated into a string separated with a delimiter char and inserted into a "SearchResults" table with a unique identifier for the search itself as well as an expiry datetime.
This would be used later to "continue" the existing search.
An output parameter is also included in the query to return the total number of results found for the search. This would be used to determine how many pages of results should be offered to be selected by the user (number of results divided by 20 results
per page = number of pages)
Then if the user continues the search to get the next "page" of results ( or previous page, or a target page number, first or last page, etc. ) another stored procedure is called accepting a unique identifier parameter as the search id and int parameter
indicating which page of results to retrieve. Within the stored procedure, it would get the concatenated string of result ids with the matching search id and, using a custom function, split the string on its delimiters, casting each string back to a unique
identifierand inserting each into a temp table like in the 1st procedure. Then using the target page variable it would calculate which 20 results should be gathered and returned.
I had figured doing this would be less overhead than having the user perform the whole search of the 1st procedure every time they gather a new page of results, but after looking at the custom function I would need to do this, I'm thinking it might actually
be less overhead to just perform the 1st procedure each time a page of results is required, rather than storing the concatenated string and using a custom function, etc.
One alternative would be to return the concatenated result id's string, do the split and cast into an array from within the app and do a foreach from within the app to get each result required 1 at a time but 21 round trips to the database per page of results
is absolutely HORRIBLE practice! lol
Am I right in assuming that the function would create just as much, if not more overhead than if i was to omitt the "part 2" of the 1st idea?
I should also mention that each search shouldnt contain more than a few thousand results but the app will have a fairly large amount of concurrent users (40'000 + or more at peak times)
Ideas or suggestions are welcome
When the going get's tough, the tough outsource and take a vacation... lol I wish :(
You are moving in right direction, Really this solution is useful for better performance, I also had a same situation before couple of years ago, I did it in the same way which you are thinking to do and got a higher performance.
I was dealing a database having millions of products and products specs and there is a multiple searchs option for user to search products by entering some criteria. What I did in this situation is pass the search criteria to stored procedures, retuns a
comma separated Ids of products which come to search criteria as output parameter, and with a result set holding record fors 1st page depending upon page size.
Then I saved this search criteria in session also save current page number and total number of products. So, when user clicks on the next page result I was just reteriving all product ids stored in session of particular user and getting all the product Id's
required for requested page, then pass these Ids to stored procedure and this time stored procedure returns only products whose Ids passed from application.
I was refreshing all these Id's on search criteria changed and storing the newly Id's in session for new criteria.
Only thing which was little bit frustating is when user changed search criteria or request for the first time then stored procdues was taking sometime to execute but in navigation between pages was like as you are view data from cache.
Surely I'll recommend you this approach, I implement this while doing search optimization and win my company trusts.
Yes thats almost exactly the way I was thinking of doing it. I'm forcing cookie-based sessions though to free up extra memory so I would be storing search related items in a dedicated cookie. The implementation I was considering was storing current results
page number, results per page and a byte value indicating how the results have been ordered (by date posted ascending, etc) but rather than storing the string of id's in the cookie, I was going to store an id representing what row the search result id's are
stored in a results table in the database, then the id would be used in the database to retrieve the string, a function would be used DB-side to split the string and insert into a temp table, get the requested set of results, etc.
After you mentioned it though, I am considering storing the concatenated string client-side in the cookie.
One question though, how did you handle the string of id's in your app? did you split the string, perform the logic in the app and get each new result set one at a time? Or did you pass back to the database a concatenated string and deal with splitting the
string in the database?
When the going get's tough, the tough outsource and take a vacation... lol I wish :(
What I did to deal with string of Id's is as follows.
1-Get back string from user session. (Suppose this string have 100 comma separated product ids).
2-Check the requested page number and page size.
3-Fetch product Ids from string depending on page number and page size (Suppose if user request for page 2 and page size is 20 then fetch product ids from product ids string after spliting it from index 20 to 39)
4-Pass on these Id in comma separated string to your stored procedure. (Remember: Your stored procedure will return two tables if your productIds "comma separated string" parameter is null, 1-table for first 20 products 2-table holding productIds which you
need to kept in session in comma separated string. In else case if productIds "comma separated string" parameter isnot null then your procedure returns one table holding 20 products which Ids are passed to stored procedure in parameter productIds.)
5-Now your stored procedure splits these ids and return all the products having same Ids.
(Note: These is just because once user sets a search criteria and now he is moving forward and back into search result pages, but be carefull as when as user changes his search criteria then you need to set product Ids =String.Empty and then all query your
stored procedure with empty string or null string of productIds, so that procedure re run logic and return you the table of first 20 products a table holding next productsIds.)
Now according to your question, yes I split string, perform logic of paging to fetch next product ids according to page size and then pass on these 20 ids to stored procedure to get back results. But at this time stored procedure will return only product
result set it will not return me result set for product ids, because now I know that user want to see next 20 products.
I hope i answer your question. Is it? If you have still any confusion, then please visit http://www.friferie.dk try to use left search box on home page, I implemented this logic here and this is dealing with lot of complex logic I ever seen in my life, any
how after using this please put your question, surely I ll answer it.
Have fun .
Marked as answer by magicmike2011 on Apr 27, 2012 03:02 PM
Excellent solution! I guess I was on the right track, was just overlooking a few peices of the puzzle.
I've had to modify my execution plan after re-analyzing my app's needs and reconsidering some possible situations.
The hosting environment I will be using at launch time will have a fairly low amount of RAM to start which is, needless to say, very frustrating lol. If every stage of development isn't carefully planned, server resource consumption could easily get out of
hand. But on a positive note, it has been an incredible learning experience and really helped me develop some very good habits that will stick with me for the rest of my career. I've come to realize optimization is a very delicate art lol.
I've marked your answer as the solution because clearly it is a great solution. But I would love to hear your thoughts on the changes I have made to my approach and why.
Since I am using strict cookie based sessions and must heavily limit the use of in-proc data (even viewstate is being minimized heavily, except where necessary) I had planned on attempting to store the string containing all id's in a session cookie. However,
after doing some calculations, I realized there will be instances where the string will require more space than a cookie will reliably allow. So I decided back to the database with the string. I had hoped to be able to use a UDF to split the string sql server
side but after doing some heavy research on numerous methods of doing so, I found that performing the string split in sql server would be a huge performance hit because of the possible number of ids unless I went with a SQL CLR function, which isn't an option
in my case.
But I found something I had overlooked. Since I am using SQL Server 2008, I can pass table variables from .NET directly to SQL with ADO.NET! This still makes the full procedure use 2 trips to the database, but it allows me to perform the result id processing
logic within the app and gathering the rows from the database would be as simple as using a join on the table variable input param containing the desired results ids and the database tables.
Even though it would be using 2 round trips, does this still sound like a reasonable plan?
When the going get's tough, the tough outsource and take a vacation... lol I wish :(
I think with Sql server 2008 this will be best solution for your problem, and you are right that spliting a string in UDF is very expensive in SQL server, but in my case I was using SQL Server 2005 and CLR UDF as well, So it was not effecting performance
by using CLR UDF for string spliting. But according to you this isn't in your case, So it will be better to use table type parameter in stored procedure.
I would like to appreciate your solution while working with SQL Server 2008 it is the best way to use less and gain more.
magicmike201...
Contributor
2021 Points
481 Posts
Which solution would be better for performance?
Apr 26, 2012 07:58 PM|LINK
I am creating a search engine for a website i'm working on and I'm trying to decide which of the following would be best for performance, as I need to keep overhead to a minimum. Also note that I am using SQL Server 2008 R2
The general idea is to create a paging type effect for search results by users.
Originally I was going to do it like so:
The user submits the search criteria to the stored procedure. The stored procedure creates a local temproary table that will only contain ids of search results (unique identifiers) and an auto-incremented int row Id column. A select statement using inner joins on 4 tables is performed and the selected results Id's are inserted into the temp table. A variable is used to determine how many results are to be returned per "page" of results. For this example I'll say 20 results are allowed per page.
The first page of results would be returned by then performing a select with inner joins to the 4 permanent tables and this time including the temp table in the inner joins (on the Id column) to return the first 20 rows of data required from the 4 permanent tables based on a WHERE clause on the rowId of the temp table.
Then I was going to have ALL the unique identifier Id's from the temp table concatenated into a string separated with a delimiter char and inserted into a "SearchResults" table with a unique identifier for the search itself as well as an expiry datetime. This would be used later to "continue" the existing search.
An output parameter is also included in the query to return the total number of results found for the search. This would be used to determine how many pages of results should be offered to be selected by the user (number of results divided by 20 results per page = number of pages)
Then if the user continues the search to get the next "page" of results ( or previous page, or a target page number, first or last page, etc. ) another stored procedure is called accepting a unique identifier parameter as the search id and int parameter indicating which page of results to retrieve. Within the stored procedure, it would get the concatenated string of result ids with the matching search id and, using a custom function, split the string on its delimiters, casting each string back to a unique identifierand inserting each into a temp table like in the 1st procedure. Then using the target page variable it would calculate which 20 results should be gathered and returned.
I had figured doing this would be less overhead than having the user perform the whole search of the 1st procedure every time they gather a new page of results, but after looking at the custom function I would need to do this, I'm thinking it might actually be less overhead to just perform the 1st procedure each time a page of results is required, rather than storing the concatenated string and using a custom function, etc.
One alternative would be to return the concatenated result id's string, do the split and cast into an array from within the app and do a foreach from within the app to get each result required 1 at a time but 21 round trips to the database per page of results is absolutely HORRIBLE practice! lol
Am I right in assuming that the function would create just as much, if not more overhead than if i was to omitt the "part 2" of the 1st idea?
I should also mention that each search shouldnt contain more than a few thousand results but the app will have a fairly large amount of concurrent users (40'000 + or more at peak times)
Ideas or suggestions are welcome
ammasbhatti
Member
124 Points
22 Posts
Re: Which solution would be better for performance?
Apr 26, 2012 08:41 PM|LINK
Hi,
You are moving in right direction, Really this solution is useful for better performance, I also had a same situation before couple of years ago, I did it in the same way which you are thinking to do and got a higher performance.
I was dealing a database having millions of products and products specs and there is a multiple searchs option for user to search products by entering some criteria. What I did in this situation is pass the search criteria to stored procedures, retuns a comma separated Ids of products which come to search criteria as output parameter, and with a result set holding record fors 1st page depending upon page size.
Then I saved this search criteria in session also save current page number and total number of products. So, when user clicks on the next page result I was just reteriving all product ids stored in session of particular user and getting all the product Id's required for requested page, then pass these Ids to stored procedure and this time stored procedure returns only products whose Ids passed from application.
I was refreshing all these Id's on search criteria changed and storing the newly Id's in session for new criteria.
Only thing which was little bit frustating is when user changed search criteria or request for the first time then stored procdues was taking sometime to execute but in navigation between pages was like as you are view data from cache.
Surely I'll recommend you this approach, I implement this while doing search optimization and win my company trusts.
magicmike201...
Contributor
2021 Points
481 Posts
Re: Which solution would be better for performance?
Apr 26, 2012 11:04 PM|LINK
Yes thats almost exactly the way I was thinking of doing it. I'm forcing cookie-based sessions though to free up extra memory so I would be storing search related items in a dedicated cookie. The implementation I was considering was storing current results page number, results per page and a byte value indicating how the results have been ordered (by date posted ascending, etc) but rather than storing the string of id's in the cookie, I was going to store an id representing what row the search result id's are stored in a results table in the database, then the id would be used in the database to retrieve the string, a function would be used DB-side to split the string and insert into a temp table, get the requested set of results, etc.
After you mentioned it though, I am considering storing the concatenated string client-side in the cookie.
One question though, how did you handle the string of id's in your app? did you split the string, perform the logic in the app and get each new result set one at a time? Or did you pass back to the database a concatenated string and deal with splitting the string in the database?
ammasbhatti
Member
124 Points
22 Posts
Re: Which solution would be better for performance?
Apr 27, 2012 01:19 PM|LINK
What I did to deal with string of Id's is as follows.
1-Get back string from user session. (Suppose this string have 100 comma separated product ids).
2-Check the requested page number and page size.
3-Fetch product Ids from string depending on page number and page size (Suppose if user request for page 2 and page size is 20 then fetch product ids from product ids string after spliting it from index 20 to 39)
4-Pass on these Id in comma separated string to your stored procedure. (Remember: Your stored procedure will return two tables if your productIds "comma separated string" parameter is null, 1-table for first 20 products 2-table holding productIds which you need to kept in session in comma separated string. In else case if productIds "comma separated string" parameter isnot null then your procedure returns one table holding 20 products which Ids are passed to stored procedure in parameter productIds.)
5-Now your stored procedure splits these ids and return all the products having same Ids.
(Note: These is just because once user sets a search criteria and now he is moving forward and back into search result pages, but be carefull as when as user changes his search criteria then you need to set product Ids =String.Empty and then all query your stored procedure with empty string or null string of productIds, so that procedure re run logic and return you the table of first 20 products a table holding next productsIds.)
Now according to your question, yes I split string, perform logic of paging to fetch next product ids according to page size and then pass on these 20 ids to stored procedure to get back results. But at this time stored procedure will return only product result set it will not return me result set for product ids, because now I know that user want to see next 20 products.
I hope i answer your question. Is it? If you have still any confusion, then please visit http://www.friferie.dk try to use left search box on home page, I implemented this logic here and this is dealing with lot of complex logic I ever seen in my life, any how after using this please put your question, surely I ll answer it.
Have fun
.
magicmike201...
Contributor
2021 Points
481 Posts
Re: Which solution would be better for performance?
Apr 27, 2012 03:28 PM|LINK
Excellent solution! I guess I was on the right track, was just overlooking a few peices of the puzzle.
I've had to modify my execution plan after re-analyzing my app's needs and reconsidering some possible situations.
The hosting environment I will be using at launch time will have a fairly low amount of RAM to start which is, needless to say, very frustrating lol. If every stage of development isn't carefully planned, server resource consumption could easily get out of hand. But on a positive note, it has been an incredible learning experience and really helped me develop some very good habits that will stick with me for the rest of my career. I've come to realize optimization is a very delicate art lol.
I've marked your answer as the solution because clearly it is a great solution. But I would love to hear your thoughts on the changes I have made to my approach and why.
Since I am using strict cookie based sessions and must heavily limit the use of in-proc data (even viewstate is being minimized heavily, except where necessary) I had planned on attempting to store the string containing all id's in a session cookie. However, after doing some calculations, I realized there will be instances where the string will require more space than a cookie will reliably allow. So I decided back to the database with the string. I had hoped to be able to use a UDF to split the string sql server side but after doing some heavy research on numerous methods of doing so, I found that performing the string split in sql server would be a huge performance hit because of the possible number of ids unless I went with a SQL CLR function, which isn't an option in my case.
But I found something I had overlooked. Since I am using SQL Server 2008, I can pass table variables from .NET directly to SQL with ADO.NET! This still makes the full procedure use 2 trips to the database, but it allows me to perform the result id processing logic within the app and gathering the rows from the database would be as simple as using a join on the table variable input param containing the desired results ids and the database tables.
Even though it would be using 2 round trips, does this still sound like a reasonable plan?
ammasbhatti
Member
124 Points
22 Posts
Re: Which solution would be better for performance?
Apr 27, 2012 03:44 PM|LINK
Excellent Man!
I think with Sql server 2008 this will be best solution for your problem, and you are right that spliting a string in UDF is very expensive in SQL server, but in my case I was using SQL Server 2005 and CLR UDF as well, So it was not effecting performance by using CLR UDF for string spliting. But according to you this isn't in your case, So it will be better to use table type parameter in stored procedure.
I would like to appreciate your solution while working with SQL Server 2008 it is the best way to use less and gain more.