?

Log in

No account? Create an account
 
 
07 May 2011 @ 10:00 pm
Scalable Full Text search with location  
I have a question for backend developers and admins.

Say you search for resumes on PostJobFree
If you search for some common term, such as software developer - the results are coming back pretty fast.
However if you search for something obscure - such search query may take several seconds (or more) to execute.

Currently such search is handled by Microsoft SQL Server Full-Text (see example of such query below).
If query can quickly find matching resumes in recent records - such query is fast.
If query has to go through all 150K+ resume records in order to find matches - it takes noticeably longer.

My question to database admins:
How do I make that query work faster?
Should I use separate server with plenty of memory (8 GB+) and build Full-Text Catalog for Resume table there?

My question to developers:
Are there other ways to do fast full-text search with location?
May be some search servers or search components?

Here's the example of SQL queries behind resume search query:
declare @upTo int;
set @upTo = @startRowIndex * @pageSize;

with p as (
select top (@upTo)
row_number() over (order by PostedDate desc) as Row,
JobId,PostedDate,JobTitle,JobDescription,CompanyName,Country,State,City,Latitude,Longitude,EmailAlias,UrlDescription
from JobPost with (nolock)
where CONTAINS((JobDescription, JobTitle, CompanyName, City, PostalCode),'SOFTWARE&DEVELOPER')

and (Latitude between 40.352045 and 41.076660) and (Longitude between -74.482126 and -73.529820))
select JobId,PostedDate,JobTitle,JobDescription,CompanyName,Country,State,City,Latitude,Longitude,EmailAlias,UrlDescription
from p where Row between @startRowIndex and @startRowIndex+@pageSize-1

option (force order);

select @totalCount = count(1) from JobPost with (nolock)

where CONTAINS((JobDescription, JobTitle, CompanyName, City, PostalCode),'SOFTWARE&DEVELOPER')

and (Latitude between 40.352045 and 41.076660) and (Longitude between -74.482126 and -73.529820)
option (force order);


Upadate:
Found interesting alternatives to SQL Server full-text search (FTS): Lucene/SOLR and Sphinx.
http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-search-servers-solr-vs-sphinx-fight/
 
 
 
 
 
Dmitry Skavishskavish on May 9th, 2011 04:42 pm (UTC)
are you sure you need fulltext search? it seems you just need to search by all words. and yes as far as I know people use lucene for that.
Dennis Gorelikdennisgorelik on May 9th, 2011 04:47 pm (UTC)
What's the difference between full-text search and "search by all words"?
Dmitry Skavishskavish on May 9th, 2011 05:02 pm (UTC)
sorry, somehow I thought you need "substring search"
Dennis Gorelikdennisgorelik on May 9th, 2011 05:24 pm (UTC)
No - only full words search -- like Google Search (but with location).
The closest example: www.simplyhired.com
Dennis Gorelikdennisgorelik on May 9th, 2011 04:50 pm (UTC)
I have some doubts about Lucene:
1) Lucene is Java-based (my software development experience is mostly with .NET)
That might be not a problem if I use Lucene as a black box.
2) Lucene is not integrated with database.
So if I need to search for all words + filter by latitude/longitude range of resume - how would I do that with combination of Lucene and SQL Server?
Get list of Lucene text matches first and then filter that list by latitude/longitude range?
Dmitry Skavishskavish on May 9th, 2011 05:03 pm (UTC)
I don't know much about lucene, I just know that people use it for that.
Dennis Gorelikdennisgorelik on May 9th, 2011 05:27 pm (UTC)
I noticed that some people who used Lucene in the past don't use it anymore.

Traffic stats from Compete.com confirm that:
http://siteanalytics.compete.com/lucene.apache.org/

:-(