Doing Relevance Ranked Full-Text Searches In MySQL
I’m going out on a limb to say MySQL’s full-text indexing and searching features are underused. They appeared in MySQL 3.23.23 (most people are using 4.x, and 5 is in development), but it’s been news to most of the people I know.
Here’s the deal, the MATCH() function can search a full-text index for a string of text (one or more words) and return relevance-ranked results. It’s at the core of the list of related links at the bottom of every post here.
For that query, I put all the tag names into a single variable that might look like this:
$keywords = “mysql database php select full-text search full-text searching docs documentation”
Then I do a select that looks something like this:
SELECT * FROM wp_posts WHERE MATCH(post_title,post_content) AGAINST(’$keywords’);
The docs give a lot more detail, including how to do boolean searches.
tags: boolean search, boolean searches, boolean searching, database, db, docs, documentation, full text, full text index, full text search, full text searching, fulltext, fulltext search, keywords, match(), mysql, rank, relevance, relevance rank, relevance ranked, relevance ranking, search, search full text
8 Comments
Comments RSS TrackBack Identifier URI
Leave a comment
[...] Why? Because MySQL 3.x doesn’t support query caching, boolean full-text searching, or complex subqueries. [...]
[...] MySQL provides two types of fulltext searches - boolean and natural language. I’m going to focus on the natural language search because it is more mathematically intense. The underlying concept behind the method used in MySQL is that each term in each document is assigned a specific weight which is used to decide a query’s “distance” or “score” with respect to that document. The weights are assigned such that the weight is increased if the term occurs frequently in the document, but decreased in the term occurs frequently among all documents. For a description of how the weights are computed, check out the MySQL documentation. For the curious reader, this article also explains the computation of word-document weights. There are also a slew of articles on using fulltext search in practice. [...]
[...] MySQL provides two types of fulltext searches - boolean and natural language. I’m going to focus on the natural language search because it is more mathematically intense. The underlying concept behind the method used in MySQL is that each term in each document is assigned a specific weight which is used to decide a query’s “distance†or “score†with respect to that document. The weights are assigned such that the weight is increased if the term occurs frequently in the document, but decreased in the term occurs frequently among all documents. For a description of how the weights are computed, check out the MySQL documentation. For the curious reader, this article also explains the computation of word-document weights. There are also a slew of articles on using fulltext search in practice. [...]
[...] And I’m fully confident that when I put our entire catalog into WPopac, all 330,000 bib records (resulting in about 6.2 million atomic records), performance will still be up to the task. And my math suggests everything should be ducky on a relatively budget server up beyond about 1 million bib records), but what happens for libraries that have more than that, say, perhaps 6 to 8 million bib records (again, 110 to 150 million atomic records; again, all full-text indexed in MySQL)? [...]
salam dostaneh man harkasi keh mikad ba yek pesarehg 29 saleh mogarad va lisanseh mekanik az thran azdevag koneh ageh be tafahoom residim baram emall bezareh
[...] MySQL provides two types of fulltext searches - boolean and natural language. I’m going to focus on the natural language search because it is more mathematically intense. The underlying concept behind the method used in MySQL is that each term in each document is assigned a specific weight which is used to decide a query’s “distance†or “score†with respect to that document. The weights are assigned such that the weight is increased if the term occurs frequently in the document, but decreased in the term occurs frequently among all documents. For a description of how the weights are computed, check out the MySQL documentation. For the curious reader, this article also explains the computation of word-document weights. There are also a slew of articles on using fulltext search in practice. [...]
sexy gierls
Thanks Maison, was looking for it. I wanted to sort the results of SELECT query by releveance. But I was stuck with ‘%LIKE%’. You have helped me out.
Thanks a lot dear!