Quantcast
Viewing all articles
Browse latest Browse all 9

ORACLE TEXT: a simple way to implement scoring text search engine on ORACLE DB

ORACLE TEXT is an ORACLE extension to build text query and document classification.
In this post i will describe text query functionality.
My customer wants a search functionality on several database columns, and results must be ordered by their relevance.
Using “like” clause let’s you find results that contains a word but doesn’t say you how much relevant it is.
For this purpose you can use ORACLE TEXT extension.

Oracle Text must be installed on your DB, in this post i won’t explain how to do this, for this task you can refer Oracle documentation: http://download.oracle.com/docs/cd/B19306_01/install.102/e10319/initmedia.htm

Once installed, in order to run query on your text columns, you must add a index. Oracle Text provides three type of index for your text/documents, i report their description as they are explained on Oracle site:

CONTEXT:
Use this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as Microsoft Word, HTML, XML, or plain text.
You can customize your index in a variety of ways.
This index uses CONTAINS clause

CTXCAT:
Use this index type to improve mixed query performance. Suitable for querying small text fragments with structured criteria like dates, item names, and prices that are stored across columns.
This index uses CATSEARCH clause

CTXRULE
Use to build a document classification application. You create this index on a table of queries, where each query has a classification.
Single documents (plain text, HTML, or XML) can be classified by using the MATCHES operator.
This index use MATCHES clause

In order to use SCORING function, that returns results’ relevance, you have to use CONTEXT index that is a kind of inverse index.

To create CONTEXT index run this instruction on your db:

CREATE INDEX myindex ON table(column) INDEXTYPE IS CTXSYS.CONTEXT;

where myindex indicates index’s name, table is the table’s name and column…(did you guess it?)

I want remember that if you don’t add ORACLE TEXT capabilities to your DB user CTXSYS doesn’t exist.

Now all is ready to launch your query, using SCORE function and CONTAINS clause.

For example suppose you have a table ENGINEERS and a text column DESCRIPTION, i want to search for all ENGINEER that in their description has JAVA word,

SELECT NAME, SURNAME, CODE, SCORE(1)as RATING
FROM ENGINEERS
WHERE CONTAINS(DESCRIPTION,'JAVA',1) > 0 

Let’s see what does it mean,
CONTAINS clause is a FUNCTION that takes as params
1)The name of text column, document ecc ecc
2)The search terms
3)A numeric label [OPTIONAL, default is 1]

and returns relevance of the search. If relevance = 0 the word is not present in the text column

SCORE is a function that take CONTAINS label as param and returns CONTAINS result, in this way you can have on your results the score.

For example if in ENGINEERS table you have also CURRICULUM text columns. You want to search ENGINEER with JAVA description OR PROJECT MANAGER in curriculum and order by the sum of relevances:

SELECT NAME, SURNAME, CODE, SCORE(1)+SCORE(2)as RATING
FROM ENGINEERS
WHERE CONTAINS(DESCRIPTION,'JAVA',1) > 0 OR CONTAINS(CURRICULUM,'PROJECT MANAGER',2) > 0 
ORDER BY RATING

the function CONTAINS search ignoring case and whole word. You can add % as LIKE clause.


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 9

Trending Articles