Monday, September 21, 2009

Text search in SQL query

Oracle TEXT is designed for such things. You can create a multi_column_datastore and use it as a parameter when creating a context index, then you can search quickly for various words in any of the columns in the datastore in various ways. The following is a simple example that searches for any rows that have both CPM and EK12 but not DXB plus the date condition. There are many other conditions that can be specified.

SCOTT@orcl_11g> CREATE TABLE test_tab
2 (id NUMBER,
3 rcvd_dt DATE,
4 msgtext VARCHAR2(15),
5 toptext VARCHAR2(15),
6 text_cols VARCHAR2(1))
7 /

Table created.

SCOTT@orcl_11g> INSERT ALL
2 INTO test_tab VALUES (1, SYSDATE - 1, 'CPM EK12', 'DXB', NULL)
3 INTO test_tab VALUES (2, SYSDATE - 1, 'CPM EK12', 'WHATEVER', NULL)
4 INTO test_tab VALUES (3, SYSDATE - 1, 'ABC DEF', 'WHATEVER', NULL)
5 INTO test_tab VALUES (4, SYSDATE, 'CPM EK12', 'WHATEVER', NULL)
6 SELECT * FROM DUAL
7 /

4 rows created.

SCOTT@orcl_11g> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('test_multi', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('test_multi', 'COLUMNS', 'msgtext, toptext');
4 END;
5 /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX test_text_idx ON test_tab (text_cols)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('DATASTORE test_multi')
4 /

Index created.

SCOTT@orcl_11g> SELECT *
2 FROM test_tab
3 WHERE CONTAINS (text_cols, 'CPM AND EK12 NOT DXB') > 0
4 AND TRUNC (RCVD_DT) = TRUNC (SYSDATE-1)
5 /

ID RCVD_DT MSGTEXT TOPTEXT T
---------- --------- --------------- --------------- -
2 29-NOV-08 CPM EK12 WHATEVER

SCOTT@orcl_11g>

No comments: