SQL Server全文搜索包含连字符的短语不会返回预期结果

[英]SQL Server full-text search for phrase containing a hyphen doesn't return expected results


We have an application that using a SQL Server 2008 database, and full-text search. I'm trying to understand why the following searches behave differently:

我们有一个使用SQL Server 2008数据库和全文搜索的应用程序。我试图理解为什么以下搜索的行为不同:

First, a phrase containing a hyphenated word, like this:

首先,包含带连字符的短语,如下所示:

contains(column_name, '"one two-three-four five"')

And second, an identical phrase, where the hyphens are replaced by spaces:

第二,一个相同的短语,连字符被空格替换:

contains(column_name, '"one two three four five"')

The full-text index uses the ENGLISH (1033) locale, and the default system stoplist.

全文索引使用ENGLISH(1033)语言环境和默认系统停止列表。

From my observations of other full-text searches containing hyphenated words, the first one should allow for matches on either one two three four five or one twothreefour five. Instead, it only matches one twothreefour five (and not one two-three-four five).

根据我对包含带连字符的单词的其他全文搜索的观察,第一个应该允许匹配一两三四五或一二二五。相反,它只匹配一个twothreefour五(而不是一个二三四五)。


Test Case

测试用例

Setup:

建立:

create table ftTest 
(
    Id int identity(1,1) not null, 
    Value nvarchar(100) not null, 
    constraint PK_ftTest primary key (Id)
);

insert ftTest (Value) values ('one two-three-four five');
insert ftTest (Value) values ('one twothreefour five');

create fulltext catalog ftTest_catalog;
create fulltext index on ftTest (Value language 1033)
    key index PK_ftTest on ftTest_catalog;
GO

Queries:

查询:

--returns one match
select * from ftTest where contains(Value, '"one two-three-four five"')

--returns two matches
select * from ftTest where contains(Value, '"one two three four five"')
select * from ftTest where contains(Value, 'one and "two-three-four five"')
select * from ftTest where contains(Value, '"one two-three-four" and five')
GO

Cleanup:

清理:

drop fulltext index on ftTest
drop fulltext catalog ftTest_catalog;
drop table ftTest;

3 个解决方案

#1


8  

http://support.microsoft.com/default.aspx?scid=kb;en-us;200043

http://support.microsoft.com/default.aspx?scid=kb;en-us;200043

"Where non-alphanumeric character must be used in the search critera (primarily the dash '-' character), use the Transact-SQL LIKE clause instead of the FULLTEXT or CONTAINS predicates."

“如果必须在搜索标题中使用非字母数字字符(主要是短划线' - '字符),请使用Transact-SQL LIKE子句而不是FULLTEXT或CONTAINS谓词。”

#2


5  

In cases like these where you can't anticipate the behavior of the word-breaker it's always a good idea to run sys.dm_fts_parser on your strings to get an idea of how the words are going to be split and stored in the internal index.

在这样的情况下,你无法预测断字符的行为,在字符串上运行sys.dm_fts_parser总是一个好主意,以便了解如何将单词拆分并存储在内部索引中。

For instance, running sys.dm_fts_parser on '"one two-three-four five"' results in the following -

例如,在“一个二三四五”上运行sys.dm_fts_parser会产生以下结果 -

select * from sys.dm_fts_parser('"one two-three-four five"', 1033, NULL, 0)
--edited--
1   0   1   Exact Match one
1   0   2   Exact Match two-three-four
1   0   2   Exact Match two
1   0   3   Exact Match three
1   0   4   Exact Match four
1   0   5   Exact Match five

As you can see from the returned results, the word-breaker parses the string and outputs six forms which may explain the results you see when running your CONTAINS query.

从返回的结果中可以看出,断字符解析字符串并输出六种形式,这些形式可以解释运行CONTAINS查询时看到的结果。

#3


1  

A full-text search considers a word to be a string of characters without spaces or punctuation. The occurrence of a non-alphanumeric character can "break" a word during a search. Because the SQL Server full-text search is a word-based engine, punctuation generally is not considered and is ignored when searching the index. Therefore, a CONTAINS clause like 'CONTAINS(testing, "computer-failure")' would match a row with the value, "The failure to find my computer would be expensive.".

全文搜索将单词视为不带空格或标点符号的字符串。非字母数字字符的出现可以在搜索期间“破坏”单词。由于SQL Server全文搜索是基于单词的引擎,因此通常不会考虑标点符号,并且在搜索索引时会忽略标点符号。因此,像'CONTAINS(测试,“计算机失败”)这样的CONTAINS子句会将一行与值匹配,“找不到我的计算机将会很昂贵。”

Please, follow the link for WHY:https://support.microsoft.com/en-us/kb/200043

请点击以下链接:https://support.microsoft.com/en-us/kb/200043


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.itdaan.com/blog/2012/07/25/4442762fc5a014ad36719d9e490d2ae2.html



 
© 2014-2018 ITdaan.com 粤ICP备14056181号