Lucene的使用

[TOC]

Lucene是一种全文检索的工具，其原理和介绍参考《Lucene简介》。本文介绍Lucene的使用。

Lucene的索引和查询过程是分开的，我们分别介绍。

一、索引过程

Lucene的索引的对象是文档的集合，一条记录理解成一个文档，文档中可以包含多个字段，每个字段可以包含一种类别的数据。和关系型数据库的概念对应就是一个文档就是一行数据，一个列就是一个字段，列中的数据对应文档中字段的内容。Lucene中，文档是一个字段的序列，一个字段是一个term序列，一个term可以理解为一个单词或者是索引的对象，它是一个字节的序列。

一个简单的Lucene的索引写法如下：

package dufei.lucene.main;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

/**
 * 基于Lucene 6.1.0的建立索引实例
 * Created by Du Fei on 2018/11/14.
 */
public class ASimpleLucene {

  public static void main(String[] args) throws IOException {

    // 定义索引存放的位置
    String indexDir = "c:/temp/lucene6index";

    // step1: 创建索引写入类
    IndexWriter writer = createWriter(indexDir);

    // step2: 创建待索引数据
    List<Document> documents = new ArrayList<>();

    Document document1 = createDocument(1, "Lokesh", "Gupta", "howtodoinjava.com");
    documents.add(document1);

    Document document2 = createDocument(2, "Brian", "Schultz", "brainschultz.com");
    documents.add(document2);

    Document document3 = createDocument(7, "Jack", "Brown", "jackbrown.com");
    documents.add(document3);

    // step2：添加待索引数据
    writer.addDocuments(documents);

    // step3：提交索引
    writer.commit();

    // step4: 关闭写入流
    writer.close();

  }

  // 创建写入索引的对象
  private static IndexWriter createWriter(String indexDir) throws IOException {

    // 建立索引的文件夹，创建方式有很多，这里只用最基本的
    FSDirectory dir = FSDirectory.open(Paths.get(indexDir));

    // IndexWriterConfig保存了所有建立索引的配置信息，在这里设置索引配置，每索引的处理需要配一个分析器，
    // 这里只用最简单的标准的分析器，StandardAnalyzer
    IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
    config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);    // 索引打开的模式，可以是新建也可以追加
    config.setUseCompoundFile(false);                         // 是否使用压缩模式来保存索引文件
    return new IndexWriter(dir, config);
  }

  // 创建文档的工具类
  private static Document createDocument(Integer id, String firstName, String lastName, String website) {
    Document document = new Document();
    document.add(new StringField("id", id.toString(), Field.Store.YES));
    document.add(new TextField("firstName", firstName, Field.Store.YES));
    document.add(new TextField("lastName", lastName, Field.Store.YES));
    document.add(new TextField("website", website, Field.Store.YES));
    return document;
  }

}

其主要的步骤如下：

创建索引写入对象
创建数据对象
添加待索引数据
提交索引
关闭索引写入的流

有几点注意一下，每次创建IndexWriter的时候都会产生一个锁文件writer.lock，这时候其他的写入类就不能对当前索引进行写入或者修改操作，索引修改完毕删除了锁文件之后可以修改。

上述代码执行完毕之后在c:/temp/lucene6index路径下可以看到一系列文件。

![]

下面详细介绍每个类的使用。

1、首先是创建索引文件夹

FSDirectory dir = FSDirectory.open(Paths.get(indexDir));

Lucene本身实现了一些底层存储的功能，因此，索引的读写都是基于自身的读写API进行的，上述代码是在现有的文件系统中创建写入索引的文件夹FSDirectory。FSDirectory这是一个基础的抽象类，它有三个子类，分别是SimpleFSDirectory、NIOFSDirectory和MMapDirectory。

这三种文件夹作用如下：

SimpleFSDirectory：这是用Files.newByteChannel直接实现的，但是性能比较差，多线程同事访问一个文件的时候会出现性能问题。
NIOFSDirectory：这是用java.nio包中的FileChannel实现的。但是这个类在Windows上有个bug，所以在windows平台上使用需要谨慎。但是其他平台都可以使用。
MMapDirectory：这是基于内存映射的IO操作。如果内存足够大，就选用这个方式创建索引即可。

在上述代码中（FSDirectory.open()），Lucene会检测系统环境，自动选择一个存储方案。

2、接下来是创建数据的解析方式并加入到索引配置中

// IndexWriterConfig保存了所有建立索引的配置信息，在这里设置索引配置，每索引的处理需要配一个分析器，
// 这里只用最简单的标准的分析器，StandardAnalyzer
IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());

// 索引打开的模式，可以是新建也可以追加
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

// 是否使用压缩模式来保存索引文件
config.setUseCompoundFile(false);

IndexWriter writer = new IndexWriter(dir, config);

注意，Lucene是针对全文检索的程序，全文检索中一个重要的功能是针对文本数据进行分析后建立索引。对数据的分析就是Analyzer，Analyzer的目标是建立字符流，也就是将文本数据变成一系列单词的集合，或者是TokensStreams。不同的输入文本可以选择不同的分析器，这里使用的是StandardAnalyzer，它使用Unicode Text Segmentation algorithm来切分单词。对于飞文本数据，这个不重要。

接下来是创建IndexWriterConfig对象，这个对象保存了的索引操作的所有配置。包括缓存的大小、缓存文档的数量、索引打开的模式、索引文件是否压缩等。这里设置了两个量。

setOpenMode是索引的打开模式，索引的打开模式有三种：

CREATE：创建一个新的索引，如果已经存在则删除已存在的索引内容，但是索引版本继续往上。
APPEND：打开一个已存在的索引，追加内容
CREATE_OR_APPEND：如果不存在创建索引，如果存在，就打开已存在的，追加索引

setUseCompoundFile是指定索引文件是否压缩，在建立索引的时候，Lucene会产生很多不同的文件，包括索引、数据、词向量等。对于数据不大的情况，可以设置压缩模式，那么最终索引文件会产生两个：cfs文件和cfe文件。

配置写完之后，可以利用上述两个参数创建IndexWriter。

3、创建数据

Lucene的存储对象是一系列的文档（Document）集合，每个文档包含了多个字段（Field），每个字段都包含多个单词（Term），每个单词都时一系列的字节（Bytes）。因此，在创建待索引的数据，就是创建Document对象。并将每个Document对象的字段设置好。

注意，Lucene的字段有几种不同的状态：

indexed：字段被索引，这个时候字段的内容一定也被保存了（注意，这可能是旧版本，新版本被索引不保存的状态貌似也存在，见后续测试）。

stored：字段被保存，但是不一定被索引，如果只是保存了没有被索引，那么是搜索不出来的，这种状态的字段通常是数据中的ID或者标识符等，不需要被搜索，但是在结果中需要展示。

既没有被索引也没有被保存的字段，是无法搜索和展示的。下面针对不同的字段会有不同的说明。

我们先看一个实例：


private static Document createDocument(Integer id, String firstName, String lastName, String website) {
    Document document = new Document();
    document.add(new StringField("id", id.toString(), Field.Store.YES));
    document.add(new TextField("firstName", firstName, Field.Store.YES));
    document.add(new TextField("lastName", lastName, Field.Store.YES));
    document.add(new TextField("website", website, Field.Store.YES));
    return document;
  }

Lucene的字段包含了几种情况，首先按照大的类别看有两种，一种是字符型的，另一种是非字符型的。字符型的字段有两种，分别是StringField和TextField。两者区别如下：

StringField：这是指一个字符类型的字段，且不需要经过分词或者是其他的语言处理，相当于url/identifier等内容，不切分。

TextField：这是文本字符的字段，里面的内容需要切词，例如上面的StandardAnalyzer处理。

非字符型的字段包含整型、浮点型等常见的类型：
BinaryPoint
IntPoint
LongPoint
FloatPoint

注意，这几种字段的处理方式有所差别，首先，对于字符型字段，其构造的方式如下：

new TextField("lastName", lastName, Field.Store.YES)；
new StringField("firstName", firstName, Field.Store.YES)

上述代码表明这个字段的名字是lastname，内容是传来的参数lastname，Field.Store.YES表明这个字段是否需要被存储。在实际中测试可以发现，如果不存储这个字段，这条记录依然可以搜索出来，但是无法展示，如果存储了这个字段，那么这个内容就可以展示了。使用StringField或者TextField构造的字段一定是被索引了。

下面我们看整型字段的添加：

document.add(new IntPoint("age", age));
document.add(new StoredField("age", age));

注意，整型字段的创建方式和字符型有差别，最主要的是Lucene默认不会保存整型的数据，也没有在IntPoint中给出保存的接口，而是需要构造一个额外的StoredField来保存。上述代码如果只有第一行，那么age这个是可以检索的，但是无法展示，只有添加了第二行代码之后才能展示。

最后，如果我们只想保存某个字段，但是不检索的话，可以用如下的方式：

document.add(new StoredField("otherName", otherName));

这个方式添加的字段可以展示，但是无法被搜索。

4、创建索引，关闭索引写入流


writer.addDocuments(documents);
writer.commit();
writer.close();

数据创建完毕后，可以提交并创建索引了。上述代码执行的时候会生成writer.lock文件，此时其他程序无法修改索引，但是可以查询。close之后才能修改。注意，每次修改索引文件，都会在索引位置产生新的索引文件，其差别就是在文件名后追加数字。如segment_x.si，其中x表示第几次修改。

二、检索过程

Lucene的检索涉及到了四个API：

IndexSearcher：这是搜索索引的网关，所有的搜索都必须通过一个IndexSearcher实例进行。

Query：包含一些子类，与特定查询类型有关，Query的实例会被传递给IndexSearcher。

QueryParser：这个是解析自然搜索接口的，就是比如我们在搜索引擎搜索中那些and or语法

Hits：这个是搜索结果的接口，所有的结果都从这里取出。

2.1、一般检索过程

Lucene中提供了两种查询的方式，一种是QueryParser，另一种是TermQuery。前者是针对TextField类型进行查询的，需要配合分析器使用（如前面的StandardAnalzyer，KeywordAnalzyer等）。而TermQuery就是针对某个字段进行搜索的，不需要分析器，是精确查询。二者的创建方法如下：


// 读取索引所在的文件夹
Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
IndexReader reader = DirectoryReader.open(dir);

// 读取索引
IndexSearcher searcher = IndexSearcher(reader);

// 使用StandardAnalyzer创建QueryParser，这里的field是指定查询的字段名称，如id等
QueryParser query = new QueryParser("firstName", new StandardAnalyzer());

// 解析查询内容，brain是用户输入的查询词
Query qr = query.parse("brain");

// ---------如果使用TermQuery，则使用如下方式
Query qr = new TermQuery(new Term("firstName", "brain"));

// 获取查询结果，第二个10表示返回前多少条记录
TopDocs foundDocs =  searcher.search(qr, 10);

// 输出命中记录
System.out.println(foundDocs.totalHits);

// 输出每一条结果
for (ScoreDoc doc : foundDocs.scoreDocs) {
  Document result = searcher.doc(doc.doc);
  System.out.println(result.get("id") + "\t" + result.get("firstName"));
}

2.2、StringField和TextField字段检索的差别

前面说过，在索引的时候需要提供分析器，最常用的分析器是StandardAnalyzer，这个分析器包含了去除常用的英文停用词，将所有的字母转换成小写等，它也会切词。分析器只会针对TextField进行转换，在查询的时候，如果针对TextField进行查询，可以使用QueryParser，这种查询会对用户的查询词做一些处理，因此需要配合一个分析器，注意，这个分析器需要和索引的分析器一样，才能查询出结果。否则会出现问题。例如，索引使用StandardAnalyzer，那么会转换大小写，并去除提用词，如果查询的时候使用KeywordAnalyzer，那么就不会转换大写字母，那么如果搜索词包含大写字母，是查询不到任何结果的。同样的，如果使用TermQuery查询，查询TextField字段中的停用词如a,the，或者是查询包含大写字母的内容，都是无法查询到结果的。我们看下面的例子：

package dufei.lucene.main;

import com.sun.scenario.effect.impl.sw.sse.SSEBlend_SRC_OUTPeer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.analysis.core.StopAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.TermsEnum;

import java.io.File;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;

/**
 * a simple example of lucene
 * Created by Du Fei on 2018/11/5.
 */
public class FirstTest {

  private static final String INDEX_DIR = "c:/temp/lucene6index";

  public static void main(String[] args) throws Exception {
    IndexWriter writer = createWriter();
    List<Document> documents = new ArrayList<>();

    Document document1 = createDocument(1, "Lokesh", "Gupta", "Hi, I am from Anhui province", 10);
    documents.add(document1);

    Document document4 = createDocument(2, "Lokesh2", "Gupta2", "nice to meet you! I come from Jiangsu province", 10);
    documents.add(document4);

    Document document2 = createDocument(3, "Brian", "Schultz", "I am glad to be here, I am from Hefei University of Technology", 20);
    documents.add(document2);

    Document document3 = createDocument(4, "Brian2", "Schultz2", "I come from Nanjing, I graduate from Nanjing University", 30);
    documents.add(document3);

    Document document5 = createDocument(4, "Brown", "Jack", "I come from Nanjing, I am very glad to be here. I graduate from Nanjing University", 30);
    documents.add(document5);

    //Let's clean everything first
    writer.deleteAll();

    writer.addDocuments(documents);
    writer.commit();
    writer.close();

    IndexSearcher searcher = createSearcher();

    // 字段索引和保存的情况
    // id：保存，string字段不索引
    // firstName: string字段保存且索引
    // lastName：string字段保存且索引
    // introduction：text字段保存且索引
    // age：整型字段保存且索引
    System.out.println("--------------正常查询firstName，使用StandardAnalyzer分析器---------------------");
    TopDocs foundDocs = queryFieldWithStandardAnalyzer("firstName", "Brian", searcher);
    System.out.println(foundDocs.totalHits);

    for (ScoreDoc doc : foundDocs.scoreDocs) {
      Document result = searcher.doc(doc.doc);
      System.out.println(result.get("id") + "\t" + result.get("firstName"));
    }

    System.out.println("--------------正常查询firstName，使用KeywordAnalyzer分析器---------------------");
    foundDocs = queryFieldWithKeywordAnalyzer("firstName", "Brian", searcher);
    System.out.println(foundDocs.totalHits);

    for (ScoreDoc doc : foundDocs.scoreDocs) {
      Document result = searcher.doc(doc.doc);
      System.out.println(result.get("id") + "\t" + result.get("firstName"));
    }

    System.out.println("--------------正常查询firstName，使用TermQuery分析器---------------------");
    foundDocs = queryFieldWithTermQuery("firstName", "Brian", searcher);
    System.out.println(foundDocs.totalHits);

    for (ScoreDoc doc : foundDocs.scoreDocs) {
      Document result = searcher.doc(doc.doc);
      System.out.println(result.get("id") + "\t" + result.get("firstName"));
    }

    System.out.println("--------------正常查询introduction，使用TermQuery分析器，查询TextField的停用词---------------------");
    foundDocs = queryFieldWithTermQuery("introduction", "a", searcher);
    System.out.println(foundDocs.totalHits);

    for (ScoreDoc doc : foundDocs.scoreDocs) {
      Document result = searcher.doc(doc.doc);
      System.out.println(result.get("id") + "\t" + result.get("firstName"));
    }

    System.out.println("--------------正常查询introduction，使用TermQuery分析器，查询TextField的非停用词，小写---------------------");
    foundDocs = queryFieldWithTermQuery("introduction", "hefei", searcher);
    System.out.println(foundDocs.totalHits);

    for (ScoreDoc doc : foundDocs.scoreDocs) {
      Document result = searcher.doc(doc.doc);
      System.out.println(result.get("id") + "\t" + result.get("firstName"));
    }

    System.out.println("--------------正常查询introduction，使用TermQuery分析器，查询TextField的非停用词，大写---------------------");
    foundDocs = queryFieldWithTermQuery("introduction", "Hefei", searcher);
    System.out.println(foundDocs.totalHits);

    for (ScoreDoc doc : foundDocs.scoreDocs) {
      Document result = searcher.doc(doc.doc);
      System.out.println(result.get("id") + "\t" + result.get("firstName"));
    }

  }

  // 创建待索引文档
  private static Document createDocument(Integer id, String firstName, String lastName, String website, int age) throws ParseException {
    Document document = new Document();
    document.add(new StoredField("id", id.toString()));
    document.add(new StringField("firstName", firstName, Field.Store.YES));
    document.add(new StringField("lastName", lastName, Field.Store.YES));
    document.add(new TextField("introduction", website, Field.Store.YES));
    document.add(new IntPoint("age", age));
    document.add(new StoredField("age", age));
    return document;
  }

  // 创建索引写入对象
  private static IndexWriter createWriter() throws IOException {
    FSDirectory dir = FSDirectory.open(Paths.get(INDEX_DIR));
    IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
    config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
    config.setUseCompoundFile(false);
    return new IndexWriter(dir, config);
  }

  // 创建索引搜索对象
  private static IndexSearcher createSearcher() throws IOException {
    Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
    IndexReader reader = DirectoryReader.open(dir);
    return new IndexSearcher(reader);
  }

  // 使用标准分析器，采用QueryParser查询
  private static TopDocs queryFieldWithStandardAnalyzer(String field, String content, IndexSearcher searcher) throws IOException, org.apache.lucene.queryparser.classic.ParseException {
    QueryParser query = new QueryParser(field, new StandardAnalyzer());
    Query qr = query.parse(content);

    return searcher.search(qr, 10);
  }

  // 使用关键词分析器，采用QueryParser查询
  private static TopDocs queryFieldWithKeywordAnalyzer(String field, String content, IndexSearcher searcher) throws IOException, org.apache.lucene.queryparser.classic.ParseException {
    QueryParser query = new QueryParser(field, new KeywordAnalyzer());
    Query qr = query.parse(content);

    return searcher.search(qr, 10);
  }

  // 使用TermQuery查询，不需要任何分析器
  private static TopDocs queryFieldWithTermQuery(String field, String content, IndexSearcher searcher) throws IOException, org.apache.lucene.queryparser.classic.ParseException {
    Query qr = new TermQuery(new Term(field, content));
    return searcher.search(qr, 10);
  }

}

在上述实例中，我们创建了四个文档并添加了索引，每个文档包含了五个字段，其中firstName和lastName使用的是StringField，而introduction使用的是TextField，同样，我们构建了三种查询，分别是带有StandardAnalyzer的QueryParser查询，带有KeywordAnalyzer的WueryParser查询，和TermQuery查询，我们同样查询了几种情况：

正常查询firstName，使用StandardAnalyzer分析器，查询的是“Brian”：这个查询是没有任何结果的，因为索引的时候用StringField是严格存储原文，但是查询的时候用了StandardAnalyzer，因此，查询的“Brian”会变成“brian”，索引中只有“Brian”。

正常查询firstName，使用KeywordAnalyzer分析器：这个可以查询到“Brian”结果。因为KyewordAnalyzer不做大小写转换等。

正常查询firstName，使用TermQuery分析器：可以查询到结果。这是精确匹配，查询和索引都不做处理。

正常查询introduction，使用TermQuery分析器，查询TextField的停用词：这个查询查的introduction中的单词“a”，这是停用词，introduction索引的时候用StandardAnalyzer处理掉了停用词，所以查询不到。

正常查询introduction，使用TermQuery分析器，查询TextField的非停用词，小写：查询”hefei”，索引中已经将“Hefei”转换成了“hefei”，可以查询到结果。因为原文的TextField会把hefei当做一个term索引。

正常查询introduction，使用TermQuery分析器，查询TextField的非停用词，大写：查询”Hefei”，索引中已经将“Hefei”转换成了“hefei”，但是这里用的精确查询，因此无法查询到结果。

2.3、非字符型字段的查询

非字符型字段如整型等，与上述查询时不同的，如果我们使用上述查询查询整型字段是没有人很内容的，继续上面的代码，假设我们要对年龄查询：

System.out.println("--------------正常查询整型的搜索KeywordAnalyzer---------------------");
foundDocs = queryFieldWithKeywordAnalyzer("age", "10", searcher);
  System.out.println(foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  System.out.println(searcher.doc(doc.doc).get("id"));
}

System.out.println("--------------正常查询整型的搜索StandardAnalyzer---------------------");
foundDocs = queryFieldWithStandardAnalyzer("age", "10", searcher);
System.out.println(foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  System.out.println(searcher.doc(doc.doc).get("id"));
}

System.out.println("--------------正常查询整型的搜索TermQuery---------------------");
foundDocs = queryFieldWithTermQuery("age", "10", searcher);
  System.out.println(foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  System.out.println(searcher.doc(doc.doc).get("id"));
}

上述三个查询结果都是0，意味着无论采用哪种查询查询整型数据都没有结果。正确的查询如下：

System.out.println("--------------查询整型字段结果---------------------");
foundDocs = queryExactIntField(10, searcher);
System.out.println(foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  System.out.println(searcher.doc(doc.doc).get("id"));
}

private static TopDocs queryExactIntField(Integer num, IndexSearcher searcher) throws IOException {
  Query query = IntPoint.newExactQuery("age", num);
  System.out.println(query);

  return searcher.search(query, 10);
}

其中最重要的是：

Query query = IntPoint.newExactQuery("age", num);

这是精确查询整型的方式。

目前，Lucene支持的查询包括：

TermQuery
BooleanQuery
WildcardQuery
PhraseQuery
PrefixQuery
MultiPhraseQuery
FuzzyQuery
RegexpQuery
TermRangeQuery
PointRangeQuery
ConstantScoreQuery
DisjunctionMaxQuery
MatchAllDocsQuery

2.4、多字段查询

多字段查询采用MultiFieldQueryParser：

IndexSearcher searcher = createSearcher();

// 任意字段满足
Query query = MultiFieldQueryParser.parse(new String[]{"Brown", "nanjing"},
new String[]{"firstName", "introduction"}, new KeywordAnalyzer());
TopDocs foundDocs = searcher.search(query, 10);
System.out.println("num of hits:" + foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  Document result = searcher.doc(doc.doc);
  System.out.println(result.get("id") + "\t" + result.get("firstName") + "\t" + result.get("introduction"));
}

// 所有字段都满足
query = MultiFieldQueryParser.parse(new String[]{"Brown", "nanjing"},
new String[]{"firstName", "introduction"},
new BooleanClause.Occur[]{BooleanClause.Occur.MUST, BooleanClause.Occur.MUST},
new KeywordAnalyzer());

foundDocs = searcher.search(query, 10);
System.out.println("num of hits:" + foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  Document result = searcher.doc(doc.doc);
  System.out.println(result.get("id") + "\t" + result.get("firstName") + "\t" + result.get("introduction"));
}

2.5、Lucene的QueryParser语法

Lucene支持一些查询语法，如AND、OR等。假设我们查询introduction的内容，我们可以写成下面的形式：

System.out.println("--------------QueryParser语法测试1---------------------");
foundDocs = queryFieldWithStandardAnalyzer("firstName", "introduction:\"nanjing\" AND introduction:\"glad\"", searcher);
System.out.println(foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  Document result = searcher.doc(doc.doc);
  System.out.println(result.get("id") + "\t" + result.get("firstName") + "\t" + result.get("introduction"));
}

System.out.println("--------------QueryParser语法测试2---------------------");
foundDocs = queryFieldWithStandardAnalyzer("firstName", "introduction:\"nanjing\" OR introduction:\"glad\"", searcher);
System.out.println(foundDocs.totalHits);

for (ScoreDoc doc : foundDocs.scoreDocs) {
  Document result = searcher.doc(doc.doc);
  System.out.println(result.get("id") + "\t" + result.get("firstName") + "\t" + result.get("introduction"));
}

可以看到，这里的字段设置已经没有意义了，重要的时候，QueryParser的参数起作用，第一个结果返回结果只有1条，第二个有3条。

目前，Lucene支持的查询语法包括：
交并查询（布尔）：AND/OR/NOT
通配符查询：如te?t/test*/te*t
正则表达式：/[mb]oat/
模糊查询：roam~
近似查询：”jakarta apache”~10
范围查询：mod_date:[20020101 TO 20030101]
字段分组：title:(+return +”pink panther”)