基於Lucene3.5.0如何從TokenStream獲得Token


通過學習Lucene3.5.0的doc文檔,對不同release版本 lucene版本的API改動做分析。最后找到了有價值的改動信息。
  • LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
  • 以上信息可以知道,原來的通過的方法已經不能夠提取響應的Token了
    StringReader reader = new StringReader(s);
    TokenStream ts =analyzer.tokenStream(s, reader);
    TermAttribute ta = ts.getAttribute(TermAttribute.class);

  • 通過分析Api文檔信息 可知,CharTermAttribute已經成為替換TermAttribute的接口
  • 因此我編寫了一個例子來更好的從TokenStream中提取Token


  • package com.segment;

    import java.io.StringReader;
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
    import org.apache.lucene.analysis.tokenattributes.TermAttribute;
    import org.apache.lucene.util.AttributeImpl;
    import org.wltea.analyzer.lucene.IKAnalyzer;


    public class Segment {
    public static String show(Analyzer a, String s) throws Exception {

    StringReader reader = new StringReader(s);
    TokenStream ts = a.tokenStream(s, reader);
    String s1 = "", s2 = "";
    boolean hasnext= ts.incrementToken();
    //Token t = ts.next();
    while (hasnext) {
    //AttributeImpl ta = new AttributeImpl();
    CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
    //TermAttribute ta = ts.getAttribute(TermAttribute.class);

    s2 = ta.toString() + " ";
    s1 += s2;
    hasnext = ts.incrementToken();
    }
    return s1;
    }

    public String segment(String s) throws Exception {
    Analyzer a = new IKAnalyzer();
    return show(a, s);
    }
    public static void main(String args[])
    {
    String name = "我是俊傑,我愛編程,我的測試用例";
    Segment s = new Segment();
    String test = "";
    try {
    System.out.println(test+s.segment(name));
    } catch (Exception e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    }

    }


  • 注意!

    本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



     
    粤ICP备14056181号  © 2014-2020 ITdaan.com