pyltp進行分詞,詞性檢測,句子結構分析,命名實體識別


安裝見本人寫的另一篇博文。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os

ROOTDIR = os.path.join(os.path.dirname(__file__), os.pardir)
sys.path = [os.path.join(ROOTDIR, "lib")] + sys.path

# Set your own model path
MODELDIR=os.path.join(ROOTDIR, "ltp_data")

from pyltp import SentenceSplitter, Segmentor, Postagger, Parser, NamedEntityRecognizer, SementicRoleLabeller

paragraph = '平素體質:健康狀況:良,既往有“高血壓病史”多年'

sentence = SentenceSplitter.split(paragraph)[0]

segmentor = Segmentor()
segmentor.load("/home/ubuntu/program/ltp_data/cws.model")
words = segmentor.segment(sentence)
print("\t".join(words))

postagger = Postagger()
postagger.load("/home/ubuntu/program/ltp_data/pos.model")
postags = postagger.postag(words)
# list-of-string parameter is support in 0.1.5
# postags = postagger.postag(["中國","進出口","銀行","與","中國銀行","加強","合作"])
print("\t".join(postags))

parser = Parser()
parser.load("/home/ubuntu/program/ltp_data/parser.model")
arcs = parser.parse(words, postags)

print("\t".join("%d:%s" % (arc.head, arc.relation) for arc in arcs))
for arc in arcs:
print(arc.head)
print(arc.relation)
recognizer = NamedEntityRecognizer()
recognizer.load("/home/ubuntu/program/ltp_data/ner.model")
netags = recognizer.recognize(words, postags)
print("\t".join(netags))


segmentor.release()
postagger.release()
parser.release()
recognizer.release()

實驗結果:

/usr/bin/python3.4 /home/ubuntu/PycharmProjects/pythonproject/ltplearning/pos.py
平素 體質 : 健康 狀況 : 良 , 既 往 有 “ 高血壓 病史 ” 多年
a n wp a n wp a wp c p v wp n n wp m
2:ATT 0:HED 2:WP 5:ATT 2:COO 5:WP 5:COO 7:WP 11:ADV 11:ADV 7:COO 14:WP 14:ATT 11:VOB 14:WP 11:CMP
O O O O O O O O O O O O O O O O

Process finished with exit code 0

標簽和含義見官方文檔

https://ltp.readthedocs.io/zh_CN/latest/appendix.html

pyltp說明文檔

https://pyltp.readthedocs.io/zh_CN/latest/api.html#id15

注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
粤ICP备14056181号  © 2014-2021 ITdaan.com