当前位置: 首页 > 编程日记 > 正文

自然语言处理:汉语分词

NLPIR/ICTCLAS 汉语分词系统(http://ictclas.nlpir.org)
PyNLPIR 是该汉语分词系统的 python 封装版(http://pynlpir.readthedocs.io...)

安装步骤:
① pip install pynlpir
② pynlpir update

官方文档的汉语分词示例:

import pynlpir
pynlpir.open()str = '欢迎科研人员、技术工程师、企事业单位与个人参与 NLPIR 平台的建设工作。'
result = pynlpir.segment(str)print(result)# output: [('欢迎', 'verb'), ('科研', 'noun'), ('人员', 'noun'), ('、', 'punctuation mark'), ('技术', 'noun'), ('工程师', 'noun'), ('、', 'punctuation mark'), ('企事业', 'noun'), ('单位', 'noun'), ('与', 'conjunction'), ('个人', 'noun'), ('参与', 'verb'), ('NLPIR', 'noun'), ('平台', 'noun'), ('的', 'particle'), ('建设', 'verb'), ('工作', 'verb'), ('。', 'punctuation mark')]

可能遇到的问题:
① raise RuntimeError("NLPIR function 'NLPIR_Init' failed.")

解决方案:
访问 https://github.com/NLPIR-team... 仓库,
下载 license 例如 NLPIR-ICTCLAS 分词系统授权中的 NLPIR.user 文件,
替换路径 path_to_local_python/Lib/site-packages/pynlpir/Data 下的同名文件以更新授权。

中文停用词表:

["啊","阿","哎","哎呀","哎哟","唉","俺","俺们","按","按照","吧","吧哒","把","罢了","被","本","本着","比","比方","比如","鄙人","彼","彼此","边","别","别的","别说","并","并且","不比","不成","不单","不但","不独","不管","不光","不过","不仅","不拘","不论","不怕","不然","不如","不特","不惟","不问","不只","朝","朝着","趁","趁着","乘","冲","除","除此之外","除非","除了","此","此间","此外","从","从而","打","待","但","但是","当","当着","到","得","的","的话","等","等等","地","第","叮咚","对","对于","多","多少","而","而况","而且","而是","而外","而言","而已","尔后","反过来","反过来说","反之","非但","非徒","否则","嘎","嘎登","该","赶","个","各","各个","各位","各种","各自","给","根据","跟","故","故此","固然","关于","管","归","果然","果真","过","哈","哈哈","呵","和","何","何处","何况","何时","嘿","哼","哼唷","呼哧","乎","哗","还是","还有","换句话说","换言之","或","或是","或者","极了","及","及其","及至","即","即便","即或","即令","即若","即使","几","几时","己","既","既然","既是","继而","加之","假如","假若","假使","鉴于","将","较","较之","叫","接着","结果","借","紧接着","进而","尽","尽管","经","经过","就","就是","就是说","据","具体地说","具体说来","开始","开外","靠","咳","可","可见","可是","可以","况且","啦","来","来着","离","例如","哩","连","连同","两者","了","临","另","另外","另一方面","论","嘛","吗","慢说","漫说","冒","么","每","每当","们","莫若","某","某个","某些","拿","哪","哪边","哪儿","哪个","哪里","哪年","哪怕","哪天","哪些","哪样","那","那边","那儿","那个","那会儿","那里","那么","那么些","那么样","那时","那些","那样","乃","乃至","呢","能","你","你们","您","宁","宁可","宁肯","宁愿","哦","呕","啪达","旁人","呸","凭","凭借","其","其次","其二","其他","其它","其一","其余","其中","起","起见","岂但","恰恰相反","前后","前者","且","然而","然后","然则","让","人家","任","任何","任凭","如","如此","如果","如何","如其","如若","如上所述","若","若非","若是","啥","上下","尚且","设若","设使","甚而","甚么","甚至","省得","时候","什么","什么样","使得","是","是的","首先","谁","谁知","顺","顺着","似的","虽","虽然","虽说","虽则","随","随着","所","所以","他","他们","他人","它","它们","她","她们","倘","倘或","倘然","倘若","倘使","腾","替","通过","同","同时","哇","万一","往","望","为","为何","为了","为什么","为着","喂","嗡嗡","我","我们","呜","呜呼","乌乎","无论","无宁","毋宁","嘻","吓","相对而言","像","向","向着","嘘","呀","焉","沿","沿着","要","要不","要不然","要不是","要么","要是","也","也罢","也好","一","一般","一旦","一方面","一来","一切","一样","一则","依","依照","矣","以","以便","以及","以免","以至","以至于","以致","抑或","因","因此","因而","因为","哟","用","由","由此可见","由于","有","有的","有关","有些","又","于","于是","于是乎","与","与此同时","与否","与其","越是","云云","哉","再说","再者","在","在下","咱","咱们","则","怎","怎么","怎么办","怎么样","怎样","咋","照","照着","者","这","这边","这儿","这个","这会儿","这就是说","这里","这么","这么点儿","这么些","这么样","这时","这些","这样","正如","吱","之","之类","之所以","之一","只是","只限","只要","只有","至","至于","诸位","着","着呢","自","自从","自个儿","自各儿","自己","自家","自身","综上所述","总的来看","总的来说","总的说来","总而言之","总之","纵","纵令","纵然","纵使","遵照","作为","兮","呃","呗","咚","咦","喏","啐","喔唷","嗬","嗯","嗳","啊哈","啊呀","啊哟","挨次","挨个","挨家挨户","挨门挨户","挨门逐户","挨着","按理","按期","按时","按说","暗地里","暗中","暗自","昂然","八成","白白","半","梆","保管","保险","饱","背地里","背靠背","倍感","倍加","本人","本身","甭","比起","比如说","比照","毕竟","必","必定","必将","必须","便","别人","并非","并肩","并没","并没有","并排","并无","勃然","不","不必","不常","不大","不得","不得不","不得了","不得已","不迭","不定","不对","不妨","不管怎样","不会","不仅仅","不仅仅是","不经意","不可开交","不可抗拒","不力","不了","不料","不满","不免","不能不","不起","不巧","不然的话","不日","不少","不胜","不时","不是","不同","不能","不要","不外","不外乎","不下","不限","不消","不已","不亦乐乎","不由得","不再","不择手段","不怎么","不曾","不知不觉","不止","不止一次","不至于","才","才能","策略地","差不多","差一点","常","常常","常言道","常言说","常言说得好","长此下去","长话短说","长期以来","长线","敞开儿","彻夜","陈年","趁便","趁机","趁热","趁势","趁早","成年","成年累月","成心","乘机","乘胜","乘势","乘隙","乘虚","诚然","迟早","充分","充其极","充其量","抽冷子","臭","初","出","出来","出去","除此","除此而外","除此以外","除开","除去","除却","除外","处处","川流不息","传","传说","传闻","串行","纯","纯粹","此后","此中","次第","匆匆","从不","从此","从此以后","从古到今","从古至今","从今以后","从宽","从来","从轻","从速","从头","从未","从无到有","从小","从新","从严","从优","从早到晚","从中","从重","凑巧","粗","存心","达旦","打从","打开天窗说亮话","大","大不了","大大","大抵","大都","大多","大凡","大概","大家","大举","大略","大面儿上","大事","大体","大体上","大约","大张旗鼓","大致","呆呆地","带","殆","待到","单","单纯","单单","但愿","弹指之间","当场","当儿","当即","当口儿","当然","当庭","当头","当下","当真","当中","倒不如","倒不如说","倒是","到处","到底","到了儿","到目前为止","到头","到头来","得起","得天独厚","的确","等到","叮当","顶多","定","动不动","动辄","陡然","都","独","独自","断然","顿时","多次","多多","多多少少","多多益善","多亏","多年来","多年前","而后","而论","而又","尔等","二话不说","二话没说","反倒","反倒是","反而","反手","反之亦然","反之则","方","方才","方能","放量","非常","非得","分期","分期分批","分头","奋勇","愤然","风雨无阻","逢","弗","甫","嘎嘎","该当","概","赶快","赶早不赶晚","敢","敢情","敢于","刚","刚才","刚好","刚巧","高低","格外","隔日","隔夜","个人","各式","更","更加","更进一步","更为","公然","共","共总","够瞧的","姑且","古来","故而","故意","固","怪","怪不得","惯常","光","光是","归根到底","归根结底","过于","毫不","毫无","毫无保留地","毫无例外","好在","何必","何尝","何妨","何苦","何乐而不为","何须","何止","很","很多","很少","轰然","后来","呼啦","忽地","忽然","互","互相","哗啦","话说","还","恍然","会","豁然","活","伙同","或多或少","或许","基本","基本上","基于","极","极大","极度","极端","极力","极其","极为","急匆匆","即将","即刻","即是说","几度","几番","几乎","几经","既...又","继之","加上","加以","间或","简而言之","简言之","简直","见","将才","将近","将要","交口","较比","较为","接连不断","接下来","皆可","截然","截至","藉以","借此","借以","届时","仅","仅仅","谨","进来","进去","近","近几年来","近来","近年来","尽管如此","尽可能","尽快","尽量","尽然","尽如人意","尽心竭力","尽心尽力","尽早","精光","经常","竟","竟然","究竟","就此","就地","就算","居然","局外","举凡","据称","据此","据实","据说","据我所知","据悉","具体来说","决不","决非","绝","绝不","绝顶","绝对","绝非","均","喀","看","看来","看起来","看上去","看样子","可好","可能","恐怕","快","快要","来不及","来得及","来讲","来看","拦腰","牢牢","老","老大","老老实实","老是","累次","累年","理当","理该","理应","历","立","立地","立刻","立马","立时","联袂","连连","连日","连日来","连声","连袂","临到","另方面","另行","另一个","路经","屡","屡次","屡次三番","屡屡","缕缕","率尔","率然","略","略加","略微","略为","论说","马上","蛮","满","没","没有","每逢","每每","每时每刻","猛然","猛然间","莫","莫不","莫非","莫如","默默地","默然","呐","那末","奈","难道","难得","难怪","难说","内","年复一年","凝神","偶而","偶尔","怕","砰","碰巧","譬如","偏偏","乒","平素","颇","迫于","扑通","其后","其实","奇","齐","起初","起来","起首","起头","起先","岂","岂非","岂止","迄","恰逢","恰好","恰恰","恰巧","恰如","恰似","千","万","千万","千万千万","切","切不可","切莫","切切","切勿","窃","亲口","亲身","亲手","亲眼","亲自","顷","顷刻","顷刻间","顷刻之间","请勿","穷年累月","取道","去","权时","全都","全力","全年","全然","全身心","然","人人","仍","仍旧","仍然","日复一日","日见","日渐","日益","日臻","如常","如此等等","如次","如今","如期","如前所述","如上","如下","汝","三番两次","三番五次","三天两头","瑟瑟","沙沙","上","上来","上去","一.","一一","一下","一个","一些","一何","一则通过","一天","一定","一时","一次","一片","一番","一直","一致","一起","一转眼","一边","一面","上升","上述","上面","下","下列","下去","下来","下面","不一","不久","不变","不可","不够","不尽","不尽然","不敢","不断","不若","不足","与其说","专门","且不说","且说","严格","严重","个别","中小","中间","丰富","为主","为什麽","为止","为此","主张","主要","举行","乃至于","之前","之后","之後","也就是说","也是","了解","争取","二来","云尔","些","亦","产生","人","人们","什麽","今","今后","今天","今年","今後","介于","从事","他是","他的","代替","以上","以下","以为","以前","以后","以外","以後","以故","以期","以来","任务","企图","伟大","似乎","但凡","何以","余外","你是","你的","使","使用","依据","依靠","便于","促进","保持","做到","傥然","儿","允许","元/吨","先不先","先后","先後","先生","全体","全部","全面","共同","具体","具有","兼之","再","再其次","再则","再有","再次","再者说","决定","准备","凡","凡是","出于","出现","分别","则甚","别处","别是","别管","前此","前进","前面","加入","加强","十分","即如","却","却不","原来","又及","及时","双方","反应","反映","取得","受到","变成","另悉","只","只当","只怕","只消","叫做","召开","各人","各地","各级","合理","同一","同样","后","后者","后面","向使","周围","呵呵","咧","唯有","啷当","喽","嗡","嘿嘿","因了","因着","在于","坚决","坚持","处在","处理","复杂","多么","多数","大力","大多数","大批","大量","失去","她是","她的","好","好的","好象","如同","如是","始而","存在","孰料","孰知","它们的","它是","它的","安全","完全","完成","实现","实际","宣布","容易","密切","对应","对待","对方","对比","小","少数","尔","尔尔","尤其","就是了","就要","属于","左右","巨大","巩固","已","已矣","已经","巴","巴巴","帮助","并不","并不是","广大","广泛","应当","应用","应该","庶乎","庶几","开展","引起","强烈","强调","归齐","当前","当地","当时","形成","彻底","彼时","往往","後来","後面","得了","得出","得到","心里","必然","必要","怎奈","怎麽","总是","总结","您们","您是","惟其","意思","愿意","成为","我是","我的","或则","或曰","战斗","所在","所幸","所有","所谓","扩大","掌握","接著","数/","整个","方便","方面","无","无法","既往","明显","明确","是不是","是以","是否","显然","显著","普通","普遍","曾","曾经","替代","最","最后","最大","最好","最後","最近","最高","有利","有力","有及","有所","有效","有时","有点","有的是","有着","有著","末##末","本地","来自","来说","构成","某某","根本","欢迎","欤","正值","正在","正巧","正常","正是","此地","此处","此时","此次","每个","每天","每年","比及","比较","没奈何","注意","深入","清楚","满足","然後","特别是","特殊","特点","犹且","犹自","现代","现在","甚且","甚或","甚至于","用来","由是","由此","目前","直到","直接","相似","相信","相反","相同","相对","相应","相当","相等","看出","看到","看看","看见","真是","真正","眨眼","矣乎","矣哉","知道","确定","种","积极","移动","突出","突然","立即","竟而","第二","类如","练习","组成","结合","继后","继续","维持","考虑","联系","能否","能够","自后","自打","至今","至若","致","般的","良好","若夫","若果","范围","莫不然","获得","行为","行动","表明","表示","要求","规定","觉得","譬喻","认为","认真","认识","许多","设或","诚如","说明","说来","说说","诸","诸如","谁人","谁料","贼死","赖以","距","转动","转变","转贴","达到","迅速","过去","过来","运用","还要","这一来","这次","这点","这种","这般","这麽","进入","进步","进行","适应","适当","适用","逐步","逐渐","通常","造成","遇到","遭到","遵循","避免","那般","那麽","部分","采取","里面","重大","重新","重要","针对","问题","防止","附近","限制","随后","随时","随著","难道说","集中","需要","非特","非独","高兴","若果 "]

相关文章:

再也不买仙剑正版盘了

奶奶的,好不容易心血来潮买了一回,windows 2003安装上蓝屏,在xp虚拟机上装报错,狗日的大宇,以后专门玩盗版气它 转载于:https://www.cnblogs.com/charie/archive/2008/02/21/1076772.html

利用BP神经网络教计算机进行非线函数拟合(代码部分单层)

单层BP神经网络 本图文已经更新,详细地址如下: http://blog.csdn.net/lsgo_myp/article/details/54425751

ps aux|grep

ps a 显示现行终端机下的所有程序,包括其他用户的程序。 2)ps -A 显示所有程序。 3)ps c 列出程序时,显示每个程序真正的指令名称,而不包含路径,参数或常驻服务的标示。 4)ps -e 此参数的效果…

排序(一)归并、快排、优先队列等(图文具体解释)

排序(一) 0基础排序算法 选择排序 思想:首先,找到数组中最小的那个元素。其次,将它和数组的第一个元素交换位置。再次。在剩下的元素中找到最小的元素。将它与数组的第二个元素交换位置。如此往复,直到将整个数组排序。 【图例】 …

利用BP神经网络教计算机进行非线函数拟合(代码部分多层)

利用BP神经网络教计算机进行非线函数拟合(代码部分多层) 本图文已经更新,详细地址如下: http://blog.csdn.net/lsgo_myp/article/details/54425751

年年英雄会,岁岁侠客行

虽然今年工作比较忙,但还是坚持参加了CSDN组织的英雄会第二届。如去年所约,CSDN在持续发展着,而英雄会这一中国独特的程序员式的聚会,胜利地举办了第二届。 虽然不能成为MVB,但还是感谢CSDN记得发给我邀请。这份情意还…

Velocity判断空的方法

Velocity中没有null,那么怎么判断null呢 1、在velocity中,非null被认为是真的,所以,可以如下用: #if($!变量名)// 变量不为空的代码 #else// 变量为空的代码 #end

js对Dom操作

<div id"myWebPanelForm"style"width:400;height:200;display:none"><div id"WebPanel_Body"style"width:400;height:200;display:none">测试</div></div><script type"text/javascript">win…

利用BP神经网络教计算机进行非线函数拟合

利用BP神经网络教计算机进行非线函数拟合 本图文已经更新&#xff0c;详细地址如下&#xff1a; http://blog.csdn.net/lsgo_myp/article/details/54425751

phpstorm failed to create jvm:error code -6 解决办法 解决方法

phpStorm 软件打开运行提示 failed to create JVM的解决办法。 修改文件 D:\Program Files (x86)\JetBrains\PhpStorm 7.1.3\bin\PhpStorm.exe.vmoptions 把内存值改成标准值&#xff0c;文件全部内容如下&#xff1a; [plain] view plaincopy -server -Xms128m -Xmx512m -X…

maven jar包冲突常见报错及解决方法

见到如下错误&#xff0c;可以想到是不是jar包冲突 1.java.lang.NoSuchMethodError2.java.lang.ClassNotFoundException3.java.lang.NoClassDefFoundError解决办法 以一个错误为例&#xff1a;解决方法&#xff1a;1.首先定位到具体类。查到org.apache.httpHost对应的maven依赖…

[轉]如果把HTML當成飾品....

轉自:http://blog.onlyone.idv.tw/997.htm [轉]如果把HTML當成飾品.... 如果有一天&#xff0c;有個人把HTML做成耳環掛在耳朵上&#xff0c;那麼… 不過&#xff0c;在國外&#xff0c;就真的有人把這玩意拿出來賣了&#xff01; 在該購物網站的商品說明&#xff0c;還很KUSO這…

利用“栈”解决“出轨”问题

本图文利用“栈”的知识解决了“出轨”问题&#xff01;

a标签点击事件

οnclick"detail(this,${vo.id})" function detail(obj,id){ var lb $("#lb").val(); $(obj).attr("href","${rootUrl }app/wx/recipeOrder/getCoudetail?id"id"&lb"lb); document.location.hrefobj.href; }

maven依赖范围

其中依赖范围scope 用来控制依赖和编译&#xff0c;测试&#xff0c;运行的classpath&#xff08;注意是与classpath&#xff09;的关系. 主要的是三种依赖关系如下&#xff1a; 1.compile&#xff1a; 默认编译依赖范围。对于编译&#xff0c;测试&#xff0c;运行三种classpa…

'or'='or'经典漏洞原理分析

oror漏洞是一个比较老的漏洞了&#xff0c;主要是出现在后台登录上&#xff0c;利用这个漏洞&#xff0c;我们可以不用输入密码就直接进入系统的后台。它出现的原因是在编程时逻辑上考虑不周&#xff0c;同时对单引号没有进行过滤&#xff0c;从而导致了漏洞的出现。先给大家简…

第七篇:数据预处理(四) - 数据归约(PCA/EFA为例)

前言 这部分也许是数据预处理最为关键的一个阶段。 如何对数据降维是一个很有挑战&#xff0c;很有深度的话题&#xff0c;很多理论书本均有详细深入的讲解分析。 本文仅介绍主成分分析法(PCA)和探索性因子分析法(EFA)&#xff0c;并给出具体的实现步骤。 主成分分析法 - PCA 主…

Matlab编程与数据类型 -- 函数M文件的调用

本图文介绍了Matlab中函数M文件的调用方式。

直接依赖,间接依赖,可选依赖,排除依赖,依赖冲突

直接依赖 在本工程pom文件中配置的依赖&#xff0c;称为本工程的直接依赖。间接依赖 本工程pom配置了依赖A&#xff0c;A又依赖B&#xff0c;则本工程也依赖B&#xff0c;B为本工程的间接依赖。可选依赖 在依赖中配置<optional> true/false 是否向下传递&#xff0c;如果…

Windows 编程[9] - WM_CLOSE 消息

本例效果图:program Project1;usesWindows, Messages;{供 WM_CLOSE 消息调用的自定义过程} procedure OnClose(h: HWND); beginif IDOK MessageBox(h, 确认关闭吗?, 提示, MB_OKCANCEL) thenDestroyWindow(h); end;function WndProc(wnd: HWND; msg: UINT; wParam: Integer; …

Python自动化测试白羊座-week3切片+元组

name zcl,py,zyznames [zcl,py,zyz]print(names[0])print(names[0:2]) #切片就是从里面取几个元素, 从第几个取到第几个结束.取值时顾头不顾尾.print(names[1])#切片操作对字符串也适用name1[zcl,py,zyz]print(name1[2])num list(range(10)) #用range生成列表&#xff0c;需…

Matlab编程与数据类型 -- 函数M文件的组成

本图文介绍了Matlab中函数M文件的组成。

intellij idea 必知的debug功能

1.设置断点 选定要设置断点的代码行&#xff0c;在行号的区域后面单击鼠标左键即可。 2.开启调试会话 点击红色箭头指向的小虫子&#xff0c;开始进入调试。 IDE下方出现Debug视图&#xff0c;红色的箭头指向的是现在调试程序停留的代码行&#xff0c;方法f2()中&#xff0c;程…

Lession 15 Good news

1 语法:直接引语;间接引语; 直接引语:用引号"" 直接把要说的话引起来; l am busy, he said. 间接引语:转述说话人的话; He said that he is busy. 间接引语:1>陈述句,say,tell,来转述,人称,时态,指示代词,时间状语,地点状语…

使用HTML5监測站点性能

在这个信息爆炸的互联网时代&#xff0c;越来越多的人缺少了等待的耐心。站点性能对于一个站点来说越来越重要。下面为监控到的站点打开时间对跳出率的影响&#xff1a; 当站点打开时间在0-1秒时&#xff0c;跳出率为12% 当站点打开时间在1-2秒时&#xff0c;跳出率为26% 当站点…

Matlab编程与数据类型 -- 单元数组

Matlab编程与数据类型 – 单元数组

反向代理服务器的工作原理

最近接触了nginx&#xff0c;nginx可以作为一个反向代理服务器完成负载均衡&#xff0c;下面记录一下从网上学习到的一些知识。 一 概述 反向代理&#xff08;Reverse Proxy&#xff09;方式是指以代理服务器来接受Internet上的连接请求&#xff0c;然后将请求转发给…

VS2005 制作安装程序的一些网络教程

原文写于&#xff1a;2006-12-14 在VS2005中&#xff0c;制作安装程序需要建立setup project&#xff0c;后面的操作就比较简单了。我在网上看了一些文章&#xff0c;把它们列在下面&#xff1a; 文章 内容 备注 Customizing Setup Project in Visual Studio.NET 2005 …

还有这种操作?

【GDB调试】 用bat避免路径问题 :a g cyc.cpp -g -Wall -o cyc -m32 gdb32 cyc.exe pause goto a debuger.bat开O2在调试中容易出现奇怪错误&#xff0c;尽量不要开&#xff01; 常用命令&#xff1a; ---------- help info 输出所有cmd指令 r 运行 ---------- b 100 在100行前…

什么是人工神经网络?

本图文详细介绍了人工神经网络的生物学基础&#xff0c;并在此基础上推导出人工神经网络的数学模型。