有人知道什么工具或者算法，可以将多个词组组成短语？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 1841 天前的主题，其中的信息可能已经有所发展或是发生改变。

最近想实现一个这样的功能，来改变一下 elasticsearch 文本搜索的匹配度。
现将字符串 “某数学兴趣小组开展了一次课外活动” 放到 elasticsearch 里去搜索，会使用 ik 分词器 ik_smart 进行分词，分词结果如下：

{
  "tokens" : [
    {
      "token" : "某",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "数学",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "兴趣小组",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "开展",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "了",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "一次",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "课外活动",
      "start_offset" : 12,
      "end_offset" : 16,
      "type" : "CN_WORD",
      "position" : 6
    }
  ]
}

但是这样的结果并不能满足的我需求，我想将上面分词后的结果再进行重组，组成新的、有意义的短语，如：一次数学课外活动。

请问有没有某种工具或算法能实现这样的需求？

1 条回复 • 2019-11-10 10:29:45 +08:00

noreplay

2019-11-10 10:29:45 +08:00 via Android

试试 gan ？我随便说的哈。手动狗头＋滑稽保命😁