Elasticsearch学习笔记(Elasticsearch learning notes)

Elasticsearch学习笔记

这篇博客用于记录学习和使用Elasticsearch的过程,主要内容包括安装配置和通过Python访问Elasticsearch。Tips: Elasticsearch安装在一台Linux服务器上。

  • Elasticsearch学习笔记

    安装配置Elasticsearch
    通过Python访问Elasticsearch
    参考资料:

  • 安装配置Elasticsearch
  • 通过Python访问Elasticsearch
  • 参考资料:

安装配置Elasticsearch

  • 下载安装包:Download Elasticsearch;
  • 解压缩:tar -xvf elasticsearch-7.15.2-linux-x86_64.tar.gz;
  • 修改config目录下的elasticsearch.yml文件,配置局域网访问:network.host: 0.0.0.0;
  • 切换到bin目录,敲击命令./elasticsearch启动Elasticsearch,出现以下错误信息:
    ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
    bootstrap check failure [1] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
    bootstrap check failure [2] of [2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

  • 由于当前用户拥有的内存权限太小,Elasticsearch不能正常启动,需要修改系统配置文件/etc/sysctl.conf,设置vm.max_map_count=262144,重启系统(或执行sysctl -w vm.max_map_count=262144);
  • 另外,由于没有指定以下配置项,Elasticsearch不能正常启动:

    discovery.seed_hosts: 集群主机列表;
    discovery.seed_providers: 基于配置文件配置集群主机列表;
    cluster.initial_master_nodes: 启动时初始化的参与选主的node,生产环境必填。

    修改配置文件elasticsearch.yml,设置discovery.seed_hosts: [“192.168.1.xx”]和cluster.initial_master_nodes: [“192.168.1.xx:9300”];

  • discovery.seed_hosts: 集群主机列表;
  • discovery.seed_providers: 基于配置文件配置集群主机列表;
  • cluster.initial_master_nodes: 启动时初始化的参与选主的node,生产环境必填。
  • 重新启动Elasticsearch,浏览器访问http://192.168.1.xx:9200/:
    {
    “name” : “xxxx”,
    “cluster_name” : “elasticsearch”,
    “cluster_uuid” : “4urQVMKyQgGl0oTM_wvgjQ”,
    “version” : {
    “number” : “7.15.2”,
    “build_flavor” : “default”,
    “build_type” : “tar”,
    “build_hash” : “93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c”,
    “build_date” : “2021-11-04T14:04:42.515624022Z”,
    “build_snapshot” : false,
    “lucene_version” : “8.9.0”,
    “minimum_wire_compatibility_version” : “6.8.0”,
    “minimum_index_compatibility_version” : “6.0.0-beta1”
    },
    “tagline” : “You Know, for Search”
    }

  • 安装、配置成功!

通过Python访问Elasticsearch

  • 安装Elasticsearch的Python客户端:conda install elasticsearch;
  • 连接Elasticsearch:
    from elasticsearch import Elasticsearch

    es = Elasticsearch(hosts=[‘192.168.1.xx’])
    result = es.indices.create(index=’news_and_events’, ignore=400) # 状态码400表示由于已经存在同名Index,创建失败
    print(result)

  • 安装插件elasticsearch-analysis-ik,使Elasticsearch具备中文分词的能力:
    ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.15.2/elasticsearch-analysis-ik-7.15.2.zip

    安装成功后,重启Elasticsearch;

  • 填充数据:
    from elasticsearch import Elasticsearch
    from tqdm import tqdm

    # 导入本地的模块
    from database import SessionLocal
    from models import Record

    es = Elasticsearch(hosts=[‘192.168.1.xx’])
    mapping = {
    ‘properties’: {
    ‘title’: {
    ‘type’: ‘text’,
    ‘analyzer’: ‘ik_max_word’,
    ‘search_analyzer’: ‘ik_max_word’
    },
    ‘content’: {
    ‘type’: ‘text’,
    ‘analyzer’: ‘ik_max_word’,
    ‘search_analyzer’: ‘ik_max_word’
    }
    }
    }
    es.indices.create(index=’news_and_events’, ignore=400) # Elasticsearch中的index可以类比关系型数据库里面的database
    es.indices.put_mapping(index=’news_and_events’, doc_type=’records’, body=mapping, include_type_name=True) # doc_type类比关系模式

    # 查询数据库,导出所有的新闻和公告
    db = SessionLocal()
    result_set = db.query(Record).all()

    for record in tqdm(result_set):
    data = {
    ‘record_id’: record.record_id,
    ‘title’: record.title,
    ‘content’: record.content
    }
    es.create(index=’news_and_events’, doc_type=’records’, id=record.record_id, document=data)

    # 关闭数据库和Elasticsearch连接
    db.close()
    es.close()

  • 查询数据:
    q = {
    ‘query’: {
    ‘multi_match’: {
    ‘query’: ‘成都重庆双城经济圈’,
    ‘fields’: [‘title^2’, ‘content’]
    }
    }
    }
    results = es.search(q, index=’news_and_events’, doc_type=’records’)

    Elasticsearch返回的结果:
    {
    “took”: 10,
    “timed_out”: false,
    “_shards”: {
    “total”: 1,
    “successful”: 1,
    “skipped”: 0,
    “failed”: 0
    },
    “hits”: {
    “total”: {
    “value”: 623,
    “relation”: “eq”
    },
    “max_score”: 57.087215,
    “hits”: [
    {
    “_index”: “news_and_events”,
    “_type”: “records”,
    “_id”: “2508”,
    “_score”: 57.087215,
    “_source”: {
    “record_id”: 2508,
    “title”: “关于成渝地区双城经济圈创新创业峰会的通知”,
    “content”: “各学院,各位老师和同学:\n现转发重庆市教育委员会和四川省教育厅等六部门联合发布的《关于举办\”智创巴蜀\”首届成渝地区双城经济圈创新创业峰会的通知》,详见附件。欢迎积极参加。\n联系人:x老师\n联系电话:xxxxxxxx\n教务处\nxxxx年xx月xx日\n附件1-川渝6部门联合发峰会正式文件”
    }
    }
    ]
    }
    }

参考资料:

  • Important system configuration / Virtual memory
  • 启动elasticsearch报错:max virtual memory areas vm.max_map_count
  • ES启动异常:the default discovery settings are unsuitable for production use…
  • master not discovered yet, this node has not…
  • Elasticsearch基本介绍及其与Python的对接实现
  • Python Elasticsearch Client
————————

Elasticsearch学习笔记

This blog is used to record the process of learning and using elasticsearch. The main contents include installation, configuration and accessing elasticsearch through python. Tips: elasticsearch is installed on a Linux server.

  • Elasticsearch学习笔记

    安装配置Elasticsearch
    通过Python访问Elasticsearch
    参考资料:

  • 安装配置Elasticsearch
  • 通过Python访问Elasticsearch
  • reference material:

安装配置Elasticsearch

  • 下载安装包:Download Elasticsearch;
  • 解压缩:tar -xvf elasticsearch-7.15.2-linux-x86_64.tar.gz;
  • 修改config目录下的elasticsearch.yml文件,配置局域网访问:network.host: 0.0.0.0;
  • 切换到bin目录,敲击命令./elasticsearch启动Elasticsearch,出现以下错误信息:
    ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
    bootstrap check failure [1] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
    bootstrap check failure [2] of [2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

  • Elasticsearch cannot start normally because the current user has too little memory permission. You need to modify the system configuration file / etc / sysctl.conf and set vm.max_ map_ Count = 262144, restart the system (or execute sysctl – w vm.max_map_count = 262144);
  • 另外,由于没有指定以下配置项,Elasticsearch不能正常启动:

    discovery.seed_hosts: 集群主机列表;
    discovery.seed_providers: 基于配置文件配置集群主机列表;
    cluster.initial_master_nodes: 启动时初始化的参与选主的node,生产环境必填。

    修改配置文件elasticsearch.yml,设置discovery.seed_hosts: [“192.168.1.xx”]和cluster.initial_master_nodes: [“192.168.1.xx:9300”];

  • discovery.seed_hosts: 集群主机列表;
  • discovery.seed_providers: 基于配置文件配置集群主机列表;
  • cluster.initial_master_nodes: 启动时初始化的参与选主的node,生产环境必填。
  • 重新启动Elasticsearch,浏览器访问http://192.168.1.xx:9200/:
    {
    “name” : “xxxx”,
    “cluster_name” : “elasticsearch”,
    “cluster_uuid” : “4urQVMKyQgGl0oTM_wvgjQ”,
    “version” : {
    “number” : “7.15.2”,
    “build_flavor” : “default”,
    “build_type” : “tar”,
    “build_hash” : “93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c”,
    “build_date” : “2021-11-04T14:04:42.515624022Z”,
    “build_snapshot” : false,
    “lucene_version” : “8.9.0”,
    “minimum_wire_compatibility_version” : “6.8.0”,
    “minimum_index_compatibility_version” : “6.0.0-beta1”
    },
    “tagline” : “You Know, for Search”
    }

  • Installation and configuration succeeded!

通过Python访问Elasticsearch

  • 安装Elasticsearch的Python客户端:conda install elasticsearch;
  • 连接Elasticsearch:
    from elasticsearch import Elasticsearch

    es = Elasticsearch(hosts=[‘192.168.1.xx’])
    result = es.indices.create(index=’news_and_events’, ignore=400) # 状态码400表示由于已经存在同名Index,创建失败
    print(result)

  • 安装插件elasticsearch-analysis-ik,使Elasticsearch具备中文分词的能力:
    ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.15.2/elasticsearch-analysis-ik-7.15.2.zip

    安装成功后,重启Elasticsearch;

  • 填充数据:
    from elasticsearch import Elasticsearch
    from tqdm import tqdm

    # 导入本地的模块
    from database import SessionLocal
    from models import Record

    es = Elasticsearch(hosts=[‘192.168.1.xx’])
    mapping = {
    ‘properties’: {
    ‘title’: {
    ‘type’: ‘text’,
    ‘analyzer’: ‘ik_max_word’,
    ‘search_analyzer’: ‘ik_max_word’
    },
    ‘content’: {
    ‘type’: ‘text’,
    ‘analyzer’: ‘ik_max_word’,
    ‘search_analyzer’: ‘ik_max_word’
    }
    }
    }
    es.indices.create(index=’news_and_events’, ignore=400) # Elasticsearch中的index可以类比关系型数据库里面的database
    es.indices.put_mapping(index=’news_and_events’, doc_type=’records’, body=mapping, include_type_name=True) # doc_type类比关系模式

    # 查询数据库,导出所有的新闻和公告
    db = SessionLocal()
    result_set = db.query(Record).all()

    for record in tqdm(result_set):
    data = {
    ‘record_id’: record.record_id,
    ‘title’: record.title,
    ‘content’: record.content
    }
    es.create(index=’news_and_events’, doc_type=’records’, id=record.record_id, document=data)

    # 关闭数据库和Elasticsearch连接
    db.close()
    es.close()

  • Query data:
    q = {
    ‘query’: {
    ‘multi_match’: {
    ‘query’: ‘Chengdu Chongqing double City Economic Circle’,
    ‘fields’: [‘title^2’, ‘content’]
    }
    }
    }
    results = es.search(q, index=’news_and_events’, doc_type=’records’)
    Results returned by elasticsearch:
    {
    “took”: 10,
    “timed_out”: false,
    “_shards”: {
    “total”: 1,
    “successful”: 1,
    “skipped”: 0,
    “failed”: 0
    },
    “hits”: {
    “total”: {
    “value”: 623,
    “relation”: “eq”
    },
    “max_score”: 57.087215,
    “hits”: [
    {
    “_index”: “news_and_events”,
    “_type”: “records”,
    “_id”: “2508”,
    “_score”: 57.087215,
    “_source”: {
    “record_id”: 2508,
    “Title”: “notice on the innovation and entrepreneurship summit of Chengdu Chongqing twin city economic circle”,
    “Content”: “all colleges, teachers and students: \ n we hereby forward the notice on holding the first Chengdu Chongqing twin city economic circle innovation and entrepreneurship summit” jointly issued by Chongqing Municipal Education Commission and Sichuan Provincial Department of education , see the attachment for details. Welcome to participate actively. \ n contact person: Teacher X \ n contact number: XXXXXXXX \ n Academic Affairs Office \ n xxxxx, XX \ n attachment 1 – official document of Sichuan Chongqing 6 department joint development summit“
    }
    }
    ]
    }
    }

reference material:

  • Important system configuration / Virtual memory
  • 启动elasticsearch报错:max virtual memory areas vm.max_map_count
  • ES启动异常:the default discovery settings are unsuitable for production use…
  • master not discovered yet, this node has not…
  • Elasticsearch基本介绍及其与Python的对接实现
  • Python Elasticsearch Client