V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
SlipStupig
V2EX  ›  程序员

pyspark 读写 elastic search 出错,求帮助

  •  
  •   SlipStupig · 2017-10-20 21:11:01 +08:00 · 4706 次点击
    这是一个创建于 2591 天前的主题,其中的信息可能已经有所发展或是发生改变。

    我用 spark 读写 es 报错:

    py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
    : java.lang.ClassNotFoundException: org.elasticsearch.hadoop.mr.LinkedMapWritable
    

    代码如下:

    from pyspark import SparkConf, SparkContext
    
    conf = SparkConf().setMaster("local").setAppName("elasticsearch-hadoop")
    sc = SparkContext(conf=conf)
    
    # read in ES index/type "products/kcosmetics"
    es_rdd = sc.newAPIHadoopRDD(
        inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
        keyClass="org.apache.hadoop.io.NullWritable",
        valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
        conf={ "es.resource" : "products" })
    print(es_rdd.first())
    
    kcosmetics_availability = es_rdd.map(lambda item: ("key",{
        'id': item[0] , ## _id from products/kcosmetics
        'availability': item[1]['availability']
    }))
    
    # write the results to "titanic/value_counts"
    kcosmetics_availability.saveAsNewAPIHadoopFile(
        path='-',
        outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
        keyClass="org.apache.hadoop.io.NullWritable",
        valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
        conf={
            "es.index.auto.create": "true", # auto creating index as inserted
            "es.mapping.id": "id",          # auto mapping id as index id
            "es.resource" : "products/kcosmetics_stocks" })
    

    根据错误信息我又去安装 elasticsearch-hadoop, 结果提示我:

    java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
    jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-spark-20_2.11-5.6.3.jar
    jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-hadoop-mr-5.6.3.jar
    

    spark 版本是:2.2 elasticsearch-spark 版本: 5.6.3

    3 条回复    2017-10-21 08:40:03 +08:00
    badttt
        1
    badttt  
       2017-10-20 21:39:58 +08:00 via iPhone   ❤️ 1
    jar 包版本问题,把 es spark 的删掉
    ligyxy
        2
    ligyxy  
       2017-10-21 03:40:42 +08:00   ❤️ 1
    你只需要安装其中的 elasticsearch-hadoop-5.6.3.jar
    SlipStupig
        3
    SlipStupig  
    OP
       2017-10-21 08:40:03 +08:00
    @ligyxy 可以了
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   1926 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 24ms · UTC 16:16 · PVG 00:16 · LAX 08:16 · JFK 11:16
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.