我用 spark 读写 es 报错:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.mr.LinkedMapWritable
代码如下:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("elasticsearch-hadoop")
sc = SparkContext(conf=conf)
# read in ES index/type "products/kcosmetics"
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={ "es.resource" : "products" })
print(es_rdd.first())
kcosmetics_availability = es_rdd.map(lambda item: ("key",{
'id': item[0] , ## _id from products/kcosmetics
'availability': item[1]['availability']
}))
# write the results to "titanic/value_counts"
kcosmetics_availability.saveAsNewAPIHadoopFile(
path='-',
outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={
"es.index.auto.create": "true", # auto creating index as inserted
"es.mapping.id": "id", # auto mapping id as index id
"es.resource" : "products/kcosmetics_stocks" })
根据错误信息我又去安装 elasticsearch-hadoop, 结果提示我:
java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-spark-20_2.11-5.6.3.jar
jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-hadoop-mr-5.6.3.jar
spark 版本是:2.2 elasticsearch-spark 版本: 5.6.3
1
badttt 2017-10-20 21:39:58 +08:00 via iPhone 1
jar 包版本问题,把 es spark 的删掉
|
2
ligyxy 2017-10-21 03:40:42 +08:00 1
你只需要安装其中的 elasticsearch-hadoop-5.6.3.jar
|
3
SlipStupig OP @ligyxy 可以了
|