毕设遇到的一些困难,刚接触hadoop。我这里有抓取到的一些商家的信息的文本,格式大概这样:
{
"status": 0,
"message": "ok",
"result": {
"name": "台北纯K(江北店)",
"location": {
"lng": 106.55089,
"lat": 29.585814
},
"address": "江北区北城天街46号九街高屋1楼(近同创国际)",
"telephone": "(023)67116711",
"uid": "d874d4cbb060e92e2bd7ab37",
"detail_info": {
"tag": "休闲娱乐;ktv",
"detail_url": "http://api.map.baidu.com/place/detail?uid\u003dd874d4cbb060e92e2bd7ab37\u0026output\u003dhtml\u0026source\u003dplaceapi_v2",
"type": "life",
"price": "95",
"overall_rating": "4.5",
"service_rating": "0",
"environment_rating": "0",
"image_num": "30",
"comment_num": "100",
"shop_hours": "11:00-",
"description": "门店介绍:"
}
}
}
{
"status": 0,
"message": "ok",
"result": {
"name": "欢乐迪KTV(未来国际店)",
"location": {
"lng": 106.53938,
"lat": 29.580435
},
"address": "观音桥步行街未来国际大厦5楼",
"telephone": "023-67704888",
"uid": "f696f3c267d7b4f21b11d5cd",
"detail_info": {
"tag": "休闲娱乐;ktv",
"detail_url": "http://api.map.baidu.com/place/detail?uid\u003df696f3c267d7b4f21b11d5cd\u0026output\u003dhtml\u0026source\u003dplaceapi_v2",
"type": "life",
"price": "23",
"overall_rating": "5.0",
"service_rating": "3.4",
"environment_rating": "3.6",
"image_num": "30",
"groupon_num": "10",
"comment_num": "1052",
"shop_hours": "13-00-次日凌晨3:30",
"alias": "HappydayKTV",
"description": "门店介绍:"
}
}
}
现在我想利用hadoop实现 提取某几个标签输出出来,比如输出成如下格式:
name overall_rating address
台北纯K(江北店) 4.5 江北区北城天街46号九街高屋1楼(近同创国际)
欢乐迪KTV(未来国际店) 5.0 观音桥步行街未来国际大厦5楼
然后再根据评分进行一下排序并输出排序结果,哪位懂的能给予一些指导吗?谢谢~
1
woshifyz 2015-05-24 22:19:21 +08:00
你确定你要的是M/R, 不是外排?
|
2
alschw OP @woshifyz 谢谢回复 我查了一下外排(mapper 对每段数据先做排序,reducer 的 shuffle 对排好序的每段数据做归并)也OK啊 但是刚刚接触hadoop 确实不是很了解怎么去写map和reduce函数 环境配了好久才好的 能在windows下的eclipse下编译运行wordcount 接下来处理我上面提到的数据不知道怎么搞了 0.0
|
3
lujiajing1126 2015-05-25 09:21:49 +08:00 via Android
用hive→_→
|
5
alschw OP @woshifyz 我的想法是这样的:用hadoop 处理文本中的评分数据 并简单计算自定义评分然后将其排序输出 只用到其中某几项键值: name、overall_rating、comment_num、address 取overall_rating(O)和comment_num(C)值,利用公式“score=O*0.5+C/(C+200)*0.5” 对score排序并按照score从大到小的顺序在每一行输出 “score name address”这种格式。确实不知道怎么弄了,您能不能给我一些帮助呀?
|