今天对比了一下 python re 和 google 的 re2 , findall 搜索字符串 100 万次 python 需要 57s 左右, re2 大概只需要 11 秒左右(性能测试结果来自 cProfile ),那么现在问题来了, tornado 参数搜索 /无效数据过滤 /还有路由匹配都大量使用了 re ,如果替换成 re2 会有相关的性能提升吗?
python-re2 : https://pypi.python.org/pypi/re2/
google re2: https://github.com/google/re2
BTW :说一下 python-re2 安装方法,先把 github 上的源码先编译安装好,然后用 pip 安装 re2 就行
测试代码:
from datetime import date
import tornado.escape
import tornado.ioloop
import tornado.web
class VersionHandler(tornado.web.RequestHandler):
def get(self):
response = { 'version': '3.5.1',
'last_build': date.today().isoformat() }
self.write(response)
class GetGameByIdHandler(tornado.web.RequestHandler):
def get(self, id):
response = { 'id': int(id),
'name': 'Crazy Game',
'release_date': date.today().isoformat() }
self.write(response)
application = tornado.web.Application([
(r"/getgamebyid/([0-9]+)", GetGameByIdHandler),
(r"/version", VersionHandler)
])
if name == "main":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
python-re:
[server@localhost ~]$ webbench -c 1000 -t 60 -2 --get http://192.168.1.108:8888/getgamebyid/1
Webbench - Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET http://192.168.1.108:8888/getgamebyid/1 (using HTTP/1.1)
1000 clients, running 60 sec.
Speed=80840 pages/min, 349074 bytes/sec.
Requests: 80247 susceed, 593 failed.
python-re2
Benchmarking: GET http://192.168.1.108:8888/getgamebyid/1 (using HTTP/1.1)
1000 clients, running 60 sec.
Speed=81921 pages/min, 0 bytes/sec.
看起来能得得到小幅度的性能提升,但是 RE2 没有 groupindex 这个对象,会报错,估计得分析一下 pythgon-re2 的源码源码:
File "/usr/lib64/python2.7/site-packages/tornado/web.py", line 1994, in _find_handler
if spec.regex.groupindex:
AttributeError: 're2.Pattern' object has no attribute 'groupindex'