re unicode 范围报错

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 2738 天前的主题，其中的信息可能已经有所发展或是发生改变。

在 https://repl.it/languages/python 使用 python 和 python3，执行这个 re 都没问题

import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)

但是在 Ubuntu 14.04 LTS 的 python 和 python3.4 执行

Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
['']
>>> 



Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
[u'\U0001f61b']
>>>

在 CentOS 执行

Python 2.7.10 (default, Oct 21 2015, 19:55:03) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re 
>>> re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python2.7/lib/python2.7/re.py", line 181, in findall
	return _compile(pattern, flags).findall(string)
  File "/usr/local/python2.7/lib/python2.7/re.py", line 251, in _compile
	raise error, v # invalid expression
sre_constants.error: bad character range
>>> 

Python 2.6.6 (r266:84292, Jul 23 2015, 15:22:56) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)  
[u'\U0001f61b']
>>>

想请教下各位大侠的是长什么样的？对比了下，2.7 的 re 源码是一样的，而 GCC 版本明显不同，但是同个 CentOS 上 Python 2.6 是正常的

第 1 条附言 · 2017-06-01 15:08:45 +08:00

找到问题了
https://stackoverflow.com/questions/10798605/warning-raised-by-inserting-4-byte-unicode-to-mysql

1 条回复 • 2017-06-01 15:51:40 +08:00

wwqgtxx

2017-06-01 15:51:40 +08:00

wwq@ubuntu:~$ python3.5
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
['😛']
>>>
wwq@ubuntu:~$ python3.6
Python 3.6.1 (default, Apr 22 2017, 20:17:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
['😛']
>>>
wwq@ubuntu:~$ python2.7
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
[u'\U0001f61b']
>>>