V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
kingmo888
V2EX  ›  Python

请教 python 多进程并行计算的问题

  •  
  •   kingmo888 · 2016-05-23 15:02:22 +08:00 · 2712 次点击
    这是一个创建于 3100 天前的主题,其中的信息可能已经有所发展或是发生改变。

    在网上看到一个 demo ,自己整理了一下,运行 ok 。因为是在 if name == "main"下执行的。

    如果直接将 py 文件作为一个脚本运行,而不是单文件测试判断语句下的时候,多进程就会崩溃的一塌糊涂。。

    运行了多次发现,感觉就像是开了 N 个进程后,脚本从头至尾再执行 N 遍?

    PS:与例子无关的题外话:我想要实现的功能是,从某个地方读大量数据过来,然后多进程进行计算 - -!

    以下是正常脚本:

    import multiprocessing
    import time
    import pandas as pd
    data = {}
    
    for i in range(10):
        data[i] = pd.DataFrame(list(range(1000)),columns=['num'])
    
    def tfunc(key, data):
        data['sum'] = data['num'].cumsum()
        #data['ma5'] = pd.rolling_apply()
        for i in range(len(data)):
            a=data.at[i,'sum']
            if a>5:
                pass
            if a>10:
                pass
            time.sleep(0.001)
        return data
    
    
    def func(msg):
        for i in range(3):
            print ('func:',msg)
            #time.sleep(1)
        return "done " + msg
    
    if __name__ == "__main__":
        pool = multiprocessing.Pool(processes=4)
        result = []
        for i in range(10):
            msg = "hello %d" %(i)
            result.append(pool.apply_async(tfunc, (i,data[i] )))
        pool.close()
        pool.join()
        for res in result:
            print('result:', res.get())
        print ("Sub-process(es) done.")
    

    以下是出问题脚本,两个脚本差异是取消了—— if name == "main":。。。。。。。

    import multiprocessing
    import time
    import pandas as pd
    data = {}
    
    for i in range(10):
        data[i] = pd.DataFrame(list(range(1000)),columns=['num'])
    
    
    
    
    def tfunc(key, data):
        data['sum'] = data['num'].cumsum()
        #data['ma5'] = pd.rolling_apply()
        for i in range(len(data)):
            a=data.at[i,'sum']
            if a>5:
                pass
            if a>10:
                pass
            #time.sleep(0.001)
        return data
    
    
    def func(msg):
        for i in range(3):
            print ('func:',msg)
            #time.sleep(1)
        return "done " + msg
    
    
    pool = multiprocessing.Pool(processes=4)
    result = []
    for i in range(10):
        msg = "hello %d" %(i)
        result.append(pool.apply_async(tfunc, (i,data[i] )))
    pool.close()
    pool.join()
    for res in result:
        print('result:', res.get())
    print ("Sub-process(es) done.")
    
    4 条回复    2016-05-23 16:57:30 +08:00
    SErHo
        1
    SErHo  
       2016-05-23 16:24:23 +08:00
    kingmo888
        2
    kingmo888  
    OP
       2016-05-23 16:31:42 +08:00
    @SErHo 是 windows ,在当前脚本下,使用 if __name__ == "__main__":进行测试就 ok ,加不加都行,不用 if __name__ == "__main__":就不行。加不加都一样 - -!
    joshz
        3
    joshz  
       2016-05-23 16:52:12 +08:00
    Due to the way the new processes are started, the child process needs to be able to import the script containing the target function. Wrapping the main part of the application in a check for __main__ ensures that it is not run recursively in each child as the module is imported. Another approach is to import the target function from a separate script.

    参考: https://pymotw.com/2/multiprocessing/basics.html
    likuku
        4
    likuku  
       2016-05-23 16:57:30 +08:00
    if __name__ == "__main__" 的话,表示整个脚本从这之后开始跑,没有的话,就是从文件第一行顺次执行。

    既然都这样了,何不直接再加个 main 函数,将你 if __name__ == "__main__" 之下的都 放进 main() 里
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   903 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 32ms · UTC 20:34 · PVG 04:34 · LAX 12:34 · JFK 15:34
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.