python的多线程为什么不能利用多核多线程CPU

没有更多推荐了,
加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!Python 并发编程之使用多线程和多处理器 - 开源中国社区
Python 并发编程之使用多线程和多处理器
英文原文:
One aspect of coding in Python that we have yet to discuss in any great detail is how to optimise the execution performance of our simulations. While NumPy, SciPy and pandas are extremely useful in this regard when considering vectorised code, we aren't able to use these tools effectively when building event-driven systems. Are there any other means available to us to speed up our code? The answer is yes - but with caveats!
In this article we are going to look at the different models of parallelism that can be introduced into our Python programs. These models work particularly well for simulations that do not need to share state. Monte Carlo simulations used for options pricing and backtesting simulations of various parameters for algorithmic trading fall into this category.
In particular we are going to consider the
library and the
在Python编码中我们经常讨论的一个方面就是如何优化模拟执行的性能。尽管在考虑量化代码时NumPy、SciPy和pandas在这方面已然非常有用,但在构建事件驱动系统时我们无法有效地使用这些工具。有没有可以加速我们代码的其他办法?答案是肯定的,但需要留意!
在这篇文章中,我们看一种不同的模型-并发,我们可以将它引入我们Python程序中。这种模型在模拟中工作地特别好,它不需要共享状态。Monte Carlo模拟器可以用来做期权定价以及检验算法交易等类型的各种参数的模拟。
我们将特别考虑库和库。
Concurrency in Python
One of the most frequently asked questions from beginning Python programmers when they explore multithreaded code for optimisation of CPU-bound code is &Why does my program run slower when I use multiple threads?&.
The expectation is that on a multi-core machine a multithreaded code should make use of these extra cores and thus increase overall performance. Unfortunately the internals of the main Python interpreter, , negate the possibility of true multi-threading due to a process known as the
The GIL is necessary because the Python interpreter is not thread safe. This means that there is a globally enforced lock when trying to safely access Python objects from within threads. At any one time only a single thread can acquire a lock for a Python object or C API. The interpreter will reacquire this lock for every 100 bytecodes of Python instructions and around (potentially) blocking I/O operations. Because of this lock CPU-bound code will see no gain in performance when using the Threading library, but it will likely gain performance increases if the Multiprocessing library is used.
其它翻译版本:1(点击译者名切换)
Python并发
当Python初学者探索多线程的代码为了计算密集型优化时,问得最多的问题之一是:”当我用多线程的时候,为什么我的程序变慢了?“
在多核机器上,我们期望多线程的代码使用额外的核,从而提高整体性能。不幸的是,主Python解释器(CPython)的内部并不是真正的多线程,是通过一个全局解释锁(GIL)来进行处理的。
GIL是必须的,因为Python解释器是非线程安全的。这意味着当从线程内尝试安全的访问Python对象的时候将有一个全局的强制锁。在任何时候,仅仅一个单一的线程能够获取Python对象或者C API。每100个字节的Python指令解释器将重新获取锁,这(潜在的)阻塞了I/0操作。因为锁,CPU密集型的代码使用线程库时,不会获得性能的提高,但是当它使用多处理库时,性能可以获得提高。
Parallelisation Libraries Implementation
We are now going to utilise the above two separate libraries to attempt a parallel optimisation of a &toy& problem.
Threading Library
Above we alluded to the fact that Python on the CPython interpreter does not support true multi-core execution via multithreading. However, Python DOES have a
library. So what is the benefit of using the library if we (supposedly) cannot make use of multiple cores?
Many programs, particularly those relating to network programming or data input/output (I/O) are often network-bound or I/O bound. This means that the Python interpreter is awaiting the result of a function call that is manipulating data from a &remote& source such as a network address or hard disk. Such access is far slower than reading from local memory or a CPU-cache.
Hence, one means of speeding up such code if many data sources are being accessed is to generate a thread for each data item needing to be accessed.
其它翻译版本:1(点击译者名切换)
并行库的实现
现在,我们将使用上面所提到的两个库来实现对一个“小”问题进行并发优化。
上面我们提到: 运行CPython解释器的Python不会支持通过多线程来实现多核处理。不过,Python确实有一个。那么如果我们(可能)不能使用多个核心进行处理,那么使用这个库能取得什么好处呢?
许多程序,尤其是那些与网络通信或者数据输入/输出(I/O)相关的程序,都经常受到网络性能或者输入/输出(I/O)性能的限制。这样Python解释器就会等待哪些从诸如网络地址或者硬盘等“远端”数据源读写数据的函数调用返回。因此这样的数据访问比从本地内存或者CPU缓冲区读取数据要慢的多。
因此,如果许多数据源都是通过这种方式访问的,那么就有一种方式对这种数据访问进行性能提高,那就是对每个需要访问的数据项都产生一个线程 。
For example, consider a Python code that is scraping many web URLs. Given that each URL will have an associated download time well in excess of the CPU processing capability of the computer, a single-threaded implementation will be significantly I/O bound.
By adding a new thread for each download resource, the code can download multiple data sources in parallel and combine the results at the end of every download. This means that each subsequent download is not waiting on the download of earlier web pages. In this case the program is now bound by the bandwidth limitations of the client/server(s) instead.
However, many financial applications ARE CPU-bound since they are highly numerically intensive. They often involve large-scale numerical linear algebra solutions or random statistical draws, such as in Monte Carlo simulations. Thus as far as Python and the GIL are concerned, there is no benefit to using the Python Threading library for such tasks.
举个例子,假设有一段Python代码,它用来对许多站点的URL进行扒取。再假定下载每个URL所需时间远远超过计算机CPU对它的处理时间,那么仅使用一个线程来实现就会大大地受到输入/输出(I/O)性能限制。
通过给每个下载资源生成一个新的线程,这段代码就会并行地对多个数据源进行下载,在所有下载都结束的时候再对结果进行组合。这就意味着每个后续下载都不会等待前一个网页下载完成。此时,这段代码就受收到客户/服务端带宽的限制。
不过,许多与财务相关的应用都受到CPU性能的限制,这是因为这样的应用都是高度集中式的对数字进行处理。这样的应用都会进行大型线性代数计算或者数值的随机统计,比如进行蒙地卡罗模拟统计。所以只要对这样的应用使用Python和全局解释锁(GIL),此时使用Python线程库就不会有任何性能的提高。
Python Implementation
The following code illustrates a multithreaded implementation for a &toy& code that sequentially adds numbers to lists. Each thread creates a new list and adds random numbers to it. This has been chosen as a toy example since it is CPU heavy.
The following code will outline the interface for the Threading library but it will not grant us any additional speedup beyond that obtainable in a single-threaded implementation. When we come to use the Multiprocessing library below, we will see that it will significantly decrease the overall runtime.
Let's examine how the code works. Firstly we import the threading library. Then we create a function list_append that takes three parameters. The first, count, determines the size of the list to create. The second, id, is the ID of the &job& (which can be useful if we are writing debug info to the console). The third parameter, out_list, is the list to append the random numbers to.
其它翻译版本:1(点击译者名切换)
Python实现
下面这段依次添加数字到列表的“玩具”代码,举例说明了多线程的实现。每个线程创建一个新的列表并随机添加一些数字到列表中。这个已选的“玩具”例子对CPU的消耗非常高。
下面的代码概述了线程库的接口,但是他不会比我们用单线程实现的速度更快。当我们对下面的代码用多处理库时,我们将看到它会显著的降低总的运行时间。
让我们检查一下代码是怎样工作的。首先我们导入threading库。然后我们创建一个带有三个参数的函数list_append。第一个参数count定义了创建列表的大小。第二个参数id是“工作”(用于我们输出debug信息到控制台)的ID。第三个参数out_list是追加随机数的列表。
The __main__ function creates a size of
and uses two threads to carry out the work. It then creates a jobs list, which is used to store the separate threads. The threading.Thread object takes the list_append function as a parameter and then appends it to the jobs list.
Finally, the jobs are sequentially started and then sequentially &joined&. The join() method blocks the calling thread (i.e. the main Python interpreter thread) until the thread has terminated. This ensures that all of the threads are complete before printing the completion message to the console:
#&thread_test.pyimport&randomimport&threadingdef&list_append(count,&id,&out_list):
Creates&an&empty&list&and&then&appends&a&
random&number&to&the&list&'count'&number
of&times.&A&CPU-heavy&operation!
for&i&in&range(count):
out_list.append(random.random())if&__name__&==&&__main__&:
size&=&&&&#&Number&of&random&numbers&to&add
threads&=&2&&&#&Number&of&threads&to&create
#&Create&a&list&of&jobs&and&then&iterate&through
#&the&number&of&threads&appending&each&thread&to
#&the&job&list&
for&i&in&range(0,&threads):
out_list&=&list()
thread&=&threading.Thread(target=list_append(size,&i,&out_list))
jobs.append(thread)
#&Start&the&threads&(i.e.&calculate&the&random&number&lists)
for&j&in&jobs:
#&Ensure&all&of&the&threads&have&finished
for&j&in&jobs:
print&&List&processing&complete.&
We can time this code using the following console call:
time&python&thread_test.py
It produces the following output:
List&processing&complete.
real&&&&0m2.003s
user&&&&0m1.838s
sys&&&&&0m0.161s
Notice that the user and sys both approximately sum to the real time. This is indicative that we gained no benefit from using the Threading library. If we had then we would expect the real time to be significantly less. These concepts within concurrent programming are usually known as CPU-time and wall-clock time respectively.
其它翻译版本:1(点击译者名切换)
__main__函数创建了一个107的size,并用两个threads执行工作。然后创建了一个jobs列表,用于存储分离的线程。threading.Thread对象将list_append函数作为参数,并将它附加到jobs列表。
最后,jobs分别开始并分别“joined”。join()方法阻塞了调用的线程(例如主Python解释器线程)直到线程终止。在打印完整的信息到控制台之前,确认所有的线程执行完成。
#&thread_test.pyimport&randomimport&threadingdef&list_append(count,&id,&out_list):
Creates&an&empty&list&and&then&appends&a&
random&number&to&the&list&'count'&number
of&times.&A&CPU-heavy&operation!
for&i&in&range(count):
out_list.append(random.random())if&__name__&==&&__main__&:
size&=&&&&#&Number&of&random&numbers&to&add
threads&=&2&&&#&Number&of&threads&to&create
#&Create&a&list&of&jobs&and&then&iterate&through
#&the&number&of&threads&appending&each&thread&to
#&the&job&list&
for&i&in&range(0,&threads):
out_list&=&list()
thread&=&threading.Thread(target=list_append(size,&i,&out_list))
jobs.append(thread)
#&Start&the&threads&(i.e.&calculate&the&random&number&lists)
for&j&in&jobs:
#&Ensure&all&of&the&threads&have&finished
for&j&in&jobs:
print&&List&processing&complete.&
我们能在控制台中调用如下的命令time这段代码
time&python&thread_test.py
将产生如下的输出
List&processing&complete.
real&&&&0m2.003s
user&&&&0m1.838s
sys&&&&&0m0.161s
注意user时间和sys时间相加大致等于real时间。这表明我们使用线程库没有获得性能的提升。我们期待real时间显著的降低。在并发编程的这些概念中分别被称为CPU时间和挂钟时间(wall-clock time)
Multiprocessing Library
In order to actually make use of the extra cores present in nearly all modern consumer processors we can instead use the
library. This works in a fundamentally different way to the Threading library, even though the syntax of the two is extremely similar.
The Multiprocessing library actually spawns multiple operating system processes for each parallel task. This nicely side-steps the GIL, by giving each process its own Python interpreter and thus own GIL. Hence each process can be fed to a separate processor core and then regrouped at the end once all processes have finished.
There are some drawbacks, however. Spawning extra processes introduces I/O overhead as data is having to be shuffled around between processors. This can add to the overall run-time. However, assuming the data is restricted to each process, it is possible to gain significant speedup. Of course, one must always be aware of !
多进程处理库
为了充分地使用所有现代处理器所能提供的多个核心 ,我们就要使用 。它的工作方式与线程库完全不同 ,不过两种库的语法却非常相似 。
多进程处理库事实上对每个并行任务都会生成多个操作系统进程。通过给每个进程赋予单独的Python解释器和单独的全局解释锁(GIL)十分巧妙地规避了一个全局解释锁所带来的问题。而且每个进程还可独自占有一个处理器核心,在所有进程处理都结束的时候再对结果进行重组。
不过也存在一些缺陷。生成许多进程就会带来很多I/O管理问题,这是因为多个处理器对数据的处理会引起数据混乱 。这就会导致整个运行时间增多 。不过,假设把数据限制在每个进程内部 ,那么就可能大大的提高性能 。当然,再怎么提高也不会超过所规定的极限值。
Python Implementation
The only modifications needed for the Multiprocessing implementation include changing the import line and the functional form of the multiprocessing.Process line. In this case the arguments to the target function are passed separately. Beyond that the code is almost identical to the Threading implementation above:
#&multiproc_test.pyimport&randomimport&multiprocessingdef&list_append(count,&id,&out_list):
Creates&an&empty&list&and&then&appends&a&
random&number&to&the&list&'count'&number
of&times.&A&CPU-heavy&operation!
for&i&in&range(count):
out_list.append(random.random())if&__name__&==&&__main__&:
size&=&&&&#&Number&of&random&numbers&to&add
procs&=&2&&&#&Number&of&processes&to&create
#&Create&a&list&of&jobs&and&then&iterate&through
#&the&number&of&processes&appending&each&process&to
#&the&job&list&
for&i&in&range(0,&procs):
out_list&=&list()
process&=&multiprocessing.Process(target=list_append,&
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&args=(size,&i,&out_list))
jobs.append(process)
#&Start&the&processes&(i.e.&calculate&the&random&number&lists)
for&j&in&jobs:
#&Ensure&all&of&the&processes&have&finished
for&j&in&jobs:
print&&List&processing&complete.&
We can once again time this code using a similar console call:
time&python&multiproc_test.py
We receive the following output:
List&processing&complete.
real&&&&0m1.045s
user&&&&0m1.824s
sys&&&&&0m0.231s
Python实现
使用Multiprocessing实现仅仅需要修改导入行和multiprocessing.Process行。这里单独的向目标函数传参数。除了这些,代码几乎和使用Threading实现的一样:
#&multiproc_test.pyimport&randomimport&multiprocessingdef&list_append(count,&id,&out_list):
Creates&an&empty&list&and&then&appends&a&
random&number&to&the&list&'count'&number
of&times.&A&CPU-heavy&operation!
for&i&in&range(count):
out_list.append(random.random())if&__name__&==&&__main__&:
size&=&&&&#&Number&of&random&numbers&to&add
procs&=&2&&&#&Number&of&processes&to&create
#&Create&a&list&of&jobs&and&then&iterate&through
#&the&number&of&processes&appending&each&process&to
#&the&job&list&
for&i&in&range(0,&procs):
out_list&=&list()
process&=&multiprocessing.Process(target=list_append,&
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&args=(size,&i,&out_list))
jobs.append(process)
#&Start&the&processes&(i.e.&calculate&the&random&number&lists)
for&j&in&jobs:
#&Ensure&all&of&the&processes&have&finished
for&j&in&jobs:
print&&List&processing&complete.&
控制台测试运行时间:
time&python&multiproc_test.py
得到如下输出:
List&processing&complete.
real&&&&0m1.045s
user&&&&0m1.824s
sys&&&&&0m0.231s
In this case you can see that while the user and sys times have reamined approximately the same, the real time has dropped by a factor of almost two. This makes sense since we're using two processes. Scaling to four processes while halving the list size for comparison gives the following output (under the assumption that you have at least four cores!):
List&processing&complete.
real&&&&0m0.540s
user&&&&0m1.792s
sys&&&&&0m0.269s
This is an approximate 3.8x speed-up with four processes. However, we must be careful of generalising this to larger, more complex programs. Data transfer, hardware cache-levels and other issues will almost certainly reduce this sort of performance gain in &real& codes.
In later articles we will be modifying the Event-Driven Backtester to use parallel techniques in order to improve the ability to carry out multi-dimensional parameter optimisation studies.
Related Articles
在这个例子中可以看到user和sys时间基本相同,而real下降了近两倍。之所以会这样是因为我们使用了两个进程。扩展到四个进程或者将列表的长度减半结果如下(假设你的电脑至少是四核的):
List&processing&complete.
real&&&&0m0.540s
user&&&&0m1.792s
sys&&&&&0m0.269s
使用四个进程差不多提高了3.8倍速度。但是,在将这个规律推广到更大范围,更复杂的程序上时要小心。数据转换,硬件cacha层次以及其他一些问题会减弱加快的速度。
在下一篇文章中我们会将Event-Driben Basketer并行化,从而提高其运行多维参数寻优的能力。
相关阅读:请教:全局锁(GIL)为何影响多线程?【python吧】_百度贴吧
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&签到排名:今日本吧第个签到,本吧因你更精彩,明天继续来努力!
本吧签到人数:0成为超级会员,使用一键签到本月漏签0次!成为超级会员,赠送8张补签卡连续签到:天&&累计签到:天超级会员单次开通12个月以上,赠送连续签到卡3张
关注:185,799贴子:
请教:全局锁(GIL)为何影响多线程?收藏
看到网上说GIL影响多线程的表现,但是核心编程里面的实例代码似乎也没啥问题啊,确实减少了时间的消耗,有没有高手指点一下有啥问题?
python_总监级名师全程面授,项目实战案例式教学,企业需求无缝对接,助你无忧就业!python,0基础21周快速实现高薪就业,0元试听两周.名额有限,欲报从速.点击抢座
纯计算的话,会感到gil的局限如果有io操作的话,影响相对小
无法利用多核cpu而已,可以利用多进程来提高速度
I/O密集运算的 用多线程CPU密集运算的 用多进程。
python有大量的I/O bound密集计算,用threading,CPU-bound,最好用multiprocessing
以后pyston和pypy应该会解决全局锁这个问题
python大量持续io密集型操作会对硬盘造成损伤吗
恐惧就是这样一个懦夫,当你触及他的底线,接受事情最坏的结果,然后开始准备和它大干一场的时候,它早就不知道躲到哪里去了。
生命本是一次旅行,在每一次停泊时都要清理自己的口袋,把更多的位置空出来,让自己活得更轻松、更自在。心无杂物,才会有满心室的暖暖阳光,才会有从容生活轻松涉世的自信和勇气。只有拥有一颗空灵的心,才能注入快乐,注入幸福。
登录百度帐号python单进程能否利用多核cpu的测试结论 | Vimer的程序世界
& python单进程能否利用多核cpu的测试结论
在很早的时候,就听网上的文章说:
python有GIL,所以在单进程内,即使使用多线程也无法利用到多核的优势,同一时刻,python的字节码只会运行在一个cpu上。
以前也是奉为真理,直到今天在对自己的python server做性能测试的时候,发现一个python进程的cpu居然达到了120%。
当用c++编程的时候,如果使用多线程,那么确实进程cpu超过100%非常正常,但是对python来说,似乎这样就和网上的文章冲突了。
所以还是决定自己亲身试验一下,编写代码如下:
from thread import start_new_thread
def worker():
for it in range(0, 15):
start_new_thread(worker, ())
raw_input()
123456789101112
from thread import start_new_thread&def worker():&&&&while 1:&&&&&&&&#print 1&&&&&&&&pass&for it in range(0, 15):&&&&start_new_thread(worker, ())&&raw_input()
运行环境为: centos6.4 64位, python 2.7.
得到的结果如下:
可以清楚的看到,pid为31199的python进程cpu达到了787.9%,接近理论能达到的最大值 800%。
而上方的8个cpu也分别达到了近100%的利用率。
如果只是按照以上测试结果,确实可以得出的结论:python使用单进程,多线程确实能够使用到多核cpu,并不是网上传的结论。
但是,还是希望如果有读者对这块有更深入的研究能够进行批评指正,谢谢~
8月15日补充
感谢 la.onger 等几位博友的讨论,现在增加一个测试,用来测试纯cpu计算用一个线程or多个线程完成的总时间的差别,代码如下:
import time
from threading import Thread
LOOPS = 1000000
THREAD_NUM = 10
STEP_SIZE =
class Test(object):
def work(self):
for it in xrange(0, LOOPS):
if self.num & STEP_SIZE:
self.num -= STEP_SIZE
self.num += STEP_SIZE
def one_thread_test(self):
self.num = 1
begin_time = time.time()
for v in xrange(0, THREAD_NUM):
self.work()
print 'time passed: ', time.time() - begin_time
def multi_thread_test(self):
self.num = 1
t_list = []
begin_time = time.time()
for v in xrange(0, THREAD_NUM):
t = Thread(target=self.work)
t_list.append(t)
for it in t_list:
print 'time passed: ', time.time() - begin_time
t = Test()
t.one_thread_test()
t.multi_thread_test()
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
import timefrom threading import Thread&LOOPS = 1000000THREAD_NUM = 10STEP_SIZE = &&class Test(object):&&&&num = 1&&&&&def work(self):&&&&&&&&for it in xrange(0, LOOPS):&&&&&&&&&&&&if self.num > STEP_SIZE:&&&&&&&&&&&&&&&&self.num -= STEP_SIZE&&&&&&&&&&&&else:&&&&&&&&&&&&&&&&self.num += STEP_SIZE&&&&&&def one_thread_test(self):&&&&&&&&self.num = 1&&&&&&&&&begin_time = time.time()&&&&&&&&&for v in xrange(0, THREAD_NUM):&&&&&&&&&&&&self.work()&&&&&&&&&&&&&&&&&&&&print 'time passed: ', time.time() - begin_time&&&&&&def multi_thread_test(self):&&&&&&&&self.num = 1&&&&&&&&&t_list = []&&&&&&&&&begin_time = time.time()&&&&&&&&&for v in xrange(0, THREAD_NUM):&&&&&&&&&&&&t = Thread(target=self.work)&&&&&&&&&&&&t.start()&&&&&&&&&&&&t_list.append(t)&&&&&&&&&for it in t_list:&&&&&&&&&&&&it.join()&&&&&&&&&&&&&&&&&&&&print 'time passed: ', time.time() - begin_time&&t = Test()t.one_thread_test()t.multi_thread_test()
输入结果如下:
time passed:
time passed:
time passed:&&3.time passed:&&7.
使用多线程后,比不用多线程还慢
为了与c++版做对比,也开发了c++代码如下:
#include &stdio.h&
#include &string.h&
#include &stdint.h&
#include &iostream&
#include &memory&
#include &sstream&
#include &algorithm&
#include &string&
#include &vector&
#include &set&
#include &map&
#include &sys/time.h&
#include &pthread.h&
#define LOOPS 1000000
#define THREAD_NUM 10
#define STEP_SIZE
class Test
virtual ~Test() {}
void one_thread_test() {
this-&num = 1;
gettimeofday(&m_tpstart,NULL);
for (size_t i = 0; i & THREAD_NUM; ++i)
gettimeofday(&m_tpend,NULL);
long long timeuse=1000000*(long long)(m_tpend.tv_sec-m_tpstart.tv_sec)+m_tpend.tv_usec-m_tpstart.tv_//微秒
printf("time passed: %f\n", ((double)timeuse) / 1000000);
void multi_thread_test() {
this-&num = 1;
vector&pthread_t& vecThreadId;//所有thread的id
pthread_attr_
pthread_attr_init (&attr);
pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
gettimeofday(&m_tpstart,NULL);
pthread_t threadId;
for (int i = 0; i & THREAD_NUM; i++)
ret= pthread_create(&threadId, &attr, Test::static_run_work, (void*)this);
if(ret!=0){
pthread_attr_destroy (&attr);
vecThreadId.push_back(threadId);
pthread_attr_destroy (&attr);
for(vector&pthread_t&::iterator it = vecThreadId.begin(); it != vecThreadId.end(); ++it)
pthread_join(*it, NULL);
gettimeofday(&m_tpend,NULL);
long long timeuse=1000000*(long long)(m_tpend.tv_sec-m_tpstart.tv_sec)+m_tpend.tv_usec-m_tpstart.tv_//微秒
printf("time passed: %f\n", ((double)timeuse) / 1000000);
void work() {
for (size_t i = 0; i & LOOPS; ++i) {
if (this-&num & STEP_SIZE) {
this-&num -= STEP_SIZE;
this-&num += STEP_SIZE;
static void* static_run_work(void *args) {
Test* t = (Test*)
t-&work();
return NULL;
struct timeval m_tpstart,m_
int main(int argc, char **argv)
test.one_thread_test();
test.multi_thread_test();
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
#include &stdio.h&#include &string.h&#include &stdint.h&#include &iostream&#include &memory&#include &sstream&#include &algorithm&#include &string&#include &vector&#include &set&#include &map&#include &sys/time.h&#include &pthread.h&using namespace std;&#define LOOPS 1000000#define THREAD_NUM 10#define STEP_SIZE&& &class Test{public:&&&&Test() {}&&&&virtual ~Test() {}&&&&&void one_thread_test() {&&&&&&&&this-&num = 1;&&&&&&&&&gettimeofday(&m_tpstart,NULL);&&&&&&&&for (size_t i = 0; i & THREAD_NUM; ++i)&&&&&&&&{&&&&&&&&&&&&work();&&&&&&&&}&&&&&&&&&gettimeofday(&m_tpend,NULL);&&&&&&&&&long long timeuse=1000000*(long long)(m_tpend.tv_sec-m_tpstart.tv_sec)+m_tpend.tv_usec-m_tpstart.tv_usec;//微秒&&&&&&&&&printf("time passed: %f\n", ((double)timeuse) / 1000000);&&&&}&&&&&void multi_thread_test() {&&&&&&&&this-&num = 1;&&&&&&&&int ret;&&&&&&&&&vector&pthread_t& vecThreadId;//所有thread的id&&&&&&&&&pthread_attr_t attr;&&&&&&&&pthread_attr_init (&attr);&&&&&&&&pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);&&&&&&&&&gettimeofday(&m_tpstart,NULL);&&&&&&&&&pthread_t threadId;&&&&&&&&for (int i = 0; i & THREAD_NUM; i++)&&&&&&&&{&&&&&&&&&&&&ret= pthread_create(&threadId, &attr, Test::static_run_work, (void*)this);&&&&&&&&&&&&if(ret!=0){&&&&&&&&&&&&&&&&pthread_attr_destroy (&attr);&&&&&&&&&&&&}&&&&&&&&&&&&vecThreadId.push_back(threadId);&&&&&&&&}&&&&&&&&pthread_attr_destroy (&attr);&&&&&&&&for(vector&pthread_t&::iterator it = vecThreadId.begin(); it != vecThreadId.end(); ++it)&&&&&&&&{&&&&&&&&&&&&pthread_join(*it, NULL);&&&&&&&&}&&&&&&&&&gettimeofday(&m_tpend,NULL);&&&&&&&&&&long long timeuse=1000000*(long long)(m_tpend.tv_sec-m_tpstart.tv_sec)+m_tpend.tv_usec-m_tpstart.tv_usec;//微秒&&&&&&&&&printf("time passed: %f\n", ((double)timeuse) / 1000000);&&&&}&&&&&void work() {&&&&&&&&for (size_t i = 0; i & LOOPS; ++i) {&&&&&&&&&&&&if (this-&num & STEP_SIZE) {&&&&&&&&&&&&&&&&this-&num -= STEP_SIZE;&&&&&&&&&&&&}&&&&&&&&&&&&else {&&&&&&&&&&&&&&&&this-&num += STEP_SIZE;&&&&&&&&&&&&}&&&&&&&&}&&&&}&&&&&static void* static_run_work(void *args) {&&&&&&&&Test* t = (Test*) args;&&&&&&&&t-&work();&&&&&&&&&return NULL;&&&&}&&&&public:&&&&int64_t num;&&&&struct timeval m_tpstart,m_tpend;};&int main(int argc, char **argv){&&&&Test test;&&&&&test.one_thread_test();&&&&test.multi_thread_test();&&&&return 0;}
输出结果如下:
time passed: 0.036114
time passed: 0.000513
time passed: 0.036114time passed: 0.000513
可见,c++版确实性能提高了非常多。
由此可见,python的多线程编程,在多核cpu利用上确实差一些。
您可能也喜欢:
测了一下django、flask、bottle、tornado 框架本身最简单的性能。对django的性能完全无语了。
django、flask、bottle 均使用gunicorn+gevent启动...
经过这么久在android客户端和服务器端的开发,感觉还是积累了不少东西想要和大家分享一下,但是好想单独拎一个点出来又不太值得,所以就汇集到一起写成系列吧...
其实之前就有写过关于python web开发框架选择的文章,之前最终选择了bottle,并给出了bottle开发的物理设计,详见之前的文章:回归简单,向Django说再见、bot...

我要回帖

更多关于 多核多线程 的文章

 

随机推荐