This post is an update to a previous post: Building OpenCV with OpenMP . In this post, it's all about performance analysis of OpenCV with and without OpenMP.
The code being used is the popular N-body Simulation used to simulate gravitational effect on large particle systems. I have used source code as available on Mark Harris's github repo with a few modifications.
Modifications made are as follows:
- A 2-D Coordinate system in place of a 3-D system for visualisation in OpenCV
- Integer precision for co-ordinate calculation for visualisation.
- Integer time step dt which determines the speed of our simulation .
- Drawing circles with radius = 0, giving us particles the size of a unit pixel.
- Dynamic Window size declaration and adaptation
- Re-Wrote random coordinate generation method for 2-D coordinate system.
The modified source code is available on my github profile/visualise-nbody-opencv
Benchmarking:
Approach #1: Total Execution Time
The Total Execution time for computing N iterations for the particle system is directly indicative of performance for N-body Sim. In general terms, if N iterations take time T on a single core , then N iterations should theoretically take time T/4 on a Quad-core CPU. Though this might not always be the case, it is a good parameter to evaluate.
Approach #2: CPU Resource Monitor
All OSes are bundled with a resource monitor that maps CPU utilisation with time.The resource monitor is an effective tool to visually examine the CPU per core usage.
A combination of approach #1 and #2 is used to examine OpenCV with and without OpenMP parallelization.
Have a look at what my code for N-body simulation for N= 2500 particles looks like:
The following benchmark has been evaluated for 1000 iterations of 2500 particles.
EVALUATION:
The Total Execution time for computing N iterations for the particle system is directly indicative of performance for N-body Sim. In general terms, if N iterations take time T on a single core , then N iterations should theoretically take time T/4 on a Quad-core CPU. Though this might not always be the case, it is a good parameter to evaluate.
Approach #2: CPU Resource Monitor
All OSes are bundled with a resource monitor that maps CPU utilisation with time.The resource monitor is an effective tool to visually examine the CPU per core usage.
A combination of approach #1 and #2 is used to examine OpenCV with and without OpenMP parallelization.
Have a look at what my code for N-body simulation for N= 2500 particles looks like:
The following benchmark has been evaluated for 1000 iterations of 2500 particles.
EVALUATION:
- Without OpenMP Parallelization
Time:
time is a command in the Unix operating systems. It is used to determine the duration of execution of a particular command.
time is a command in the Unix operating systems. It is used to determine the duration of execution of a particular command.
time ./nbody
real 2m7.258s
user 1m37.096s
sys 0m0.500s
CPU Core Usage:
The calculations being done on a single core with occasional core switching |
The CPU usage graph shows that at any given time, Only a single core is being used for the calculation. Additionally, the core being used is also switched by the OS occasionally.
- With OpenMP Parallelization
Time:
time ./nbody
real 1m23.373s
user 3m31.596s
sys 0m0.696s
CPU Core Usage:
The CPU usage during OpenMP being used is sufficient to show that the code is run parallel on multiple cores. The CPU time shows the same as we have a reduced real time (as in wall time) by running computations on 4 cores of the CPU. For details of how to interpret the time output, refer this answer on Stack Overflow.
OpenMP is therefore an easy to use framework in cases where code needs to be distributed on multiple cores. Since its inception, it has advanced sufficiently and have been adopted among developers looking to leverage improved hardware capabilities.
To Know more about OpenMp, visit their official website.
Kudos.
Push Yourself Again and Again.Don't give an inch until the final buzzer sounds.
Stats:
Ubuntu 17.04
Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
8 GB DDR3 RAM
Code::Blocks 16.01
GCC 6.3.0
0 comments:
Post a Comment