Discussion of the merits and challenges of using GPUs

Message boards : Number crunching : Discussion of the merits and challenges of using GPUs

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 96728 - Posted: 22 May 2020, 15:56:52 UTC - in response to Message 96669.  
Last modified: 22 May 2020, 15:58:04 UTC

For those interested only in projects related to medical research, the only choice now appears to be Folding@home, which wasn't set up to be compatible with BOINC projects. It's possible, but difficult, to run it on a computer that has BOINC running at the same time. Their forums currently aren't working.

I run Folding on the GPU on all my machines with BOINC on the CPU work units. It is no more difficult than the usual annoyances with Folding.
That is, you have to set it up and then delete the "CPU" slot, or it will run by default (and check it again - you usually have to do it twice).
And you of course have to reserve a CPU core to support the GPU, as with most setups.
But they have a new version of their app recently, which may ease the setup. It won't take long to get the hang of it.

And their forums are up, and have been for some time. Maybe you were not trying the SSL version?
https://foldingforum.org/index.php


If you are interested in other types of GPU projects, note that Asteroids@home currently has disk space problems interfering with uploads.

I am about to post a comparison of how awful their GPU version is as compared to the CPU version for efficiency. It will be something like 40 watt-hours per work unit for the GPU
(i.e., GTX 1060 or 1070), and about 14 watt-hours for the CPU. They should ban the GPU version to save the planet.
(It has been stated by others before, but should be emphasized again.)
ID: 96728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 96729 - Posted: 22 May 2020, 17:05:36 UTC
Last modified: 22 May 2020, 17:07:04 UTC

i don't like crunching on gpus, i do play with some python tensorflow stuff on the side, and watch how it works.
the simple ones like if it is a pre-trained convolution neural network (CNN), it would run for a fraction of a second and one would not feel any different.
but if it is they other way round say if you are training a complex CNN network with lots of data (say images)
the gpu can run at full speeds (loud fans) maximum loads for hours consuming more than a hundred watts (the top tier ones probably consume many hundreds of watts ) .
if electricity costs isn't after all cheap, doing such computation can be expensive in electricity bills.
gpus are used where their use are relevant and appropriate, e.g. those CNN stuff, and a lot of those CNN models are rather huge, and the training / update process are so data intensive it would generate terabytes of network data if traiing distributed across the network even for a rather modest / small CNN model.
so for those it would be more appropriate to just have it run in the GPU rather than spill terabytes of data in conventional inter-networks in minutes, flooding and choking the whole networks.
ID: 96729 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1231
Credit: 14,240,917
RAC: 3,671
Message 96731 - Posted: 22 May 2020, 19:06:59 UTC - in response to Message 96728.  

[snip]

I run Folding on the GPU on all my machines with BOINC on the CPU work units. It is no more difficult than the usual annoyances with Folding.
That is, you have to set it up and then delete the "CPU" slot, or it will run by default (and check it again - you usually have to do it twice).
And you of course have to reserve a CPU core to support the GPU, as with most setups.
But they have a new version of their app recently, which may ease the setup. It won't take long to get the hang of it.

And their forums are up, and have been for some time. Maybe you were not trying the SSL version?
https://foldingforum.org/index.php

I'm not sure if I was or not. However, that link allows me to read the forums, but I still can't log in to post anything there.
ID: 96731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 96734 - Posted: 22 May 2020, 21:10:48 UTC - in response to Message 96669.  

The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software.

'cause the project is based on Autodock. And Autodock has a gpu version
Also project Quarantine@Home (it's not a boinc project) is using gpu.
ID: 96734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1231
Credit: 14,240,917
RAC: 3,671
Message 96735 - Posted: 22 May 2020, 21:51:07 UTC - in response to Message 96734.  

The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software.

'cause the project is based on Autodock. And Autodock has a gpu version
Also project Quarantine@Home (it's not a boinc project) is using gpu.

I've read that Autodock development has gone in two different directions, producing one version that can use a GPU and another version with the changes needed for COVID-19 work. IF they can find someone who can merge the two sets of changes, THEN Open Pandemics should have a GPU version they can use.

A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home?
ID: 96735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 96736 - Posted: 22 May 2020, 22:01:05 UTC - in response to Message 96735.  

A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home?

https://quarantine.infino.me/

But the GPU version is only for Linux.
The Windows version is only on the CPU at the moment.
ID: 96736 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 353
Credit: 1,187,424
RAC: 4,369
Message 96737 - Posted: 22 May 2020, 22:28:55 UTC - in response to Message 96735.  
Last modified: 22 May 2020, 22:32:33 UTC

The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software.

'cause the project is based on Autodock. And Autodock has a gpu version
Also project Quarantine@Home (it's not a boinc project) is using gpu.

I've read that Autodock development has gone in two different directions, producing one version that can use a GPU and another version with the changes needed for COVID-19 work. IF they can find someone who can merge the two sets of changes, THEN Open Pandemics should have a GPU version they can use.

A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home?


They are working on the GPU version: https://twitter.com/ForliLab/status/1261194223811887109


Edit: One of the people on the OPN research team is a CUDA/OpenCL developer.
ID: 96737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 97456 - Posted: 19 Jun 2020, 8:59:17 UTC

Interesting article about C++/Sycl/OpenCl
ID: 97456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 97839 - Posted: 30 Jun 2020, 21:42:37 UTC

Sycl 2020 provisional specification
SYCL is a standard C++ based heterogeneous parallel programming framework for accelerating High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and AI processors. SYCL 2020 is based on C++17 and includes new programming abstractions, such as unified shared memory, reductions, group algorithms, and sub-groups to enable high-performance applications across diverse hardware architectures.

ID: 97839 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1231
Credit: 14,240,917
RAC: 3,671
Message 97841 - Posted: 1 Jul 2020, 0:23:08 UTC
Last modified: 1 Jul 2020, 0:42:01 UTC

What we really need for GPU use is a compiler that can automatically identify groups of sections of the program that are running operations that CAN safely run without any of the sections within the group writing to any memory location used by any other section of the group. This would make it possible to just recompile the program with that compiler, with no programmer effort to modify the source code first. This is often, but not always, running the same operations on multiple sets of data.

A few problems with this: GPU clock speeds are typically about a quarter of the clock speeds of CPUs produced at about the same time. This means that on the average, four threads on the GPU must be running at the same time just to make the GPU do the work as fast as a CPU-only program.

At least for NVIDIA-based GPUs, the GPU cores come in groups (warps for NVIDIA). Within each group, if one core is doing an operation, all of the others must either be doing that same operation (probably on different data), or be doing nothing. That means that if there if an if-then-else in the GPU part of the program, the then part and the else part can only be doing different operations simultaneously if they are in different GPU core groups. I have not checked if this is also true for other brands of GPUs, but I suspect that it is.

BOINC projects normally offer GPU versions of their programs only if those version will produce the outputs in no more than a tenth of the time required for the CPU versions to do it. The last time the Rosetta@home project tried to produce a GPU version, it gave outputs slightly faster than the CPU version for some users, and slightly slower for others. I've seen nothing since then about whether it has been tried again with the more recent versions of their program. This means using an average of at least 40 GPU cores at a time, which is impossible for GPUs that have less than 40 GPU cores.

BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification.
ID: 97841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 97843 - Posted: 1 Jul 2020, 8:08:03 UTC - in response to Message 97841.  

BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification.


Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++).
Meantime i hold my breath :-P
ID: 97843 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 98264 - Posted: 22 Jul 2020, 9:27:23 UTC - in response to Message 97843.  
Last modified: 22 Jul 2020, 9:33:11 UTC

Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++).
Meantime i hold my breath :-P


And Sycl, often, is faster than Cuda!!
Sycl and Cuda
ID: 98264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 98968 - Posted: 11 Sep 2020, 16:48:18 UTC - in response to Message 98264.  

oneAPI with support to Sycl 2020
ID: 98968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,078,438
RAC: 5,608
Message 98985 - Posted: 13 Sep 2020, 3:13:54 UTC - in response to Message 97841.  

What we really need for GPU use is a compiler that can automatically identify groups of sections of the program that are running operations that CAN safely run without any of the sections within the group writing to any memory location used by any other section of the group. This would make it possible to just recompile the program with that compiler, with no programmer effort to modify the source code first. This is often, but not always, running the same operations on multiple sets of data.

A few problems with this: GPU clock speeds are typically about a quarter of the clock speeds of CPUs produced at about the same time. This means that on the average, four threads on the GPU must be running at the same time just to make the GPU do the work as fast as a CPU-only program.

At least for NVIDIA-based GPUs, the GPU cores come in groups (warps for NVIDIA). Within each group, if one core is doing an operation, all of the others must either be doing that same operation (probably on different data), or be doing nothing. That means that if there if an if-then-else in the GPU part of the program, the then part and the else part can only be doing different operations simultaneously if they are in different GPU core groups. I have not checked if this is also true for other brands of GPUs, but I suspect that it is.

BOINC projects normally offer GPU versions of their programs only if those version will produce the outputs in no more than a tenth of the time required for the CPU versions to do it. The last time the Rosetta@home project tried to produce a GPU version, it gave outputs slightly faster than the CPU version for some users, and slightly slower for others. I've seen nothing since then about whether it has been tried again with the more recent versions of their program. This means using an average of at least 40 GPU cores at a time, which is impossible for GPUs that have less than 40 GPU cores.

BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification.


So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that.
ID: 98985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1231
Credit: 14,240,917
RAC: 3,671
Message 98986 - Posted: 13 Sep 2020, 3:34:29 UTC - in response to Message 98985.  
Last modified: 13 Sep 2020, 3:37:12 UTC

[snip]

So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that.

Not fully independently. The warps in Nvidia GPUs require an even smaller breakdown within each workunit where the cores within each warp must USUALLY be be doing the same operations on separate sets of data, or expect a major slowdown due to limits on how many GPU cores can be active at once.

What you described is more like the main principal of BOINC works, whether for CPUs or for GPUs.
ID: 98986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,078,438
RAC: 5,608
Message 98991 - Posted: 13 Sep 2020, 11:18:57 UTC - in response to Message 98986.  

[snip]

So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that.


Not fully independently. The warps in Nvidia GPUs require an even smaller breakdown within each workunit where the cores within each warp must USUALLY be be doing the same operations on separate sets of data, or expect a major slowdown due to limits on how many GPU cores can be active at once.

What you described is more like the main principal of BOINC works, whether for CPUs or for GPUs.


Ok thanks.
ID: 98991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 99003 - Posted: 14 Sep 2020, 6:32:41 UTC - in response to Message 95280.  

The previous attempt at a GPU version gave one that ran at about the SAME speed as the CPU version - a little slower on some computers, and a little faster on others. This was not considered fast enough to make further development worthwhile.

The only previous attempt that i know was over 5 years ago and a lot of thinghs are changed (hw, sw, etc)


I'm wrong. The previous attempt was over 7 years ago.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6475&postid=76916#76916
ID: 99003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 99058 - Posted: 20 Sep 2020, 9:56:39 UTC - in response to Message 97843.  

Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++).

As i said, Nvidia released CUDA C++ standard library as open source.

works with not only NVIDIA CUDA enabled configurations but also CPUs

ID: 99058 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 99203 - Posted: 30 Sep 2020, 10:36:00 UTC - in response to Message 97843.  

Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++).


As i said in Ralph's forum:
Intel, with the Heidelberg University, is working on port oneAPI/DPC++ on AMD Gpu thanks to HypSycl
Codeplay is working on port oneAPI/DPC++ on Nvidia Gpu thanks to SYCL
ID: 99203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1987
Credit: 9,451,551
RAC: 12,647
Message 99205 - Posted: 30 Sep 2020, 13:35:46 UTC - in response to Message 95471.  

'cause is a simple rebrand of OpenCl 1.2. They abandoned OpenCl 2.x to his fate.
Simply: OpenCl 3.0 is great.....if it was released 5 years ago.
The only sunbeam (little sunbeam) is C++ for OpenCl


And, today, OpenCL 3.0 is finalised, with initial SDK and a C++ Kernels
ID: 99205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Discussion of the merits and challenges of using GPUs



©2024 University of Washington
https://www.bakerlab.org