Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 67 · 68 · 69 · 70 · 71 · 72 · 73 . . . 300 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98174 - Posted: 17 Jul 2020, 20:47:17 UTC - in response to Message 98172.  

With hyper threading, each full core is divided into two "virtual" cores, each with its own instruction stream, referred to as a "thread".
That allows the hardware to be used more efficiently, so that it is idle less of the time, but each of the two threads runs more slowly than if only one were used per core.

Typically, you get about 30% greater output using 100% of the cores than when using only 50%, even though each work unit runs faster in the latter case.


True, but I don't want to lose that 30%.

And Boinc calls threads cores. Maybe it can't detect which they are?
ID: 98174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98176 - Posted: 17 Jul 2020, 20:56:17 UTC - in response to Message 98174.  
Last modified: 17 Jul 2020, 21:12:55 UTC

True, but I don't want to lose that 30%.

And Boinc calls threads cores. Maybe it can't detect which they are?

You ARE losing by using less than the full number. I reserve cores to support a GPU, or to support other desktop use, but for dedicated machines I allow the maximum number possible.
BOINC detects virtual cores, since they appear the same as a real core to the operating system. So your i5-8600K shows up as 12 cores on BOINC, even though it has only 6 real cores.

EDIT: It is sometimes useful to limit the number of cores in order to limit memory use. For example, each Rosetta work unit should be allocated at least 1 GB (preferably more). But if you have only 8 GB of memory in a 12-core machine, then by limiting BOINC to use only 50% of the cores, you only need 6 GB for Rosetta. There are sometimes other reasons, but for maximum output, you use as many cores as possible.
ID: 98176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 98177 - Posted: 17 Jul 2020, 21:14:19 UTC - in response to Message 98174.  
Last modified: 17 Jul 2020, 21:26:25 UTC

[snip]

True, but I don't want to lose that 30%.

And Boinc calls threads cores. Maybe it can't detect which they are?

Threads has two meaning for programs. One is for the use of virtual cores. The other is for setting up a list of things to be done that will not interfere with any other member of the list, so that if any member of the list is currently running but encounters a reason why it must wait, any other member of the list can take over the CPU core so that the program continues to make progress during that wait.

This does not require using multiple CPU cores, but it is still possible for more than one CPU core to each be working on one member of the list at the same time. BOINC applications that do this and allow the program to use more than one CPU core at once are unpopular and therefore seldom used.

BOINC tends to use the second meaning instead.

Hyperthreading means that each physical CPU core has two sets of registers, and can therefore do a very quick switch from working for one program to working for another if the first program needs to wait for accessing main memory, which is slower than the CPU for almost all computers these days. This make each physical core act almost like two cores, except for some timing issues. These two core are called virtual cores.
ID: 98177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98178 - Posted: 17 Jul 2020, 21:16:43 UTC - in response to Message 98176.  
Last modified: 17 Jul 2020, 21:23:30 UTC

You ARE losing by using less than the full number.


I know, which is why I don't do that. I treat a thread as I would a core. The jiggery pokery inside the CPU is none of my business :-)

I reserve cores to support a GPU


I stopped doing that, as it doesn't make much difference. GPU threads are given higher priority on the CPU, so they always seem to get what they need.

What does help though (especially if your GPU is better than your CPU) is running more than one task on the GPU at once, then it can take two cores to help it out instead of one, and can also work on the task which doesn't need CPU at that point while the other one is stalled from the GPU's point of view.

or to support other desktop use


I pause Boinc completely for games, and the GPU for watching video. Done automatically by exclusive applications. Otherwise, full power. Well er.... except when Tthrottle slows it down for overheating or because I don't like over 50% fan noise in the lounge.

BOINC detects virtual cores, since they appear the same as a real core to the operating system. So your i5-8600K shows up as 12 cores on BOINC, even though it has only 6 real cores.


Pah, my Xeons have 24 :-)

Anyway you're wrong on 2 counts :-P
My i5 has 6 cores and no HT, so 6 threads too.
And Windows can tell if they're threads or cores. On my Xeons, it lists in the task manager "cores 12, logical processors 24".

It is sometimes useful to limit the number of cores in order to limit memory use. For example, each Rosetta work unit should be allocated at least 1 GB (preferably more). But if you have only 8 GB of memory in a 12-core machine, then by limiting BOINC to use only 50% of the cores, you only need 6 GB for Rosetta. There are sometimes other reasons, but for maximum output, you use as many cores as possible.


I just let Boinc handle it. I tell Boinc to use only 80% RAM. If there isn't enough, there isn't enough, and some tasks sit waiting. If I see that happening regularly, I consider buying more RAM.
ID: 98178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98179 - Posted: 17 Jul 2020, 21:26:16 UTC - in response to Message 98177.  
Last modified: 17 Jul 2020, 21:26:34 UTC

BOINC applications that use the first meaning and allow the program to use more than one CPU core at once are unpopular and therefore seldom used.


Why on earth would they be unpopular? I prefer it, it means I have less tasks running at once, but am still using the whole processor. LHC (Atlas) and Milkyway (Nbody) do it.
ID: 98179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 98181 - Posted: 17 Jul 2020, 21:35:16 UTC - in response to Message 98179.  

BOINC applications that use the first meaning and allow the program to use more than one CPU core at once are unpopular and therefore seldom used.

Why on earth would they be unpopular? I prefer it, it means I have less tasks running at once, but am still using the whole processor. LHC (Atlas) and Milkyway (Nbody) do it.

Unclear , but all the BOINC project my computer participates in (over a dozen) don't use it. One is them is Milkyway, but I don't recall ever getting an Nbody task from them.
ID: 98181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98183 - Posted: 17 Jul 2020, 21:52:17 UTC - in response to Message 98181.  

Unclear, but all the BOINC project my computer participates in (over a dozen) don't use it. One is them is Milkyway, but I don't recall ever getting an Nbody task from them.


Nbody runs only on CPU, if you use CPU for Milkyway, half your tasks should be Nbody. If you only use GPU, you will only get Seperation.

I thought the lack of multi-threaded projects was the difficulty coding it (or impossibility if everything depends on the result of the last calculation).
ID: 98183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,118,186
RAC: 6,004
Message 98187 - Posted: 17 Jul 2020, 23:02:09 UTC - in response to Message 98157.  

Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu).


It really depends on the Project and how much it rights to the disk and how often, the more it rights and the more often it does it the SSD drives will benefit. A Project like Rosetta should benefit more than a project with 30 minute tasks for example.

I went to all ssd drives awhile back because I refuse to pay for a/v stuff on Boinc only machines and the free ones want me to reboot every couple of weeks or so, the bootup time is sooo much better than I have better rac because of it.
ID: 98187 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98189 - Posted: 17 Jul 2020, 23:09:52 UTC - in response to Message 98187.  
Last modified: 17 Jul 2020, 23:10:22 UTC

Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu).


It really depends on the Project and how much it rights to the disk and how often, the more it rights and the more often it does it the SSD drives will benefit. A Project like Rosetta should benefit more than a project with 30 minute tasks for example.

I went to all ssd drives awhile back because I refuse to pay for a/v stuff on Boinc only machines and the free ones want me to reboot every couple of weeks or so, the bootup time is sooo much better than I have better rac because of it.


Some of my Boinc machines have rotary drives, because I had them kicking about anyway. Some have an SSD because the rust spinners were annoyingly slow to reboot. The ones with normal drives seem to run just as fast once they're computing, but take ages to start some tasks, and if it's got 24 Theory tasks to start from LHC to start at once after a reboot, it can take 10 minutes.
ID: 98189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 98191 - Posted: 17 Jul 2020, 23:25:58 UTC - in response to Message 98156.  
Last modified: 17 Jul 2020, 23:30:44 UTC

They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time.
That's your problem. Limiting the amount of time doesn't limit the number of cores/threads in use. In fact what it does mean is that it will take you more than 3 times as long to process a Task than your Target CPU time.
ie- 8 hour Tasks will take over 24 hours.

CPU time should always be left at 100%.
If you feel it's necessary to limit the number of cores/threads in use (since i paid for them i choose to use them all), set "Use at most 100 % of the CPUs" to less than 100%



Edit- you also need to reduce your cache size as you are missing deadlines. If you have more than 1 project, there is no need for a cache at all. Even with just one project, a cache really isn't necessary unless that project has lots of extended down time.
Grant
Darwin NT
ID: 98191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 98192 - Posted: 17 Jul 2020, 23:27:47 UTC - in response to Message 98157.  

Any discussion of how much faster memory helps? About all I've been able to find so far us that at least for Rosetta@home, it does help.
Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu).
Don't know why as storage performance has no impact at all on processing performance- unless your system is short of RAM and has been spending all of it's time previously swapping to the page file.
Grant
Darwin NT
ID: 98192 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,141
RAC: 1,223
Message 98193 - Posted: 17 Jul 2020, 23:28:59 UTC
Last modified: 17 Jul 2020, 23:30:50 UTC

I'm getting a bit frustrated with the current running task. I have my settings to run for 4 hours. The current running task is going on 10+ hours now and is barely crawling up from 97% complete, ten minutes to go, 1/1000 of a percent at a time.

The thing that irks me no end is that the task wrote exactly ONE checkpoint, 30 minutes or so after it was initially started and nothing since. I want to do some maintenance on the PC yet can't unless I want to throw away 10 hours of crunching on the task because it will restart from basically zero. Why the heck did the developers configure the task to only write one checkpoint? Arggh!

I understand the 10+ hour watchdog will eventually kick in supposedly at the 14 hour mark of runtime. Question is will I have enough patience to wait it out.
ID: 98193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 98194 - Posted: 17 Jul 2020, 23:32:49 UTC - in response to Message 98193.  

I'm getting a bit frustrated with the current running task. I have my settings to run for 4 hours. The current running task is going on 10+ hours now and is barely crawling up from 97% complete, ten minutes to go, 1/1000 of a percent at a time.
There have been a few Tasks that run longer than the set time till the watchdog timer kills them off. There were a lot when i first started here, for a while now there haven't been any.
Just recently, there's been a small batch of those longer running Tasks again.
Grant
Darwin NT
ID: 98194 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 98195 - Posted: 17 Jul 2020, 23:40:28 UTC - in response to Message 98172.  
Last modified: 17 Jul 2020, 23:42:11 UTC

Typically, you get about 30% greater output using 100% of the cores than when using only 50%, even though each work unit runs faster in the latter case.
Yep.
When i started here i didn't have enough RAM to support all the cores/threads on both of my systems.
On one system i turned of hyper-threading, the other i left it enabled. The system with hyperthreading produced way more work than the one with it turned off- more than 50%.

It does depend a lot on the software being run- in some instances hyperthreading can reduce the amount of work done. In some cases the increase is just in line with the increase in cores/threads. In most cases the improvement tends to be from 30-60% In some case the output can be almost double. It really does depend a lot on the work being done.
Grant
Darwin NT
ID: 98195 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98199 - Posted: 18 Jul 2020, 10:47:35 UTC - in response to Message 98193.  

Keith Myers wrote:
I'm getting a bit frustrated with the current running task
This one? Does indeed look like the watchdog stepped in:
BOINC:: CPU time: 50522.6s, 36000s + 14400s[2020- 7-17 19: 6:41:] :: BOINC
(Output like that does not normally appear in task results.)

Tasks do occasionally ‘go rogue’, and checkpointing is known to be difficult and inconsistent in Rosetta. I don’t think there’s anything we can do about it other than leave them alone, switch off when we need to and hope not too much work is lost, or abort overrunning tasks.

Once a task has overrun the timing predictions become meaningless; it seems progress asymptotically approaches 100%, and estimated remaining time never goes below 10 minutes.
ID: 98199 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98208 - Posted: 18 Jul 2020, 17:37:40 UTC - in response to Message 98191.  
Last modified: 18 Jul 2020, 17:42:16 UTC

Edit- you also need to reduce your cache size as you are missing deadlines. If you have more than 1 project, there is no need for a cache at all. Even with just one project, a cache really isn't necessary unless that project has lots of extended down time.


Or the project has very small tasks that take 18 seconds to complete! (Milkyway)

There have been a few Tasks that run longer than the set time till the watchdog timer kills them off. There were a lot when i first started here, for a while now there haven't been any.
Just recently, there's been a small batch of those longer running Tasks again.


Actually the watchdog doesn't kill them off, I had one run for 2.5 days. They always finish eventually, and with success, but the lack of checkpoints is a bit annoying - I lost quite a bit when Microsoft illegally restarted my property for a Windows update (which I've since disabled).
ID: 98208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98210 - Posted: 18 Jul 2020, 18:06:50 UTC - in response to Message 98157.  

boboviz wrote:
Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu).
The only reason I can think of for the hard disk to make such a difference is that you are short on RAM and the system is spending a lot of time paging. From what I’ve seen, once the application and protein database are loaded, Rosetta itself uses the disk very little. It infrequently saves a few kilobytes of state/​checkpoint/​results data; nothing more.
ID: 98210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,372
RAC: 18,198
Message 98211 - Posted: 18 Jul 2020, 18:12:45 UTC - in response to Message 98210.  

boboviz wrote:
Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu).
The only reason I can think of for the hard disk to make such a difference is that you are short on RAM and the system is spending a lot of time paging. From what I’ve seen, once the application and protein database are loaded, Rosetta itself uses the disk very little. It infrequently saves a few kilobytes of state/​checkpoint/​results data; nothing more.


Which will be buffered in RAM anyway and not hold up the program. Only if it needs to read data can it be stalled, even then if it's stuff it or another task from the same project has used recently, that will be cached in RAM.
ID: 98211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,115,238
RAC: 19,699
Message 98223 - Posted: 19 Jul 2020, 3:19:43 UTC - in response to Message 98169.  

After the task outage last month I guess people re-prioritised other projects, understandably.

I really don't understand why people do that. I have all my computers set to run at least two projects. If one goes wrong or runs out of work, it will run entirely the other one with no intervention from myself. When it's fixed, it'll go back to doing it at the proportion I've set (and in fact tries to make up lost ground by doing more of the one that was broken for a while. You could even have Rosetta at weight 1,000,000 and another project at 1.

I know, but you've been here longer than me, so you ought to know that some people are... weird in their reasoning. There's no explaining that'll help
ID: 98223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,115,238
RAC: 19,699
Message 98224 - Posted: 19 Jul 2020, 3:27:34 UTC - in response to Message 98199.  

Keith Myers wrote:
I'm getting a bit frustrated with the current running task
This one? Does indeed look like the watchdog stepped in:
BOINC:: CPU time: 50522.6s, 36000s + 14400s[2020- 7-17 19: 6:41:] :: BOINC
(Output like that does not normally appear in task results.)

Tasks do occasionally ‘go rogue’, and checkpointing is known to be difficult and inconsistent in Rosetta. I don’t think there’s anything we can do about it other than leave them alone, switch off when we need to and hope not too much work is lost, or abort overrunning tasks.

Once a task has overrun the timing predictions become meaningless; it seems progress asymptotically approaches 100%, and estimated remaining time never goes below 10 minutes.

Slightly side-tracking.
That task isn't available to view any more, but if it says "36000s + 14400s" that indicates the watchdog has now been set back to 4hrs rather than 10hrs.
I wasn't aware that'd changed back as I haven't had a long-running task for a very long time
ID: 98224 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 67 · 68 · 69 · 70 · 71 · 72 · 73 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org