Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 45 · 46 · 47 · 48 · 49 · 50 · 51 . . . 300 · Next

AuthorMessage
Nicholas Hathaway

Send message
Joined: 20 Nov 14
Posts: 6
Credit: 791,395
RAC: 0
Message 94808 - Posted: 19 Apr 2020, 1:16:49 UTC

Hi I am repeatedly getting the following in my event loG:

Sat Apr 18 07:39:31 2020 | Rosetta@home | Resetting project
Sat Apr 18 23:48:22 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:48:22 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:53:46 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:53:46 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:54:40 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:54:40 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sun Apr 19 00:46:30 2020 | Rosetta@home | Project requested delay of 7 seconds


What do I need to do?
ID: 94808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94810 - Posted: 19 Apr 2020, 1:38:00 UTC - in response to Message 94808.  
Last modified: 19 Apr 2020, 1:43:43 UTC

Hi I am repeatedly getting the following in my event loG:

Sat Apr 18 07:39:31 2020 | Rosetta@home | Resetting project
Sat Apr 18 23:48:22 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:48:22 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:53:46 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:53:46 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:54:40 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:54:40 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sun Apr 19 00:46:30 2020 | Rosetta@home | Project requested delay of 7 seconds


What do I need to do?
How do your Computing preferences compare to these? Particularly the "When to suspend" settings. If they aren't a problem, then as it says in the Event log, you'll probably need to reset the project, but even then i may not fix the problem.
It appears it's because the system is busy doing something else & BOINC can't communicate with the science application. Having "Use at most 100% of CPU time" less than 100% can cause it on some systems.
As long as Tasks don't error out as a result, it's not a problem as such, but it does show some contention for resources on the system.


You've also had a lot of Tasks miss the deadline, so a much smaller cache would be a good idea.
Computing
   Usage limits	
                                   Use at most 100% of the CPUs
                                   Use at most 100% of CPU time

   When to suspend	
           Suspend when computer is on battery (not selected)
               Suspend when computer is in use (not selected)
 Suspend GPU computing when computer is in use (not selected)
   'In use' means mouse/keyboard input in last 3 minutes
  Suspend when no mouse/keyboard input in last --- minutes
     Suspend when non-BOINC CPU usage is above --- %
                          Compute only between ---

   Other	
                                Store at least 1 days of work
                     Store up to an additional 0.02 days of work
                    Switch between tasks every 60 minutes
     Request tasks to checkpoint at most every 60 seconds

   Disk
                              Use no more than 20 GB
                                Leave at least 2 GB free
                              Use no more than 60 % of total

   Memory
          When computer is in use, use at most 95 %
      When computer is not in use, use at most 95 %
 Leave non-GPU tasks in memory while suspended (not selected)
                   Page/swap file: use at most 75 %

Grant
Darwin NT
ID: 94810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nicholas Hathaway

Send message
Joined: 20 Nov 14
Posts: 6
Credit: 791,395
RAC: 0
Message 94812 - Posted: 19 Apr 2020, 1:46:09 UTC - in response to Message 94810.  

Computing preferences
These settings apply to all computers using this account except
computers where you have set preferences locally using the BOINC Manager
Android devices
Computing
Usage limits
Use at most 100 % of the CPUs
Use at most 100 % of CPU time
When to suspend
Suspend when computer is on battery
Suspend when computer is in use
Suspend GPU computing when computer is in use
'In use' means mouse/keyboard input in last 3 minutes
Suspend when no mouse/keyboard input in last --- minutes
Suspend when non-BOINC CPU usage is above --- %
Compute only between ---
Other
Store at least 0.5 days of work
Store up to an additional 1 days of work
Switch between tasks every 60 minutes
Request tasks to checkpoint at most every 60 seconds
Disk
Use no more than --- GB
Leave at least --- GB free
Use no more than 90 % of total
Memory
When computer is in use, use at most 90 %
When computer is not in use, use at most 90 %
Leave non-GPU tasks in memory while suspended
Page/swap file: use at most 75 %
Network
Usage limits
Limit download rate to --- KB/second
Limit upload rate to --- KB/second
Limit usage to --- MB every --- days
When to suspend
Transfer files only between ---
Other
Skip data verification for image files
Confirm before connecting to Internet
Disconnect when done

Edit preferences
ID: 94812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,112,600
RAC: 19,835
Message 94815 - Posted: 19 Apr 2020, 2:03:04 UTC - in response to Message 94727.  

Not sure if I should report this as a problem, but...

On an Android phone I'm running 4 tasks and have another (varying) 3 or 4 waiting to follow.
I've been reporting and receiving more tasks regularly. All sounds good.

Trouble is, the Server Status page has been reporting no tasks available to download for at least a day.
And the number of in progress tasks has been reducing steadily until a few hours ago and currently reads nil. Right now I have 7.

I've certainly received and reported tasks since both read nil.

Not complaining, obviously. Just reporting

Still reporting tasks and getting more 24hrs after 0 to send and 0 in progress. Am I one of the 7 who reported in the last 24hrs?

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.

Oh, maybe this explains it. Thought it was weird.
ID: 94815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 94837 - Posted: 19 Apr 2020, 7:35:19 UTC - in response to Message 94807.  

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.

Well that should reduce the development effort if you only have the one app (but multiple platforms). I still seem to be getting MiniRosetta and haven't cleared my cache (which is only 0.3 days) yet.
BOINC blog
ID: 94837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94838 - Posted: 19 Apr 2020, 7:44:48 UTC - in response to Message 94837.  

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.
Well that should reduce the development effort if you only have the one app (but multiple platforms). I still seem to be getting MiniRosetta and haven't cleared my cache (which is only 0.3 days) yet.
I figure there will be a few resends after the last of the initial Tasks have been sent out. But with the short deadlines & low replication, it should all be cleared up well within a week.
Grant
Darwin NT
ID: 94838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zfp

Send message
Joined: 22 Mar 20
Posts: 1
Credit: 114,637
RAC: 0
Message 94840 - Posted: 19 Apr 2020, 7:57:36 UTC

Hello,

After a kernel update I restarted my system. It resulted in
    two of the task running at the time of the reboot to lose all progress after the reboot,
    then all running tasks at the time of the reboot to run for extremely long
    and all of them to exit with: Exit status 139 (0x0000008B) Unknown error code



The output shows this:

WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1

https://boinc.bakerlab.org/rosetta/result.php?resultid=1152628078
https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627950
https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627464
https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627790

ID: 94840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ww

Send message
Joined: 17 Mar 20
Posts: 3
Credit: 455,936
RAC: 0
Message 94847 - Posted: 19 Apr 2020, 8:51:04 UTC
Last modified: 19 Apr 2020, 9:04:20 UTC

Maybe a memory leak

rb_04_16_21806_21365_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_08_918009_366

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1037241400

The first attempt (Windows 32-bit) failed at 12 hours of CPU time, RSS 354MB

I have the second attempt on Linux 64-bit. If it actually needs this much memory, 32-bit wouldn't have been able to run it at all. RSS was at 3.09 GB. It seems to be able to free some memory though.


(Or some swapped out. Don't post tired kids.)

RSS has been steadily climbing; it started at 1.8 GB. Now at 3.5 hours. Completion is on pace for 11.9 hour run-time. It appears to be check-pointing

.

Application
Rosetta 4.15 
Name
rb_04_16_21392_21290__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_917949_249
State
Running
Received
Sat 18 Apr 2020 04:59:33 PM EDT
Report deadline
Tue 21 Apr 2020 04:59:33 PM EDT
Estimated computation size
80,000 GFLOPs
CPU time
03:38:02
CPU time since checkpoint
00:02:38
Elapsed time
03:38:38
Estimated time remaining
04:56:41
Fraction done
30.282%
Virtual memory size
3.09 GB
Working set size
2.89 GB
Directory
slots/5
Process ID
24116
Progress rate
8.280% per hour
Executable
rosetta_4.15_x86_64-pc-linux-gnu
ID: 94847 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 87
Credit: 14,880,624
RAC: 117,108
Message 94867 - Posted: 19 Apr 2020, 12:39:18 UTC - in response to Message 94754.  

Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%,


If you have Seti@Home mostly idling I would go to the S@H website and disable the "intel igpu" check box.

Generally running any crunching task on that part of the Intel cpu chip slows the entire system down significantly.

This usually is true now, if/when Intel delivers on the planned upgrades to the iGPU it will then start behaving more like AMD's iGPU but not yet.

Tom M
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 94867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Maslo55

Send message
Joined: 3 Mar 08
Posts: 1
Credit: 1,029,280
RAC: 0
Message 95417 - Posted: 27 Apr 2020, 10:15:53 UTC

I have some random crashes, once every few days, I find my crunching computer rebooted when I return to it. I also run Folding@home which I thought was responsible, but in Windows Event viewer the faulting application seems to be rosetta:

Faulting application name: rosetta_4.15_windows_x86_64.exe, version: 0.0.0.0, time stamp: 0x5e856ed2
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x0000000000000000
Faulting process id: 0x2f68
Faulting application start time: 0x01d61c558f1205cc
Faulting application path: C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.15_windows_x86_64.exe
Faulting module path: unknown
Report Id: e3f4316a-3112-476b-9f13-f2fcc13a42e3
Faulting package full name:
Faulting package-relative application ID:

I have Ryzen 3600 with slightly overclocked RAM, would probably try default, or increasing voltage. All testing programs show no errors. I get some computation errors, but very infrequently.

Rosetta seems to be a better RAM tester than Memtest for me.
ID: 95417 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 95418 - Posted: 27 Apr 2020, 10:19:30 UTC - in response to Message 95417.  

I have Ryzen 3600 with slightly overclocked RAM, would probably try default, or increasing voltage. All testing programs show no errors. I get some computation errors, but very infrequently.
Or better yet default clocks & voltage to see if that sorts out the problem.
Grant
Darwin NT
ID: 95418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 95423 - Posted: 27 Apr 2020, 14:39:39 UTC - in response to Message 95417.  

I have some random crashes, once every few days, I find my crunching computer rebooted when I return to it. I also run Folding@home which I thought was responsible, but in Windows Event viewer the faulting application seems to be rosetta:

Faulting application name: rosetta_4.15_windows_x86_64.exe, version: 0.0.0.0, time stamp: 0x5e856ed2
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005.

[snip]

I'm running Rosetta, some other BOINC projects, and Folding@home on my computer at the same time. Only BOINC currently get the GPU, since I'm trying to avoid changes in how loud the computer's fan is, and Folding@home doesn't always have GPU WUs available.

This often causes crashes of my browser, but not also of Windows.

It tends to make Rosetta tasks take about twice as much clock time to finish, though.

I'm still trying to find out how many virtual CPU cores Folding@home uses at the same time, and how to control this - it appears that the slowdown is due to more background tasks trying to grab CPU time than there are virtual CPU cores to provide such CPU time.

There needs to be a discussion somewhere of how to make BOINC and Folding@home share a computer; I haven't find one at Folding@home.

Changing the Folding@home power setting to light helped reduce the crashes, but has not done much to the slowdown problem.
ID: 95423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 95430 - Posted: 27 Apr 2020, 16:58:05 UTC

Are you running together CPU tasks on BOINC and Folding? If yes then it's nonsense. You just slow them all as they use more memory (and memory bandwidth, and CPU context changes and whatever) without need. Either use the CPU for Folding or for BOINC. There is no way to fix that, you can't just add threads that require CPU usage and expect them all to not be inefficient.

GPU is a different thing of course.
ID: 95430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 95439 - Posted: 27 Apr 2020, 19:02:51 UTC - in response to Message 95430.  

Are you running together CPU tasks on BOINC and Folding? If yes then it's nonsense. You just slow them all as they use more memory (and memory bandwidth, and CPU context changes and whatever) without need. Either use the CPU for Folding or for BOINC. There is no way to fix that, you can't just add threads that require CPU usage and expect them all to not be inefficient.

GPU is a different thing of course.

The Folding@home method for finishing the current WU and then stopping doesn't work, and I don't want to let a Folding@home workunit time out instead of finishing.

I've already found a way to limit the number of threads BOINC is using. If I can find a similar method for Folding@home, I should be able to stop their contention for virtual cores, but let both continue to run CPU work.
ID: 95439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 95443 - Posted: 27 Apr 2020, 20:32:00 UTC - in response to Message 95417.  

Maslo55,

Error Code 0xc0000005 under Window 7 or 10 indicates that a program failed to start.

You should check if the total amount of memory in use is approaching the total amount of memory that your computer has.

If so, the problem is not specific to Rosetta at home, but a problem with trying to run too many memory-demanding programs at once.

You can either add more memory to your computer, or reduce the amount of work your computer is trying to do at once.
ID: 95443 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 0
Message 95445 - Posted: 27 Apr 2020, 20:50:49 UTC - in response to Message 95443.  

Folding@ you can set the number of cores in the cpu slot, change from - 1 to value you want.

I have found there is no optimal for running both simultaneously.
ID: 95445 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brummit

Send message
Joined: 14 Jul 14
Posts: 2
Credit: 30,582
RAC: 0
Message 95454 - Posted: 28 Apr 2020, 1:01:06 UTC

Is there any way you could set up an option to download smaller work units?

My average stats for Rosetta are - 159 completed, and 78 failed. Optimistically that's 1/3 of download work deemed invalid due to running out of time, requiring someone else has to (re) process the data,
and pessimistically, just under half the data fails the deadline. I run the PC 12-15 hours per day average.

A waste of processing time for all.

My PC, though not the latest super duper 1000 core gamer extravaganza, is custom built two years ago, and still pretty good.

Thankyou
'Brummit'.
ID: 95454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 95455 - Posted: 28 Apr 2020, 1:23:51 UTC - in response to Message 95454.  

Brummit,

Under the advanced interface, Your account, Rosetta@home preferences, you can try reducing the Target CPU run time by about a third of its current value. But note that there's a minimum value you're not allowed to go below.

This should give you workunits that run for shorter times, but need about the same amount of memory.

Does this fit your idea of smaller work units?
ID: 95455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,112,600
RAC: 19,835
Message 95457 - Posted: 28 Apr 2020, 1:53:24 UTC - in response to Message 95454.  

Is there any way you could set up an option to download smaller work units?

My average stats for Rosetta are - 159 completed, and 78 failed. Optimistically that's 1/3 of download work deemed invalid due to running out of time, requiring someone else has to (re) process the data,
and pessimistically, just under half the data fails the deadline. I run the PC 12-15 hours per day average.

A waste of processing time for all.

My PC, though not the latest super duper 1000 core gamer extravaganza, is custom built two years ago, and still pretty good.

Have you only recently returned to this project? It looks like you've had a few days off after receiving tasks and had to abort some.
If you're online 12-15hrs/day you should be able to complete 8hr tasks ok when they have a 3-day deadline. Try to let them run and complete and it should improve the more tasks you complete and return.
It should settle down after a few days. Give it another try.
ID: 95457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 95462 - Posted: 28 Apr 2020, 5:24:30 UTC - in response to Message 95454.  
Last modified: 28 Apr 2020, 5:40:08 UTC

Is there any way you could set up an option to download smaller work units?
Just set a smaller cache- reducing the Target CPU Run time (at this point) there's the high risk that you'll just end up with even more Tasks downloading, missing the deadlines & then erroring out than is already happening.
On your account page, Preferences, When and how BOINC uses your computer, Computing preferences
Other	
           Store at least 0.6 days of work
Store up to an additional 0.02 days of work
Works for me.
It takes an extremely long time for the Estimated completion times to get reasonably close to the actual time (Target CPU Run time).


And even so, it will take a while for BOINC to determine how many hours a day your computer is on, and how much time it is able to process work while it is on (the default settings can mean just browsing sites with heavy graphics/scripts will stop BONC from processing work).
Grant
Darwin NT
ID: 95462 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 45 · 46 · 47 · 48 · 49 · 50 · 51 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org