Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 176 · 177 · 178 · 179 · 180 · 181 · 182 . . . 300 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 104846 - Posted: 16 Feb 2022, 19:27:03 UTC - in response to Message 104844.  

For the past few months, Rosetta work units have been unavailable for days on end. No errors are shown, just "got 0 new tasks".

They aren't very good at alerting you to the fact that the work units are mostly the pythons now, which require VirtualBox.
And lots of memory (3 GB per work unit). And lots of disk space.

And they will kill your SSD unless you are careful.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14871

The other problems are too painful to mention. You won't have to look far in this thread, once you get past the irrelevant posts.
Without a moderator, there are a lot of them. That is still another problem. Lots of luck.
ID: 104846 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104847 - Posted: 16 Feb 2022, 19:39:29 UTC - in response to Message 104844.  

I have been running Rosetta on multiple computers for years, and it has been a mostly hands-off background task requiring minimal supervision.

For the past few months, Rosetta work units have been unavailable for days on end. No errors are shown, just "got 0 new tasks".

2/16/2022 11:45:16 AM | Rosetta@home | update requested by user
2/16/2022 11:45:20 AM | Rosetta@home | Sending scheduler request: Requested by user.
2/16/2022 11:45:20 AM | Rosetta@home | Requesting new tasks for CPU
2/16/2022 11:45:22 AM | Rosetta@home | Scheduler request completed: got 0 new tasks
2/16/2022 11:45:22 AM | Rosetta@home | No tasks sent
2/16/2022 11:45:22 AM | Rosetta@home | Project requested delay of 31 seconds


When this happens, the online website server status shows approximately 5000 tasks are ready to send, with some large number (~100k) of tasks in progress, and little server side processing occurring.

Computing status
Work
Tasks ready to send 4992
Tasks in progress 115529
Workunits waiting for validation 0
Workunits waiting for assimilation 1
Workunits waiting for file deletion 1
Tasks waiting for file deletion 1
Transitioner backlog (hours) 0.00


It seems that whenever the available number of tasks gets down to around 5000, all work units are considered sent, and the server backend is now just waiting for completed work to be returned. I don't recall ever seeing the available tasks go down to zero.

When I do eventually get some tasks, everything runs as expected locally until all tasks are finished. Then I go idle for days waiting for more tasks to become available.

If seems to me that the project is just not generating as much work for all of it's users these days. I don't know if that is because the number of work units are down, or there are many more users available to process the same number of generally available units, or if the type of work has changed and I am unaware of what my system is lacking so it can be sent some of these "new" type of tasks now being made available.

Is there a checklist somewhere that I can use to verify my system is setup correctly? Because my BOINC Manager currently thinks everything is running just fine.

I used to run Rosetta work exclusively. But to keep my computers occupied (non-idle) I have since added other projects so I can pickup other tasks when no Rosette tasks are available. The downside is that when Rosetta tasks are available, these other projects dilute the amount of resources I can devote to Rosetta in the hands-off processing approach I prefer, as all projects now have to share the available CPU time.

If many Rosetta users are running out of work, but there are still 10s or 100s of thousands of tasks still in progress, can Rosetta start limiting the number of tasks sent to individual users (even if they are willing to backlog a large numbers of tasks locally)?

I have seen other projects where tasks were only generated in large bursts, and the users knew to backlog days or weeks worth of tasks since the server would quickly run out of new tasks to send out. The result was that if you didn't stockpile tasks during the initial big release, you would virtually never see any tasks unless BOINC happened to check in during a new big release of tasks days or weeks in the future.

Limiting the size of individual user backlogs would spread the available work out across all the available users. That would help retain more users, since everyone would feel like they are contributing to the project. At this point, I feel like I'm getting sidelined with no work, while others are sitting on a lot of work units they cannot run immediately. And the rate of results back to Rosetta will be delayed unnecessarily as they wait for the return of backlogged tasks for a few users instead of sending them to idle machines instead.

My Rosetta@home Statistics graph clearly shows 3 bursts of activity over a total of 8 days within the past 30 days. That leaves me sitting idle for 22 days (or about 75% of that time). My main PC (which the graph come from) is capable of running 16 concurrent tasks in 32 GB of RAM at ~3.5 GHZ CPU speed, so while I can normally complete many concurrent tasks in about 8 hours, 75% of the month Rosetta gets ZERO results from me for lack of tasks to run.

https://drive.google.com/file/d/1X5aBWy0xj2wgV7DpF9tqjrRg8i8E-XEY/view


It looks like your just getting 4.2 tasks????
Well if that is the case you need to enable virtualization on your MOBO and download Oracle Virtualbox and its extension pack and you need to enable the option for RAH to send you Python (Virtual box tasks) via your account ->(Computing and credit section) ->Computers on this account View ->Details ->VirtualBox VM jobs (bottom of the page) and change that button to allow Vbox tasks to run.

If you have been running Python and then it goes cold, then check this same page and make sure your computer has not been knocked off by the system for "errors".

Your team is Gridcoin...are your systems run by Gridcoin?
Someone else here is also a gridcoin member and can maybe help you further.
How many days of work do you store on your system?
Do you run other projects besides RAH?
Often the no new tasks message comes up when your system is maxed out for work.

As for task types...you need to look deeper in the server status
Tasks by application
Application Unsent In progress
Rosetta 1 93348
Rosetta Mini 0 0
rosetta python projects 4995 16897

Feb 16 840pm CET
ID: 104847 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104852 - Posted: 16 Feb 2022, 21:44:31 UTC

Rosetta@home: Notice from server
rosetta python projects needs 14411.41MB more disk space. You currently have 4662.07 MB available and it needs 19073.49 MB.
2/16/2022 10:25:14 PM


Say what?
500 GB dedicated drive
Boinc set to leave 2GB free use the rest
Windows says 97.1 used and 358 free out of 465 available

running 10 python and 4x 4.2

Weird
ID: 104852 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 8,525
Message 104853 - Posted: 16 Feb 2022, 22:51:45 UTC - in response to Message 104852.  

If the boxes are unchecked then I believe it uses your web preferences.
ID: 104853 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2118
Credit: 41,164,538
RAC: 14,014
Message 104855 - Posted: 16 Feb 2022, 23:19:15 UTC - in response to Message 104842.  

I couldn't even access the home page of Sidock.

Sidock is in maintenance today, but will be turn online soon
TN-Grid may be something, but I'm not sure what.

It's a storical boinc project about gene network....
http://gene.disi.unitn.it/test/

Sidock: ok
TN-Grid: I got there and looked around a bit, but I didn't really understand what was going on tbh.
If you repeat what they say, I'm not sure I'll understand it any better. I know my limitations.
I'll persist here
ID: 104855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2118
Credit: 41,164,538
RAC: 14,014
Message 104856 - Posted: 16 Feb 2022, 23:36:43 UTC - in response to Message 104844.  

It seems that whenever the available number of tasks gets down to around 5000, all work units are considered sent, and the server backend is now just waiting for completed work to be returned. I don't recall ever seeing the available tasks go down to zero.

Scroll down to the bottom of the server status page. 5000(ish) are Python tasks you'll need VirtualBox for. Anything above that is the Rosetta 4.20 tasks you're set up for and used to.
Basically, you're right.

Is there a checklist somewhere that I can use to verify my system is setup correctly? Because my BOINC Manager currently thinks everything is running just fine.

No reason to think anything is wrong if you successfully ran unattended before. Everyone's in the same position.

Limiting the size of individual user backlogs would spread the available work out across all the available users. That would help retain more users, since everyone would feel like they are contributing to the project. At this point, I feel like I'm getting sidelined with no work, while others are sitting on a lot of work units they cannot run immediately. And the rate of results back to Rosetta will be delayed unnecessarily as they wait for the return of backlogged tasks for a few users instead of sending them to idle machines instead.

We did that already in 2019. What you're seeing now is the good version.
Yeah, I know...

The project changed recently, prioritising Python/Virtual Box tasks, of which there are many (2.2million) available.
Plain Rosetta 4.20 tasks only become available in dribs and drabs in the way you're seeing.
Python/Virtual Box tasks have their own problems, which many people are struggling with right now. <Very> high RAM and disk demands.
If you want to give them a try you'll get work, but they won't be so easily runnable, as you'll see from other comments here.
This just seems to be the way things are. Sorry
ID: 104856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2118
Credit: 41,164,538
RAC: 14,014
Message 104857 - Posted: 16 Feb 2022, 23:43:15 UTC - in response to Message 104852.  

Weird

I think I just posted the long version of what you're getting in the "There's a max WU of 8 with Virtualbox" thread.
In short: yes, me too, exactly the same but on a 1Tb drive
ID: 104857 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104861 - Posted: 17 Feb 2022, 6:58:56 UTC - in response to Message 104857.  
Last modified: 17 Feb 2022, 7:17:39 UTC

Weird

I think I just posted the long version of what you're getting in the "There's a max WU of 8 with Virtualbox" thread.
In short: yes, me too, exactly the same but on a 1Tb drive



1TB and YOU got an error?
I got into it with computer whats his name...so what did you and him figure out?
And you said 8 tasks..but 8 python or 8 in total?
ID: 104861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104862 - Posted: 17 Feb 2022, 7:23:25 UTC - in response to Message 104855.  

Sidock: ok
TN-Grid: I got there and looked around a bit, but I didn't really understand what was going on tbh.
If you repeat what they say, I'm not sure I'll understand it any better. I know my limitations.
I'll persist here[/quote]

I had a look there as well. It's over my head...but I think its just a larger cellular form of this.
We are researching on the micro micro scale. They research on the gene level, quite a few steps up from this.
But at first glance it really didn't look that exciting.
ID: 104862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,531,739
RAC: 6,525
Message 104866 - Posted: 17 Feb 2022, 13:11:32 UTC - in response to Message 104862.  
Last modified: 17 Feb 2022, 13:14:29 UTC

I had a look there as well. It's over my head...but I think its just a larger cellular form of this.
We are researching on the micro micro scale. They research on the gene level, quite a few steps up from this.
But at first glance it really didn't look that exciting.


The tn-grid project is managed by the University of Trento, Helmut Mach Foundation and CNR (Italian National Research Council)
Here a description of the project.
They worked, in the past, with some model organism, with Vitis VInifera, and now with human genome, in project like, for example, colorectan cancer characterization
Here, for example, a publication from the project about drug repositioning.

Plus:
- no virtual machine
- optimized app for cpu (SSE-AVX)
- android app
- admins on forums
ID: 104866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 258
Credit: 483,503
RAC: 109
Message 104868 - Posted: 17 Feb 2022, 14:19:53 UTC

Looks like sidock is working again
https://www.sidock.si/sidock/forum_thread.php?id=189
ID: 104868 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BoredEEdude

Send message
Joined: 11 Apr 12
Posts: 11
Credit: 38,954,694
RAC: 4
Message 104872 - Posted: 17 Feb 2022, 19:45:59 UTC - in response to Message 104844.  

Thanks to all for the responses about VirtualBox.

I installed and run BOINC from a mechanical hard disk that is separate from the SSD from which the Windows OS, specifically to reduce the constant writes I expected BOINC to be doing. So getting VirtualBox running on that same drive should not be a problem for the SSD.

At this point, the still don't understand is how to get the VM up and running in VirtualBox to begin with.

VirtualBox was installed at the same time as BOINC, using the combined BOINC + VB installer, so the two program versions should be the approved/compatible pairing. I can start VirtualBox separately, and it looks to be running OK, it just doesn't have any VMs loaded. From the BOINC Event Log, it knows that VirtualBox is available.

2/17/2022 10:44:30 AM | | Host name: DELL-XPS8930
2/17/2022 10:44:30 AM | | Processor: 16 GenuineIntel Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz [Family 6 Model 158 Stepping 12]
2/17/2022 10:44:30 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
2/17/2022 10:44:30 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.22000.00)
2/17/2022 10:44:30 AM | | Memory: 31.81 GB physical, 40.31 GB virtual
2/17/2022 10:44:30 AM | | Disk: 931.39 GB total, 751.12 GB free
2/17/2022 10:44:30 AM | | Local time is UTC -5 hours
2/17/2022 10:44:30 AM | | No WSL found.
2/17/2022 10:44:30 AM | | VirtualBox version: 6.1.12


Is Rosetta@home supposed to downloaded and start a VM automatically? Or do I have to manually kick off the VM installation somehow? After reading countless forum posts for a few hours, I have seen lots of discussion about trying to get stuff to run correctly, and a few mentions of an image checksum being wrong so the VM apparently won't install, but haven't seen an explanation about how how the initial VM installation is supposed to work.

On my system the original tasks will download and run when they are available, but nothing seems to be happening as it relates to VirtualBox.
ID: 104872 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 353
Credit: 1,227,479
RAC: 2,728
Message 104873 - Posted: 17 Feb 2022, 19:56:36 UTC - in response to Message 104872.  

If you aren't receiving Python work, go to your Rosetta account, list of computers, click on details on whatever computer you wish to run the Pythons on, scroll down and click allow where it says "allow VirtualBox jobs". Then try again.
ID: 104873 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 353
Credit: 1,227,479
RAC: 2,728
Message 104875 - Posted: 17 Feb 2022, 20:17:12 UTC
Last modified: 17 Feb 2022, 20:24:20 UTC

Rosetta 4.2 work is back. Assuming the increase in the queue is 100% due to 4.2 work, it's a sizeable batch with over almost 3 million tasks.
Then again, we'll probably plough through it in an instant or it will get pulled by the devs since it has a 100% error rate so far on my laptop.

Edit: Ok, got some tasks that are all numbers on the name and those are running well so far.
100% error rate on the movingstub ones.
ID: 104875 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 258
Credit: 483,503
RAC: 109
Message 104877 - Posted: 17 Feb 2022, 20:32:34 UTC - in response to Message 104875.  

Movingstubs crash on my pc too.
ID: 104877 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BoredEEdude

Send message
Joined: 11 Apr 12
Posts: 11
Credit: 38,954,694
RAC: 4
Message 104881 - Posted: 17 Feb 2022, 22:08:18 UTC - in response to Message 104875.  

My movingstub tasks fail also.

Sorted tasks by name, then suspended the non-movingstub ones. That forced all of the movingstubs to fail immediately, report back as failed (not canceled) and then download more good tasks.
ID: 104881 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 258
Credit: 483,503
RAC: 109
Message 104882 - Posted: 17 Feb 2022, 22:17:37 UTC - in response to Message 104881.  

I had to update project several times to get working tasks.
Rosetta gave me four good tasks and four bad tasks.
ID: 104882 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 104883 - Posted: 17 Feb 2022, 22:18:09 UTC
Last modified: 17 Feb 2022, 22:20:45 UTC

The Pcr work units are getting up me nose
Error at startup 20 seconds
17/02/2022 22:13:33 | Rosetta@home | Output file PcrV11AA_PcrV_111136_000009_extract_A_SAVE_ALL_OUT_2906118_477_1_r1856145221_0 for task PcrV11AA_PcrV_111136_000009_extract_A_SAVE_ALL_OUT_2906118_477_1 absent
Anyone else having problems with them ?
edit - had a look back up the thread , its not just me.
ID: 104883 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 104884 - Posted: 17 Feb 2022, 22:27:29 UTC - in response to Message 104883.  

Anyone else having problems with them ?
edit - had a look back up the thread , its not just me.

I don't know if they are the same on Windows as Linux, but I have had my share.
https://boinc.bakerlab.org/rosetta/results.php?hostid=6176524&offset=0&show_names=0&state=6&appid=

I really want only the pythons, but they don't allow you to select them.
ID: 104884 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BoredEEdude

Send message
Joined: 11 Apr 12
Posts: 11
Credit: 38,954,694
RAC: 4
Message 104885 - Posted: 17 Feb 2022, 22:33:52 UTC - in response to Message 104873.  

If you aren't receiving Python work, go to your Rosetta account, list of computers, click on details on whatever computer you wish to run the Pythons on, scroll down and click allow where it says "allow VirtualBox jobs". Then try again.


Thanks! That was the missing step I needed to get a VM to download.

Of the three initial "rosetta python" tasks downloaded, all three failed with with "Computational error" after 15 seconds. So it doesn't look like Python will be running smoothly on my system anytime soon. :(
ID: 104885 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 176 · 177 · 178 · 179 · 180 · 181 · 182 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org