Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 93 · 94 · 95 · 96 · 97 · 98 · 99 . . . 300 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 101137 - Posted: 7 Apr 2021, 20:30:15 UTC - in response to Message 101135.  

It seems the RAM problem is not yet solved. I just updated Rosetta now and got the same complaint about needing 6 GB of RAM and only having 3 GB.
It's not a problem, it's just some tasks needing more RAM.
It is a problem when a Work Unit says it it will need 6+GB of RAM, when it really only needs 300MB (or less), as that results in almost a third of the projects computing resources becoming unavailable.
It's true all the ones that came through to my larger machines all say under 700MB RAM. Maybe they can occasionally need a lot more so they're playing safe and not crashing small machines with them?
[snip]

I allocate 28Gb from 32Gb total
They don't need the RAM. If they run, they generally use 300Mb, not 5 or 6Gb each. It's more than a bit crackers

Anyway, new news. I grabbed another few tasks while WCG is mainly running and they all seems new and running normally without crashing.
I think the hint dropped pretty heavily when every task got sent back un-run.
Things are happening whether they say so here or not

Have you considered the possibility that many of those creating workunits are not yet very good at estimating how much RAM they will need to run?

I suspect that many of them are also not yet very good at reading the task log files, recognizing the problems they show, and correcting them.
ID: 101137 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Martin

Send message
Joined: 9 Oct 05
Posts: 23
Credit: 1,416,797
RAC: 1,374
Message 101138 - Posted: 7 Apr 2021, 22:09:52 UTC

Hello. After approx. 15 years with Baker Lab, I've experienced an interesting problem, under the general category, computer errors.
Sorry to copy the entire err. report, but perhaps it will clarify.

Any ideas? The past three downloads gave, basically, the same results. Unless computing requirements have changed, recently, then I'll
have to change. Otherwise, perhaps, UW's end is with some new problems?

Good luck.

Jim Martin


Task 1364797245


Name ajzjTxIe_YBAABB_ABYBB_AAAAAAXB_AAY_CGGGGGGCCGGGGGCGGGGGGGGCGGGC_1-4_2-5_3-6.pdb_0001_abinitio_1_abinitio_SAVE_ALL_OUT_1389656_916_1
Workunit 1220204735
Created 7 Apr 2021, 15:30:32 UTC
Sent 7 Apr 2021, 15:32:35 UTC
Report deadline 10 Apr 2021, 15:32:35 UTC
Received 7 Apr 2021, 17:59:51 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x00000001) Unknown error code
Computer ID 1324493
Run time 39 sec
CPU time 24 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 3.37 GFLOPS
Application version Rosetta v4.20
windows_x86_64
Peak working set size 153.00 MB
Peak swap size 124.98 MB
Peak disk usage 0.01 MB

Stderr output
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -fragA 00001.500.6mers -fragB 00001.500.4mers -in:file:fasta 00001.fasta -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 15 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip cp_ajzjTxIe_YBAABB_ABYBB_AAAAAAXB_AAY_CGGGGGGCCGGGGGCGGGGGGGGCGGGC_1-4_2-5_3-6.pdb_0001_abinitio_1_fold_data.zip -out:file:silent default.out -silent_gz -in:file:native 00001.pdb -out:file:silent_struct_type binary -detect_disulf true -fix_disulf disulf -constraints::cst_file CB_cst -constraints:cst_weight 1 -number_9mer_frags 150 -number_3mer_frags 150 -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1985841
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: ERROR: FragmentIO: could not open file 00001.500.6mers
ERROR:: Exit from: ......srccorefragmentFragmentIO.cc line: 233
BOINC:: Error reading and gzipping output datafile: default.out
11:41:18 (10892): called boinc_finish(1)

</stderr_txt>
]]>
ID: 101138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101141 - Posted: 7 Apr 2021, 22:46:08 UTC - in response to Message 101138.  
Last modified: 7 Apr 2021, 22:47:14 UTC

perhaps, UW's end is with some new problems?
Yes: many people have reported the same issue recently. There’s nothing we can do about it other than let the bad work units fail, or stop running Rosetta until the problem has passed.
ID: 101141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 101142 - Posted: 7 Apr 2021, 22:49:40 UTC - in response to Message 101128.  

When I was a teenager,


If you're THAT old, you shouldn't be getting hot flashes every time someone says "dude."
ID: 101142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 101143 - Posted: 7 Apr 2021, 22:56:08 UTC - in response to Message 101141.  

There’s nothing we can do about it

We could ritualistically sacrifice a chicken, and then sprinkle its blood and entrails on @Peter Hucker.

The best part is, none of the staff of the nursing home would believe that it happened, when it's reported by the other residents.
ID: 101143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,115,753
RAC: 19,563
Message 101144 - Posted: 8 Apr 2021, 0:02:04 UTC - in response to Message 101137.  
Last modified: 8 Apr 2021, 0:08:45 UTC

It seems the RAM problem is not yet solved. I just updated Rosetta now and got the same complaint about needing 6 GB of RAM and only having 3 GB.
It's not a problem, it's just some tasks needing more RAM.
It is a problem when a Work Unit says it it will need 6+GB of RAM, when it really only needs 300MB (or less), as that results in almost a third of the projects computing resources becoming unavailable.
It's true all the ones that came through to my larger machines all say under 700MB RAM. Maybe they can occasionally need a lot more so they're playing safe and not crashing small machines with them?
I allocate 28Gb from 32Gb total
They don't need the RAM. If they run, they generally use 300Mb, not 5 or 6Gb each. It's more than a bit crackers

Anyway, new news. I grabbed another few tasks while WCG is mainly running and they all seems new and running normally without crashing.
I think the hint dropped pretty heavily when every task got sent back un-run.
Things are happening whether they say so here or not

Have you considered the possibility that many of those creating work-units are not yet very good at estimating how much RAM they will need to run?

I suspect that many of them are also not yet very good at reading the task log files, recognizing the problems they show, and correcting them.

I hadn't considered it because if someone can code for the kind of work we're getting I wouldn't be so grossly insulting as to suggest they're a bit thick.
I can easily imagine either the slip of a finger or maybe some kind of test that they didn't want to be limited by RAM or disk space to have accidentally been left in.

Honestly, of all the things to suggest... have a word with yourself

Aside from that, it would be nice if we could have a few more of those tasks that it seems I was lucky to pick up. They seem fine on my main PC with plenty of RAM

Edit again: Miraculously picked up 4 tasks on my laptop immediately after posting. None again when I tried on the desktop. They're trying, but hand to mouth.
ID: 101144 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Martin

Send message
Joined: 9 Oct 05
Posts: 23
Credit: 1,416,797
RAC: 1,374
Message 101145 - Posted: 8 Apr 2021, 0:55:10 UTC - in response to Message 101141.  

Thanks, for the reply, Brian. I wonder why some have this problem, and others don't. Nothing has changed (computer) on this end of the line.
So, will just run SiDock@home, for awhile. Natalia has a cheerful, and informative approach to running things.

jm
ID: 101145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 101146 - Posted: 8 Apr 2021, 1:04:35 UTC - in response to Message 101138.  

Hello. After approx. 15 years with Baker Lab, I've experienced an interesting problem, under the general category, computer errors.
Sorry to copy the entire err. report, but perhaps it will clarify.

Any ideas? The past three downloads gave, basically, the same results. Unless computing requirements have changed, recently, then I'll
have to change. Otherwise, perhaps, UW's end is with some new problems?

[snip]

This line seems to be the important one:

ERROR: ERROR: FragmentIO: could not open file 00001.500.6mers

I've seen such errors in most of my recent failed tasks from R@H, so it is probably an error in what was included in the workunit.
ID: 101146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 101147 - Posted: 8 Apr 2021, 1:13:48 UTC - in response to Message 101144.  

[snip]
Have you considered the possibility that many of those creating work-units are not yet very good at estimating how much RAM they will need to run?

I suspect that many of them are also not yet very good at reading the task log files, recognizing the problems they show, and correcting them.

I hadn't considered it because if someone can code for the kind of work we're getting I wouldn't be so grossly insulting as to suggest they're a bit thick.
I can easily imagine either the slip of a finger or maybe some kind of test that they didn't want to be limited by RAM or disk space to have accidentally been left in.

Honestly, of all the things to suggest... have a word with yourself

Aside from that, it would be nice if we could have a few more of those tasks that it seems I was lucky to pick up. They seem fine on my main PC with plenty of RAM

Edit again: Miraculously picked up 4 tasks on my laptop immediately after posting. None again when I tried on the desktop. They're trying, but hand to mouth.

You appear to be assuming that those who write the code for the application are the same ones creating the workunits.

They have to be taught at some time. Which activity do you think they are allowed to do first? Or do you think that they can start both at the same time, with no tests of what the are doing works properly?
ID: 101147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 101148 - Posted: 8 Apr 2021, 1:50:59 UTC - in response to Message 101138.  

Name ajzjTxIe_YBAABB_ABYBB_AAAAAAXB_AAY_CGGGGGGCCGGGGGCGGGGGGGGCGGGC_1-4_2-5_3-6.pdb_0001_abinitio_1_abinitio_SAVE_ALL_OUT_1389656_916_1

It seems to be the work units that have names that look like someone fell asleep at the keyboard
or a tin o fizzy stuff gon sticky keys that are the worst
with over 800 of the whack job units in my Error list,
that's is a lot of duff guff to clog up the data base and turn it into more of a septic tank.
ID: 101148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,118,186
RAC: 5,220
Message 101151 - Posted: 8 Apr 2021, 2:34:43 UTC - in response to Message 101148.  

Name ajzjTxIe_YBAABB_ABYBB_AAAAAAXB_AAY_CGGGGGGCCGGGGGCGGGGGGGGCGGGC_1-4_2-5_3-6.pdb_0001_abinitio_1_abinitio_SAVE_ALL_OUT_1389656_916_1

It seems to be the work units that have names that look like someone fell asleep at the keyboard
or a tin o fizzy stuff gon sticky keys that are the worst
with over 800 of the whack job units in my Error list,
that's is a lot of duff guff to clog up the data base and turn it into more of a septic tank.


Hopefully it means something to someone somewhere.
ID: 101151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,115,753
RAC: 19,563
Message 101153 - Posted: 8 Apr 2021, 3:02:41 UTC - in response to Message 101147.  
Last modified: 8 Apr 2021, 3:03:53 UTC

Have you considered the possibility that many of those creating work-units are not yet very good at estimating how much RAM they will need to run?

I suspect that many of them are also not yet very good at reading the task log files, recognizing the problems they show, and correcting them.

I hadn't considered it because if someone can code for the kind of work we're getting I wouldn't be so grossly insulting as to suggest they're a bit thick.
I can easily imagine either the slip of a finger or maybe some kind of test that they didn't want to be limited by RAM or disk space to have accidentally been left in.

Honestly, of all the things to suggest... have a word with yourself

Aside from that, it would be nice if we could have a few more of those tasks that it seems I was lucky to pick up. They seem fine on my main PC with plenty of RAM

Edit again: Miraculously picked up 4 tasks on my laptop immediately after posting. None again when I tried on the desktop. They're trying, but hand to mouth.

You appear to be assuming that those who write the code for the application are the same ones creating the workunits.

They have to be taught at some time. Which activity do you think they are allowed to do first? Or do you think that they can start both at the same time, with no tests of what the are doing works properly?

Christ...
ID: 101153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101154 - Posted: 8 Apr 2021, 7:39:04 UTC - in response to Message 101151.  
Last modified: 8 Apr 2021, 7:42:39 UTC

Hopefully it means something to someone somewhere.
Ah: divided by a common language… :-⁠)
Perfectly comprehensible this side of the pond.

tin o[f] fizzy stuff = can of soda
duff = defective
guff = nonsense

Also “septic tank” is occasionally used as rhyming slang for “Yank”, but I don’t think that’s what’s meant here.
ID: 101154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1993
Credit: 9,520,400
RAC: 11,365
Message 101156 - Posted: 8 Apr 2021, 8:09:25 UTC - in response to Message 101147.  

You appear to be assuming that those who write the code for the application are the same ones creating the workunits.

I hope they work on the same office/building.....or that they keep in touch constantly!

Or do you think that they can start both at the same time, with no tests of what the are doing works properly?

Ralph@home exists for that.
ID: 101156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101158 - Posted: 8 Apr 2021, 8:24:58 UTC - in response to Message 101145.  

will just run SiDock@home, for awhile. Natalia has a cheerful, and informative approach to running things.
Yeah – it’s a new project: they’re still keen. I dare say it was like that here in the early days, too? And that 15 years from now users will be moaning that the SiDock admins don’t talk to them any more…
ID: 101158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1993
Credit: 9,520,400
RAC: 11,365
Message 101159 - Posted: 8 Apr 2021, 8:33:54 UTC - in response to Message 101158.  

And that 15 years from now users will be moaning that the SiDock admins don’t talk to them any more…

There are "old" projects that have active admins on forums.
ID: 101159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dougga

Send message
Joined: 27 Nov 06
Posts: 28
Credit: 5,248,050
RAC: 0
Message 101165 - Posted: 8 Apr 2021, 18:06:28 UTC - in response to Message 80621.  

My linux client appears to be stuck with issues with messaging from the server.
I removed Rosetta and added it back.
I'm not getting any work untis.
It says ...

Requesting new tasks from CPU
Scheduler requests complete: got 0 tasks
No tasks sent
Project requested delay of 31s

After the 31 seconds, no further requests are made
Manual requests reruns the loop

I've uninstalled the program, rebooted etc...
ID: 101165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101167 - Posted: 8 Apr 2021, 18:17:56 UTC - in response to Message 101165.  

I'm not getting any work untis.
Neither is anybody else. There is no work that needs doing at the moment. Try again in a couple of days’ time.
ID: 101167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PorkyPies

Send message
Joined: 6 Apr 20
Posts: 45
Credit: 1,650,779
RAC: 0
Message 101173 - Posted: 8 Apr 2021, 23:27:08 UTC

I'm getting new work. My larger x64 machines are only getting 11 per request so it takes a few goes to get enough for all cores.

On the Pi4 4GB I'm still getting;
Rosetta@home 9/04/2021 9:16:50 AM Message from server: Rosetta needs 6675.72 MB RAM but only 3460.72 MB is available for use.

The 6.6GB free memory requirement hasn't changed.
MarksRpiCluster
ID: 101173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,115,753
RAC: 19,563
Message 101174 - Posted: 8 Apr 2021, 23:53:25 UTC - in response to Message 101154.  

Hopefully it means something to someone somewhere.
Ah: divided by a common language… :-⁠)
Perfectly comprehensible this side of the pond.

tin o[f] fizzy stuff = can of soda
duff = defective
guff = nonsense

Also “septic tank” is occasionally used as rhyming slang for “Yank”, but I don’t think that’s what’s meant here.

Lol! Too much information.

Wrt whether that's what was meant, I didn't think it was wrt the slang. I thought it was wrt to the CGCCGGG stuff in the task names

Isn't that genetic code? I've assumed it was. Whether it is or not, it's only important it means something to them, not to us.

Unless they're too dumb to understand what they're naming means - far be it for me to exclude all possibilities...
ID: 101174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 93 · 94 · 95 · 96 · 97 · 98 · 99 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org