Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 280 · 281 · 282 · 283
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Tasks starting with RosettaVS run for 8 hours for me. Great, but I don't say this for the ones that run as expected, but for all those that don't, of which there seem to be many. Also, I don't recall seeing any RosettaVS tasks. I don't know how they behave. ![]() ![]() |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 380 Credit: 11,334,032 RAC: 8,037 ![]() |
Now out of work new I’ve always figured to leave it on default as the project scientists who set them up know their requirements better than I do. |
![]() Send message Joined: 28 Mar 20 Posts: 1551 Credit: 15,933,616 RAC: 17,840 ![]() |
New batch of work over at Ralph, with new errors. RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_148_16902_5_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 8, in <module> import torch File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorch__init__.py", line 124, in <module> raise err OSError: [WinError 1455] Il file di paging è troppo piccolo per essere completato. Error loading "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchlibcaffe2_detectron_ops_gpu.dll" or one of its dependencies. </stderr_txt> ]]> RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_e_pred_195_16901_6_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 698, in <module> b.write(base64.b64decode(f.read())) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libbase64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Invalid base64-encoded string: number of data characters (65) cannot be 1 more than a multiple of 4 </stderr_txt> ]]> RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_119_16902_6_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 708, in <module> pred.predict(out_name+f'_{n}', File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 551, in predict logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model( File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaRoseTTAFoldModel.py", line 358, in forward msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator( File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaTrack_module.py", line 1106, in forward msa, pair, xyz, state, alpha, symmsub = self.main_block[i_m](msa, pair, File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaTrack_module.py", line 929, in forward xyz, state, alpha = self.str2str( File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchcudaampautocast_mode.py", line 141, in decorate_autocast return func(*args, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaTrack_module.py", line 503, in forward shift = self.se3(G, node.reshape(B*L, -1, 1), l1_feats, edge_feats) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaSE3_network.py", line 96, in forward return self.se3(G, node_features, edge_features) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodeltransformer.py", line 185, in forward node_feats = self.graph_modules(node_feats, edge_feats, graph=graph, basis=basis) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodeltransformer.py", line 47, in forward input = module(input, *args, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersattention.py", line 162, in forward fused_key_value = self.to_key_value(node_features, edge_features, graph, basis) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersconvolution.py", line 347, in forward out += self.conv_in[str(degree_in)](feature, invariant_edge_feats, File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersconvolution.py", line 186, in forward radial_weights = self.radial_func(invariant_edge_feats[e_i:e_j]) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersconvolution.py", line 118, in forward return self.net(features) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulescontainer.py", line 139, in forward input = module(input) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmoduleslinear.py", line 96, in forward return F.linear(input, self.weight, self.bias) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnfunctional.py", line 1847, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: [enforce fail at ..c10coreCPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 536870912 bytes. </stderr_txt>]]> Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 243 Credit: 435,550 RAC: 1,317 ![]() |
Did they port rosetta python projects to native windows? Try to increase pagefile size. It helped with gpugrid python project. It even uses gpu. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
What I'd re-emphasise is that the default runtime for tasks has fallen to 3hrs for some reason, which I believe to be a mistake and contradicts the forced Boinc setting of 8hrs, While generally true, it's clear imo this 3hr target runtime is an error as it's inconsistent with what Rosetta tells Boinc. It only ever slips through when a new version of the app comes out. Istr it happened once before and was corrected in the days when the admins paid more attention to us. If the 8hr default ever changes I think something would be said - and seeing as no-one's saying anything these days I doubt it ever will change without a very specific reason. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Ooh, 360k tasks. We live to fight another day (or two) ![]() ![]() |
![]() Send message Joined: 1 Dec 05 Posts: 1912 Credit: 8,818,406 RAC: 9,577 ![]() |
Today a lot of "classical" error ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442 BOINC:: Error reading and gzipping output datafile: default.out 08:16:19 (5164): called boinc_finish(1) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Today a lot of "classical" error Yes, but very quickly, so I'm not too worried by them More concerning are two Validate errors after running to completion hal_8a_i_hal_8aa_2jp5597_d99_0001_SAVE_ALL_OUT_2978378_13_0 hal_8a_i_hal_8aa_2jp1316_d224_0001_SAVE_ALL_OUT_2978378_13_0 ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Ooh, 360k tasks. We live to fight another day (or two) Turned into 3+ days, but we're out again. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Ooh, 360k tasks. We live to fight another day (or two) While I know most people will have finished up their outstanding tasks already, I managed to sneak 4 extra returned tasks today and now discover that the validators running under boinc-process are down again. Better now than at other times, I guess ![]() ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1551 Credit: 15,933,616 RAC: 17,840 ![]() |
That boinc-process server has developed a habit of regularly falling over, it was well past due for another crash. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Ooh, 360k tasks. We live to fight another day (or two) Or maybe not better now as 660k tasks newly available ![]() ![]() |
![]() Send message Joined: 1 Dec 05 Posts: 1912 Credit: 8,818,406 RAC: 9,577 ![]() |
Or maybe not better now as 660k tasks newly available 0 wus and a lot of daemons are down.... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2024 Credit: 39,862,078 RAC: 19,183 ![]() |
Or maybe not better now as 660k tasks newly available Yup. I would've expected 660k to last at least 2 days, but I'm not sure it lasted much more than 15hrs, Unless tasks got pulled. Front page figures borked on top of boinc-process server borked Edit: Actually, I'm now thinking tasks did get pulled. Unvalidated tasks were about 20k before the new batch arrived - now 160k In progress tasks were about 30k, now 112k That implies 222k tasks were grabbed But the front page is locked at 7am with 660k queued, 440k have gone missing, presumed pulled ![]() ![]() |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org