Message boards : Number crunching : Rosetta 4.0+
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 19 · Next
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
RAM requirements more then 3 GB / WU ... isn't fun anymore And.....not always. Sometimes also i have these errors even if i have more than 7gb of ram free. I think it's an allocation memory problem. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
After 6hs of calculation (my default time is 2hs) 1105183556 Starting watchdog... |
Johannes Kingma Send message Joined: 4 Dec 17 Posts: 33 Credit: 9,453,639 RAC: 0 |
Since a couple of months I face a problem that the Windows 4.07 stalls on uploads reaching 100%. The than move to status Upload: pending (project backoff). All data seems to upload fine. stdouae.txt reads:
12-Dec-2019 10:36:29 [Rosetta@home] Sending scheduler request: Requested by user. 12-Dec-2019 10:36:29 [Rosetta@home] Reporting 1 completed tasks 12-Dec-2019 10:36:29 [Rosetta@home] Not requesting tasks: too many uploads in progress 12-Dec-2019 10:36:31 [Rosetta@home] Scheduler request completed 12-Dec-2019 10:36:48 [Rosetta@home] work fetch suspended by user 12-Dec-2019 10:36:49 [Rosetta@home] work fetch resumed by user
|
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version: Rosetta v4.07 windows_intelx86 Device: 3710630, Task: 1112811554, and WU 1002360285. Name: foldit_2007855_0003_fold_and_dock_SAVE_ALL_OUT_849400_5017_0 Status: Error while computing Exit status: 1 (0x00000001) Unknown error code Incorrect function. (0x1) - exit code 1 (0x1) ERROR: Assertion `std::abs( coordsys_rot.det() - 1.0 ) < 1e-6` failed. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
|
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version: Rosetta v4.07 windows_intelx86 Device: 3710630, Task: 1113518581, and WU 1002915300. Name: rb_12_26_12887_12992__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_884505_148 Status: Error while computing Errors: Too many errors (may have bug) Too many total results Exit status: -529697949 (0xE06D7363) Unknown error code <message>(unknown error) - exit code -529697949 (0xe06d7363)</message> Unhandled Exception Detected... Note that prior task errored-out due to: Unknown error code Incorrect function. (0x1) - exit code 1 (0x1) |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version: Rosetta v4.07 windows_intelx86 Device: 3710630, Task: 1118475697, and WU 1007449108. Name: rb_01_23_14034_14473__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_886490_64 Status: Error while downloading Errors: Too many errors (may have bug). Too many total results. Exit status: -186 (0xFFFFFF46) ERR_RESULT_DOWNLOAD WU download error: couldn't get input files: |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version: Rosetta v4.07 windows_intelx86 Device: 1759960, Task: 1121955810, and WU: 1010612613. Name: rb_02_08_15652_15556__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_891233_2568_0 Status: Error while computing Exit status: 1 (0x00000001) Unknown error code <message>Incorrect function. (0x1) - exit code 1 (0x1)</message> There was also additional info in Event Log: 2/16/2020 4:33:48 AM | Rosetta@home | Computation for task rb_02_08_15652_15556__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_891233_2568_0 finished Similar error with Task: 1121955812 and WU: 1010612617. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
Application version: Rosetta v4.07 windows_intelx86 Looks like the important lines are: Status: Error while computing <message>Incorrect function. (0x1) - exit code 1 (0x1)</message> All the rest appears to be the result of that. I'd like to see more information shown about WHAT the error in computing was, though, whenever such an error occurs. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version: Rosetta v4.07 windows_intelx86 The 2nd host has now completed crunching this task, and had a successful outcome. However, the app used in this case was Rosetta v4.07 windows_x86_64 and not Rosetta v4.07 windows_intelx86. Concerning WU 1010612617, my host as well as 2nd host both were using v4.07 windows_intelx86 and that one also failed validation for same reason as my host. Appears app v4.07 windows_intelx86 needs to be checked out in association with this type of task. Note also that my system is a 64. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
Are there multiple upload servers, with at least one not working? I have a 4.07 workunit that is finished, but an output file repeatedly refuses to upload. Another workunit that finished after this one quickly uploaded, and was marked completed. Another attempt to do the upload is scheduled for about 5.5 hours from now. The file size for the upload is 2.53 MB. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There are multiple upload servers. Yes. The BOINC Manager sets a delay on the next attempt, and I believe the WU has a list of possible upload servers, and BOINC will rotate through them on future attempts. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
There are multiple upload servers. Yes. The BOINC Manager sets a delay on the next attempt, and I believe the WU has a list of possible upload servers, and BOINC will rotate through them on future attempts. I see the delays on next attempts. Up to several hours now. If the WU has a list of upload servers, it has failed to upload to many of them over the last few days. Does this mean the the list of upload servers included in the WU should be examined to determine if it includes any valid servers? Can users do that or does it need to be done at the project end of the connections? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Does this mean the the list of upload servers included in the WU should be examined to determine if it includes any valid servers? It means that it is a bad work unit. All your others are going through, and I don't see the upload problem at all. Ditch it. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
Another small download is stalled, which blocks my computer from downloading any tasks. 3/10/2020 11:47:11 AM | Rosetta@home | Not requesting tasks: some download is stalled 3/10/2020 11:47:13 AM | Rosetta@home | Scheduler request completed 3/10/2020 12:12:53 PM | Rosetta@home | Started download of rb_03_01_17261_17076_ab_t000__robetta.zip 3/10/2020 12:18:00 PM | | Project communication failed: attempting access to reference site 3/10/2020 12:18:00 PM | Rosetta@home | Temporarily failed download of rb_03_01_17261_17076_ab_t000__robetta.zip: transient HTTP error 3/10/2020 12:18:00 PM | Rosetta@home | Backing off 04:04:15 on download of rb_03_01_17261_17076_ab_t000__robetta.zip 3/10/2020 12:18:01 PM | | Internet access OK - project servers may be temporarily down. 4.05 KB of a 5.37 KB file was downloaded. The last file I saw this problem on was also an *.zip file. Could this mean that the server gets confused about when it's time to stop sending more of an *.zip file? Or could it mean that when a download of a small file fails, the next attempt always uses the same server? |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 225 |
I had one of these earlier this week, a download stalled, at 90+% complete. I tried restarting the process, and the machine, but ultimetely killed the job. It was completed and validated by my wingman. He has Windows 10, I have 8.1, but I doubt that is significant. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
I had one of these earlier this week, a download stalled, at 90+% complete. I tried restarting the process, and the machine, but ultimetely killed the job. It was completed and validated by my wingman. He has Windows 10, I have 8.1, but I doubt that is significant. Probably not - I also use Windows 10. Another such failure: 3/11/2020 6:41:34 AM | Rosetta@home | Started download of twc_method_msd_cpp_c580_9mer_gb_000073_msd.zip 3/11/2020 6:46:41 AM | Rosetta@home | Temporarily failed download of twc_method_msd_cpp_c580_9mer_gb_000073_msd.zip: transient HTTP error 3/11/2020 6:46:41 AM | Rosetta@home | Backing off 01:26:54 on download of twc_method_msd_cpp_c580_9mer_gb_000073_msd.zip 3/11/2020 6:46:42 AM | | Project communication failed: attempting access to reference site 3/11/2020 6:46:44 AM | | Internet access OK - project servers may be temporarily down. Several attempts, each failed after getting 2.63 KB of the expected 3.04 KB. I now just abort the download in such cases - nothing else seems to help if several attempts to download the same file have failed. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
Another case of a small input file repeatedly failing to download: 3/13/2020 5:57:03 AM | Rosetta@home | Started download of twc_method_msd_cpp_JC_9_34334_1_msd.zip 3/13/2020 6:02:10 AM | Rosetta@home | Temporarily failed download of twc_method_msd_cpp_JC_9_34334_1_msd.zip: transient HTTP error 3/13/2020 6:02:10 AM | Rosetta@home | Backing off 04:42:28 on download of twc_method_msd_cpp_JC_9_34334_1_msd.zip 3/13/2020 6:02:11 AM | | Project communication failed: attempting access to reference site 3/13/2020 6:02:13 AM | | Internet access OK - project servers may be temporarily down. 3/13/2020 6:43:44 AM | Rosetta@home | Sending scheduler request: To report completed tasks. 3/13/2020 6:43:44 AM | Rosetta@home | Reporting 1 completed tasks 3/13/2020 6:43:44 AM | Rosetta@home | Not requesting tasks: some download is stalled 3/13/2020 6:43:46 AM | Rosetta@home | Scheduler request completed Only 2.63 KB of the expected 3.03 KB would download. Windows 10 did an automatic update during last night. Unclear if this was involved in the download problem. I aborted this download to allow downloading more tasks. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 4,447 |
Another repeatedly failing download of a small input file: 3/16/2020 4:54:30 PM | Rosetta@home | Started download of 9v1nm_gb_c3065_9mer_gb_001680.zip 3/16/2020 4:59:38 PM | Rosetta@home | Temporarily failed download of 9v1nm_gb_c3065_9mer_gb_001680.zip: transient HTTP error 3/16/2020 4:59:38 PM | Rosetta@home | Backing off 00:58:42 on download of 9v1nm_gb_c3065_9mer_gb_001680.zip 3/16/2020 4:59:39 PM | | Project communication failed: attempting access to reference site 3/16/2020 4:59:41 PM | | Internet access OK - project servers may be temporarily down. Got 2.63 KB of the expected 2.93 KB. Is it worthwhile to report this type of download failure? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Is it worthwhile to report this type of download failure? I tried. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1000 |
Message boards :
Number crunching :
Rosetta 4.0+
©2024 University of Washington
https://www.bakerlab.org