Questions and Answers : Windows : No Checkpoints for recent jobs.
Author | Message |
---|---|
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I've noticed the last couple weeks that there have been several types of jobs I haven't seen before (some beginning with the "hyb" or "hybred," "cyto," etc.) These jobs are not setting checkpoints, even after crunching up to 11 hours or so (with checkpoint limited to no more than every 60 sec. in computer pref.) The jobs starting "rb_5_17" and other "dates" continue to have checkpoints as usual. The problem, as noted in another recent thread, is that if I must shut down my system or reboot (such as for doing Windows updates, updating applications, etc.), or if I must close BIONC, I lose all the work in these "new" type jobs without checkpoints. Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks. |
vakobo Send message Joined: 3 Aug 08 Posts: 18 Credit: 13,910,889 RAC: 857 |
I have now rb_05_14_38371_73012__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_81049_953_0 task without checkpoints. It is now at 71% but i need to shutdown my PC and it will start next time from 0% again. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I have now rb_05_14_38371_73012__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_81049_953_0 task without checkpoints. It is now at 71% but i need to shutdown my PC and it will start next time from 0% again. The only "remedy" I've found involves my old Windows 98 rig crunching SETI 1 job at a time. I've just backed up the BOINC folder, including the slot with the particular job in question. However, if you're crunching a large number of jobs, this would be a huge backup file. Hopefully whomever wrote this particular type of job will realize there's a bug and will work on remedying this situation, as obviously not all instances are without checkpoints. Good Luck! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
When a task has not checkpointed, then the slots folder hasn't got the data required to help preserve the uncheckpointed work. So you still lose the work just as you do when you turn off the computer without doing the backup described. Some of the types of tasks run recently have long-running models. The application will always checkpoint at the end of a model. Some types of tasks checkpoint more frequently. When new protocols are being developed, it is unclear if the protocol will be used extensively going forward or not. It is also unclear if further refinements can be made to eliminate the long-running models or reduce their frequency. There isn't much you can change on your machine to improve the situation. Each time a task starts, a check is made to see if this has previously been a starting point. It would be normal to see this once and a while as people reboot to install updates to their machines etc. But if it occurs 5 times on the same task, then the task is cut-off and marked as completed. This ensures that such tasks, which are not progressing on your machine, can report back the work they have completed, and free of a space for another task, which may have properties that better match your machine and it's uptime etc. Rosetta Moderator: Mod.Sense |
Questions and Answers :
Windows :
No Checkpoints for recent jobs.
©2024 University of Washington
https://www.bakerlab.org