Module argmin::core::checkpointing

source ·
Expand description

Checkpointing

Checkpointing is a useful mechanism for mitigating the effects of crashes when software is run in an unstable environment, particularly for long run times. Checkpoints are saved regularly with a user-chosen frequency. Optimizations can then be resumed from a given checkpoint after a crash.

For saving checkpoints to disk, FileCheckpoint is provided in the argmin-checkpointing-file crate. Via the Checkpoint trait other checkpointing approaches can be implemented.

The CheckpointingFrequency defines how often checkpoints are saved and can be chosen to be either Always (every iteration), Every(u64) (every Nth iteration) or Never.

The following example shows how the checkpointing method is used to activate checkpointing. If no checkpoint is available on disk, an optimization will be started from scratch. If the run crashes and a checkpoint is found on disk, then it will resume from the checkpoint.

Example

use argmin::core::checkpointing::CheckpointingFrequency;
use argmin_checkpointing_file::FileCheckpoint;

// [...]

let checkpoint = FileCheckpoint::new(
    ".checkpoints",
    "optim",
    CheckpointingFrequency::Every(20)
);

let res = Executor::new(my_optimization_problem, solver)
    .configure(|config| config.param(init_param).max_iters(iters))
    .checkpointing(checkpoint)
    .run()?;

// [...]

Enums

Traits