HTC Jobs and JobBatch

This module deals with Rescale Jobs and Batch Jobs, which are where actual compute is performed in Rescale. Jobs are returned as a HtcJob or HtcJobBatch objects.

class rescalehtc.htcjobs.HtcJob(json: dict, task: HtcTask)

Class for a single Rescale Job.

json

The raw dictionary describing this Job. The dictionary follows the HTCJob schema in the Rescale HTC API documentation. Example contents:

{
    "jobUUID": "155f18d4",
    "providerJobId": "provider-id-12345",
    "region": "AWS_AP_SOUTHEAST_1",
    "taskId": "task-12345",
    "projectId": "project-12345",
    "status": "FAILED",
    "statusReason": "Completed",
    "container": {
        "exitCode": 3,
        "reason": "Container Exited"
    },
    "createdAt": "2023-10-19T08:05:53.730Z",
    "createdBy": "qWoUF",
    "failureCode": "ErrorTimeout",
    "workspaceId": "04-8098234",
    "group": "sample-group",
    "commands": [
        "python",
        "script.py"
    ],
    "envs": [
        {
        "name": "FOO",
        "value": "bar"
        }
    ],
    "jobExecutionEnvironment": {
        "instanceId": "123456789",
        "instanceType": "c7g.medium",
        "architecture": 2
    },
    "tags": [
        {
        "key": "HOME",
        "value": "/home/users/"
        }
    ],
    "architecture": "A100",
    "maxVCpus": 0,
    "maxMemory": 0,
    "maxDiskGiB": 0,
    "maxSwap": 0,
    "imageName": "string",
    "execTimeoutSeconds": 0,
    "updatedAt": "2023-10-19T08:05:53.730Z",
    "instanceLabels": {
        "csp": "string",
        "priority": "string",
        "instanceType": "string",
        "instanceArchitecture": "string",
        "accountId": "string",
        "region": "string"
    },
    "startedAt": "2023-10-19T08:05:53.730Z",
    "completedAt": "2023-10-19T08:05:53.730Z"
}

Access this member variable to extract information about a job.

If this HtcJob object was created from a rescalehtc.htcjobs.HtcJobBatch.to_jobs() (which is common), then not all fields in the schema may be available. Only fields shared between all jobs and returned by the /htc/projects/{projectId}/tasks/{taskId}/jobs/batch endpoint would be available. Running rescalehtc.htcjobs.HtcJob.get_update() on this object would make all fields from /htc/projects/{projectId}/tasks/{taskId}/jobs/{jobId} available, but this should be used very rarely, especially with large number of jobs.

get_update(rescale: HtcSession) dict

Update the job status for this job, getting the latest status. Returns the rescalehtc.htcjobs.HtcJob.json dict after updating.

This function has basic flood prevention on the job status requests. The API never updates more than every 30 seconds anyway, so calling this function more often than that has no effect.

is_still_running(rescale: HtcSession) bool

Update the status of the job, and return true if the job is still running or pending.

The possible values of the status field in rescalehtc.htcjobs.HtcJob.json are:

Values indicating the job is no longer running
["FAILED", "POD_FAILED", "POD_SUCCEEDED", "SUCCEEDED"]

Values indicating the job is still running
["SUBMITTED_TO_RESCALE", "SUBMITTED_TO_PROVIDER", "RUNNABLE", "STARTING", "RUNNING"]

This function calls rescalehtc.htcjobs.HtcJob.get_update() and checks whether the status field is in the categories shown above.

This function has basic flood prevention on the job status requests. The API never updates more than every 30 seconds anyway, so calling this function more often than that has no effect.

get_logs(rescale: HtcSession, last_n_lines: int | None = None) Iterable[str]

Get stdout logs for this job. Returns an iterator of strings, one per line.

Parameters:

last_n_lines – Optional: The number of lines from the tail of the log to fetch, or None to get the entire log. Fewer log lines will fetch faster.

The entire log is kept in memory, as the log from Rescale is fetched with the most recent line first, then reversed by this function into the normal reading order (oldest line first). This function may therefore consume a lot of memory.

For a more memory efficient way of fetching a lot, consider using rescalehtc.htcjobs.HtcJob.get_logs_to_file().

This function has basic flood prevention on the job log requests. The API never updates more than every 30 seconds anyway, so calling this function more often than that has no effect.

get_logs_to_file(rescale: HtcSession, destination_file_path: str, last_n_lines: int | None = None)

Get stdout logs for this job and write them to a file.

Parameters:

last_n_lines – Optional: The number of lines from the tail of the log to fetch, or None to get the entire log. Fewer log lines will fetch faster.

This function writes to a temporary file first, in the newest-line-first order that Rescale API returns logs. Then the log is reversed into the more friendly oldest line first ordering, and written to the target file.

More memory efficient than rescalehtc.htcjobs.HtcJob.get_logs(), as the whole log is not kept in memory during fetching.

class rescalehtc.htcjobs.HtcJobBatch(json: dict, task: HtcTask)

Class for a batch series of rescale jobs.

json: dict

The raw dictionary describing a batch run of jobs. The dictionary follows the HTCJobSubmitRequest schema in the Rescale HTC API documentation. Example contents:

[
    {
        "jobName": "Sample job",
        "taskId": "task-12345",
        "projectId": "project-12345",
        "parentJobId": "job-12345",
        "createdBy": "qWoUF",
        "workspaceId": "04-8234074",
        "group": "sample-group",
        "batchSize": 10,
        "regions": [
        "AWS_AP_SOUTHEAST_1"
        ],
        "rescaleTimeReceived": "2023-10-24T20:34:51.279Z",
        "htcJobDefinition": {
        "imageName": "my-image",
        "maxVCpus": 8,
        "maxMemory": 128,
        "maxDiskGiB": 1,
        "maxSwap": 0,
        "tags": {
            "HOME": "foo_bar"
        },
        "commands": [
            "python",
            "script.py"
        ],
        "envs": [
            {
            "name": "FOO",
            "value": "bar"
            }
        ],
        "claims": [
            {
            "name": "string",
            "value": "string"
            }
        ],
        "execTimeoutSeconds": 300,
        "architecture": "A100",
        "priority": "ON_DEMAND_ECONOMY"
        },
        "tags": [
        {
            "key": "HOME",
            "value": "/home/users/"
        }
        ],
        "jobDefinitionName": "job-definition-321",
        "cloudProvider": "AWS"
    }
]

Access this member variable to extract information about a batch of jobs.

The dictionary contains the information given to the create_*_job* functions in this package.

get_task_summary(rescale: HtcSession) dict

Gets the task summary for the task of this HtcJobBatch.

Note that this includes all jobs in the task, not just this HtcJobBatch.

See documentation in rescalehtc.htctasks.HtcTask.get_task_summary()

is_still_running(rescale: HtcSession) bool

Check whether any jobs in the task for this HtcJobBatch are still running.

Note that this includes all jobs in the task, not just this HtcJobBatch.

See documentation in rescalehtc.htctasks.HtcTask.is_still_running()

to_jobs() list[HtcJob]

Converts a HtcJobBatch to an iterator of HtcJob. The details of each HtcJob is a bit sparse, as we have not queried the API for the status of each individual job.

The following fields are available without calling job.get_update():

group, projectId, taskId, jobUUID

If you want to monitor the status of a set of jobs, its more efficient to call get_task_summary() on the task instead of polling the individual HtcJobs.

rescalehtc.htcjobs.get_jobs(rescale: HtcSession, task: HtcTask, job_status: str = 'any') list[HtcJob]

Get all jobs within a task with a particular job status. By default shows jobs with any status. Set job_status field e.g. to SUCCEEDED to filter on specific job statuses.

rescalehtc.htcjobs.get_job_with_id(rescale: HtcSession, task: HtcTask, job_id: str) HtcJob

Get a job with a specific ID within a task.

rescalehtc.htcjobs.create_single_job(rescale: HtcSession, task: HtcTask, priority: str, image_name: str, exec_timeout_seconds: int, job_name: str = 'rescalehtc_default_jobname', max_vcpus: int = 1, max_memory_mib: int = 4000, max_swap_mib: int = 0, max_disk_gib: int = 10, job_tags: dict = {}, batch_tags: list = [], commands: list = [], envs: list = [], claims: list = [], architecture: str = 'AARCH64', region: str = None) HtcJob

This function creates a single Rescale job. Inputs are split into python arguments, with many arguments having default values.

This function returns a HtcJob.

If you need to run several jobs, use create_job_batch() function instead.

Parameters:
  • batch_size – The number of instances of this job to run

  • priority – Job priority, one of [ON_DEMAND_ECONOMY, ON_DEMAND_PRIORITY]

  • image_name – The container image name in the Rescale container registry to run

  • exec_timeout_seconds – Number of seconds this container can execute for until it is stopped by timeout

  • job_name – Optional: The name of the job batch

  • max_vcpus – Optional: Number of vCPUs allocated to this container

  • max_memory_mib – Optional: Maximum RAM usage for this container, in MiB

  • max_swap_mib – Optional: Maximum Swap usage for this container, in MiB

  • max_disk_gib – Optional: Maximum disk usage for this container, in GiB

  • job_tags – Optional: Tags given to each job in a batch

  • batch_tags – Optional: Tags given to the job batch

  • commands – Optional: The command to run in the container, in list form: [‘bash’, ‘-c’, ‘echo hello world’]

  • envs – A list of environment variables to use in the run, as a list of dicts: [{“name”: “MY_ENV_VAR”, “value”: “value_of_my_env_var”},..]

  • claims – Optional: Custom JWT Claims that will be attached to the Rescale JWT Bearer Token, as a list of dicts: [{“name”: “my_claim_name”, “value”: “my_claim_value”},..] . Note that the name given here is prefixed by userDefined_ in the actual JWT. Use rescalehtc.bearer_token.BearerToken.get_user_claims() to retrieve custom claims easily.

  • architecture – Optional: The architecture to run container on, one of [ AARCH64, A100, X86 ]

  • region – Optional: The compute region to run the container in. If the Rescale Project only has a single region this value can remain as None, and the available region will be picked.

If you need more control over the job definition than this function allows, then use the create_job_batch_raw() function instead.

rescalehtc.htcjobs.create_job_batch(rescale: HtcSession, task: HtcTask, batch_size: int, priority: str, image_name: str, exec_timeout_seconds: int, job_name: str = 'rescalehtc_default_jobname', max_vcpus: int = 1, max_memory_mib: int = 4000, max_swap_mib: int = 0, max_disk_gib: int = 10, job_tags: dict = {}, batch_tags: list = [], commands: list = [], envs: list = [], claims: list = [], architecture: str = 'AARCH64', region: str = None) HtcJobBatch

This function creates batch of Rescale jobs. Inputs are split into python arguments, with many arguments having default values.

Parameters:
  • batch_size – The number of instances of this job to run

  • priority – Job priority, one of [ON_DEMAND_ECONOMY, ON_DEMAND_PRIORITY]

  • image_name – The container image name in the Rescale container registry to run

  • exec_timeout_seconds – Number of seconds this container can execute for until it is stopped by timeout

  • job_name – Optional: The name of the job batch

  • max_vcpus – Optional: Number of vCPUs allocated to this container

  • max_memory_mib – Optional: Maximum RAM usage for this container, in MiB

  • max_swap_mib – Optional: Maximum Swap usage for this container, in MiB

  • max_disk_gib – Optional: Maximum disk usage for this container, in GiB

  • job_tags – Optional: Tags given to each job in a batch

  • batch_tags – Optional: Tags given to the job batch

  • commands – Optional: The command to run in the container, in list form: [‘bash’, ‘-c’, ‘echo hello world’]

  • envs – A list of environment variables to use in the run, as a list of dicts: [{“name”: “MY_ENV_VAR”, “value”: “value_of_my_env_var”},..]

  • claims – Optional: Custom JWT Claims that will be attached to the Rescale JWT Bearer Token, as a list of dicts: [{“name”: “my_claim_name”, “value”: “my_claim_value”},..] . Note that the name given here is prefixed by userDefined_ in the actual JWT. Use rescalehtc.bearer_token.BearerToken.get_user_claims() to retrieve custom claims easily.

  • architecture – Optional: The architecture to run container on, one of [ AARCH64, A100, X86 ]

  • region – Optional: The compute region to run the container in. If the Rescale Project only has a single region this value can remain as None, and the available region will be picked.

If you need more control over the job definition than this function allows, then use the create_job_batch_raw() function instead.

rescalehtc.htcjobs.create_job_batch_raw(rescale: HtcSession, task: HtcTask, payload: list[dict]) HtcJobBatch

This function creates batch of Rescale jobs.

Parameters:

payload – The full JSON dict expected by the POST /htc/projects/{projectId}/tasks/{taskId}/jobs/batch endpoint

Most users should use create_job_batch() instead of this raw function. That function accepts separate arguments for the individual fields instead of the full JSON dict.