HTC Jobs and JobBatch
This module deals with Rescale Jobs and Batch Jobs, which are where actual compute is performed in Rescale. Jobs are returned as a HtcJob or HtcJobBatch objects.
- class rescalehtc.htcjobs.HtcJob(json: dict, task: HtcTask)
Class for a single Rescale Job.
- json
The raw dictionary describing this Job. The dictionary follows the HTCJob schema in the Rescale HTC API documentation. Example contents:
{ "jobUUID": "155f18d4", "providerJobId": "provider-id-12345", "region": "AWS_AP_SOUTHEAST_1", "taskId": "task-12345", "projectId": "project-12345", "status": "FAILED", "statusReason": "Completed", "container": { "exitCode": 3, "reason": "Container Exited" }, "createdAt": "2023-10-19T08:05:53.730Z", "createdBy": "qWoUF", "failureCode": "ErrorTimeout", "workspaceId": "04-8098234", "group": "sample-group", "commands": [ "python", "script.py" ], "envs": [ { "name": "FOO", "value": "bar" } ], "jobExecutionEnvironment": { "instanceId": "123456789", "instanceType": "c7g.medium", "architecture": 2 }, "tags": [ { "key": "HOME", "value": "/home/users/" } ], "architecture": "A100", "maxVCpus": 0, "maxMemory": 0, "maxDiskGiB": 0, "maxSwap": 0, "imageName": "string", "execTimeoutSeconds": 0, "updatedAt": "2023-10-19T08:05:53.730Z", "instanceLabels": { "csp": "string", "priority": "string", "instanceType": "string", "instanceArchitecture": "string", "accountId": "string", "region": "string" }, "startedAt": "2023-10-19T08:05:53.730Z", "completedAt": "2023-10-19T08:05:53.730Z" }
Access this member variable to extract information about a job.
If this HtcJob object was created from a
rescalehtc.htcjobs.HtcJobBatch.to_jobs()(which is common), then not all fields in the schema may be available. Only fields shared between all jobs and returned by the /htc/projects/{projectId}/tasks/{taskId}/jobs/batch endpoint would be available. Runningrescalehtc.htcjobs.HtcJob.get_update()on this object would make all fields from /htc/projects/{projectId}/tasks/{taskId}/jobs/{jobId} available, but this should be used very rarely, especially with large number of jobs.
- get_update(rescale: HtcSession) dict
Update the job status for this job, getting the latest status. Returns the
rescalehtc.htcjobs.HtcJob.jsondict after updating.This function has basic flood prevention on the job status requests. The API never updates more than every 30 seconds anyway, so calling this function more often than that has no effect.
- is_still_running(rescale: HtcSession) bool
Update the status of the job, and return true if the job is still running or pending.
The possible values of the status field in
rescalehtc.htcjobs.HtcJob.jsonare:Values indicating the job is no longer running ["FAILED", "POD_FAILED", "POD_SUCCEEDED", "SUCCEEDED"] Values indicating the job is still running ["SUBMITTED_TO_RESCALE", "SUBMITTED_TO_PROVIDER", "RUNNABLE", "STARTING", "RUNNING"]
This function calls
rescalehtc.htcjobs.HtcJob.get_update()and checks whether the status field is in the categories shown above.This function has basic flood prevention on the job status requests. The API never updates more than every 30 seconds anyway, so calling this function more often than that has no effect.
- get_logs(rescale: HtcSession, last_n_lines: int | None = None) Iterable[str]
Get stdout logs for this job. Returns an iterator of strings, one per line.
- Parameters:
last_n_lines – Optional: The number of lines from the tail of the log to fetch, or None to get the entire log. Fewer log lines will fetch faster.
The entire log is kept in memory, as the log from Rescale is fetched with the most recent line first, then reversed by this function into the normal reading order (oldest line first). This function may therefore consume a lot of memory.
For a more memory efficient way of fetching a lot, consider using
rescalehtc.htcjobs.HtcJob.get_logs_to_file().This function has basic flood prevention on the job log requests. The API never updates more than every 30 seconds anyway, so calling this function more often than that has no effect.
- get_logs_to_file(rescale: HtcSession, destination_file_path: str, last_n_lines: int | None = None)
Get stdout logs for this job and write them to a file.
- Parameters:
last_n_lines – Optional: The number of lines from the tail of the log to fetch, or None to get the entire log. Fewer log lines will fetch faster.
This function writes to a temporary file first, in the newest-line-first order that Rescale API returns logs. Then the log is reversed into the more friendly oldest line first ordering, and written to the target file.
More memory efficient than
rescalehtc.htcjobs.HtcJob.get_logs(), as the whole log is not kept in memory during fetching.
- class rescalehtc.htcjobs.HtcJobBatch(json: dict, task: HtcTask)
Class for a batch series of rescale jobs.
- json: dict
The raw dictionary describing a batch run of jobs. The dictionary follows the HTCJobSubmitRequest schema in the Rescale HTC API documentation. Example contents:
[ { "jobName": "Sample job", "taskId": "task-12345", "projectId": "project-12345", "parentJobId": "job-12345", "createdBy": "qWoUF", "workspaceId": "04-8234074", "group": "sample-group", "batchSize": 10, "regions": [ "AWS_AP_SOUTHEAST_1" ], "rescaleTimeReceived": "2023-10-24T20:34:51.279Z", "htcJobDefinition": { "imageName": "my-image", "maxVCpus": 8, "maxMemory": 128, "maxDiskGiB": 1, "maxSwap": 0, "tags": { "HOME": "foo_bar" }, "commands": [ "python", "script.py" ], "envs": [ { "name": "FOO", "value": "bar" } ], "claims": [ { "name": "string", "value": "string" } ], "execTimeoutSeconds": 300, "architecture": "A100", "priority": "ON_DEMAND_ECONOMY" }, "tags": [ { "key": "HOME", "value": "/home/users/" } ], "jobDefinitionName": "job-definition-321", "cloudProvider": "AWS" } ]
Access this member variable to extract information about a batch of jobs.
The dictionary contains the information given to the create_*_job* functions in this package.
- get_task_summary(rescale: HtcSession) dict
Gets the task summary for the task of this HtcJobBatch.
Note that this includes all jobs in the task, not just this HtcJobBatch.
See documentation in
rescalehtc.htctasks.HtcTask.get_task_summary()
- is_still_running(rescale: HtcSession) bool
Check whether any jobs in the task for this HtcJobBatch are still running.
Note that this includes all jobs in the task, not just this HtcJobBatch.
See documentation in
rescalehtc.htctasks.HtcTask.is_still_running()
- to_jobs() list[HtcJob]
Converts a HtcJobBatch to an iterator of HtcJob. The details of each HtcJob is a bit sparse, as we have not queried the API for the status of each individual job.
The following fields are available without calling job.get_update():
group, projectId, taskId, jobUUID
If you want to monitor the status of a set of jobs, its more efficient to call
get_task_summary()on the task instead of polling the individual HtcJobs.
- rescalehtc.htcjobs.get_jobs(rescale: HtcSession, task: HtcTask, job_status: str = 'any') list[HtcJob]
Get all jobs within a task with a particular job status. By default shows jobs with any status. Set job_status field e.g. to SUCCEEDED to filter on specific job statuses.
- rescalehtc.htcjobs.get_job_with_id(rescale: HtcSession, task: HtcTask, job_id: str) HtcJob
Get a job with a specific ID within a task.
- rescalehtc.htcjobs.create_single_job(rescale: HtcSession, task: HtcTask, priority: str, image_name: str, exec_timeout_seconds: int, job_name: str = 'rescalehtc_default_jobname', max_vcpus: int = 1, max_memory_mib: int = 4000, max_swap_mib: int = 0, max_disk_gib: int = 10, job_tags: dict = {}, batch_tags: list = [], commands: list = [], envs: list = [], claims: list = [], architecture: str = 'AARCH64', region: str = None) HtcJob
This function creates a single Rescale job. Inputs are split into python arguments, with many arguments having default values.
This function returns a HtcJob.
If you need to run several jobs, use
create_job_batch()function instead.- Parameters:
batch_size – The number of instances of this job to run
priority – Job priority, one of [ON_DEMAND_ECONOMY, ON_DEMAND_PRIORITY]
image_name – The container image name in the Rescale container registry to run
exec_timeout_seconds – Number of seconds this container can execute for until it is stopped by timeout
job_name – Optional: The name of the job batch
max_vcpus – Optional: Number of vCPUs allocated to this container
max_memory_mib – Optional: Maximum RAM usage for this container, in MiB
max_swap_mib – Optional: Maximum Swap usage for this container, in MiB
max_disk_gib – Optional: Maximum disk usage for this container, in GiB
job_tags – Optional: Tags given to each job in a batch
batch_tags – Optional: Tags given to the job batch
commands – Optional: The command to run in the container, in list form: [‘bash’, ‘-c’, ‘echo hello world’]
envs – A list of environment variables to use in the run, as a list of dicts: [{“name”: “MY_ENV_VAR”, “value”: “value_of_my_env_var”},..]
claims – Optional: Custom JWT Claims that will be attached to the Rescale JWT Bearer Token, as a list of dicts: [{“name”: “my_claim_name”, “value”: “my_claim_value”},..] . Note that the name given here is prefixed by userDefined_ in the actual JWT. Use
rescalehtc.bearer_token.BearerToken.get_user_claims()to retrieve custom claims easily.architecture – Optional: The architecture to run container on, one of [ AARCH64, A100, X86 ]
region – Optional: The compute region to run the container in. If the Rescale Project only has a single region this value can remain as None, and the available region will be picked.
If you need more control over the job definition than this function allows, then use the
create_job_batch_raw()function instead.
- rescalehtc.htcjobs.create_job_batch(rescale: HtcSession, task: HtcTask, batch_size: int, priority: str, image_name: str, exec_timeout_seconds: int, job_name: str = 'rescalehtc_default_jobname', max_vcpus: int = 1, max_memory_mib: int = 4000, max_swap_mib: int = 0, max_disk_gib: int = 10, job_tags: dict = {}, batch_tags: list = [], commands: list = [], envs: list = [], claims: list = [], architecture: str = 'AARCH64', region: str = None) HtcJobBatch
This function creates batch of Rescale jobs. Inputs are split into python arguments, with many arguments having default values.
- Parameters:
batch_size – The number of instances of this job to run
priority – Job priority, one of [ON_DEMAND_ECONOMY, ON_DEMAND_PRIORITY]
image_name – The container image name in the Rescale container registry to run
exec_timeout_seconds – Number of seconds this container can execute for until it is stopped by timeout
job_name – Optional: The name of the job batch
max_vcpus – Optional: Number of vCPUs allocated to this container
max_memory_mib – Optional: Maximum RAM usage for this container, in MiB
max_swap_mib – Optional: Maximum Swap usage for this container, in MiB
max_disk_gib – Optional: Maximum disk usage for this container, in GiB
job_tags – Optional: Tags given to each job in a batch
batch_tags – Optional: Tags given to the job batch
commands – Optional: The command to run in the container, in list form: [‘bash’, ‘-c’, ‘echo hello world’]
envs – A list of environment variables to use in the run, as a list of dicts: [{“name”: “MY_ENV_VAR”, “value”: “value_of_my_env_var”},..]
claims – Optional: Custom JWT Claims that will be attached to the Rescale JWT Bearer Token, as a list of dicts: [{“name”: “my_claim_name”, “value”: “my_claim_value”},..] . Note that the name given here is prefixed by userDefined_ in the actual JWT. Use
rescalehtc.bearer_token.BearerToken.get_user_claims()to retrieve custom claims easily.architecture – Optional: The architecture to run container on, one of [ AARCH64, A100, X86 ]
region – Optional: The compute region to run the container in. If the Rescale Project only has a single region this value can remain as None, and the available region will be picked.
If you need more control over the job definition than this function allows, then use the
create_job_batch_raw()function instead.
- rescalehtc.htcjobs.create_job_batch_raw(rescale: HtcSession, task: HtcTask, payload: list[dict]) HtcJobBatch
This function creates batch of Rescale jobs.
- Parameters:
payload – The full JSON dict expected by the POST /htc/projects/{projectId}/tasks/{taskId}/jobs/batch endpoint
Most users should use
create_job_batch()instead of this raw function. That function accepts separate arguments for the individual fields instead of the full JSON dict.