The Job Specification
Every HQueue job is defined by a specification – a JSON structure containing the job’s properties. Here is a simple example:
The specification defines a job that has exactly one property – the job name. This is a valid specification because only the name
property is required. The job does not do anything useful. It will immediately finish and succeed without performing any work when assigned to a client machine.
To execute tasks on the client, a command set can be added to the specification using the command
property. For example:
When the job is assigned to a client machine, it will output “Hello World!” using the default shell installed with the machine’s operating system.
To execute the command with a particular shell, add the shell
property to the specification like so:
Several commands can be stored in a job and executed in sequence by concatenating them as a single command. For example:
How commands are concatenated depend on the shell but separating commands with &&
or ending commands with ;
will work in most cases.
For complex command sets, it is easier to store the commands in a script file and then point the command
property to the script file. For example:
For a complete list of the job properties that can be defined in the specification, see Job Properties.
Status Changes
An HQueue job passes through a sequence of status changes from when it is submitted to HQueue to when it completes. A basic job like the example mentioned in the previous subsection typically undergoes this sequence:
For a complete list of job statuses, see Job Statuses.
Parent-Child Relationships
A dependency can be established between two jobs so that the first job cannot run until the second job completes. Such a dependency is referred to as a parent-child relationship in HQueue. For example, if job A depends on job B, then job A is a parent of job B and job B is a child of job A.
A Simple Parent-Child Example
Suppose that we want to create an AVI video from X frames rendered from a Houdini scene. Assuming that IFD files have already been generated for the frames, then this work can be performed in 2 steps:
- Render the frames from the IFD files using Mantra.
- Encode the rendered images into a video.
For the first step, we can define X HQueue jobs, one job for every frame to be rendered. The job specification for rendering frame 1 would look like:
The job specifications for the rest of the frames would be similar.
Now suppose that Mantra renders the images to $HQROOT/path/to/output/frame*.png
where $HQROOT
is the mount point to the shared network folder registered with HQueue, then the job specification for the encoding step would look like:
To ensure that the encoding job is executed after all the render jobs have completed, we create a dependency by making the encoding job a parent of the render jobs.
To create the dependency, we use the children
job property in the encoding job’s specification. The children
property accepts a list of child job specifications.
So the final specification for the encoding job would look like:
Parent Status
The parent job’s status depends on the statuses of its child jobs. When the child jobs complete, if at least one child job has failed, then the parent job’s status is set to failed and the parent job’s command set is not executed. Otherwise, if all the child jobs complete successfully, then the parent job’s status is changed to waiting for machine and the parent job is placed into the scheduling queue where it waits for a client machine.
Submitting Child Jobs from Within The Parent
It is possible to submit child jobs from within a running job. This may be desired if the number of child jobs needed is not known until runtime. In such cases, commands can be added to the parent job that calculate how many children are required and then submits child job specifications to the HQueue server.
To submit child jobs, use the newjob()
Python API function (see Python API).
Here is an example of a Python script that creates a new job and assigns it as a child to the currently running job:
If the script is saved into a file, say createChild.py, then it can be added to the command
property of the parent job. So the parent job’s specification would look like:
Commandless Jobs
It is possible to have a job without the command
property defined since the only required property is the name
property.
Commandless jobs do not perform any work, but they can be useful at times. For example, if you write a Python script that submits jobs to HQueue, you can use commandless jobs to test calls to newjob()
without burdening the farm with any real work. Also, you can use commandless jobs as “containers” for several independent but related jobs. This has an organizational benefit in the HQueue web interface since the work produced by the related child jobs can be viewed under a single job.
Job Properties
Here is a list of properties that can be added to a job specification.
Property Name | Property Type | Property Value |
---|---|---|
children | list/tuple of job specifications | Job specifications that will be submitted and assigned as child jobs. |
childrenIds | list/tuple of integers | Ids for existing jobs that will be assigned as child jobs. |
command | string | The set of shell commands to execute on the assigned client machine. |
conditions | list/tuple of strings | Conditions that the HQueue scheduler must follow when choosing a machine to assign the job to. For more information, please read Job Conditions. |
cpus | integer | The minimum number of CPUs that the job will use. The default is 1. |
description | string | The job description. |
emailReasons | string | A comma seperated list of reasons to send emails to the addresses specified by the emailTo property. If this is empty or not specified, no emails will be sent. Valid reasons are ‘abandoned’, ‘cancelled’, ‘ejected’, ‘ejecting’, ‘failed’, ‘paused’, ‘pausing’, ‘priority changed’, ‘queued’, ‘rescheduled’, ‘resumed’, ‘resuming’, ‘runnable’, ‘running’, ‘succeeded’ and ‘waiting’. |
emailTo | string | A comma seperated list of addresses to send emails to based on reasons specified by the emailReasons property. |
environment | dictionary | A dictionary of variables to define in the client’s environment when the job’s command set is executed. The keys and values of the dictionary are the variable names and values respectively. |
host | string | The hostname of the machine that the job should execute on. If this property is not set, then the job can execute on any machine. |
onCancel | string | The set of shell commands to execute if the job is cancelled while running on a client machine. |
onError | string | The set of shell commands to execute when the job fails. |
onChildError | string | The set of shell commands to execute if the job has child jobs that failed. The commands are run after all the child jobs finish and after the job executes its command set. Note that if the job already has client machines assigned to it, then the job will hold onto those clients and use them to run the onChildError command. |
onSuccess | string | The set of shell commands to execute when the job completes successfully. |
maxHosts | integer | The maximum number of client machines allowed to be assigned to the job. The default is 1. |
maxTime | integer | The maximum amount of time (in seconds) that the job is permitted to run for. If the job’s running time exceeds the maximum time then it is automatically cancelled by HQueue. Note that if this property is not specified or set to less than zero then the job has no maximum time and can run indefinitely. |
minHosts | integer | The minimum number of client machines that must be assigned to the job. The default is 1. |
name | string | The job’s name. |
priority | integer | The job’s priority. Jobs with higher priorities are scheduled and processed before jobs with lower priorities. 0 is the lowest priority. The default is 0. |
shell | string | The terminal shell to use when executing the job’s command set. |
submittedBy | string | The name of the person that submitted the job. For child jobs, if this value is not specified, then it is inherited from a parent job. |
tags | list/tuple of strings | A list of tags to apply to the job. Tags can be used to control whether the job requires a dedicated machine or whether it can share a machine with other running jobs. For more information, see Job Tags. |
triesLeft | integer | The number of times a job should be automatically rescheduled in an attempt to make it succeed after a failure. If the job fails after the amount of triesLeft , then it is marked as failed. The default value is 0. |
resources | dictionary | A name value pairing of the HQueue resources used by the job and the amount of each resources used. For e.g., {"sidefx.license.houdini": 1, "custom_one": 2} . See the Resourceshelp page for more details. |
Job Properties Example
Here is an example of a job specification that demonstrates the use of some properties:
The example above defines a job named “The Main Job” which has a priority level of 0. It uses the bash shell to execute its command set and defines two environment variables, “SHOW_MSG” and “MSG”. Its command set directly references these two variables. The job requires one dedicated machine as defined by the “single” tag, and the “maxHosts” and “minHosts” properties. Finally, it has a single child job which prints out “Hello World”.
Job Conditions
Job conditions inform the HQueue scheduler to assign a job to a restricted set of client machines. A job condition is defined by a type, name, operator and value. Together they specify a comparison test that the scheduler uses to determine whether a machine can be assigned to run the job. If a client machine passes ALL of the assigned conditions, then it can run the job.
Note that if a job does not define its own set of conditions then it automatically inherits the conditions of its root job.
Below is a description of each of the condition components:
Component | Description |
---|---|
type | The type specifies what the condition applies to. Since HQueue only supports client conditions at the moment, the type should always be set to “client”. Client conditions determine whether a client machine can be assigned to the job or not. |
name | The name of the client property to be tested. The supported names are:
|
op | The comparison operator to use when testing the client’s attribute against the condition’s value. The supported operators are:
|
value | The value to test against the requested client attribute. If the condition operator is “any”, then the value can be a list of multiple items where commas are used to separate items. |
Job Condition Examples
Here is an example that demonstrates how to attach a condition to a job specification:
The example above defines a job which can only be assigned to a client machine named either “machine1” or “machine2”. Note that the conditions
property is a list of dictionaries where each dictionary defines a single condition and its 4 components.
The next example demonstrates how to set a condition where the job can only be assigned to client machines that are members of the “Simulation” group:
Job Tags
Job Statuses
Here is a list of job statuses and their descriptions.
Status | Description |
---|---|
abandoned | The job is assigned to a client machine but the machine is not reporting the job’s progress or status. This can happen if the machine becomes unresponsive (i.e. reboots, or hangs). |
cancelled | The job is no longer on the scheduling queue because it was interrupted by a user. |
failed | The job is finished but an error was reported during execution of its command set or during execution of one of its child jobs. |
paused | The job has been paused by a user. The scheduler does not assign a client machine to the job while it is paused. If the job is already running on a machine, then its execution is halted. |
pausing | The job is running on a client machine but has been requested by a user to halt execution. The HQueue server is waiting for a response from the client to confirm that the job has been paused. |
resuming | The job is assigned to a client machine and is currently paused but has been requested by a user to resume execution. The HQueue server is waiting for a response from the client to confirm that the job has been resumed. |
running | The job is executing on a client machine. |
running (X clients assigned) | One or more of the job’s child jobs is running and a total of X clients are assigned to the child jobs. |
succeeded | The job is finished and no errors were reported during command execution. |
waiting for resources | The job is ready for execution but is waiting for a resource. These can be HQueue resources, client machines, or job conditions. If the job does not have a child job then a tooltip will display the exact reason the job is waiting. |
Job Variables
Here is a list of the built-in, runtime variables that are defined in the job’s environment.
Environment Variable | Description |
---|---|
HQCLIENT | The folder path to the client code on the machine running the job. |
HQCLIENTARCH | The platform of the client machine running the job. It consists of the operating system and machine architecture. Here is a quick list of the possible values:
|
HQROOT | The folder path to the shared network drive registered with HQueue. Depending on the platform of the machine running the job, $HQROOT will be set to either one of the HQueue server configuration values:
See Configuration. |
HQSERVER | The address of the HQueue server. It consists of the HQueue server’s hostname and the port number that the server is listening on. |
JOBID | The id of the current job. |