Utilizing Workstation Clusters Cluster Queuing Systems, QUB

The Queen's University of Belfast

Parallel Computer Centre

[Next] [Previous] [Top]

Cluster Queuing Systems

MAIN SYSTEMS
- DQS
- CODINE
- NQE
- LSF

DQS

DQS - Distributed Queuing System
Batch environment containing different queues based on architecture and group.
All jobs are submitted to individual queues to await execution.
2 methods of scheduling possible
- Schedule on a first come first serve basis, the first queue in the list receives the job for execution.
- Schedule by weighted load average so that the least busy node is selected to run the job.
- The method used is selected at compile time
Popular as it is the first non-commercial system conceptually similar to NQS.
The GUI, graphical user interface, is qmon invoked by the qmon command.

NQE

Network Queuing Environment, NQE, is a UNIX batch environment consisting of NQS and NLB.
It attempts to run each job submitted to a network as efficiently as possible on the available resources
Load information daemons reside on the various NQS platforms and report to a master network load balancer.
Scheduling algorithm uses statistics eg
- idle CPU time,
- amount of free memory,
- number of CPU's,
- amount of temporary disk space,
- queue run lengths,
- swap rate,
- number of currently running processes and amount of disk IO.

CODINE

CODINE - COmputing in DIstributed Networked Environments
Aim - optimal utilization of the compute resources in heterogeneous networked environments.
2 methods of queue selection:-
- 1 A simple first come first serve algorithm, where the first queue in the list receives the job.
- 2 Schedule by weighted load average within a group so that the least busy node is selected to run the job.
- These methods of selection are under the full control of the administrator
Resources are managed to ensure that impact on a machine owner is minimal.
Can suspend jobs if the console becomes active or the load average passes a preset threshold
Jobs can be migrated to other, less busy, machines.
The environment may be controlled via the qmon tool or by the command line interface (CLI) commands.

LSF: Load Sharing Facility

LSF distributes the workload around one or more large heterogeneous clusters of workstations.
Moves jobs so that each machine has an even load.
- Jobs are dispatched to the host with the lightest load that satisfies the job resource requirements.
Determines the lightest load by examining:
- CPU utilization,
- paging rates,
- number of login sessions,
- interactive idle time,
- available virtual memory,
- available physical memory,
- and available disk space in the /tmp directory.
Features:-
- transparent load sharing,
- distributed batch queuing,
- platform independence,
- fault tolerance,
- system performance information,
- parallel processing,
- open and scalable environment and standards based.

Cluster Queuing Systems

OTHER SYSTEMS

Balans - provides both dynamic load balancing and a distributed batch queuing system.
Load Balancer- automatically queues and distributes jobs across a heterogeneous network of UNIX workstations.
NC Toolset - provides the user with a cluster eg
- either as a collection of systems that is used exclusively as a compute server
- or as a collection of systems that provide spare CPU cycles
TaskBroker - built on the client server concept.
Condor - Major features are;
- automatic location,
- allocation of idle machines,
- checkpointing and migration of processes
Connect:Queue - an intelligent batch job scheduling system providing workload balancing across a heterogeneous UNIX environment.
PBS - separate queues may be created to serve various schedule and resource requirements
LoadLeveler - aims to locate, allocate and deliver resources from across the network while attempting to maintain a balanced load, fair scheduling and an optimal use of resources.

[Next] [Previous] [Top]

All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk

Generated with CERN WebMaker