The Queen's University of Belfast

Parallel Computer Centre
[Next] [Previous] [Top]
Cluster Queuing Systems
Cluster Queuing Systems

Cluster Queuing Systems
DQS
- DQS - Distributed Queuing System
- Batch environment containing different queues based on architecture and group.
- All jobs are submitted to individual queues to await execution.
- 2 methods of scheduling possible
- Schedule on a first come first serve basis, the first queue in the list receives the job for execution.
- Schedule by weighted load average so that the least busy node is selected to run the job.
- The method used is selected at compile time
- Popular as it is the first non-commercial system conceptually similar to NQS.
- The GUI, graphical user interface, is qmon invoked by the qmon command.
NQE
- Network Queuing Environment, NQE, is a UNIX batch environment consisting of NQS and NLB.
- It attempts to run each job submitted to a network as efficiently as possible on the available resources
- Load information daemons reside on the various NQS platforms and report to a master network load balancer.
- Scheduling algorithm uses statistics eg
- idle CPU time,
- amount of free memory,
- number of CPU's,
- amount of temporary disk space,
- queue run lengths,
- swap rate,
- number of currently running processes and amount of disk IO.
CODINE
- CODINE - COmputing in DIstributed Networked Environments
- Aim - optimal utilization of the compute resources in heterogeneous networked environments.
- 2 methods of queue selection:-
- 1 A simple first come first serve algorithm, where the first queue in the list receives the job.
- 2 Schedule by weighted load average within a group so that the least busy node is selected to run the job.
- These methods of selection are under the full control of the administrator
- Resources are managed to ensure that impact on a machine owner is minimal.
- Can suspend jobs if the console becomes active or the load average passes a preset threshold
- Jobs can be migrated to other, less busy, machines.
- The environment may be controlled via the qmon tool or by the command line interface (CLI) commands.
LSF: Load Sharing Facility
- LSF distributes the workload around one or more large heterogeneous clusters of workstations.
- Moves jobs so that each machine has an even load.
- Jobs are dispatched to the host with the lightest load that satisfies the job resource requirements.
- Determines the lightest load by examining:
- CPU utilization,
- paging rates,
- number of login sessions,
- interactive idle time,
- available virtual memory,
- available physical memory,
- and available disk space in the /tmp directory.
- Features:-
- transparent load sharing,
- distributed batch queuing,
- platform independence,
- fault tolerance,
- system performance information,
- parallel processing,
- open and scalable environment and standards based.
Cluster Queuing Systems
OTHER SYSTEMS
- Balans - provides both dynamic load balancing and a distributed batch queuing system.
- Load Balancer- automatically queues and distributes jobs across a heterogeneous network of UNIX workstations.
- NC Toolset - provides the user with a cluster eg
- either as a collection of systems that is used exclusively as a compute server
- or as a collection of systems that provide spare CPU cycles
- TaskBroker - built on the client server concept.
- Condor - Major features are;
- automatic location,
- allocation of idle machines,
- checkpointing and migration of processes
- Connect:Queue - an intelligent batch job scheduling system providing workload balancing across a heterogeneous UNIX environment.
- PBS - separate queues may be created to serve various schedule and resource requirements
- LoadLeveler - aims to locate, allocate and deliver resources from across the network while attempting to maintain a balanced load, fair scheduling and an optimal use of resources.
[Next] [Previous] [Top]
All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk
Generated with CERN WebMaker