The Queen's University of Belfast

Parallel Computer Centre
[Next] [Previous] [Top]
Tutorial/Practical
Example
Clementine
- Supplied by Integral Solutions Limited (ISL), Basingstoke, England
- Visual Programming Interface
- builds a discovery model
- performs learning task
- Uses neural networks and rule induction
- Data sources
- ASCII file format, Oracle, Informix, Sybase and Ingres
- Clementine has many useful facilities:
- Data Manipulation - construct new data items derived from existing ones, and breaking the data down into meaningful sub-sets
- Browsing and Visualisation - displaying aspects of the data using interactive graphics
- Statistics - confirming suspected relationships between factors in the data
- Hypothesis testing - constructing models of how the data behaves and verifying them
Clementine Example
Drug trials
- A number of hospital patients all suffering from the same illness were treated with a range of drugs
- 5 different drugs were available and the different patients responded differently to the different drugs
- Problem - which drug is appropriate for any given future patient?
Problem solving
Stages
- Accessing the data
- read in the data e.g. from a text file with delimiters
- name the fields
- age
- sex
- BP - High, Normal or Low
- Cholesterol - Normal or High
- Na - blood sodium concentration
- K - blood potassium concentration
- drug - i.e. to which the patient responded
- View the records by using the table node e.g.

- Can select fields or filter the data
- Display properties of the data e.g.
- what proportion of cases respond to each drug?

- Answer DrugY followed by DrugX
- Finding a relationship e.g.
- relationship between sodium and potassium levels as displayed in a point plot

- Random scattering - no apparent relationship
- Re-examine according to a particular drug i.e. drugY, and sodium to potassium ratio
- calculate the Na/K ratio i.e. as a derived field or node

- Patients with a high Na to K ratio respond best to drugY
Machine learning
Clementine
- Which is the best drug for any patient?
- Filter the unwanted fields
- Define types for the fields
- Building rules and training nets i.e. by attaching the appropriate nodes
Building rules and training nets

- Net trained, rules built on 200 example cases
Net and rules completed

Rules formulated

- The rules first decision is based on the same criterion discovered previously i.e. allocates drugY to patients with a high Na to K ratio
UUJ Example
House price prediction
- Problem - mass appraisal of property in N.Ireland
- 10 attributes per property including:
- Ward No, Area No, Price, Age of house, Number of bedrooms, Detached/Semi, Type of building - (bungalow, house, chalet etc.), Heating
- Best predictive accuracy
- After removal of outliers
Data file - houses.dat


Graphical Output
- Outliers - any property over 100,000 had land attached

Statistics Produced
Predictions set

Clementine stream

Clementine Output
Rule Browser

REFERENCES
- Knowledge Discovery Data Mine
- http://info.gte.com/~kdd/
- University of Ulster Jordanstown
- Database Mining Interest Group (UUDMIG)
- http://iserve1.infj.ulst.ac.uk/main.html
- Queens University Parallel Computer Centre
- Training and Education - course materials
- http://www.pcc.qub.ac.uk/
- Articles
[Next] [Previous] [Top]
All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk
Generated with CERN WebMaker