Other Products

SPSS Answer Tree 2

PCQ Bureau

05 Jan 2000 09:25 IST

New Update

SPSS

Answer Tree is a tool for statistical analysis. It’s handy for analyses

that involve classification of large amounts of data into homogeneous

groups, and helps you in better decision-making. Answer Tree is quite adept

at sifting through large numbers of records and establishing meaningful

patterns among them. This tool should be useful for market researchers, data

analysts, management consultants, and the like.

Advertisment

To understand the specific

capabilities of the product, let’s take the case of a consumer finance

company disbursing loans to individuals. The foremost question facing the

company would be whether a given applicant would unerringly pay up the loan

over the loan re-payment period, or whether he’s more likely to default.

Like most statistical tools,

Answer Tree obviously can’t provide you with absolutely accurate answers.

Instead, based on criteria that you specify as being important for assessing

the loan payment capabilities of applicants, it sifts through existing

databases and classifies them into homogeneous groups. This can help the

company to understand the profile of likely defaulters, and thereby take

decisions to minimize the number of defaulters.

Installing the software was a

snap, the most tedious part being feeding in the 38-number license code.

Once installed, it ran smooth and quick on a P/200 MMX test machine with 32

MB RAM, 2.1 GB HDD, and VGA at 800x600 resolution.

Advertisment

The built-in tutorials and

help files would be adequate for users well-versed with research

methodology, statistics, and decision tree analysis. However, to the lay

user, these are insufficient, since the basic concepts are not covered.

However, the documentation–consisting of a well laid-out and comprehensive

book consisting of over 200 pages spread across 14 chapters–is virtually a

textbook on decision tree analysis, and explains basic as well as advanced

concepts. So, lay users can also get started with decision trees, with the

help of the documentation.

To begin using Answer Tree,

you need to have records of past loan applicants in one of the following

file formats:

SPSS file (*.SAV)

SYSTAT file (*.SYD,

*.SYS)

Common database formats

(*.DBF, etc)

ODBC (MS-Access files,

etc)

Advertisment

The process of building the

answer tree involves two steps. In the first step, the Minimal Tree is

drawn, which classifies data into homogeneous groups. In the second step,

the Minimal Tree may be grown, so as to arrive at an even better answer to

the question at hand, in this case, the likely loan payment defaulters.

The Minimal Tree may be drawn

using one of the following methods.

CHAID: This uses

Chi-square or F statistics to select predictors for each homogeneous

group
Exhaustive CHAID:

This is a modification of the CHAID method that’s more exhaustive and

rigorous in selecting predictors. As a result, it also takes longer to

run
C&RT: This

method identifies homogeneous subsets of data, with each split

generating two nodes
QUEST: This method

is similar to the C&RT method with one difference, the target

(dependent) variable has to be nominal (that is, you can’t do any

further mathematical calculations on it, for example, rankings given to

individuals). Two nodes are generated at each split, as in the case of

the C&RT method.

Advertisment

The Minimal Tree

To draw

the Minimal Tree, the first step is to specify variables that are predictors

to the segregation into heterogeneous groups. In our loan repayment example,

variables such as monthly salary, educational qualifications, type of

service (government, private service, self-employed, etc), number of

dependents, other possessions of the customer (car, house, etc) may

all be predictors to behavior with respect to loan payment.

You can also specify the

following tree characteristics:

Advertisment

Maximum tree depth
Minimum number of cases

in parent and child nodes
Minimum change in

impurity (this is the degree of difference between individual cases

within a homogeneous group)

In addition to these, there’s

an option for validating/cross-validating the tree. Validation is achieved

by partitioning the data set into Training and Testing Sample, in specified

proportions.

Growing the Minimal Tree

Advertisment

Since Answer

Tree is an exploratory tool, it’s almost always necessary to re-look at

the initial assumptions once the Minimal Tree is drawn. Extensive facilities

for growing/pruning individual branches are available, so that you can best

classify the given data into homogeneous groups. It should be kept in mind

that knowledge of the situation at hand is key to arriving at the best

grouping. Once the final tree is drawn, its interpretation is quite

straightforward.

Risk charts

The

misclassification matrix counts up the predicted and actual category values

and displays them in a table. A correct classification is added to the

counts in the diagonal cells of the table. The diagonal elements of the

table represent agreement between the predicted and actual value, often

called a "hit". An incorrect classification—called a

"miss"—means that there’s disagreement between predicted and

actual values. Misclassifications are counted in the off-diagonal elements

of the matrix. In this example, 11 applicants with no credit or no debt

(NCR/NODEB) were misclassified as having current, up-to-date credit accounts

(PD BK). This table is helpful in determining exactly where the model

performs well or poorly.

Advertisment

The risk estimate and

standard error of risk estimate indicate how well the classifier (the

variable you use for classification at a given node) is performing. In this

case, the risk estimate for the four-level C&RT tree is 0.2880, and the

standard error for the risk estimate is 0.0143. In other words, we are

missing 28.8 percent of the time. If necessary, you can look at ways to

further improve the model.

In conclusion

At a steep

license fee of Rs 110,000 per user, Answer Tree is clearly beyond the reach

of individual researchers or even the smaller research firms. Its pricing

renders it a viable buy only for large organizations.

Advertisment