Data Mining Final Project
The data mining project involve the application of data mining techniques discussed in class to
one data set. The goal of the project is to go through the full data mining cycle with respect to a
particular data set, including the specification of the business problem to be solved, the
specification of the data mining tasks to be performed, preprocessing and transformation of the
data, application of several data mining methods and the discovery of patterns, evaluation of
patterns, and recommendation of specific actions with respect to relevant findings.
Please download a Bank Marketing data set (bank.csv) from
http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip and import the data set
into SAS Enterprise Miner (Please see Appendix A for instructions on how to download the data
set and perform data importation). This data set was collected from a Portuguese bank that used
its own contact-center to do direct marketing campaigns in order to motivate and attract the deposit
clients. The marketing campaigns were based on phone calls. Often, more than one contact to the
same client was required, in order to access if the bank term deposit would be (“yes”) or would
not be (“no”) subscribed. The data set contains 4521 instances and 17 variables (16 input variables
and 1 target variable). Please refer to the bank-names.txt file (downloadable at
http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip) for a detailed
description of the data set.
Data Mining Final Report (Submitted through the Assignment Submission Folder on
This is a comprehensive description of your project, which should fully describe the work done
for the whole data mining analysis, not just the end results. Often, the whole data mining analysis
may iterate the data mining processes several times, not just one-shot. The process may include
data preparation, data exploration and preprocessing, data mining methodologies, results analysis,
conclusions, lessons learned and so on. Therefore, presenting merely the SAS Enterprise Miner
output report will receive a very low score for the report. You should demonstrate your work
using not only textual descriptions but also detailed screen shots in your report. The project final
report should include the following:
1. Cover page: your name, project name, and source of the dataset.
2. Objectives: Clear statement of objectives of the data mining project; the problem that you are
investigating and summarize your goals for this project.
3. Data preparation: Discussion of the structure and characteristics of the data.
After importing the data set into SAS Enterprise Miner, please set the appropriate roles (e.g.,
Input, Target, Rejected, and etc.) and levels (e.g., Interval, Nominal, Binary, Ordinal, and
etc.) for the variables in the data set. Please provide both detailed textual descriptions and
relevant screen shots of data importation and the roles and levels of the variables in the data
4. Data exploration and preprocessing: Discussion of the processes and results of any exploratory
data analysis and data visualization performed on the data. Examination of different data
preparation and transformation approaches to improve results for the given analysis
tasks. What data exploration steps were performed? What are results of data exploration?
What preprocessing and transformation were done to make the data amenable for data mining?
Describe your reasoning behind the performed data exploration, preprocessing and
transformation. Please provide both detailed textual descriptions and relevant screen shots
regarding the issues of data exploration, preprocessing and transformation.
For example, you should perform data exploration to determine whether there are any unusual
values, whether there is any missing data, and whether data transformation is required, and
then perform certain data preprocessing and/or transformation (such as data replacement
and/or filtering, data imputation, variable transformation, and etc.) when necessary. More
specifically, you should examine the distributions of the variables by creating histograms for
the variables in your data set (right click the “File Import” node, select “Edit Variables”, select
the variable(s) you want to explore, and click the “Explore” button). Please provide both
detailed textual descriptions and relevant screen shots of the histograms of the variables.
According to the histograms, are there any unusual values in any variables? Do you need to
change the unusual values using the replacement node or remove the unusual cases using the
filter node? If you decide to use the replacement node and/or the filter node, please provide
both detailed textual descriptions and relevant screen shots of data replacement and/or
filtering. After that, please perform data partition. Again, please provide both detailed textual
descriptions and relevant screen shots of data partition. Are there any missing values? Please
provide both detailed textual descriptions and relevant screen shots to show whether there
are missing values or not. Do you need to impute any missing values? Please provide both
detailed textual descriptions and relevant screen shots if data imputation is done. According
to the histograms, do you find any skewed distributions? Do you need to transform any
variables due to skewed distributions? Please provide both detailed textual descriptions and
relevant screen shots if skewed distributions are discovered and variable transformations
5. Data mining process: The exploration of multiple data mining methods on the targeted data set.
Experimentation with different parameters to optimize the results of the chosen data mining
techniques. The use of a variety of relevant techniques to determine the best approach to
accomplish the data analysis tasks. Please provide both detailed textual descriptions and
relevant screen shots of the data mining process.
For example, please choose certain data mining techniques, such as Decision Tree,
Regression, Neural Networks, and etc., to develop multiple data mining models on the data
set and experiment with different parameters to optimize the model results. Please
provide both detailed textual descriptions and relevant screen shots of model development.
After that, please perform model comparison to select the best performing model using
the model comparison node. Please also provide both detailed textual descriptions and
relevant screen shots of model comparison.
6. Results and conclusions: Thorough discussion and analysis of data mining results, including
an analysis of how the approaches used worked in accomplishing the project objectives. Draw
conclusions from your results. Please provide both detailed textual descriptions and
relevant screen shots of the data mining results and conclusions.
For example, you should explain the results of the data mining models (e.g., the validation
ASEs of the models, the number of leaves in the optimal tree for the decision tree model, the
variables used for the splits in the decision tree model, the significant variables included in
the regression model, and etc.) as well as the results of model comparison (e.g., which model
is selected as the best performing model based on which criterion?). Please provide both
detailed textual descriptions and relevant screen shots of the model results and the model
comparison results. After that, please draw conclusions from the results by determining which
factors are the best predictors of bank term deposit subscription and discussing their
implications for successful bank marketing strategies.
7. (Graduate Students Only) Ethical issues in data mining: Explore the impact that data mining
could have on privacy and the laws surrounding the privacy of personal data (4-5 pages,
8. Lessons learned and future work: Discuss what you have learned through the project and what
concepts and techniques you learned in class are used in the project; Discuss potential
extensions and future work.
9. References, if any.
1. The writing quality of the report, such as the completeness of the contents with regard to
the above requirements, and the coherence and correctness of the writing.
2. The effort made in data exploration and preprocessing
3. Data mining skills and strategies
4. Comprehensiveness of data analysis results and explanations. Business background related
and in-depth discussions are encouraged.
5. Completeness and timeliness of the report
Issues You Should Tackle During Project Accomplishment:
1. How to conduct data exploration and preprocessing
2. How to select right variables for the models
3. How to combine different data mining skills for the project, such as applying the stepwise
regression for neural network variable selection.
4. How to explain the data mining results
Appendix A – How to Import the Bank Marketing Data Set into SAS Enterprise Miner
Please go to http://archive.ics.uci.edu/ml/machine-learning-databases/00222/ and click the
“bank.zip” link to download and save the “bank.zip” zip folder on your computer. Then, please
extract the “bank.zip” zip folder into a regular folder, which contains three files: the bank.csv file,
the bank-full.csv file, and the bank-names.txt file. The bank-full.csv file will not be used for this
data mining project and can be deleted from your computer. Please open the bank-names.txt file
to read the description of the data set and develop an understanding of the objectives of this project.
After that, please follow the steps below to import the data set from the bank.csv file
into SAS Enterprise Miner for this project.
1. Double click the bank.csv file to open it in Excel.
2. Click letter “A” on top of the first column to select the entire column A in Excel, then
click the “Data” tab, and click “Text to Columns”.
3. Select the “Delimited” radio button, and click the “Next” button.
4. Uncheck the “Tab” checkbox, check the “Semicolon” checkbox, select “ " ” as the Text
Qualifier, and click the “Next” button.
5. Select the “General” radio button, enter $A$1 as the Destination, and click the “Finish”
6. Click the “File” menu, select “Save As”, click “Browse”, navigate to the folder where
you want to save the file, select “Excel Workbook (*.xlsx)” in the “Save as type:” box,
enter a file name in the “File name:” box, and click the “Save” button.
7. Start SAS Enterprise Miner, create a new project, and then create a new diagram.
8. Click the “Sample” tab and add a “File Import” node to your new diagram.
9. Select the “File Import” node in the diagram workspace, go to the property panel, click
the button of the "Import File" property in the "Train" section.
10. Select “My Computer” and click on “Browse...”.
11. Select the excel file you want to import, click “Preview” to make sure the data set will be
properly imported into SAS Enterprise Miner, and click “OK”.
12. Right click the File Import node, and click “Run”.
13. Right click the “File Import” node in the diagram workspace and choose “Edit
Variables” to set the roles and the levels for the variables in the dataset.
14. To view the dataset after the “File Import” node is run, go to the Property panel, click the
button of the “Exported Data” property, select the “Train” data set, and click the
“Browse” button to view the data.
15. To explore the distribution(s) of particular variable(s) in the data set, right click the “File
Import” node, select “Edit Variables”, select the variable(s) you want to explore, click
the “Explore” button, and you will be able to view the variable histogram(s).
16. After you make sure the data set has been properly imported, you can proceed with data
analysis by using the “File Import” node as the Data Source node.