Briefly summarize the questions that the project works on, the methods and the results.

MACHINE LEARNING
Course Project
Group Project
Group size: 1-2 students. (1 means working individually.)
I. Project Description
The goal of the project is to identify and address a domain-specific problem
on data analysis by applying machine learning algorithms or models. The
topics can be on prediction, regression, or classification.
In this project, you will need to locate a specific dataset, define the problem(s)
that you want to study, and implement at least one machine-learning algorithm
to solve the problem. Data preprocessing should be conducted if needed. You
are recommended to use Python and apply Scikit learn library. Repeating an
analysis that was performed in another class is not allowed.
The Evaluation will be based on completeness and quality of problem
definition, methodology introduction, implementation and presentation (project
report writing).
II. Submission
Project Report and Source Codes should be submitted by the due date.
The source codes should be in a separate file or zipped folder. Only one
group member should submit on behalf of the whole group. The group
members will get the same grade unless the work distribution is extremely
unbalanced.
The project report should include but not limited to the following components.
• Title and author(s). The title should capture your project question and
main methodology.
2
• Abstract. Briefly summarize the questions that the project works
on, the methods and the results.
• Background/Introduction. Introduce the
background, understanding and analysis of the problem. Explain the
motivation or the context.
• Problem Statement. Define the problem(s) that the project works on in
detail.
• Methodology. Describe the organization and patterns of the data set.
Describe methodology or algorithms using pseudocode or flow chart.
Describe the experimental settings (e.g., size of training data and test
data), data preprocessing (if any), and evaluation metrics of methods.
• Experimental Results. Depict experimental results using text, figures or
tables. Patterns of the data should be described. Appropriate
visualizations of data should be created to illustrate the experimental
results. Interpret the findings from the experiment.
• Conclusions. Conclude your course project. Briefly describe what you
have gained in this project.
The expected length of the project report can be 6 to 8 pages (single column,
using the template provided by the instructor).
III. Resources
Hints on datasets: Here are some possible data sources, but many more exist:
o UC Irvine Machine Learning Repository http://archive.ics.uci.edu/ml/
o The Home of Data Science & Machine Learning
https://www.kaggle.com/
o U.S. open government data. http://www.data.gov/
o Kaggle https://www.kaggle.com/
The above UC Irvine Machine Learning Repository and Kaggle website
provide many featured data sets that are suitable for machine learning tasks
such as classification and clustering. Some datasets require considerable
knowledge to interpret, while others are easier to understand. In government
data website, you can search by keyword say, ‘Education’ and then download a
specific dataset.
Hints on Problem Definition and Methodology: Course Modules in Canvas,
Textbook.