More Instructions
Project 3
Deadline:
Submit by midnight Monday, 20th of October 2019.
Evaluation:
35% of your final course grade.
Late Submission:
No late submissions accepted since this is the last week of the semester.
Work
This assignment is to be done in groups of up three students. You will need to fill out and submit a form (to be provided) indicating your contribution to the project. You will be asked to evaluate your group members’ as well as your contribution to the project.
Identical grades are not guaranteed for each student in a group.
Purpose:
To work in a group setting and to apply all machine learning and data mining skills learned so far on a real-world problem. Build a software package that demonstrates the application of your work and present this to the class. Learning outcomes 1 - 5 from the course outline.
Project outline:
Create a data science artefact/deliverable which could consist of a notebook or a standalone application. This artefact will apply machine learning and data mining techniques on a chosen real-world problem domain.
Investigate what kind of ongoing research is taking place on the chosen domain, and either replicate parts of this research or attempt something novel by modifying or extending the algorithms, or by augmenting the applications with a richer set of features.
Some possible domains and ideas are:
1. Financial and market data analysis: time-series analysis, time-series forecasting, stock market prediction.
2. Recommender engine: create application for making recommendations based on user preferences.
3. Fitness data: analysis of your personal or some group’s FitBit data.
4. Twitter: sentiment analysis, text classification, semantic analysis, network visualization, geospatial visualization, data storage etc.
5. Facebook: network visualization, geospatial visualisation, network analysis, natural language processing, data storage etc.
6. Data journalism: data visualization, infographics, text summarisation and classification, natural language semantic analysis
7. Interesting real-time correlations: Twitter discussions about financial instruments and their shifts in price index etc.
8. A Kaggle dataset related.
9. Process mining.
10. ...or something entirely different.
This project makes a considerable proportion of your total mark. Therefore, your final work must be substantial. Form your groups early and come up with topics for your group at the earliest possible stage so that you can commence work on development. You are required to register your project and your team composition on the class Google Doc.
You are encouraged to use Python; however, this is not an absolute pre-requisite for all parts of your project. If you are building a GUI based application, Python does possess libraries that facilitate this; however, you can use Qt or technologies like .NET which can call your Python methods that implement your application.
Project Requirements:
Project details:
· Submit all your application code, experimental code in a mixture of .py and Notebook files as is appropriate for each project. Each project should submit at least one Notebook that contains all the key findings and summaries.
· Present and demonstrate your project to the class in a 15 minute presentation.
· Submit a document outlining the contribution that each person has made to the project. List in detail what each person has done and the percentage of the total contribution. Not all team members will necessarily receive the same mark.
Marking criteria:
Marks will be awarded for different components of the project using the following rubric:
Component
Marks
Project presentation and demonstration
20%
Originality
15%
Project python code, Notebooks, application of data science, substance and difficulty of the work undertaken.
35%
If you have any questions or concerns about this assignment, please ask the lecturer sooner rather than closer to the submission deadline.
2 | Page
1
|
Page
Project 3
Deadline:
Submit by midnight Monday, 20
th
of October 2019.
Evaluation:
35% of your final course grade.
Late Submission
:
No late submissions accepted since this is the last week of the semester.
Work
This assignment is to be done in groups of up
three
students. You will need to fill out and
submit a form (to be provided) indicating your contribution to the project. You will be
asked to evaluate your group members’ as well as your contribution to the pr
oject.
Identical grades are not guaranteed for each student in a group.
Purpose:
To work in a group setting and to apply all machine learning and data mining skills
learned so far on a real
-
world problem. Build a software package that demonstrates the
application of your work and present this to the class. Learning outcomes 1
-
5 from
the
course outline.
Project outline:
Create a data science artefact/deliverable which could consist of a notebook or a standalone application. This artefact will
apply machine learning and data mining techniques on a chosen real
-
world problem domain.
Investigate what kind of ongoing res
earch is taking place on the chosen domain, and either replicate parts of this research
or attempt something novel by modifying or extending the algorithms, or by augmenting the applications with a richer set
of features.
Some possible domains and ideas are:
1.
Financial and market data analysis: time
-
series analysis, time
-
series forecasting, stock market prediction.
2.
Recommender engine: create application for making recommendations based on user preferences.
3.
Fitness d
ata: analysis of your personal or some group’s FitBit data.
4.
Twitter: sentiment analysis, text classification, semantic analysis, network visualization, geospatial visualization,
data storage etc.
5.
Facebook: network visualization, geospatial visualisatio
n, network analysis, natural language processing, data
storage etc.
6.
Data journalism: data visualization, infographics, text summarisation and classification, natural language
semantic analysis
7.
Interesting real
-
time correlations: Twitter discussions abo
ut financial instruments and their shifts in price index
etc.
8.
A Kaggle dataset related.
9.
Process mining.
10.
...or something entirely different.
This project makes a considerable proportion of your total mark. Therefore, your final work must be sub
stantial. Form
your groups early and come up with topics for your group at the earliest possible stage so that you can commence work
on development. You are required to register your project and your team composition on the class Google Doc.
You are e
ncouraged to use Python; however, this is not an absolute pre
-
requisite for all parts of your project. If you are
building a GUI based application, Python does possess libraries that facilitate this; however, you can use Qt or
technologies like .NET which
can call your Python methods that implement your application.
Project Requirements:
Project details:
-
Submit all your application code, experimental code in a mixture of .py
and Notebook files as is appropriate for
each project. Each project should submit at least one Notebook that contains all the key findings and summaries.
-
Present and demonstrate your project to the class in a 15 minute presentation.
1 | Page
Project 3
Deadline: Submit by midnight Monday, 20
th
of October 2019.
Evaluation: 35% of your final course grade.
Late Submission: No late submissions accepted since this is the last week of the semester.
Work This assignment is to be done in groups of up three students. You will need to fill out and
submit a form (to be provided) indicating your contribution to the project. You will be
asked to evaluate your group members’ as well as your contribution to the project.
Identical grades are not guaranteed for each student in a group.
Purpose: To work in a group setting and to apply all machine learning and data mining skills
learned so far on a real-world problem. Build a software package that demonstrates the
application of your work and present this to the class. Learning outcomes 1 - 5 from the
course outline.
Project outline:
Create a data science artefact/deliverable which could consist of a notebook or a standalone application. This artefact will
apply machine learning and data mining techniques on a chosen real-world problem domain.
Investigate what kind of ongoing research is taking place on the chosen domain, and either replicate parts of this research
or attempt something novel by modifying or extending the algorithms, or by augmenting the applications with a richer set
of features.
Some possible domains and ideas are:
1. Financial and market data analysis: time-series analysis, time-series forecasting, stock market prediction.
2. Recommender engine: create application for making recommendations based on user preferences.
3. Fitness data: analysis of your personal or some group’s FitBit data.
4. Twitter: sentiment analysis, text classification, semantic analysis, network visualization, geospatial visualization,
data storage etc.
5. Facebook: network visualization, geospatial visualisation, network analysis, natural language processing, data
storage etc.
6. Data journalism: data visualization, infographics, text summarisation and classification, natural language
semantic analysis
7. Interesting real-time correlations: Twitter discussions about financial instruments and their shifts in price index
etc.
8. A Kaggle dataset related.
9. Process mining.
10. ...or something entirely different.
This project makes a considerable proportion of your total mark. Therefore, your final work must be substantial. Form
your groups early and come up with topics for your group at the earliest possible stage so that you can commence work
on development. You are required to register your project and your team composition on the class Google Doc.
You are encouraged to use Python; however, this is not an absolute pre-requisite for all parts of your project. If you are
building a GUI based application, Python does possess libraries that facilitate this; however, you can use Qt or
technologies like .NET which can call your Python methods that implement your application.
Project Requirements:
Project details:
- Submit all your application code, experimental code in a mixture of .py and Notebook files as is appropriate for
each project. Each project should submit at least one Notebook that contains all the key findings and summaries.
- Present and demonstrate your project to the class in a 15 minute presentation.