code development environment?
I Have feedback from my professor and I need to modify the proposal the Most important part is to find the data
It is clear you have not looked at the data to understand what it represents. I should not be doing this work for you, but I had a look at the data and can give you a hint: you should research libsvm data format in relation to the data you say you want to use.
Without understanding of the data, you have of course also not defined the research problem: what are your inputs and what output is your ML system going to try and extract from it.
It would be good to say which approaches you are going to compare in your investigation.
Saying “supervised, unsupervised and reinforcement learning” is too generic and I don’t think you intend to compare all three.
You talk about comparing user-centric and content-centric approaches – but as far as I can see you do not have the data to do this comparison. Once again, analyse what you’ve got first – make sure you understand the data fully and convince yourself it is sufficient for your problem. Otherwise, either change the problem to suit the data or find more suitable data to suit the problem. (also note that your Kaggle test data (https://www.kaggle.com/c/cs420-2019-ctr-prediction/data) does not appear to have groundtruth. I think they may want you to submit the solution to get the performance score. I am not sure it is still possible to submit as the competition is closed. Think about how you would get around this issue)
In your proposal you talk about measuring things like cost per click and average order value, investigating ways to maximise profit etc. Once again, you don’t have data to investigate all this as far as I can see (if I am wrong, tell me how you want to do this).
Reminder: research is not literature survey.
Your proposal still does not have a list of deliverables.
About your resources:
Jupyter notebook is too generic, are you planning to use any python ML libraries?
I am also concerned about you listing GitHub as a resource. You cannot just use other people’s code for this project. I hope you understand.
Finally, the text of your proposal is still not in your own words and consists of plucked out sentences with marginal modifications. You have been warned that this is not acceptable.
there are some points we need to address to avoid problems down the line.
1. Problem definition: make it explicit in the proposal. You say it is targeted advertisement. So, your problem can be defined as that of prediction of some behaviour output X (e.g. click or no click) based on the input of content and user profile (you will need to discuss how you can represent/quantify “user profile” and “content type”).
2. Data: “Kaggle advertisement database” is vague. The availability of specific data will dictate the problem formulation to a large extent.
3.Novelty: it looks like you are doing an investigation/experiment type dissertation where you are comparing different methods to solve a specific well-defined problem (as discussed above). The methods compared should be qualitatively different and not just tweaking of the same pipeline.
Please note: solutions to the Kaggle competitions related to the data are available . However, you must not base your investigation exclusively on the existing implementations. You must devise and implement a solution to the problem yourself or adapt some existing method to the problem. The method in the core of your implemented approach should be different from the implemented approaches presented on Kaggle but not necessarily better performing. Then you compare your solution to the results for the method(s) published on Kaggle and analyse the differences in performance. Always make it explicit when you quote the results obtained by somebody else. If you run existing implementations for comparison yourself, make it explicit your used somebody’s else solution and implementation.
4. Scope: we will need to narrow down to a set of qualitatively approaches you are going to test as potential solutions to the defined problem (see above) and make it clear which one(s) are your own design and implementation (and if there is any novelty in the algorithm you propose) and which one(s) are third-party. Don’t try to do too many, rather try to find maybe 3 that are fundamentally different (“quality over quantity”). The set must be exactly defined after you have done your literature survey. At this stage however, you can discuss scope definition in more generic terms in your objectives, similar to the way I describe it here.
Further, I have made comments in the pdf of your proposal. See the document attached (hover the mouse over the highlighted text to see my comments).
NB! Apart from the technical issues listed above, my biggest concern is the way you write by plucking out sentences from different sources with minimal paraphrasing. This is bordering on plagiarism and could result in very serious consequences if you write your dissertation like that. I must warn you about it now because I don’t want you to get problems later on. Believe me, it is immediately clear to the reader when you do it as the style of writing is evidently not your own and very eclectic with references to things not mentioned anywhere else in the text. In the pdf I give some tips how to approach this in a better way while still using the sources. I think you need to re-write your proposal introduction in your own words based on my feedback. Also review your sources as many of them are not academic.
To summarise, using the suggestions above and my comments in the pdf, develop your proposal further. Here are the key points:
Content-wise you need to include points 1-4 listed above in the objectives/deliverables section of the proposal to make it better defined and also write a project plan with a timeline.
Presentation-wise you need to re-write the introduction in your own words.
How about including your code development environment? Are you going to code in python? Which libraries are you going to use?
Any hardware needs? A GPU?
A lot of your references are from non-academic sources. You should stick to papers from reputable venues. A random webpage is not an academic source.