MSc-Projects
General Information
A warning.
I have very high expectations and I don't believe in doing work last minute. The projects require you to be good at coding or good at math. If you tick both boxes, then you'll enjoy them a lot. However, if neither applies, then you might not enjoy these projects. I am looking for excellent students and ideally, I'd like to use the research to publish together in case the outcome is extremely good.
What should you do if you are interested?
If you want me to consider supervising you for a particular project, you need to send me an email with the following subject "EOI for project FMT-XX" where FMT-XX is the identifier for the project you are interested in. For example, if you are interested in taking the project "Fake news detection in large graphs" then you would put "EOI for project FMT-FN" in the subject line. In the body of the email you should briefly provide the following information:
why you are interested in the project;
what things about the project you expect would be particularly challenging;
what cool and interesting things you hope to do with the project;
why you would be well suited to the project.
your CV
Once project booking closes for the students and after thinking about it, I will decide who to offer my projects to. In case there are multiple people interested in a single project, I will use the information above to decide who to offer it to.
When can you come and talk about the project?
If you would like to discuss the suitability of my projects, you can come to see me during my normal office hours:
Tuesdays, 10:30pm - 12:30pm. (online)
In addition, I will be holding two extra office hours, during which times you can also come to my office to discuss my projects:
17:00 - 18:00 on Thursday 28 November10:00 - 11:00 on Wednesday 4 December.
I'm afraid I'm unable to meet with students to talk about my projects outside of these times.
My office is Bush House Centre Block (N)5.12.
What are the expectations?
Your project is an extremely important part of your degree. You should expect to invest a significant amount of time and effort, and so it's important that the project you choose is something that interests you. You should take great care when writing your report. Your report will be marked by examiners who have not supervised your project, and so won't know how wonderful your implementation is unless you tell them in the report. Make notes throughout your project and start working on drafts of the different chapters of your report as soon as possible. Of course, your implementation is also important. You need to implement something interesting in a sophisticated way in order to be able to write a really good report about it.
You should think carefully about how you are going to evaluate your project. This is something students often don't do very well. If your evaluation involves collecting any kind of data from humans then you need to apply for ethical approval. In most cases this will involve completing the Minimal Ethical Risk Registration Process. Details on this are here and here.
You may wish to make use of existing libraries in your project, which is fine but you must make clear in your report where you have used libraries and what exactly you have developed for your project. Keep in mind that to get a good mark we need to be able to see that you have contributed something sophisticated to the system you develop.
I will provide feedback on your report only once. Please keep in mind though that I supervise a lot of students and am also busy with many other commitments. I will aim to get feedback to you as quickly as possible but this may take up to two weeks (especially as we get towards the deadline, which is when many students want feedback). You may prefer to submit early chapters of your report for feedback as soon as they're ready. It's usually the case that receiving feedback early on can help you in writing the rest of your report.
Remember, it's your project! My role is to discuss your ideas with you, help you understand what's expected, and give you feedback on things you've done, I'm not here to tell you what to do or how to do it. You shouldn't expect me to help you with your code or come up with ideas for you. If you decide to do everything last minute, then unfortunately, chances are that I won't have time to help you. So, please, spread your work as evenly as possible.
Project FMT-FN: Fake news detection in large graphs
The influence of fake news is undoubtedly one of the biggest threats to democracy that our era faces. Research on fake news is, due to the novelty of the phenomena, in its infancy. Nonetheless, there has been a number of important discoveries; among the most insightful being:
Bots have a tremendous impact on spreading fake news [1] and fake news exhibits a community structure [2].
This is the starting point of the project, which consists of two steps.
The first step is to analyse labeled real-world graphs (e.g., Twitter) with the goal of finding patterns common among fake news sites.
In the second part, based on the findings of the analysis, the goal is to develop and evaluate a simple data-driven model to detect fake news.
It is extremely helpful to have a good understanding of statistics and outlier detection.
[1] C. Shao, G. Ciampaglia, O. Varol, K. Yang, A. Flammini, F. Menczer
The spread of low-credibility content by social bots. Nature Communications'18
[2] K. Starbird. Examining the Alternative Media Ecosystem Through the Production of Alternative Narratives of Mass Shooting Events on Twitter. ICWSM'17
Project FMT-ComSearch: Finding communities in large graphs
The goal of this projects is to study a powerful algorithm for community detection: the Louvain algorithm.
The goal of the algorithm---and community detection in general---is to determine which nodes of a graph (say Facebook or Twitter) belong together (for example, in the sense that all of the people of the group know each other in real life). Ideally, we would like to be able to do this by just using the link-structure, i.e., only by looking whether nodes are connected by an edge in the graph and not by using any additional information provided (e.g., Facebook profile) since this information is not always available.
The starting point of this project will be to understand an influential paper by Newman: https://journals.aps.org/pre/abstract/10.1103/PhysRevE.94.052315 .
The Louvain algorithm makes its decisions based on an objective function, the modularity, that it tries to optimise. Newman shows that, under strong assumptions, optimising this modularity is a good idea because it gives exactly the same result as maximum likelihood methods. This is very surprising since, typically, maximum likelihood methods are the most reliable methods possible, but are very costly to compute (long runtime etc.). On the other hand, the Louvain algorithm is very efficient and thus this provides a very efficient way of computing the maximum likelihood.
There are two possible directions both requiring to change the definition of the modularity a bit
1) Find out how to change the modularity so that the resulting algorithm works better on real-world graphs (better than the original algorithm).
2) Find out how to change the modularity so that the one can prove a stronger claim than Newman.
Ideally, the student can work on both parts (which requires very good math skills + good implementation skills). Alternatively, it would also be great if the student understands the modularity function enough to propose new algorithms and tests them on real-world graphs (good math skills + very good implementation skills).
Project FMT-HCluster: Comparing Hierarchical Clustering Algorithms
This project focuses on understanding and characterising the limitations of hierarchical clustering on graphs (see https://towardsdatascience.com/understanding-the-concept-of-hierarchical-clustering-technique-c6e8243758ec).
The goal is to test many different hierarchical clustering algorithms on a wide-variety of datasets ranging from social networks to protein-protein interaction networks. A question that might also be answered is whether it is possible to test whether a dataset has an underlying hierarchy.
Strong implementation skills are required.
Projects:
Implement Louvain (https://en.wikipedia.org/wiki/Louvain_method) test it using arbitrary datasets
UCI datasets, arbitrary clustering algorithms
Implement PCA/SVD + linkage algorithm (see https://dl.acm.org/doi/10.1145/3321386) using arbitrary datasets
Protein-protein interaction networks (e.g., https://snap.stanford.edu/biodata/datasets/10000/10000-PP-Pathways.html) arbitrary clustering algorithms
Stochastic Block Model (see https://dl.acm.org/doi/10.1145/3321386). Try different values for p and q and different community sizes. This is synthetic data and understanding the model requires a good math background.
Project FMT-BRAIN: Learning Hierarchical Concepts in the Brain
This project focuses on understanding how the brain works. Specifically, the question of how concepts that have structure get represented in the brain. In https://arxiv.org/pdf/1909.04559.pdf a model for hierarchically structured concepts is introduced and it is shown how a biologically plausible neural network can recognise these concepts, and how it can learn them in the first place. The goal to simulate the model in in various settings to simulate the network by extending the model in various ways.
The second goal of the project is a data analysis of different brain topologies (e.g., http://networkrepository.com/bn.php )
Project FMT-FIN: Predicting The Stock Market
UPDATE: No spots left, please have a look at the Fake News topic.
The goal is to predict the behaviour of the stock market. You can use ARMA type models (time series), neural networks and other means.
There is a lot of data available, e.g., finance.yahoo.com. See also: https://www.quantshare.com/sa-620-10-new-ways-to-download-historical-stock-quotes-for-free
You can, for example, predict the Dow Jones (DJI). It is not necessary that you have very profound understanding of the stock market for this project!
Projects:
DJI
gold price
Brent crude oil
FTSE
Apple