Spotted Unipd: machine-learn your sweet half

3 minute read


What is this? Another Facebook-sentiment analysis?

What can you do with ~5000 posts just downloaded from one of the most visited Facebook pages where Unipd students chat about (true) love? The correct answer should be: simply sentiment-analyze them! Yet, there are few drawbacks:

  1. few (but there are some) posts are not in Italian, they are written in other languages: some dialects, English …
  2. the meaning is very easy to understand, and the content does not vary much among different posts
  3. the typical post is 45 words long and each of them is about 5 chars long, so there is not much data to deal with and find something really interesting about the opinions of the writers
  4. furthermore the style is so much different among posts: one post maybe all sad and depressed while the next can be extremely happy and joyful

So what is this?

PySpottedUnipd is simply an attempt to get analytical insights into the most successful and popular posts in the official Spotted Unipd Facebook page . As you can imagine the repository is divided in 2 (+ 1) main blocks:

  • the scraper which downloads all the data provided a Facebook API token
  • the ml-analyzer which carries out sample analysis on the data
  • the ngram generator which generates sample n-grams based on the posts text content

Ok, I understand … but what have you found?

  1. Timing is not relevant Others researches have found out that if a Facebook post is posted in a specific time frame, that post would have more probability to get popular. The reality is that, because the page is managed by admins, you can’t post some time frames (say e.g from 23PM to 7AM) because simply the admins do not post that time. As a consequence the timing of the post becomes a less relevant factor to determine how many likes (or comments, or shares) the posts gets. fitting likes count

  2. Describe in detail, but not as a poet As you can see from the image below, there are 3 main categories a post can fall in:

    • you can write as much details about your sweet half as you want, but the post will probably never get liked (nor shared) thereby reducing your chances to find the sweet half

    • but if you decide to write long (and many of them) words about your love, and if you choose the words very carefully, the post may get liked and shared; certainly not balancing the workload of finding those words

    • simply flowing your feelings about those eyes, those legs, and those … whatever you liked the most about him/her, can be very profitable: you get the most likes and shares this way!

    So finally, be yourself, use funny description and use hashtags (or words longer than 11 chars). 3 clusters

  3. Do not underestimate the power of long words With number of likes and number of shares the number of words longer than 9, 11, 7 are the top 5 factors to determine the number of comments that are useful if you want to get in touch with your future love. So don’t be scared to use descriptive words (and a bit of hashtags): the important thing is that your post will look like a love declaration rather than an excerpt of the Divina Commedia. k best comments

Understood, where can I try it?

I hope the repository guide is self-explanatory, but essentially what you do is

mkdir github-projects && cd github-projects  # create tmp directory
git clone  # or wget
cd spotted-unipd  # enter local repo
pip3 install . --upgrade --force-reinstall  # install
  -t <access token to use Facebook API>
  -m <min number of posts to fetch>
  -f <format of output file [json, csv]>
  -o <path to output file>  # run script

Stuff todo