Project completed and this blog text edited by: Matteo Tegnenti, Emanuele Marisio and Marco Compagnoni.

We wrote a python program that can (1) infer data of a user by considering his friends and (2) with this data compare the profile similarity of two users.

We had problems with some complex parameters, like education, that are list of dictionary of list. We managed to generalize the similarity algorithm, now it works with complex parameters.

For the infer part, the program can only infer single values as gender or age.

Down below there are the two algorithm, on page 3-4 there are 2 image, those are example of results that we have using the program with the rawData graph.

We used the formulas that are on the papers.

1) Infer value:

Algorithm to determinate the minimum number of votes f, and the minimum percentage of votes t for a user U:

  1. Inferring item values for each u’s friend, with incremental f and t (Blue circles in Fig 1);

  2. Comparing the inferred values with friends‘ existing values.(it means compute the modularity);

  3. Obtain the maximum modularity and the relative f and t with it is computed;


Fig .1 Simple graph with gender attribute for each vertex


If we want to determine the gender. First, we infer the gender for every U’s friends F.

  1. Set f and t;

  2. For each friend F we infer value with particular f and t:

    1. We consider only the mutual friends between U and F (Green circles that have edge with U, in Fig.1);

    2. Count the votes for the gender attribute values from these mutual friends;

    3. Compute the percentage of votes

    4. The value with the highest is the value inferred if count of votes is greater than f and percentage of votes is greater than t else the value discarded;

  3. After inferred the values we compare the inferred values with the friend’s existing values and compute the modularity.

  4. Compare the modularity value with the maximum modularity value and chose the greater, then returns 2 with incremental f and t;

  5. The optimum f and the optimum t are those relative to the maximum modularity.

2) Similarity:

Consideration: Second the homophily theory we assume that friends of target user are similar to him/her, and profile items of friends can guide us in computing similarity of stranger.

The OF(Occurrence frequency):”mismatches on more frequent values are assigned higher similarity”. That is, if two values are non-identical but highly frequent in the dataset similarity is higher.

We chose OF, a measure that gives 1 for two identical profile item values, and, in case of non-identical values, gives a non-zero value which is computed by considering the frequency of the item in the dataset.

Classification of item: Item on a social network profile are classified according two orthogonal dimensions.

  • First classification: Is based on whether they are single value (gender) or multiple value (hobbies) items;
  • Second classification: Is related to whether they contain multiple sub-fields (education contains degree, school, type etc.) or not.

Single value similarity for a defined attribute is computed as follows:

  • Is 1 if U’s item value is equal to X’s item value

  • Otherwise we count the number of U’s friends with U’s as RIU value and the number of U’s friends with X’s value as RIX;

  • If RIU or RIX are zero the single value is not computed. Otherwise we compute A and B and return a non-zero value.

Multiple value similarity for a defined attribute is computed as follows:

  • For each item value h in U’s item we call the SimItem method that return the max OF between h and each value k in X’s item.

  • These SimItem values are aggregate together then returns the itemSimilarity.

Multiple sub-fields similarity: for a defined attribute is computed as follows:

  • For each field in U’s item:

    • If the field value is a single value item we compute the single value similarity;

    • If the field value is a multiple value we compute the multiple value similarity;

    • If the field value is a multiple sub-fields value we recomputed the multiple sub-filed similarity;

  • Then we aggregate each value returned and return a non-zero value.

Profile similarity:

To compute the profile similarity between U and X profile for each item in U’s profile we compute the similarity and compute the weighted average with an importance for each value.


Fig.2: Example of similarity among two user’s.

Attribute —-> similarity of this attribute.

Number : this is the total value of the similarity.

(we used random weight for attributes)


Fig.3: the graph with 1000 users for 1 refererrer (usign plot() method)

These profile similarity are done with random importance (weight).