Word Count: 2,460

1 Executive Summary

In this paper we use NBD count models to examine the behavior of Wharton MBA students on the messaging platform GroupMe. With data from the Wharton 2018 GroupMe, in which nearly all class members are users, we fit NBD models to the activities of posting a message, being mentioned in a message, and liking a message. In the observation period we find that users post approximately 5 times more than they are mentioned and like posts 5 times more than they post. Moreover, we find that being mentioned is the most concentrated (small number of students account for most mentions) while liking messages is the least concentrated. Finally, we built NBD models for each gender but find that behaviors for females and males are the same in the Wharton 2018 GroupMe.

2 Background

2.1 Objective

Our objective with this Wharton 2018 GroupMe analysis is to answer the following questions:

  1. Is the rate of posting, being mentioned, and liking posts different?
  2. Is there variation in the concentration of posting, being mentioned, and liking posts?
  3. Do females and males interact differently with the platform through the lens of posting, being mentioned, and liking posts?

2.2 GroupMe Platform

You have just decided to get your MBA at Wharton. After paying your deposit and joining the Facebook group, the next thing you do is join the class GroupMe. GroupMe is messaging service created in 2010 and later acquired by Skype (and thus a Microsoft holding). Unlike Whatsapp or iMessage, GroupMe is designed for group messaging rather than one-on-one conversations. As such, it’s become the message platform du jour for university students as it supports groups with hundreds of users. Below is a screenshot of the Wharton 2018 GroupMe that shows the following three primary actions:

  1. Posts - messages sent by users
  2. Mentions - @’ing another user, which sends them an alert
  3. Likes - heart-ing a post to show you like it

Screenshot showing posts, mentions, and likes

Figure 2.1: Screenshot showing posts, mentions, and likes

The data in this analysis is from the “Wharton - 2018” GroupMe group (often just referred to as the Wharton 2018 GroupMe). GroupMe has an API that allows developers to access groups and messages. After creating an access token, we built a pipeline to acquire and process the users and messages from GroupMe for this group (see this data processing documentation for details). After parsing the JSON’s and cleaning the data, we created a dataset of simple tables illustrated in the diagram below:

GroupMe data organization

Figure 2.2: GroupMe data organization

2.3 Wharton 2018 GroupMe

There are 811 users in the Wharton 2018 GroupMe, covering the approximately 850 members of the Wharton 2018 MBA class. Though the group was created in January 2016, we trimmed the dataset to start on August 8, 2016 (first day of pre-term) to provide an accurate window in which to observe the actions of the users. In other words, all users have the same observation period. We removed users from the dataset that have left the group and discuss the possibility of late joiners in the Limitations section. The last post in our dataset is 2017-02-22 07:17:45, thus covering 198 days or about 28 weeks. There have been 4,921 posts by 570 distinct users. Below is a time series of the posts:

From the plot above we see a great deal of daily volatility. Below is a plot of a 7-day rolling average that helps smooth out spikes and exhibit the trend.

2.4 Count Datasets

2.4.1 Three Events

The three actions that we will investigate (posts, mentions, and likes) each arise from count processes and thus deserve a count model (i.e. NBD).

Count datasets arising from the Wharton 2018 GroupMe
Event Individual-level Story Source of Heterogeneity
Posts Users in the Wharton 2018 GroupMe can post as many times as they would like - there is no upper bound. Thus we can think of each user as having a post rate, \(\lambda\), in the observed time window. Users interact with GroupMe differently. Some post a lot, some have never posted. However, all users have the same opportunity to post.
Mentions Users in the Wharton 2018 GroupMe can be mentioned an infinite number of times - there is no upper bound. Other users can create a new post and mention them. Unlike the posts event, the act of being mentioned is not in the agency of individual. Nevertheless, we can think of each user has having a mention rate, \(\lambda\), during the observed time window that determines how many times they will be mentioned Popularity. In all seriousness, some users of the group will be mentioned more than others. Some will not be mentioned at all. Heterogeneity arises from the social construct.
Likes The number of posts a user has liked is a choice dataset, as there is a finite number of opportunities to like a post (i.e. the number of posts). However, given the high upper bound, we can reasonably view this dataset as a count process. As such, each user has some like rate, \(\lambda\), during the observed time window that determines how many posts they like. An individual can be someone that likes every post or has never liked a post. Users have different levels of engagement on the Wharton 2018 GroupMe. Thus, it follows there will be variation in like rates within the user population.

We might expect to observe differences in heterogeneity for each of the three events. For example, we would presume that there is more heterogeneity in like rate than in post rate as liking is less visible and risky than posting (to one’s reputation) in a group of 811.

2.4.2 Gender

In addition to three behaviors that are the primary interest of this analysis, we included an attribute of the user: gender. We will use this to identify if there are differences in posting, being mentioned, or liking between male and female Wharton students.

3 NBD Model

3.1 Posts

In the plot below we show the distribution of posts per user. The distribution is positively skewed with a long right tail. There are a few users that have posted more than 50 times, but the majority are less active. The median number of post per user is 2 posts though the mean posts per user is 6.07 posts (sd = 11.3).

The data is of the form:

Number of users for count of posts in period (bottom 10)
posts users
0 241
1 108
2 81
3 67
4 43
5 39
6 36
7 20
8 32
9 8

We fit an NBD model, including a zero-inflated NBD given the notable spike at 0, using MLE, method of moments, and means and zeros to estimate parameters. We find through MLE that a zero-inflated model does not help describe the data as \(\pi = 0\).

NBD parameters estimates for different methods
model r alpha pi
MLE 0.4200 0.0692
MLE (Zero-Inflated) 0.4200 0.0692 0
Method of Moments 0.3026 0.0499
Means and Zeros 0.4561 0.0752

We note the divergence between the method of moments and MLE / means and zeros parameter estimates. The large standard deviation, 11.3, shrinks the estimate of alpha as \(\hat{\alpha} = \frac{\bar{x}}{s^2-\bar{x}}\), causing a smaller \(r\) in turn.

Below is a table that shows the estimated number of users for post counts less than five by the three parameter estimation techniques. A plot showing all post counts follows. We see that the methods are not that different, but method of moments certainly performs the worst.

Estimated number of users for posts (<= 5) by different estimation methods
posts Actual MLE Method of Moments Means and Zeros
0 241 257 323 241
1 108 101 93 102
2 81 67 58 69
3 67 51 42 53
4 43 40 33 42
5 39 33 27 35

In order to perform the \(\chi^2\) goodness-of-fit test for the NBD model, we need roll-up the right tail so that 80% of the expected counts have more than 5 counts. We create a 25+ bucket so that 84.6% of the expected counts are greater than 5. We calculate the \(\chi^2\) test statistic and \(p\)-value for each parameter estimation method using 25 - 2 - 1 = 22 degrees of freedom. Based on the \(p\)-values shown below, we have no evidence that the data came from the NBD model. Nevertheless, the plot above shows a relatively good fit, at least for the estimates from MLE and means and zeros.

Goodness of Fit Test
model chisq p.value
MLE 55.92 0.000088
Method of Moments 102.61 0.000000
Means and Zeros 53.72 0.000180

3.2 Mentions

Like posts we start by looking at the distribution of the number of times a user has been mentioned both in graphic form and the the table below. Like posts, mentions are positive skewed with a long right tail - one user has 40 mentions. The median number of mentions for a user is 0 mentions though the mean is 1.76 mentions (sd = 3.66).

Number of users for count of mentions in period (bottom 10)
mentions users
0 416
1 151
2 91
3 38
4 24
5 23
6 11
7 9
8 12
9 6

We perform the parameter estimation using the same techniques and find that the zero-inflated model does not fit the data. Like the method of moments estimates for posts, the method of moments estimates for mentions are quite different from the estimates by MLE and means and zeros.

NBD parameters estimates for different methods
model r alpha pi
MLE 0.3651 0.2073
MLE (Zero-Inflated) 0.3651 0.2073 0
Method of Moments 0.2660 0.1511
Means and Zeros 0.3919 0.2226

Estimated number of users for mentions (<= 5) by different estimation methods
mentions Actual MLE Method of Moments Means and Zeros
0 416 426 473 416
1 151 129 109 133
2 91 73 60 76
3 38 48 39 50
4 24 33 28 34
5 23 24 21 25
6 11 18 16 18
7 9 13 12 14
8 12 10 10 10

The plot below shows that the parameter estimates by MLE and means and zeros fit quite well.

Like before, to perform the \(\chi^2\) goodness-of-fit test for the NBD model, we need roll-up the right tail so that 80% of the expected counts have more than 5 counts. We create a 10+ bucket so that 100% of the expected counts are greater than 5. We calculate the \(\chi^2\) test statistic and \(p\)-value for each parameter estimation method using 10 - 2 - 1 = 7 degrees of freedom. Though the plot above looked quite good, based on the \(p\)-values shown below, we do not have evidence that the data came from the NBD model, ignoring the method of moments as a poor fit.

Goodness of fit test for mentions received
model chisq p.value
MLE 17.83 0.01275
Method of Moments 43.69 0.00000
Means and Zeros 16.52 0.02079

3.3 Likes

We rinse and repeat, following the same process for likes as we did for posts and mentions. We note that the tail is a bit longer for likes as some users do a lot of post-liking. The median number of likes given is 20 likes though the mean is 46.83 likes (sd = 78.51).

Number of users for count of likes given in period (bottom 10)
likes users
0 61
1 39
2 25
3 28
4 23
5 19
6 25
7 12
8 16
9 17

A careful observer of the plot above may have noted the magnitude of the counts are quite large. This is problematic when calculating gamma functions. For example, \(\Gamma(100) = 9.3e^{155}\). Now imagine \(\Gamma(600)\). To handle this, we used log-gamma and log-factorial functions and restated the first term of the NBD equation as

\[\begin{equation} \ \frac{\Gamma(r + x)}{\Gamma(r) x!} = e^{lgamma(r + x) - (lgamma(r) + lfactorial(x))} \end{equation}\]

We estimate the parameters using each of the three methods as before and again find that the zero-inflated model does not fit the data and that the method of moments estimate is quite different from the MLE and means and zeros estimate.

NBD parameters estimates for different methods
model r alpha pi
MLE 0.5358 0.0114
MLE (Zero-Inflated) 0.5358 0.0114 0
Method of Moments 0.3585 0.0077
Means and Zeros 0.5898 0.0126

Below is a comparison of the expected counts for the left-end of the likes distribution:

Estimated number of users for likes given (<= 5) by different estimation methods
likes Actual MLE Method of Moments Means and Zeros
0 61 73 141 61
1 39 39 50 36
2 25 30 34 28
3 28 25 26 24
4 23 22 22 21
5 19 19 19 19
6 25 18 17 18
7 12 16 15 16
8 16 15 14 15

Aside from the large spike for the method of moments, the MLE and means and zeros model do not look too bad. However, we can see quite a few gray spikes above the blue and green lines in the 10-30 range indicating poor fit there.

Finally we perform the \(\chi^2\) goodness-of-fit test and first roll-up the right tail so that 80% of the expected counts have more than 5 counts. We create a 35+ bucket so that 88.9% of the expected counts are greater than 5. We calculate the \(\chi^2\) test statistic and \(p\)-value for each parameter estimation method using 35 - 2 - 1 = 32 degrees of freedom. Based on the \(p\)-values shown below, we have evidence that the data came from the NBD model for the MLE and means and zeros estimation methods. The model created by the method of moments fits poorly.

Goodness of fit test for likes given
model chisq p.value
MLE 41.64 0.1183
Method of Moments 115.11 0.0000
Means and Zeros 39.12 0.1806

4 Results

4.1 Activity Rates

Let’s get to the answers to our questions. Below is a summary of the parameter estimates (using MLE) for the three behaviors in question. We see that there is in fact different rates for each activity. On average, a user posts 5 times more than they get mentioned. Users also like posts about 5 times more than they post. So, for the 198 days thus far, you have on average liked 45 posts, posted 5 times, and been mentioned once. The magnitude of the variance (and standard deviance shown below), follow this hierarchy and mirror observed values).

Summary of model parameters, mean, and standard deviation
variable r alpha E[X] sd[X]
posts 0.4200 0.0692 6.068 9.682
mentions 0.3651 0.2073 1.761 3.202
likes 0.5358 0.0114 46.834 64.347

We can also look at the distributions of the three rates, identified as \(\lambda\) in our NBD model to understand user heterogeneity. In the plot below we see that there is the most heterogeneity in like rate, the least heterogeneity in mention rate, and the post rate is in the middle. As \(r < 1\) for all distributions, each have an interior mean (do not go to \(\infty\) near zero). At this point we have answered question 1: there are differences in post, mention, and like rates.

4.2 Concentration

Now we can answer question 2: is there variation in the concentration of each of the GroupMe activities. Using the Lorenz curve and the 80/20 rule highlighted below we see there are some differences, but the differences are not stark. We see that being mentioned is concentrated in the fewest number of users (20% of users account for 1 - 28% = 72% of the mentions). This follows intuitively from the histogram in the NBD model section. In contrast, likes are the least concentrated (20% of users account for 1 - 36% = 64% of the likes). So, we find that mentions are more concentrated than likes, with posts in between. However, the differences are not substantial.

4.3 Gender Differences

We move into treacherous waters: asking if there are differences between the genders. To answer question 3 we start with side-by-side histograms of each of the three activities, scaled for differences in the number of females (375) and males (436).

Next, we fit an NBD model for each activity, for each gender and combined, using MLE without and with a spike at zero. Based on the (large) table below we immediately see that none of the zero-inflated models are appropriate. However, we note that the parameter estimates are quite similar between the genders.

Estimated model parameters and log-likelihood for each model by gender
activity gender model r alpha pi ll
posts Female MLE 0.4425 0.0676 -1051.3
posts Male MLE 0.4022 0.0711 -1154.0
posts Combined MLE 0.4200 0.0692 -2206.5
mentions Female MLE 0.3799 0.2092 -649.9
mentions Male MLE 0.3528 0.2059 -729.9
mentions Combined MLE 0.3651 0.2073 -1380.1
likes Female MLE 0.5785 0.0113 -1815.1
likes Male MLE 0.5049 0.0117 -2012.0
likes Combined MLE 0.5358 0.0114 -3829.8
posts Female MLE (Zero-Inflated) 0.4425 0.0676 0 -1051.3
posts Male MLE (Zero-Inflated) 0.4022 0.0711 0 -1154.0
posts Combined MLE (Zero-Inflated) 0.4200 0.0692 0 -2206.5
mentions Female MLE (Zero-Inflated) 0.3799 0.2092 0 -649.9
mentions Male MLE (Zero-Inflated) 0.3528 0.2059 0 -729.9
mentions Combined MLE (Zero-Inflated) 0.3651 0.2073 0 -1380.1
likes Female MLE (Zero-Inflated) 0.5785 0.0113 0 -1815.1
likes Male MLE (Zero-Inflated) 0.5049 0.0117 0 -2012.0
likes Combined MLE (Zero-Inflated) 0.5358 0.0114 0 -3829.8

So, we move to plotting the expected counts of each activity for females and males based on the NBD model. We see that the expected counts are remarkably similar for each activity, though the \(r\) and \(\alpha\) parameters are slightly different.

Using the Lorenz curves below we see that the activities are a bit more concentrated for males than for females. We can see this in the histogram at the beginning of this section. For each activity, there are more males than females that are hardcore non-posters, not-mentioned, and non-likers. However, these differences is minimal.

Lastly, we use the likelihood ratio test (with degrees of freedom \(4 - 2 = 2\)) to identify if the individual models are better at explaining the behavior than a combined model. We see from the \(p\)-values below, two separate models are not different from the combined model for every activity. So, we have answered question 3: females and males use the platform in a similar fashion.

Likelihood ratio test to determine in separate female and male models are appropriate
activity Female Male Combined chisq p.value
posts -1051.3 -1154.0 -2207 2.396 0.3019
mentions -649.9 -729.9 -1380 0.443 0.8013
likes -1815.1 -2012.0 -3830 5.346 0.0690

5 Limitations

  1. Users that joined after August 8, 2016 or left and rejoined. While we removed users that left during the observation period, this analysis assumes that all 811 users were in the Wharton 2018 GroupMe for the entire duration of the observation period. While we know a few cases where this is not true, this occurrence is minimal. Unfortunately, GroupMe does not have a join date in the API for users to a group. There are system created posts when users join or when existing users add new users. Unfortunately, the way that GroupMe has stored this data has changed over time. The source of truth is GroupMe generated human-readable text which would need to be extensively parsed and was not done here.
  2. Interdepedence of activites. Not discussed in this paper is interdependence of posts, mentions, and likes. An extension of this analysis would explore the interaction between the three.