Word Count: 2,460
In this paper we use NBD count models to examine the behavior of Wharton MBA students on the messaging platform GroupMe. With data from the Wharton 2018 GroupMe, in which nearly all class members are users, we fit NBD models to the activities of posting a message, being mentioned in a message, and liking a message. In the observation period we find that users post approximately 5 times more than they are mentioned and like posts 5 times more than they post. Moreover, we find that being mentioned is the most concentrated (small number of students account for most mentions) while liking messages is the least concentrated. Finally, we built NBD models for each gender but find that behaviors for females and males are the same in the Wharton 2018 GroupMe.
Our objective with this Wharton 2018 GroupMe analysis is to answer the following questions:
You have just decided to get your MBA at Wharton. After paying your deposit and joining the Facebook group, the next thing you do is join the class GroupMe. GroupMe is messaging service created in 2010 and later acquired by Skype (and thus a Microsoft holding). Unlike Whatsapp or iMessage, GroupMe is designed for group messaging rather than one-on-one conversations. As such, it’s become the message platform du jour for university students as it supports groups with hundreds of users. Below is a screenshot of the Wharton 2018 GroupMe that shows the following three primary actions:
The data in this analysis is from the “Wharton - 2018” GroupMe group (often just referred to as the Wharton 2018 GroupMe). GroupMe has an API that allows developers to access groups and messages. After creating an access token, we built a pipeline to acquire and process the users and messages from GroupMe for this group (see this data processing documentation for details). After parsing the JSON’s and cleaning the data, we created a dataset of simple tables illustrated in the diagram below:
There are 811 users in the Wharton 2018 GroupMe, covering the approximately 850 members of the Wharton 2018 MBA class. Though the group was created in January 2016, we trimmed the dataset to start on August 8, 2016 (first day of pre-term) to provide an accurate window in which to observe the actions of the users. In other words, all users have the same observation period. We removed users from the dataset that have left the group and discuss the possibility of late joiners in the Limitations section. The last post in our dataset is 2017-02-22 07:17:45, thus covering 198 days or about 28 weeks. There have been 4,921 posts by 570 distinct users. Below is a time series of the posts:
From the plot above we see a great deal of daily volatility. Below is a plot of a 7-day rolling average that helps smooth out spikes and exhibit the trend.
The three actions that we will investigate (posts, mentions, and likes) each arise from count processes and thus deserve a count model (i.e. NBD).
Event | Individual-level Story | Source of Heterogeneity |
---|---|---|
Posts | Users in the Wharton 2018 GroupMe can post as many times as they would like - there is no upper bound. Thus we can think of each user as having a post rate, \(\lambda\), in the observed time window. | Users interact with GroupMe differently. Some post a lot, some have never posted. However, all users have the same opportunity to post. |
Mentions | Users in the Wharton 2018 GroupMe can be mentioned an infinite number of times - there is no upper bound. Other users can create a new post and mention them. Unlike the posts event, the act of being mentioned is not in the agency of individual. Nevertheless, we can think of each user has having a mention rate, \(\lambda\), during the observed time window that determines how many times they will be mentioned | Popularity. In all seriousness, some users of the group will be mentioned more than others. Some will not be mentioned at all. Heterogeneity arises from the social construct. |
Likes | The number of posts a user has liked is a choice dataset, as there is a finite number of opportunities to like a post (i.e. the number of posts). However, given the high upper bound, we can reasonably view this dataset as a count process. As such, each user has some like rate, \(\lambda\), during the observed time window that determines how many posts they like. An individual can be someone that likes every post or has never liked a post. | Users have different levels of engagement on the Wharton 2018 GroupMe. Thus, it follows there will be variation in like rates within the user population. |
We might expect to observe differences in heterogeneity for each of the three events. For example, we would presume that there is more heterogeneity in like rate than in post rate as liking is less visible and risky than posting (to one’s reputation) in a group of 811.
In addition to three behaviors that are the primary interest of this analysis, we included an attribute of the user: gender. We will use this to identify if there are differences in posting, being mentioned, or liking between male and female Wharton students.
In the plot below we show the distribution of posts per user. The distribution is positively skewed with a long right tail. There are a few users that have posted more than 50 times, but the majority are less active. The median number of post per user is 2 posts though the mean posts per user is 6.07 posts (sd = 11.3).
The data is of the form:
posts | users |
---|---|
0 | 241 |
1 | 108 |
2 | 81 |
3 | 67 |
4 | 43 |
5 | 39 |
6 | 36 |
7 | 20 |
8 | 32 |
9 | 8 |
We fit an NBD model, including a zero-inflated NBD given the notable spike at 0, using MLE, method of moments, and means and zeros to estimate parameters. We find through MLE that a zero-inflated model does not help describe the data as \(\pi = 0\).
model | r | alpha | pi |
---|---|---|---|
MLE | 0.4200 | 0.0692 | |
MLE (Zero-Inflated) | 0.4200 | 0.0692 | 0 |
Method of Moments | 0.3026 | 0.0499 | |
Means and Zeros | 0.4561 | 0.0752 |
We note the divergence between the method of moments and MLE / means and zeros parameter estimates. The large standard deviation, 11.3, shrinks the estimate of alpha as \(\hat{\alpha} = \frac{\bar{x}}{s^2-\bar{x}}\), causing a smaller \(r\) in turn.
Below is a table that shows the estimated number of users for post counts less than five by the three parameter estimation techniques. A plot showing all post counts follows. We see that the methods are not that different, but method of moments certainly performs the worst.
posts | Actual | MLE | Method of Moments | Means and Zeros |
---|---|---|---|---|
0 | 241 | 257 | 323 | 241 |
1 | 108 | 101 | 93 | 102 |
2 | 81 | 67 | 58 | 69 |
3 | 67 | 51 | 42 | 53 |
4 | 43 | 40 | 33 | 42 |
5 | 39 | 33 | 27 | 35 |
In order to perform the \(\chi^2\) goodness-of-fit test for the NBD model, we need roll-up the right tail so that 80% of the expected counts have more than 5 counts. We create a 25+ bucket so that 84.6% of the expected counts are greater than 5. We calculate the \(\chi^2\) test statistic and \(p\)-value for each parameter estimation method using 25 - 2 - 1 = 22 degrees of freedom. Based on the \(p\)-values shown below, we have no evidence that the data came from the NBD model. Nevertheless, the plot above shows a relatively good fit, at least for the estimates from MLE and means and zeros.
model | chisq | p.value |
---|---|---|
MLE | 55.92 | 0.000088 |
Method of Moments | 102.61 | 0.000000 |
Means and Zeros | 53.72 | 0.000180 |
Like posts we start by looking at the distribution of the number of times a user has been mentioned both in graphic form and the the table below. Like posts, mentions are positive skewed with a long right tail - one user has 40 mentions. The median number of mentions for a user is 0 mentions though the mean is 1.76 mentions (sd = 3.66).
mentions | users |
---|---|
0 | 416 |
1 | 151 |
2 | 91 |
3 | 38 |
4 | 24 |
5 | 23 |
6 | 11 |
7 | 9 |
8 | 12 |
9 | 6 |
We perform the parameter estimation using the same techniques and find that the zero-inflated model does not fit the data. Like the method of moments estimates for posts, the method of moments estimates for mentions are quite different from the estimates by MLE and means and zeros.
model | r | alpha | pi |
---|---|---|---|
MLE | 0.3651 | 0.2073 | |
MLE (Zero-Inflated) | 0.3651 | 0.2073 | 0 |
Method of Moments | 0.2660 | 0.1511 | |
Means and Zeros | 0.3919 | 0.2226 |
mentions | Actual | MLE | Method of Moments | Means and Zeros |
---|---|---|---|---|
0 | 416 | 426 | 473 | 416 |
1 | 151 | 129 | 109 | 133 |
2 | 91 | 73 | 60 | 76 |
3 | 38 | 48 | 39 | 50 |
4 | 24 | 33 | 28 | 34 |
5 | 23 | 24 | 21 | 25 |
6 | 11 | 18 | 16 | 18 |
7 | 9 | 13 | 12 | 14 |
8 | 12 | 10 | 10 | 10 |
The plot below shows that the parameter estimates by MLE and means and zeros fit quite well.
Like before, to perform the \(\chi^2\) goodness-of-fit test for the NBD model, we need roll-up the right tail so that 80% of the expected counts have more than 5 counts. We create a 10+ bucket so that 100% of the expected counts are greater than 5. We calculate the \(\chi^2\) test statistic and \(p\)-value for each parameter estimation method using 10 - 2 - 1 = 7 degrees of freedom. Though the plot above looked quite good, based on the \(p\)-values shown below, we do not have evidence that the data came from the NBD model, ignoring the method of moments as a poor fit.
model | chisq | p.value |
---|---|---|
MLE | 17.83 | 0.01275 |
Method of Moments | 43.69 | 0.00000 |
Means and Zeros | 16.52 | 0.02079 |
We rinse and repeat, following the same process for likes as we did for posts and mentions. We note that the tail is a bit longer for likes as some users do a lot of post-liking. The median number of likes given is 20 likes though the mean is 46.83 likes (sd = 78.51).
likes | users |
---|---|
0 | 61 |
1 | 39 |
2 | 25 |
3 | 28 |
4 | 23 |
5 | 19 |
6 | 25 |
7 | 12 |
8 | 16 |
9 | 17 |
A careful observer of the plot above may have noted the magnitude of the counts are quite large. This is problematic when calculating gamma functions. For example, \(\Gamma(100) = 9.3e^{155}\). Now imagine \(\Gamma(600)\). To handle this, we used log-gamma and log-factorial functions and restated the first term of the NBD equation as
\[\begin{equation} \ \frac{\Gamma(r + x)}{\Gamma(r) x!} = e^{lgamma(r + x) - (lgamma(r) + lfactorial(x))} \end{equation}\]We estimate the parameters using each of the three methods as before and again find that the zero-inflated model does not fit the data and that the method of moments estimate is quite different from the MLE and means and zeros estimate.
model | r | alpha | pi |
---|---|---|---|
MLE | 0.5358 | 0.0114 | |
MLE (Zero-Inflated) | 0.5358 | 0.0114 | 0 |
Method of Moments | 0.3585 | 0.0077 | |
Means and Zeros | 0.5898 | 0.0126 |
Below is a comparison of the expected counts for the left-end of the likes distribution:
likes | Actual | MLE | Method of Moments | Means and Zeros |
---|---|---|---|---|
0 | 61 | 73 | 141 | 61 |
1 | 39 | 39 | 50 | 36 |
2 | 25 | 30 | 34 | 28 |
3 | 28 | 25 | 26 | 24 |
4 | 23 | 22 | 22 | 21 |
5 | 19 | 19 | 19 | 19 |
6 | 25 | 18 | 17 | 18 |
7 | 12 | 16 | 15 | 16 |
8 | 16 | 15 | 14 | 15 |
Aside from the large spike for the method of moments, the MLE and means and zeros model do not look too bad. However, we can see quite a few gray spikes above the blue and green lines in the 10-30 range indicating poor fit there.
Finally we perform the \(\chi^2\) goodness-of-fit test and first roll-up the right tail so that 80% of the expected counts have more than 5 counts. We create a 35+ bucket so that 88.9% of the expected counts are greater than 5. We calculate the \(\chi^2\) test statistic and \(p\)-value for each parameter estimation method using 35 - 2 - 1 = 32 degrees of freedom. Based on the \(p\)-values shown below, we have evidence that the data came from the NBD model for the MLE and means and zeros estimation methods. The model created by the method of moments fits poorly.
model | chisq | p.value |
---|---|---|
MLE | 41.64 | 0.1183 |
Method of Moments | 115.11 | 0.0000 |
Means and Zeros | 39.12 | 0.1806 |
Let’s get to the answers to our questions. Below is a summary of the parameter estimates (using MLE) for the three behaviors in question. We see that there is in fact different rates for each activity. On average, a user posts 5 times more than they get mentioned. Users also like posts about 5 times more than they post. So, for the 198 days thus far, you have on average liked 45 posts, posted 5 times, and been mentioned once. The magnitude of the variance (and standard deviance shown below), follow this hierarchy and mirror observed values).
variable | r | alpha | E[X] | sd[X] |
---|---|---|---|---|
posts | 0.4200 | 0.0692 | 6.068 | 9.682 |
mentions | 0.3651 | 0.2073 | 1.761 | 3.202 |
likes | 0.5358 | 0.0114 | 46.834 | 64.347 |
We can also look at the distributions of the three rates, identified as \(\lambda\) in our NBD model to understand user heterogeneity. In the plot below we see that there is the most heterogeneity in like rate, the least heterogeneity in mention rate, and the post rate is in the middle. As \(r < 1\) for all distributions, each have an interior mean (do not go to \(\infty\) near zero). At this point we have answered question 1: there are differences in post, mention, and like rates.
Now we can answer question 2: is there variation in the concentration of each of the GroupMe activities. Using the Lorenz curve and the 80/20 rule highlighted below we see there are some differences, but the differences are not stark. We see that being mentioned is concentrated in the fewest number of users (20% of users account for 1 - 28% = 72% of the mentions). This follows intuitively from the histogram in the NBD model section. In contrast, likes are the least concentrated (20% of users account for 1 - 36% = 64% of the likes). So, we find that mentions are more concentrated than likes, with posts in between. However, the differences are not substantial.
We move into treacherous waters: asking if there are differences between the genders. To answer question 3 we start with side-by-side histograms of each of the three activities, scaled for differences in the number of females (375) and males (436).
Next, we fit an NBD model for each activity, for each gender and combined, using MLE without and with a spike at zero. Based on the (large) table below we immediately see that none of the zero-inflated models are appropriate. However, we note that the parameter estimates are quite similar between the genders.
activity | gender | model | r | alpha | pi | ll |
---|---|---|---|---|---|---|
posts | Female | MLE | 0.4425 | 0.0676 | -1051.3 | |
posts | Male | MLE | 0.4022 | 0.0711 | -1154.0 | |
posts | Combined | MLE | 0.4200 | 0.0692 | -2206.5 | |
mentions | Female | MLE | 0.3799 | 0.2092 | -649.9 | |
mentions | Male | MLE | 0.3528 | 0.2059 | -729.9 | |
mentions | Combined | MLE | 0.3651 | 0.2073 | -1380.1 | |
likes | Female | MLE | 0.5785 | 0.0113 | -1815.1 | |
likes | Male | MLE | 0.5049 | 0.0117 | -2012.0 | |
likes | Combined | MLE | 0.5358 | 0.0114 | -3829.8 | |
posts | Female | MLE (Zero-Inflated) | 0.4425 | 0.0676 | 0 | -1051.3 |
posts | Male | MLE (Zero-Inflated) | 0.4022 | 0.0711 | 0 | -1154.0 |
posts | Combined | MLE (Zero-Inflated) | 0.4200 | 0.0692 | 0 | -2206.5 |
mentions | Female | MLE (Zero-Inflated) | 0.3799 | 0.2092 | 0 | -649.9 |
mentions | Male | MLE (Zero-Inflated) | 0.3528 | 0.2059 | 0 | -729.9 |
mentions | Combined | MLE (Zero-Inflated) | 0.3651 | 0.2073 | 0 | -1380.1 |
likes | Female | MLE (Zero-Inflated) | 0.5785 | 0.0113 | 0 | -1815.1 |
likes | Male | MLE (Zero-Inflated) | 0.5049 | 0.0117 | 0 | -2012.0 |
likes | Combined | MLE (Zero-Inflated) | 0.5358 | 0.0114 | 0 | -3829.8 |
So, we move to plotting the expected counts of each activity for females and males based on the NBD model. We see that the expected counts are remarkably similar for each activity, though the \(r\) and \(\alpha\) parameters are slightly different.
Using the Lorenz curves below we see that the activities are a bit more concentrated for males than for females. We can see this in the histogram at the beginning of this section. For each activity, there are more males than females that are hardcore non-posters, not-mentioned, and non-likers. However, these differences is minimal.
Lastly, we use the likelihood ratio test (with degrees of freedom \(4 - 2 = 2\)) to identify if the individual models are better at explaining the behavior than a combined model. We see from the \(p\)-values below, two separate models are not different from the combined model for every activity. So, we have answered question 3: females and males use the platform in a similar fashion.
activity | Female | Male | Combined | chisq | p.value |
---|---|---|---|---|---|
posts | -1051.3 | -1154.0 | -2207 | 2.396 | 0.3019 |
mentions | -649.9 | -729.9 | -1380 | 0.443 | 0.8013 |
likes | -1815.1 | -2012.0 | -3830 | 5.346 | 0.0690 |