CI 2019 Conference Retrospective

In early 2018, I was at the Organization Science Winter Conference when a friend of mine suggested I submit to the Collective Intelligence Conference which would be held that year in Zurich. I figured why not, sent part of my dissertation that we had been revising (and still are) and was accepted to the conference. Though it was an exciting trip (storms led me to arrive in Zurich about an hour before the conference started) I had a positive overall experience. It was held that year in combination with HCOMP (Human Computation) so there was a machine learning / “computers as cognition” flare to the conference. As part of a grant, i was exposed to this type of work before and found it interesting. One thing that was evident there and at the same conference this year (held at my alma mater Carnegie Mellon University), is that Collective Intelligence (CI) is many things to many people.

I was expecting this year to be heavy on machine-learning, simulations, and micro-tasking as last year’s conference was. Instead, there was a strong focus on “ghost workers” (the workers on micro-tasks), the rise of AI, and detailing the parallel progress on problems of collective organization that many fields have investigated. Though there was still a computer science flair to the conference, I felt that much of the work was traditional social science focused. When some of the presenters commented on the crowd being programmers (as opposed to management/psych/soc scholars) I felt that there was probably less the case this year as compared to last.

So what is Collective Intelligence? A minority of research presented was on the ‘Collective Intelligence Quotient’ a score that teams receive that captures to extent to which they have developed “Collective Intelligence.” I honestly thought this concept was what CI would be about (that groups with high CI do better than we would expect) though this was proved wrong at Zurich. This work was pioneered in large part by Anita Woolley (who was on my thesis committee) who presented some of her new work and some posters from her and others presented some related research on the same topic. Most of the work presented, however, was on collective actions or collective aggregation writ large. Thus, a discussion of Uber drivers or Amazon Mechanical Turk workers as a collective, brought together by a system, to accomplish work can still be thought of as demonstrating intelligent work done by a group. Thus CI (the conference) is in large part about the most effective (in terms of performance and social responsibility) ways to bring people together to create positive outcomes. Like INGRoup, the focus of the conference is about these ideas as opposed to process or outcomes, which I think is great.

But, I left this year’s conference feeling a bit unsatisfied. The majority of the conference was composed of plenary sessions where stars talked about big ideas. I frequently felt like these sessions were intended to motivate the audience to see the world or its problems in a particular way as opposed to presenting research.This left me with the flavor that we were the congregation and we were being preached at with equal parts motivation (about the new problems the world faces that we can solve) and despair (that only policy makers could address the main issues). This isn’t inherently bad but it felt much different from last year. There was still interesting work discussed and I got to see many professors and colleagues I hadn’t seen in a while (and make some new connection) which was great.

My comment above also isn’t intended to be critical of the conference’s presenters or organizers, but merely a statement of my own feelings born in large part from my personal career position. There was very little room for someone like me as a worker (a teacher whose job will soon be automated) or me as a researcher (who is not looking at the problems of the future yet and using dated tools) in my current position and research stream. This in large part left me feeling like I need to be faster, pivot harder, and improve more, which isn’t a bad thing. But some of the work presented also seemed to present re-discovery of existing work as novel or the re-naming of existing constructs in sexy ways for no evident purpose. This evident disconnect was perplexing and could in part have been driven from my misunderstanding of presenters (time allocations were very short). So, I am left with very complicated feelings about myself and the conference. Not inherently bad, but perplexing.

Companion to Indicators of TMS paper

One of my interests that has persisted throughout my PhD was on the measurement strategies for transactive memory systems (TMS). I have a paper now available in advance at Group Dynamics: Theory, Research, and Practice that describes one process I have gone through over the last several years to assess and refine our assessment strategies for TMS. This post will be a companion to that paper where I’ll be sharing some tips and my code to do similar assessments of TMS in your own work.

So what is TMS? A transactive memory system is colloquially called the ‘knowledge of who knows what in a team.’ It is the characteristic of teams whereby they can implicitly coordinate their work because it is clear who is good at what. If we zoom in on the individuals, a TMS contains a few layers. I know what I know (though the accuracy of that knowledge is always debatable, see Dunning Kruger effect). And I know what my teammates know. I may learn about my teammate’s knowledge in a few ways: I guess based on titles or stereotypes (see Yoon and Hollingshead, 2010), they tell me, or I see knowledge demonstrated. A team can take advantage of a TMS to improve their work if we all have a mental map of our own and other’s knowledge.

If we look at the way that TMS is assessed, however, typically we look at the behavioral indicators that a team has developed a TMS: specialization, credibility, and coordination. Other work has looked at the creation and use of knowledge maps to assess TMS (Austin, 2003). I wanted to determine to what extent these different ways of assessing a TMS capture something similar or different. You can read the full paper to see the full story.

For the rest of this post, I want to describe how I assessed the Knowledge and Meta-Knowledge indicators of TMS. These are the steps you would also want to follow in any of your own attempts to do the same.

Step 1: The survey

Lewis’s (2003) measure of TMS is the most common currently in use. This survey contains 15 Likert-style questions asking about the extent to which the group has specialization, credibility, and coordination. I encourage authors to continue using this scale though, importantly, the Coordination sub-scale appears to do the heavy-lifting in terms of predicting performance in my paper and most of the studies I have run. This is evidently not always the case, see Bachrach et al. (2018) for a meta-analysis. As you will likely already be including a survey, measuring knowledge and meta-knowledge should be pretty easy to add on.

First, determine your expertise areas. Austin (2003) describes a process to determine what are the relevant areas of expertise in field settings. I was doing lab studies, so I chose the main knowledge elements of work elements of my task as expertise areas. There was only one person with information about the “filter” component of the graphical programming task, for example, so I asked a question about knowledge of the “filter.” I had multiple tasks or areas of work in Studies 1 and 3 so I captured knowledge in both areas. You’ll want to use factor analysis to confirm 2 areas of knowledge and then you’ll have to handle some calculations differently later on.

There are a few options in terms of wording your survey questions. In study 1 and 3, I asked a question for each expertise area: “For the statements below, rate each member below on how much you agree or disagree that that statement describes the member. 1. Knows what the Filter module does” I then had the participants respond. If you are only measuring Knowledge, just have one question on each . If you also want to capture meta-knowledge, you might structure your question this way as a matrix:

Example of Knowledge/Meta-knowledge question from Study 2

Example of Knowledge/Meta-knowledge question from Study 2

 I encourage you to give the opportunity to select “Don’t know” and I’ll show you how to deal with that later.

For Study 2 in the paper, I structured this question differently with a question per participant and then a matrix for each expertise area:

Example of question posed to 1 member on the Knowledge in Study 3

Example of question posed to 1 member on the Knowledge in Study 3

Again, you’d only have 1 question like this if you want to only measure knowledge or you would repeat this question once for each member to capture meta-knowledge.

I did see some small differences in the effects of the underlying calculated variables between Study 2 and 3, so it is possible that this difference in presentation may have driven some of those difference. I don’t think either presentation is superior so I encourage other authors to use the presentation they prefer.

Step 2: Data Structure

Depending on what question-type you used, whether you captured Meta-knowledge, etc. your data may be in a different structure. Be sure you get it in a structure where Group, Member, and Expertise Categories are each variables.

IF YOU ASKED ABOUT META KNOWLEDGE

It is easiest if you have a different variable for each expertise-area by person. So, in Group 10, Member A, they may have a column for each member who rated them on each expertise area. If you have only 2 expertise areas for example, you’d have these columns: “Expertise1_A, Expertise2_A, Expertise1_B, Expertise2_B, etc.” I encourage you to have these columns grouped by member. Create a new set of columns which is for the member’s self-rating. IF YOU ARE MEASURING ACCURACY, create a column for the average of each member’s alter rated expertise scores for each category at the group level. That’s what I did and I think it de-complicates some things later on. Thus you’d have the following columns from the above example: “Expertise1_A, Expertise2_A, Expertise1_B, Expertise2_B, Expertise1_Self, Expertise2_Self, Expertise1_A_Avg, Expertise2_A_Avg, Expertise1_B_Group, Expertise2_B_Group”

Step 3: Data Cleaning

Once you have your data, we can start cleaning things up.

IF YOU ALLOWED FOR “DON’T KNOW”

You’ve got a few options here. If the individuals are rating themselves, I think it’s reasonable to put them at the lowest score. I didn’t do that, but I think that’s justifiable (I think I tested and it didn’t matter too much in my studies to make that choice). Especially if you’re talking about ratings of others, it will be necessary to impute the median for some calculations we’ll do in a minute. You could make other choices than the median and the code is pretty similar.

The easiest way I found to do this is:

Library(hmisc)

Data_noNA <- impute_median(Data, Expertise1_A + Expertise2_A  + Expertise1_B  + Expertise2_B  + Expertise1_Self  + Expertise2_Self  ~ GroupNumber)

The ~ GroupNumber means that the group median score will be imputed for that variable. This means that if I said I didn’t know someone’s expertise, the median score that the other teammates gave that member is substituted in. Importantly, this command just fills in NA values, but once those data are filled in, they are indistinguishable from the true values so you want to be sure you don’t overwrite your original data.

Step 4: Data Crunching

Now we get into the fun stuff.

We want to add a new variable which is the standard deviation of an individual’s ratings of their own expertise. Super easy:

Data_noNA$IndSpec <- apply(Data_noNA[,X:Y],1,sd) #X:Y in this case are the columns that represent a member’s self-rating.

#Now, things get extra fun:

Data_split <- split(Data_noNA, Data_noNA$GroupNumber) #allows us to do a series of calculations within each group separately

#Now we create some empty data.frames to hold the group-level variables we’re about to create

Data_KnowStock <- as.data.frame(NA) #Our Knowledge stock variable, create as many as you need if you have more than one knowledge area.
Data_IndSpec <- as.data.frame(NA) #The individual specialization score from before. Again, if you want to calculate specialization within more than one knowledge area, create more.
Data_KnowDifferentiation <- as.data.frame(NA) #Knowledge Differentiation is a major contribution of the paper and also the reason we needed to impute NAs earlier.
DATA_UnConsensus <- as.data.frame(NA) #The major meta-knowledge variable we have here.

#Now we will loop through the split datasets

for (i in 1:Z) {  #Z is the number of groups that there are in your data
    Data_KnowStock[i,] <- mean(asNumericMatrix(lapply(Data_split[i], "[",X:Y))) #X:Y in this case are the columns that represent a member’s self-rating.
    Data_IndSpec[i,] <- mean(asNumericMatrix(lapply(Data_split[i], "[",K))) #K is the column that was added from the Individual Specialization variable we calculated earlier
    Data_KnowDifferentiation[i,] <- mean(distance_matrix(distances(as.data.frame(lapply(Data_split[i], "[",X:Y))))) #X:Y in this case are the columns that represent a member’s self-rating.
    Data_UnConsensus[i,] <- mean(distance_matrix(distances(as.data.frame(lapply(SCALE_split[i], "[",L:M))))) #L:M are the columns that represent all of the ratings of self and others (don’t double count the self ratings)
}
Data_Kush <- bind_cols(Data_KnowStock, Data_IndSpec, Data_KnowDifferentiation, Data_UnConsensus)
names(Data_Kush) <- c("KnowStock", "IndSpec", " KnowDifferentiation", "UnConsensus")
write.csv(Data_Kush,"Data_Kush.csv")

Step 4.5 Accuracy

I calculated accuracy well before I calculated some of these other variables and thus I wrote the original code in SPSS.

The accuracy variable is calculated as the sum of differences in members perceptions of each other.

Original SPSS code:

COMPUTE DifferencesInA = MEAN(ABS(Expertise1_A - Expertise1_A_Group), ABS(Expertise2_A – Expertise2_A_Group), etc…)). Execute.

Once you’ve calculated the differences for each member, you sum the these differences and then subtract from the maximum to determine accuracy.

As I stated in the paper, my goal here is to add to the toolbox in how to calculate TMS and to hopefully be able to capture the totality of its effects. If you have any questions of issues with this tutorial, please let me know as I want to make this as easy to do as possible (jkush@umassd.edu).

Names and Research Topics

I could make this piece heavily researched and exhaustive, but I’m not going to. This is just for fun, well, kind of.

One of my jobs in a lot of the projects I have worked on has been to put together the reference list. As you get deeper and deeper into a topic, the names of the researchers whose work you cite the most becomes more and more familiar. I’ve also met a number of these people as well now. My area of research has primarily been topics around TMS. So let’s look at some of the top articles from google scholar search of TMS (plus some more that I like):

Hollingshead (1998) x2

Hollingshead and Brandon (2003)

Brandon and Hollingshead (2004)

Lewis (2003)

Lewis (2004)

Lewis et al. (2005)

Lewis et al. (2007)

Lee et al. (2014)

Liang et al. (1995)

Moreland (1999)

Mell et al. (2014)

Ren and Argote (2012)

Obviously, we have a few people that are very prolific in the literature. But, we also have a range of letters in the alphabet that seem somewhat over-represented. Why are there so many important researchers in TMS whose names begin with an L? Lee, Lewis, Lange, Liang, Levine, etc. There are also several Ms. My last names is Kush so there’s a K if I want to throw my name in too and a Keller on Lewis et al (2007). Probably not actually weird if you look at common names, but has always been amusing to me.