Brandeis and Hugo discuss people of color and under-represented groups in data science.

[This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Brandeis Marshall, Associate Professor of Computer Science in the Computer and Information Sciences Department at Spelman College.

Here is the podcast link.

Introducing Brandeis Marshall

Hugo: Hi there, Brandeis, and welcome to DataFramed.

Brandeis: Well, thank you. Wonderful to be here.

Hugo: It’s such a pleasure to have you on the show. And I’m really excited to have you here today to discuss the representation and lack thereof in data science of people of color, along with your work in data science education at Spelman College in particular, about the Black Twitter hashtag, among many other things.

What type of work do you do?

Hugo: Before we get into all of these issues, I’d like to find out a bit about you. I thought maybe you could start by telling us what you’re known for in the data science community and what type of work you do?

Brandeis: Well, I guess it started in graduate school when I took database. So I’m really in the database realm of the data science continuum. I’m what they call now a data engineer, because I love the organization of data, and the ability in order to retrieve it in structured ways, and sometimes even unstructured. Then I moved into information retrieval because I love the web, and understanding this information and noise was part of my PhD work. Then I kind of continued on the continuum there with loving data. I’m really a computer scientist who loves data, and I kind of stumbled into data science about five years ago.

Brandeis: Information retrieval is one area of data analysis. So I liked algorithms, I liked databases, and so that lead me to information retrieval. Then there was this notion of “How do you make good decisions based upon that data?” So you’re trying to move data to information, and for me, that gap between data and information is credibility. That’s where you have these algorithms and these processes. How do you narrate what’s happening front of the data to give you the information to make the good decisions? So that’s how I got into data science.

Hugo: Great. What type of use cases do you think about in terms of these questions?

Brandeis: I think about what is the end-client. So what are the stakeholders going to do with the information that they receive from the data? And what is the quality of the data so that they can make those good decisions?

Brandeis: I’m more exploratory than explanatory in my data analysis. I want to make sure that there is good value in the data. Of course I want to make sure that the information that they receive is sound. That’s very hard to validate. Very, very difficult to validate, because you can always use more data, you can always use more processing time, you can always do another algorithm on that data. But it really is about what story you’re trying to tell, and what’s that narrative that you’re trying to expose the stakeholders to.

Hugo: I was just going to say, this is such an important part of data science which is often overlooked when the conversation happens in general. Right, these kind of foundational, thinking about the data generating processes, getting your data into a form that is actually usable in order to impact decision-making, or research, right?

Brandeis: Yes, very much so. I mean, I consider data science to be five lanes. The first one is, as you mentioned, the data collection and cleaning. The second one is the storage and management. The third one is the analysis. The fourth one is the vis, and the fifth one is the storytelling.

Brandeis: I think everyone needs to understand that storytelling is, to me, the biggest crux. If you do, if you ask these questions, you go through the full five lanes, if your story doesn’t make sense to the end-user, you fall completely flat. It makes no sense even in order to be turning, to be wrangling, to be manipulating that data.

Data Science Education

Hugo: Absolutely. You’re also heavily involved in data science education, right?

Brandeis: Very much so. I think it’s imperative that as many people know about how to handle their own personal data, as well as how to handle data within their own domain. I think every domain uses data in some aspect. Either you’re collecting it, either you’re manipulating it, or you’re using it. It doesn’t matter what discipline you’re in. You need to know how to manage it.

Brandeis: You need to know when you have enough data to make sound decision. You need to know when you don’t have enough data to make sound decisions. You have to know how to ask questions. It’s about being curious. It’s about trying to understand what that data is trying to tell you, not what you’re trying to force the data to tell you.

Hugo: Right. Currently we live in a world where some people have data skills and they’re heralded as data scientists or data analysts. Other people, who may be their colleagues, don’t. There’s kind of a gap there existing. I’m wondering if you see a future there were this gap still exists, or one in which data skills and data literacy are spread throughout organizations?

Brandeis: Oh, I think we’re almost at this point where data skills are essential. It’s important as reading, writing, and computer programming. I mean, I know, coding. Everyone calls it just coding, but I say computer programming. Data skills are so essential. We carry around computers all day long with our cell phones and our mobile devices. We need to know how much data we’re ingesting. We need to understand what that data means to us, individual, and what does that mean to everyone else?

Brandeis: For instance, you get an email, it has an attachment. That attachment is, let’s say it’s 17 megs. Why do you have an attachment that’s 17 megs? How does that impact the rest of your system? How many times has that email been sent out? How many people has it been sent out to? That is putting strain on your system, that is putting strain on yourself because you have to download this file to even be able to view what’s inside that particular file of that size.

Brandeis: That’s just a small example. Data skills need to be universal. It needs to be taught from the time a child can pick up and use a device.

Hugo: I love it. That was going to be my next question. How early do we need to start educating people around this? You just answered that. My next is, how long will it take for the education system to catch up with these needs?

Brandeis: Well, that’s a tough question, Hugo.

Hugo: Yeah, I know. I’m glad you didn’t ask me that one.

Brandeis: I want to say, “How long do you think it will take?” Right now, we’re still in CS for all while we’re trying to infuse computer science skills in K-12 education. That is something that we have seen to be a bit of a challenge, to say it nicely. There is a lot of momentum in getting that done.

Brandeis: How can we make sure that it’s equitable? How can we make sure that there’s participation of all students in computer science education? I think a lot of the lessons learned from that environment can therefore translate into the data science environment.

Brandeis: I want to make a particular point here. A lot of people consider data science and computer science to be essentially the same thing. I beg to differ. Data science is a combination of several different disciplines that includes mathematics, statistics, also computer science. But then there is a domain context that needs to be considered here, infused. You have to consider data science completely separate from computer science, just as computer science is completely separate from mathematics, and statistics is separate from mathematics as well. Does that make sense?

Hugo: That makes perfect sense. I think one thing your speaking to there is developing data analytic skills, and data intuition, and statistical intuition, and all of these types of things, which you don’t necessarily need to write any code to do, for example. Learning how to, like, the touch of data, the feel of data, the smell of it, how to work with it, right?

Brandeis: Exactly. If some student is going to build a YouTube video, have their own YouTube channel, create these videos, they have to know how that information is going to be stored and who has access to it-

Hugo: Absolutely.

Brandeis: … and what the analytics are around that video. How many views? How many likes? How many comments? What is the nature of those comments? Those are the analytic skills that that individual has to understand. They can’t just put up a video. There’s so much richer and robust information that needs to be understood by that particular individual.

Diversity, Equity, and Inclusion in Data Science

Hugo: Agreed. I know something else you’re heavily invested in is practical solutions to when thinking about and working towards diversity, equity, and inclusion in data science. In particular, for example, you educate at Spelman College, which is historically a black liberal arts college for women in Atlanta, right?

Brandeis: Correct.

Hugo: I thought maybe you could say some words just about how you think about this in general.

Brandeis: Well, educating black women at Spelman College is a whole different experience. Not only do these student’s look like me, I am in some respects a model, in other respects a mentor. I think it’s important to understand since I’m in computer science, a lot of the computer science literature is centered around a very homogeneous type of background. What I try to infuse within the classroom is the fact that people of color, particularly black women, are part of the conversations, have made advances and innovations in computing.

Brandeis: Therefore, the students, I feel, now have a better attachment to why they are represented and why they are welcomed inside of this discipline. Maybe they don’t feel welcome in all spaces, but at least they have a certain amount of context, so that they can feel as though they can do anything they want within the discipline. Make sense?

Hugo: That makes perfect sense. I’m sure places like Spelman and the deep, foundational tradition there they’re built upon, it’s interesting having new disciplines such as data science being taught at places such as Spelman, because you can build on the old traditions of educating, in this case black women, but in a totally new discipline, right?

Brandeis: Exactly. I think it’s important that people of color are first to be thought of inside of a new discipline. They’re not an after thought. It’s not going to be scrubbed without people of color. I want to make sure that when we talk about this emerging field of data science, that we continue to be inclusive and continue to make sure there’s representation in all areas of the discipline.

Brandeis: For the student’s that I teach, I want to make it very clear, here are some examples on how representation is lacking in the discipline, and how there is an opportunity for representation to therefore get elevated. Hopefully, with the classes that I teach and the lectures and the conversations that I have with students, that that is something that they have a takeaway from.

Hugo: Interesting. What changes in your pedagogical approach when teaching, for example, black women of Spelman?

Brandeis: What I did in the very beginning, I joined Spelman in 2014. In the beginning, I would skip over the history of computing, because it was very homogeneous. All of the innovators and all of those type of individuals. I used to skip it, then I decided to create the own little hashtag. I called it #blackcomputing. It was once a week I would find a black person, sometimes seasoned, sometimes a little younger, sometimes with advanced degrees, sometimes without, and have the student learn about that particular individual. Just for a day or so, and then they would share what they learned over Twitter.

Brandeis: Then I decided that it was important to not to just make this a once a week type of engagement. That it was important that as topics were introduced, that they would see people of color. I would take time in order to find people of color, black people, women, black women, who added to the innovation of the discipline, which in the topic I was talking about, in computing.

Brandeis: I tried to do the same thing within data science, as well. I think it’s very important to have the cultural environment so that students know that they can too, add to this particular discipline. Data science is a bit more difficult because it’s so new. Definitely, in computer science, I was able to make that happen.

Hugo: Yeah. Absolutely. I do know that if we consider data science as the confluence of statistics, data analysis, and computer science, which is one zeroth order or approximation to it, I think, particularly for women, the statistics part of it is incredibly helpful. To see how many women are active in the statistics community.

Brandeis: Yes, very much so. I think it’s important to connect students, connect myself, even. I’m still a learner within data science, to continue to look for those individuals who have made such huge strides, before it was deemed data science, before it was really deemed statistics, or before it was deemed computational math. That we look back, know our history, and be able to bring that history forward and give it new life.

What are the biggest barriers to entry in data science for people of color?

Hugo: Absolutely. So then through the lens of all the work you’ve been doing to broaden participation in data science, what are the biggest barriers to entry in data science for people of color?

Brandeis: I would say there are two. The first one is awareness and the second one is access.

Hugo: Tell me about awareness.

Brandeis: With awareness, it’s unconsciously using data without necessarily knowing how to use it. So not being really aware of the power of the information. Let me provide a little bit of an example.

Brandeis: We’re in the day and age where everything has a credential attached to it. You have to give a user name and a password. With that user name and password, tends to come along with that, is your demographics. Whether you’re male, where you’re female, what happens to be your ethnicity or your race.

Brandeis: My students are in this generation where they’re used to having everything behind a user name and a password. I’m not of that generation, so when I have conversations with them about, “Well, you’re downloading this app and you’re providing these credentials and you’re attaching it to some other social media, do you know that they now are being able to get all this other information about you? Or even to understand how you might speak? Or how you might use your language in order to figure out whether you’re male or female, or possibly what demographic?”

Brandeis: That type of awareness is something that students, a lot of students that I speak to and a lot of people that I speak to, don’t really understand. Having that conversation is very important. That’s what I mean by awareness. What are you actually giving up? Do you find it to be giving up, or is it just a normal mode of transaction?

Hugo: Yeah. Why is this, for you, a barrier to entry for people of color in data science?

Brandeis: I consider it a barrier because if you don’t understand or don’t have a conversation about the context of what information you’re providing, you don’t understand its value. If you don’t understand its value, you have no idea that it’s important for you to dig deeper. If you don’t understand it’s important for you to dig deeper, then you won’t dig deeper.

Brandeis: It’s almost as if there are things that just happen to you, versus you being proactive. This whole reactionary versus proactive kind of thought path. If, for data science, if you don’t know how important data is, and how you being who you are is important to whomever that business or organization is, you therefore don’t have the questions to ask.

Hugo: Right. That makes perfect sense. Do you think this is a bigger barrier for people of color than other demographics?

Brandeis: I think it’s a barrier for everyone. I think historically people of color have unfortunately had some very difficult incidents, recently and in recent past, that have made having these types of conversations a little bit more difficult. I think people of color, especially black people, because that’s the community I come from, needs to really think about how their transactions are happening. What does that mean for themselves? For their families? For the next generation?

Hugo: Absolutely. I think there’s such a trade-off there, right? The example that springs to mind when you talk about giving access to certain aspects of your data and social media is when I sign in with … A lot of websites offer me to log in via Google.

Brandeis: Right.

Hugo: I can just use my Google account. I honestly should probably do a bit more due diligence and figure out what I’m giving away there.

Brandeis: Right. Google is one example.

Hugo: Of many.

Brandeis: If you use Gmail, then of course, all of your email exchanges are therefore, in some way, being processed and analyzed. There’s just a lot of information that you’re giving that you don’t realize that you’re giving. You just think it’s just data, but it’s really information about you.

Hugo: We’re starting to see the downstream effects of that more and more. I get emails from recruiters every now and then, and say no. Gmail will make suggestions for you, now. I got an email from a recruiter saying, “Would you be interested in a chat?”

Hugo: The two things Gmail suggested were, “No I’m not interested,” and, “No thank you.” I was like, “Oh, you’ve learnt pretty well.” In fact, I joke that if one of the options had been yes, I would have been like, “Oh, Google thinks I might be interested, so perhaps I should.”

Brandeis: Right. It’s this notion of power of suggestion that is very subtle. That subtly could be dangerous. Also the fact that you can write an email now and there is an auto complete, based upon hundreds of millions of scrubbed correspondence in order to figure out what you’re going to say. Is that really you saying it, or is it that power of suggestion?

Brandeis: So more philosophical questions, more on the right creative brain, than my analytical left brain. I think that’s what data science does for me is that it sits at that intersection and lets those synapses between the left and the right brain fire. What happens with that logic part, and what happens with the creative part, and what does that mean?


Hugo: Fantastic. That’s a very nice introduction to thinking about awareness in data science. The second point you mentioned was access. Can you tell me a bit about access?

Brandeis: A bit about access. There is this conversation that’s been had over the past probably decade or so about the digital divide, is that people of color don’t necessarily have access to the internet as people not of color. Okay, I will take that point that yes, there is a digital divide, there is a gap.

Brandeis: However, I will say, that there is newer research that’s coming out that says, “Actually, African-Americans do have access to the internet, but it tends to be on their mobile devices.” That access on a mobile device is very different than the access on a laptop, very different than the access on an iPad, or any other type of tablet device.

Brandeis: When it comes to access, you want to be able not only to access the internet, but you want to be able to access the tools. You want to be able to then create and innovate. The access that I’m speaking of is in the creation and innovation component. Data science is a field that is very open. It has many different definitions, and threads, and educational pedagogies, instructional strategies. When it comes to access, who is delivering that information and how are those learners supposed to then get expertise and grow their expertise.

Brandeis: For example, there is a surge of boot camps sprouting around the country. These boot camps are sometimes a week, sometimes two weeks. Sometimes they’re 12 weeks. These boot camps tend to cost money. These boot camps tend-

Hugo: A lot of money, some of them, right?

Brandeis: A lot of money, yes.

Hugo: I mean, yeah, we’re talking like 10s, 20s, 30s of thousands of dollars, potentially. There are different models, too.

Brandeis: There are many different models, and they tend to be all day, and 9 to 5. If you’re going to say, “Let’s do a two-week boot camp that’s 9 to 5 every day,” who is going to have time in order to deal with that boot camp, deal with all the exercises, learn what they have to learn, as well as work, as well as possibly taking care of a family? That doesn’t provide access, correct?

Hugo: Absolutely correct.

Brandeis: It’s part of the problem. We want to create data professionals, but creating data professionals takes time. Every company realizes they need data professionals, call them data scientists, data analysts, data engineers, data visualizers, but there’s no time. So then someone is going to have to quit their job in order to take a boot camp, and then hopefully get a job thereafter and make a lot of money. That’s one possibility. The other one is, they still have to feed their family, they got to feed themselves, they still need a roof over their head, so how do they do it?

Brandeis: Then if you go to that particular boot camp, do you have the machinery that you need? The computing devices that you need in order to completely absorb all of that content. Can you download Anaconda? Can you make sure that you understand what Python or R is? Do you need some pre-learning before you get to the boot camp in order to therefore be successful. On top of then having the money in order to pay for it, and the time in order to dedicate in order to invest in yourself in order to become a data professional.

Brandeis: There’s a lot of different stumbling blocks that I believe, really, are opportunities. That access is really providing that access to the information and the time and the investment and the money in order to actually pursue this particular track. This track doesn’t have to be all computer coding. It can be on the vis side and the data storytelling side, without a problem, but here needs to be a better mechanism for access.

Hugo: You mentioning that these stumbling blocks can be turned into opportunities, one takeaway or one thing that comes to mind there, is that as educators, we need to meet learners and potential learners where they are, as opposed to where we want to teach from.

Brandeis: Exactly. That’s very difficult, because there’s a certain strand where people believe that you have to know how to computer program in order for you to enter into data science. I don’t believe that’s true. I think that there’s a lot of methodology and practice that does not include computer programming.

Brandeis: But if you want to computer program, I think it’s important that you have that foundation, which now means, for an educator, they have to choose a programming language. Is it going to be Python? Is it going to be R? Is it going to be something else? Is it going to be Julia, or something else? You have to choose a programming language. To switch between programing languages, as a computer scientist, I am not a fan of.

Hugo: Me neither. I’m not even a computer scientist.

Brandeis: I’m just not a fan of it so switching from one language, going from Python, then going to R, then going back to Python, then going to Java, it’s just not something that I think builds competency and confidence in the learner. Stick with one language. Let the learner get one language in for a full year, and then you can introduce new languages. It makes more sense.

Brandeis: Then, of course, the learners could be non-traditional. You can people who have been working in the tech industry or the non-tech industry for a decade or more. There are different types of learners. You have to meet them where they are. What are their objectives? What do they want to accomplish? What jobs do they actually want at the end of the day?

Brandeis: Some people just want to be in data visualization, so sculpt the learning around data visualization. They don’t need to know all the things about data acquisition, and collection, and cleanup. They’re going to be puling data down from an Excel spreadsheet, or from some database somewhere, and they’re going to be visualizing it and writing reports. It’s just so important to meet them where they are.

Hugo: I just couldn’t agree more. This idea of access is something that we think a lot about at DataCamp. Actually, it doesn’t solve all the problems by any measurable, but one of the many reasons I first joined DataCamp was the fact that I used to do a lot of in-person training. One of the toughest things, when training people, as you know, is installation at the start of a workshop.

Hugo: The fact that at Data Camp, when I met the DataCamp co-founders, in their exercise, we spin up images where people can get coding straightaway. Of course, they’ll need to do installs on their own systems downstream, but you can get people being functional immediately, without getting them bogged down for three hours in getting it up and running.

Brandeis: Right. I think it’s important to have the ability to ramp up quickly, and then go back and, “Okay, now let’s take three hours at the end of week 2 of a 10 week program. Let’s now try to put it on your own machine, so that you are able to work independently, post-training.” That’s very important. I think that piece doesn’t happen. I think there’s a lot of pieces that will ramp people up very quickly. They will be two hours of code, and then it falls flat.

Brandeis: There needs to be a concentration on making sure someone can work independently. That’s part of being in data science. You will work in teams, because data science really is done as a team. It’s a team sport. And it really is something that is done individually, just like mathematics, just like computer science, just like every other discipline that exists on the face of this earth. There will be individual work and there will be group work.

Brandeis: So how do you make sure that you translate those learners from the group think, and ramping them up quickly to get them excited about what they’re learning, to being able to work individually? That particular gap analysis is where an in-person educator comes into play. That’s why I think it’s so important to have people face-to-face, at least for some segment of time. It’s very, very important. Some things you just cannot quite do online that can be done much more seamlessly in person.

Hugo: Agreed, because we’re inherently social animals that are part of a community as well, right?

Brandeis: Exactly.

What initiatives exist for under-represented groups?

Hugo: So these issues of awareness and access, I find so interesting. I’m wondering what initiatives exist for under-represented groups, in particular, people of color and black people, that can get them up and running, coding, and doing data science and AI, and all of these types of … ?

Brandeis: Yeah, well I know of two initiatives that have been around for a little bit. A little bit meaning like two or three years. Maybe a little bit more for the second one. I know there’s a number of them sprouting up. The first one I know of is Black in AI. This is a couple of individuals, I think mostly are graduate students and newly minted PhDs who’ve come together and put together several workshops.

Brandeis: One of their workshops, I believe this past month, was actually NIPs.

Hugo: That’s right.

Brandeis: The other one is Women and Machine Learning, and those workshops have been going on for about five, six years now, I believe. These are women of all different background, all different pedigrees. They’re coming together to talk about machine learning and their work, and their work in many different domains.

Brandeis: Those are two places I would definitely go, for the listeners, to just get a start on what exists within the data science community, for at least people of color, as well as for women in this discipline.

Hugo: So we’ll definitely include those links in the show notes, as well. Anyone listening,, if you want to go there right now to check it out. I’ve also seen on Black and AI’s website, they’ve got, they’re a related organization, such as Black Girls Code, which looks incredibly interesting.

Brandeis: Yes, so Black Girls Code has been around, I don’t know exactly how long. They provide learning camps in the summer and during the school year at different schools, all over the country. I believe they have an engagement in New York that’s pretty heavy, as well as in the California area. I believe it is in the Southern California area.

Brandeis: They have been around trying to get students, especially black girls, in order to understand how to code. As far as I know, they look at Ruby as the introductory language, and continue to try to add new curriculum and add new trainings. They have students that have been part of their trainings to come back in order to help out with the new girls, to actually teach them something of how to code.

Brandeis: I think it’s a good initiative. Now translating what Black Girls Code does into something that is more formalized in the classroom and making sure the students can therefore continue their learning, well that’s where I come in.

How can we do better, with respect to being as inclusive and diverse?

Hugo: Fantastic. Generally as a community of data scientists and people who talk about it in public, how can we do better, with respect to being as inclusive and diverse, with respect to people of color in particular?

Brandeis: This is actually a hard question. It’s hard because we are trying to right a wrong from generations ago. As a community, we have to be intentional about our inclusion. It has to be at the core of every engagement. I think there’s four principles. I call it PEAR. P is for participation, A is for access, I is for inclusion, and R is for representation.

Brandeis: I think as a community, we have to take these four principles and we have to ask the question, “Who is participating? Who has access?” We also have to ask, “Who is being included, and therefore who is being excluded?” Then, the last one is, “Who is being represented?” And that corollary, “Who’s not being represented?”

Brandeis: If we take a moment to think about this PEAR principle, then I think we will get to a place where we will be intentional about including individuals from all demographics when it comes to our data sets, when it comes to our teams, when it comes to our conversations, when it comes to our algorithms, when it comes to our testing of these algorithms and of these systems. We have to be very, very intentional and purposeful, and a bit resourceful.

Brandeis: The conversation is always had, “Well, we can’t seem to find people of color in X or Y discipline.” It depends on where you look and if you’re really looking.

Hugo: That once again brings it back to intentionality.

Brandeis: Yes, yes, yes.

Hugo: You actually spoke to something else there, not only about how we as a community of data scientists can be more inclusive and diverse, but also in terms of thinking about end-users and the stakeholders in all types of algorithms. Of course, algorithmic bias is a very huge conversation at the moment. One of the most public examples of algorithmic bias was Propublica’s expose of the recidivism risk model that was biased against blacks and used for parole hearings, right?

Brandeis: Right. Yes. There’s several other examples. Amazon had to step away from their algorithm when it came to hiring practices, because their algorithm had some bias within it. Yeah, I think that there’s a lot of examples of where we need to really think about who’s in the room, what are the results saying, and what the results may be excluding, intentionally or unintentionally.

Brandeis: I think a lot of things happen unintentionally, just because who happens to be sitting in the room. They don’t necessarily think about certain perspectives, and I think all those perspectives need to, at the very least, have some type of representation.

Hugo: Absolutely. Actually, at the end of season one, I had Cathy O’Neil, author of Weapons of Mass Destruction on the podcast. She discussed this idea that she’s been developing with other people, of an ethical matrix for algorithms, whereby you have a row for each stakeholder and columns for things from efficiency to false positives to false negatives and that type of stuff.

Hugo: She gives the Compass recidivism model as an example where you’d have certain demographics as stakeholders, and the company as well, and society and all of these things. I think that’s one step in the right direction.

Brandeis: Yeah, I completely agree. If you have not read the book, get it today. Take a little time.

Hugo: Absolutely.

Brandeis: Just read a little bit about it. Just start leafing through it. Just pick a chapter and start reading. I think all of it is good.

Hugo: Very much so. Pivoting slightly, one of your research interests is the spread of social movements on social networks.

Brandeis: Yeah.

Black Twitter Hashtag

Hugo: I was wondering if you could tell me a bit about the Black Twitter hashtag and what you’ve learnt from it?

Brandeis: Yes. So about 2015, I started to really get more engaged within Twitter. As a way to engage my students, I developed a little project that I wanted them to work on, which is to delve into some of the social movements that were happening at the time. There was a lot of celebrity disagreements, they call them beefs, back and forth. There was a lot of conversation about social injustice, with the death and murder of many black people and people of color.

Brandeis: As a result of that, I had the class, in order to put together a number of different hashtags that they found within some of these social movements, and wanted them to look at Twitter as a way to sift through that particular data. Collect it and sift through that data. It was part of an information retrieval course.

Brandeis: Long story short, the class did pretty good on the project. I decided that I think it should really be a research area of mine. The social movements were continuing and the predominance of black people on Twitter was growing. In about 2014, 2015, there were about 23, 24% of African-Americans were on Twitter. Then it increased, as of last year, to about 26 to 27%. I think in 2018, it’s about at 26% as well. Black people tend to be on Twitter and other social media, more than any other demographic. I think Hispanic Americans might be up there as well.

Brandeis: Anyway, when it came to this particular work, I had some students and we were talking about the work of André Brock, and his conversation about black Twitter and Twitter as a cultural conversation. That’s one of his works, as well as having conversation about the work of black tags, or racialized hashtags coming out of Sara Ferlini.

Brandeis: Then, late 2015-early 2016, #OscarsSoWhite hashtag came up as a conversation of who the Oscars, the National Academy of Arts and Sciences, did not have any black nominees. My students, understanding this conversation also around the time of Grammys, decided, “Well, this is crazy. Why is there no black people nominated for movies? Why are there not that many black people nominated for Grammys?”

Brandeis: We put together a little Python, in order to collect some data. We found some keywords and some hashtags that we used ourselves. Then that’s where the Black Twitter project was born. For the past two and a half, now almost three years, it has been an ongoing project in order to collect in real time the tweets when Oscars are being broadcasted, 2016, 2017, and 2018.

Brandeis: We recently got a paper published and we talked about this movement, this movement from #OscarSoWhite, where there was no people of color that was no people of color that were nominated for the Oscars, to the #MeToo Movement, to #OscarsLessWhite, all of these different hashtags in between. The predominance of who was the host, to what was the conversation happening within the Oscars, and what the commentary was and how the language was.

Brandeis: That’s what we’re looking at. We’re looking at the language of Black Twitter in respect to this social movement within the Oscars, and then more broadly, that’s happening around the time of the Oscars. It’s been a wonderful engagement.

Brandeis: As a little bit of black background to Black Twitter, I forgot to even mention that. Black Twitter, the conversation started in 2008 with a little blog post by Anil Dash. We can put that in the notes.

Hugo: Will do.

Brandeis: Then there was another post, I believe on Medium, that Chris Wilson, “You now you’re black when … ” was the name of this. Then the published work of Andre Brock happened in about 2013, or so. 2012, 2013. Since then, there’s been a lot of conversation about what is the cultural and the racial undertones of Twitter. What does it mean? How do groups identify themselves? What are the characteristics and behaviors of the group?

Brandeis: Understanding that the group doesn’t necessarily represent the whole culture, or the whole people, and that’s the very interesting component. Twitter was created, and it is as an undertone, is something that is very much a white space. Black Twitter is a subculture that’s very much concerned with black culture, with the conversations, with the social justice, injustice. The black appreciation that exists within the black narrative and what that narrative looks like, and how the conversation amongst that community is something that is very much akin to conversations that would happen in the analog world that is represented within the digital world.

Brandeis: It’s a very interesting construct. It’s a natural evolution of Twitter. It’s something that was not necessarily the intent of Twitter, but has definitely been a wonderful platform for at least black people and advocates and allies of black culture, to have conversations, to have serious conversations, and sometimes not serious conversations. How hashtags and trending topics, therefore, have been elevated with the conversations that are happening within Black Twitter.

Brandeis: There’s actually a lot of work about how the conversations happening within the Black Twitter space is somehow changing the language that’s happening within the non-Black Twitter space. I know I’ve talked a lot, but ask any question.

Hugo: There’s so much in there. One thing that sprung to mind was it must be so rewarding to work on this with your students and see their responses and have them learn all the techniques of network analysis, sentiment analysis, all of these things, in a context that actually matters to them.

Brandeis: Yes. That’s one of the wonderful things is that because algorithms and computer science is something that’s highly theoretical and pretty abstract, and some students have difficulty in understanding it, I can easily now put together a little talk about graph theory that talks to them about users and followers and edges, that makes it so concrete for them. They then understand, “Okay, now I know every time I send a tweet what happens.” Or “Anytime I’m on snap chat, this is what happens.”

Brandeis: Then how the information, or that tweet, propagates, and then what does that mean when it propagates? How do you deal with re-tweets? How do you deal with likes? How do you deal with comments and what does that mean in this context of analysis? How do you rate the importance of each of these elements on Twitter, in order to then be able to say, “Okay, this particular tweet is what you would classify as something in Black Twitter”? Black Twitter is very much a nebulous environment. It doesn’t have boundary lines. It doesn’t have a list of influencers. It is within the digital ethos. So how do you then say something is part of Black Twitter versus something is not part of Black Twitter?

Brandeis: That’s what my students and I are talking about and having these conversations about. And really helps them, because they are using their creative brain as well as their analytical brain in order to answer these questions. We read things that are very technical on the computing end. Then we also read things that are very much on the social science end, as well, which I think is a wonderful blend.

Hugo: Is there anything in the use of Black Twitter that you’d like to see change?

Brandeis: For right now, I don’t want to see anything changed. For right now. If I were to hope for one thing in the next few years, that would be that we would use Black Twitter in a way to be proactive. There’s a lot of situations that have come up, where Black Twitter has been reactionary.

Brandeis: There’s a #BlackLivesMatter movement, #SayHerName, #ICan’tBreathe, a number of hashtags and social movements that have come out of the Black Twitter space and the users of Black Twitter. I want it to be more than that, I want it to push success of black excellence, push education, push geekhood and geekiness as something that’s championed and valued. I want it to push healthy living and self-care. I want it to be something that… Black Twitter can really be part of changing certain aspects of the culture to be a lot more positive in taking care of ourselves, as black people.

What is your favorite data science technique?

Hugo: We’re going to have to wrap up in a minute. I’d love to know, because you have a very technical background as well, what one of your favorite data science-y techniques or methodologies is?

Brandeis: I have a lot of them. I’m very much an exploratory type of data analysis person. I always try to ask myself questions that I hope to be answered by a data set, whatever data set that I’m collecting. Of course, I have to ask questions before I create a data set or look for a data set. After I ask those questions, I then want to perform some rudimentary, elementary statistics. I want to know min-max and things like that. That’s an easy method call in Python, right? Python pandas.

Brandeis: The other thing I love to do is to use Seaborn’s pair-plot.

Hugo: I love pair-plot!

Brandeis: I love it because the diagonal will give you a histogram of, “Okay, now what is the span of these values for each element?” Then I like the correlations that are on the triangularization. Then I can see if there’s clusters that are forming. It gives me a good visual to know what to do next. That’s when I’m able to re-evaluate whether or not my questions can even be answered by this data set. Then I always spur off into different questions.

Brandeis: I think that is very important, as someone that works with data, is that you’re never locked in to the original question that you asked. You have to be able to answer those original questions that you originally asked, and then be able to migrate and pivot to do questions, based upon what the data is telling you.

Hugo: As you suggested, or alluded to, the amount of information in seaborn’s pair-plot, being able to see all those things and see different qualities jump out at you, is really so much fun.

Brandeis: Yes, it’s a lot of fun. Then you can, of course, do different visualizations. Then you know whether or not you need to do a linear analysis. You know whether or not you need to some type of a sentiment analysis, or a semantic analysis. You can then figure out where to go next. If you see a correlation, okay, maybe you want to do a quick least squares linear analysis. Maybe you need a multi-regression. You don’t really know where to go until you see what the data is revealing to you. A pair-plot is my go-to. It’s just my go-to method, so then I know what to do next, and know what questions I really should be asking of this data.

Call to Action

Hugo: My final question, Brandeis, is do you have a final call-to-action for our listeners out there?

Brandeis: Oh, final call is be intentional and be resourceful. I think it’s so important that for anyone that’s in the data world, if you can call yourself a data scientist or any of its variations, is that you look around the room and see who is represented and how they’re represented, and include other people as part of that conversation. Asking questions is a necessary component of being in the data space. Grab hold of that and continue to ask questions.

Brandeis: The call to action really is, yeah, be intentional. Ask the questions. Get the answers and ask more questions. That’s the only way we’re going to broaden participating in data science is to do the work, do the good work, in data science, and make sure that who’s doing the work is as diverse as possible.

Hugo: Thank you, Brandeis. It’s been such a pleasure having you on the show.

Brandeis: Thank you for having me. This has been a lot of fun.

Hugo: It really has.

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)