How good are you at turning dry data into compelling stories? How diverse is your data—and your team? In our latest episode, Bobby talks with Ahna Girshick, researcher and former innovator at Ancestry who’s also an artist who’s collaborated with Philip Glass and Bjork. Ahna's unique background paves the way for a deep dive on topics like data storytelling, perceptual biases, ethics in AI and how COVID drove a more honest dialogue about racial disparities in machine learning. Have a listen. We think the lessons learned can inspire startups to handle data in new ways.
Bobby: Welcome to Loka’s podcast, What Fascinates You?—conversations with entrepreneurs, engineers, and visionaries who are driven to bring innovations to life. I'm Bobby Mukherjee and today's podcast is about the power of telling stories with your data and how more diversity in our data—and our teams—can uncover much greater outcomes.
Here to share her story through video chat is Ahna Girshick—lifelong researcher, creator, and former manager of computational genomics at Ancestry.com. Like many of our guests, Ahna’s career journey is as fascinating as the innovation’s she pursues.
Her research has spanned the worlds of science and data – from neuroscience to machine learning and computational storytelling.
She has received over 3000 citations press from the New York times and NPR, and has been featured at the museum of modern art for her work with Philip Glass and Bjork.
If you work with data in any capacity, if you want to connect better with your customers, or if you want to see more diversity in the tech world, I think you'll come away with something valuable from this interview.
Hello. Welcome to the show.
Ahna: Hi, thank you. It's nice to be here.
Bobby: Pleasure to have you here. So, the first thing I wanted to talk about was when you and I were talking previously, you had mentioned that your dad was a scientist and your mom was an artist, both at Stanford, which made me wonder what were conversations like at your dinner table growing up?
Ahna: I really grew up surrounded by both arts and sciences. From both my parents from my father, I saw his scientific journey. It was very academic. It felt very innovative and stimulating. And then also surrounded by the arts. So my mother was painting inside the house and in her home studio.
So that felt very creative and beautiful. And my parents had quite a bit of appreciation for each other's professions, but at the same time, even as a kid, I could see how different they were. And so, as a kid, I love the arts. I love the sciences and it didn't really dawn on me until probably fairly late in high school.
And then increasingly in college that most of our society is kind of set up in this... it's very specialized, right? So our educational systems, our professional tracks where people tend to self-segregate is generally, you know, it could be scientific, it could be artistic, or it could be many other things, but.
There's more and more deeper and deeper specialization. So that's great. I love going deep, but it also creates some limitations on diversity of ways of thinking and working, communicating. And so that was a little bit disappointing. I think for me as a young adult and confusing,
Do you remember at an early age, if you found yourself drawn to one over the other,
I think I was probably a little bit more drawn to math and science. But I'm not sure how natural that was. It might've been because I internalize this kind of story from my mother that I, you know, you don't want to become a starving artist. Like I'm going to get a good, get a computer science degree or something like that.
I'm not quite sure, but I was always looking for ways to connect them. And I, I still have been my whole career.
Bobby: So one of the things that you had taught me when we were just talking the last time was you provided this lens on how to look at data, and you talked about the power of stories and how. Data can be very dry and it's not very meaningful in and of itself, and it can become so much more powerful if you can craft a powerful story around it.
So before we dive into the specific notion of creating stories with data, I was just curious at a higher level. What role does stories play in your life?
Ahna: Stories are this uniquely human format for communicating information. So we can trace stories back like almost 5,000 years. And you know, I'm a mother
And so, quickly, I discovered even for very young children, storytelling has this huge power. I think we're wired to be captivated by stories because they give us this framework for. Interpreting our lives. And they're also very connecting, which is why they run strong and families and communities, and get passed down over centuries.
So I'm not a natural storyteller. I'm not a published author of literature. Most of us aren't. But I think when I was at ancestry, During my time as a research scientist there, , I became really interested in this idea of data driven stories and saw that my work there was really kind of like computational storytelling, which was a way for me to fit.
You know, maybe it's the story I was telling myself, but it connected into my career quest to connect AI and technology into the human experience.
Bobby: Let's dig into that a bit. So you worked at ancestry as a matter of computational genomics research for a little under five years. What were your major roles during that time?
Ahna: I started as a research scientist and then later on I was managing a research team and its history is, uh, about a 40 year old company. Their main focus is family history and users can create family trees. And a Monday joined my association with building a family tree, which is called genealogy.
You know, it was a hobby for me. And I think a lot of people have that association, but you know, I'm also a data geek. And then I quickly understood that family trees, especially at the scale ancestry has, they have over a hundred million family trees.
They're this very rich data source. And there. Spatial, because usually in family trees, people say so-and-so is born in this place and they've been in this place and died in this place. So they, and that covers the whole globe. And they're also temporal because they say. My mother was born this year. My grandfather was born in this year, going back, uh, the birth dates, death dates, marriage dates.
They, you know, all the significant events when the children were born and et cetera. So you can think of them as this raw material for that creating a computational history. And then when you aggregate them, you can discover historical trends. So, you know, we have textbook history which was written. By historians, their version of the story.
And then there's this kind of computational aggregated family tree story, which is potentially the same and possibly a little different, right. So, uh, hold on to that thought because I'm going to return to it. But another way to look at your past is genetics. And I was in the ancestry DNA. Science research labs.
So we are very focused on genetics and, genetics tell you about your past because all the DNA you have, you inherited from your parents who inherited it from their parents all the way back. And, you know, it's funny because in school, you don't get to combine or even major in like history and genetics or combine those data sets.
But that's what we did. So we're combining historical data from the family trees and the genetics data, which I think is something like pretty awesome and data science. So if you find these disparate data sources and you get to work sort of cross disciplinarily, so I had the opportunity to do that research project.
We called it know, I think of it as data storytelling. So the way we did this is first we built a social network. So think of it like Facebook, but instead of friendships, connecting to individuals, it's determined based on genetic relatedness, how much DNA you share. So siblings are going to be directly connected and very close in that network.
Whereas people from opposite parts of the world are going to be very far in that network. So think of this as a massive genetic social network. And then. You can use clustering algorithms. You don't really need to know what those are, but to find clusters of individuals in that social network. So these clusters represent a group of people there they're large clusters in the tens of thousands or hundreds of thousands, but these clusters, aren't representing a group of people who share DNA more DNA with a toddler than they do with others.
And generally when you share DNA, especially when we're going back like eight generations, it means to share a common history. You know, it was only in the past few hundred years that people are traveling across the world to meet their mate. Right.
Bobby: Right. So I definitely believe in the power of storytelling. And then you take something like data, which untouched can be really dry and about the furthest thing away from a story. So if I'm trying to create better stories with data, one of the things that I just picked up is first you have to have a character or characters in your story that people can empathize with and relate to.
Ahna: Yeah. I mean, I think connecting to people is a powerful technique. It's probably not the only technique, but we empathize with people. We empathize with other people, especially if they can relate to them.
If there are. Data products out there that are geared towards customers. And so there, the people are there already, and then there's also, , data journalism and the news where it's looking at large populations of people. So it's not necessarily targeted towards individuals, but there's still that human connection.
Bobby: Right. So I think that seems to be, at least one of the key ingredients in trying to make a more compelling story.
Ahna: Right. So if you're making a story, even if it's about climate change or something like that, and you're, looking at the data on that, how does that affect people?
Because we care about people. Oh, you could talk about how it affects animals too, I suppose, right?
Bobby: No, no. I mean, exactly. But I think again, the key ingredient is can have you created a character. That, whether it's a person, animal or whatever form that people will relate to and empathize with. That's the key thing.
So switching gears a bit, something that I would really love your perspective on is, having had this tremendous journey in the field of AI, being a practitioner, you know, just your perspective, is it different for women in the field of AI and machine learning than for men?
Ahna: I have anecdotal answers to that question for myself, but I think what I know is that, you know, the more diversity we can bring to teams, building AI systems, The more, algorithms can reflect that diversity, which I think is a very good thing. You know, diversity comes in many forms.
It's not just gender, right. It's race, but it's also a diversity of educational institutions and diversity of training. Right? So matching computer scientists with anthropologists or artists or journalists, Or matching a Stanford PhD to someone who is self-educated.
Cognitive diversity. There's many farms, but I think when we are willing to create a more diverse team, sometimes that might feel hard or different, but it ends up creating different types of solutions.
I had definitely have seen that happen with us and with other teams and constantly, constantly looking for opportunities to make that happen because the outcomes are fantastic.
Bobby: So here we are, hopefully, hopefully in the tail end of the pandemic in your mind, did 2020 change things for data AI and storytelling.
Ahna: Yeah. I mean, 2010 each changed the world, right? It was just a moment. Right? I mean, two things stand out to me. One, we didn't talk about COVID research, but. Because of COVID. I noticed that self-reported health data became more mainstream and more accessible. So addressing the COVID pandemic while we're all remote, it kind of forced the healthcare industry to be more accepting of that self-reported data.
And also people were motivated to help. So one of the things I did at ancestry was to help coordinate a COVID research study. We had. Nearly a million volunteers sign up to anonymously participate, you know, to contribute to the scientific understanding of the genetics and other risk factors underlying COVID.
And I saw many other efforts like this, where organizations were collecting data and cell phone apps that like ping you every day. And you could enter your zip code and what symptoms you were feeling, that sort of thing. And that. Healthcare industry, which has traditionally been rather low tech, you know, looking for this data because it's only in mass.
So that was one big change. And I'm hoping that trajectory will continue because self-reported data is really valuable, really powerful, and what it can do , for science and healthcare. And then the other big issue of course, of 2020 was, you know, racial justice movements that sort of Gulf the country.
But I think they also helped drive this kind of more honest dialogue about, racial disparities in the workplace machine learning. Last week on PBS, I saw the coded bias documentary, which if you haven't seen is, is amazing. And it's about bias in AI. It's disturbing, but it didn't tell me anything.
I didn't already know. And, kind of highlights bias and training data by the lack of diversity of engineers, designing the algorithms, but also the business forces dominating that conversation. And so, my hope is that those AI ethics organizations that you mentioned, we'll be applying pressure and working with, the big AI companies.
There's only a few really big ones right now, you know, it's just to prioritize. Transparency, for example, that would be a great start, but what's how the algorithms are working and where the data is coming from. And so people, you know, this goes back to data storytelling that people want to use, that product wants to understand why this algorithm is.
No, maybe something serious, like telling them they have cancer or something. Right. Like where did it, how did it make that decision based on what data and how was that learned? How do you contextualize it within a population to understand? Because if you're making a big decision about your healthcare or getting a job, you know, you want to understand that whole context behind it.
Bobby: Where my optimism comes from for, for 21, I think is those observations. You said, lay the groundwork for some momentum in that direction. So you see, you see better outcomes with things like model explainability and less confusion about. How algorithms are making these weird decisions that we don't believe they shouldn't be.
So I think that is great. Cause for optimism.
Well, I know this has been fantastically useful and engaging. I've learned a ton. I have many more things to ask you, but. I really, really appreciate your time. Thank you so much for being on the show.
Ahna: Yeah, it was a pleasure.
Bobby: That was Anna Girshick, researcher, creator, former manager of computational genomics and ancestry.com and barrier breaker in AI. If you're interested in learning more about Ahna and her research, you can visit her website at lightdark.org.