Briana McClain, Senior Data Scientist at TUVA, has worked on projects ranging from making city transit safer to work for the department of defense. But today, she breaks down how small businesses can use data science as a competitive edge.
Get the latest on indie agents’ fight for data ownership:
Want to read instead? We’ve got you.
Syd Roe 0:00
Alright, how’s wedding planning going?
Briana McClain 0:04
It’s good. It’s, it’s a little crazy, but it’s good. Like we’re moving along.
Briana McClain 0:10
My fiancé’s got this week, not necessarily off, but she’s her boss is out of town. So, she gets to kind of work from home and get some more stuff done than she would normally get to. So that part of it is pretty cool.
Syd Roe 0:23
Yeah, yeah, I am. I’m not speaking from experience here. I’m, I’m not married, having been married. I know, it’ll happen at some point in my life. But I’ve heard a lot of stories. And I feel like there are three phases of wedding planning. There’s like, pure, ignorant bliss, where you just don’t know what you don’t know. You kind of know it’s gonna be some work, but you’re like, Ah, it’s so exciting. And I’ll just kind of happen like, you know, you’re like right in the heat of the engagement and everything’s good. Then there’s the reality hits where you’re like, oh my God, this was way more work. than I ever thought it would be. And then there’s the like, I can’t wait for it to
Briana McClain 0:10
be overly, um, all the things like all of those feelings you’ve experienced, and fortunately not at great degrees. It’s been pretty solid. I think we kind of knew what we were setting our self up for. It was just a matter of like, when are we going to get around to doing it and so of course, we crash the nation kicks in and, and that makes the, you know, the month before a little crazy, but that’s probably the fun in it. Um, so no worries there.
Syd Roe 1:33
Well, that’s good. That’s good. I’m glad it’s going well.
Syd Roe 1:38
I mean, I guess to kick this thing off. I’d really love to hear a little bit about you like how you got into data science. And I mean, one of the things that I’ve thought about recently is just how hot that industry is right now. I like it. Yeah. I think that there’s like waves different industries kind of peak and then Valley and peak and valley. And I think right now just I keep hearing about data science and that skill set so I’m guessing when you first got into it, it wasn’t as popular as it is now, but I could be mistaken. You know, like, what was that journey? What? How’d you get into it?
Briana McClain 2:24
Yeah, for sure. So, I mean, I would say when I got into it, it actually was, it was pretty hot. Not nearly as hot as it is now, but it was it was definitely a thing. Um, but when I went to school, originally, I studied math and, um, I definitely was like, I’m gonna study math so that I can do data science like now that totally makes sense for kids that are doing it like that’s the way to go right? Because it’s there’s so much money and opportunity there. When I went to school. I just studied math because I knew it was productive. I knew it would be helpful and it’ll be helpful industry, whatever. We went into it but at the time, the only real opportunity was like research and teaching. When I was looking at math like now you could go down to like research or clinical chemistry path. But there was nothing that was like math numbers based. So, I went to school. And then I got a job straight out of school going out of grad school at Accenture. And they kind of pushed me into the software development realm, which is, you know, the way that they these technical jobs work often it’s like, well, if you have a technical mind, like if you can get through a math degree, you can probably do this, you can probably do this. And so, they kind of just put me into software development. And I didn’t really love it. It wasn’t really what I wanted to do, because I wasn’t really doing math. I was doing some coding and some difficult things, but it wasn’t math based. So, I thought to get to something a little bit more math base, which was kind of in the analytics rule. And so, I kept asking him to do analytics capacity data analytics, and they kind of were like, No, we do software development here, like just go do more technical architecture and more engineering and so on. So eventually, I left center because of that, because I spent a couple years there asking to do analytics and I couldn’t. And they kind of shot me down and said, No, just focus on this, just you know, if you get into security, cyber security, you’ll, you’ll make a lot of money, this is a good place for you to be. But that wasn’t what I wanted to do. And I’m so thankful that I stood my ground and, and followed what I wanted to because I knew at the time that that was happening, I knew analytics was going to pop off, I knew that it was going to be a big part of the industry. And I knew how powerful it was. And so, I made that switch and now look at data science has, in fact popped off and I’m so glad I did.
Syd Roe 4:36
How did you have that foresight to know that that it was going to be something that that a lot of businesses were going to need and you know, I mean, were you watching the industry, or was it like a gut? Was it just frustration in your current role, like what were what was pointing you in that direction?
Briana McClain 4:56
I mean, so I would say it’s a mix of two on one hand, as a as an A Analysts and just as a technical resource, you know what you’re competent at. I knew I was good at math. And I knew that what I was doing was not math based. And if I move toward something that was math based, I would be more successful. And I would have less to learn because I just spent six years in college learning it. So that was one of the initial things. But I also knew that I knew that applied math, which is essentially data science, a portion of data science, I knew that there was power in that because I had, I’ve solved own problems in my I’ve solved problems in my own life with that. So I knew that if people caught on to the power in analytics, and the power and data science and applied mathematics that they that this would be a push that everybody would want to do once they saw what it could do for them. So it was kind of on both ends. On one hand, I personally wanted to do more. I wanted to do analytics, because that’s what makes me happy. But I also knew that there was there was companies that were starting to do things, that we’re leading them in direction that we’re making them powerful right like that. This is when Things like spam filters were coming out and easy ways where you could find spam in your, in your inbox. And this is where in ways where people were doing different things to detect what you may find in an airport like these small things I saw, I started to see how helpful they could be to the industry. And I knew that was it was a push that was gonna come.
Syd Roe 6:22
Yeah, that makes sense. I mean, I guess I want to ask a two part question here. A is the current project you’re working on your favorite project? If not, or if so, I mean, you know, what have been some of your favorite projects you’ve gotten to work on and then also, I’d be curious as to like, maybe some things you haven’t worked on maybe, you know, something you’ve seen worked on in the industry or in the marketplace, right? Or maybe a friend you’ve overheard a friend, you know, working on it, like, what are what are just some of the cool things going on right now. Both that you’ve done, and maybe Heard about?
Briana McClain 7:03
Yeah, sure. So what I’m doing now, I wouldn’t say is the coolest project I’m working on. But some of the stuff I did with the d. o. d, I thought was pretty cool. And some of the stuff that I’ve, I don’t know if you’ve heard of kaggle. But kaggle is like, basically a data science competition. It’s open source, so it’s free to anyone. So if you go to capital comm, you’ll see a series of competitions that you can do, and you can go in there and just try to solve it, they’ll give you a problem, they’ll give you data and they sit and they’ll say, tell me this, and there’s prizes, you can actually win if you get the best solution, you can win. And there’s actually some companies that put their that will put a competition on to solve their problem. So instead of hiring a team, they’ll put it on kaggle and then give it a prize to the winner figures. So it’s a really, really smart
Briana McClain 7:50
industry in that that might be something that you might want to look into because it’s a, it’s something really cool. So I’ve done I’ve done some kaggle competitions on my own, and so that’s helped me kind of Learn, learn some different things. And that’s improved my skills. I’ve also done some, some hackathons, which are not really hackathons or name that, which makes it sound cool, but they’re really just like, someone sponsors it in the area. And then basically DC will provide different problems. I’ve done them in DC. So like, council members, or someone in the area will provide a problem and say, Hey, can you help us figure this out, and a bunch of women in tech will come show up, there’s actually some men that come to but a bunch of people will come up basically come to this free event in code all day and figure out the problem. So if I’m being honest answer your question, some of the best projects I’ve done have been free work that I’ve done on kaggle or volunteering at a hackathon. Not necessarily the stuff I’ve done for the God or for the FBI, but
Briana McClain 8:49
I can speak to either one if you
Syd Roe 8:51
Yeah, well, I mean, you have so much more creative freedom. It’s kind of like wide open fields here, come help us. So I mean, obviously and they don’t really I might not have a clue on how to solve it either. So they’re really like here, just come on. And let’s figure this out. And then you get all these different minds kind of on the same projects, you’re probably not working alone on it either. You can kind of bounce things off of people. Like that’s cool.
Briana McClain 9:16
Yeah. And it’s also like a very casual environment, you go in, and you like you guys hang out, eat, drink, and hang out and solve data science problems all day. And like, it’s a very laid back atmosphere. So that that is helpful. But it’s also cool because you’re solving a problem, a real world problem sometimes, in these problems for these contracting companies that you work for you, you know, they come to you and say, Hey, help us figure this out, help us like, improve our numbers here. And then you give them something and you send it off to them and you never see it again. And you don’t really understand how the work you’re doing really fits into the real world. But some of the ones that I’ve done at the hackathon is we looked through all of the bike paths and there was a problem with bikers getting hit by fire cars. And so we looked through all the city use geo mapping to look through the city and find areas where there were not protected bike paths and how one could be put in or how a bike lane could be put in based on the landscape of the city. And so what’s the funding that’s going to take to do this? And how do we get there so that we can make bikers safer? So that’s a really cool, that’s a really cool problem, because now you can if you see some change, and in different I guess if you see some change in what the city is funding, then you’re like, man, I had a I had
Briana McClain 10:34
a hand in figuring out the right mix of bike lanes is not there. We also did another one on gun violence and trying to detect where gun violence was happening in the city and what the age groups were, what the demographics of it were in economics and all of that. And that was a cool problem too, because that’s a real issue that I deal with every single day. stuff that’s necessarily for the D o d, or the DOJ Whatever, you know, you get how it’s powerful, but you don’t necessarily see the end result of the work.
Syd Roe 11:06
So, I mean, clearly this is something that a is very technically heavy. I mean, not it doesn’t seem like anybody can be a data scientist. And you’ve got to have, I mean a ton of background experience in Applied Math, which, by the way, I was never good at math. So kudos to you, dude. You know, that Plus, it’s something that can make a real difference, a real impact in the world. I mean, let’s just like step back for a sec, if you had to give a definition of what data science is what it does for people who are just coming into this or like data science, I have no idea what you guys are talking about. What is this thing? Why are we putting these two words together? What would you tell them? If you were explaining to someone in the desert what grasses what would you say?
Briana McClain 11:58
Sure. So Um, I guess I would start by saying so data science is essentially making use of your data to solve problems. So that might be taking your data, there’s one side of it, that’s data storage. And that might be, you know, if you have customers coming into your store every day, and you want to record what it is that they’re you want to record, what it is that they bought in January versus March versus whatever what quantities they brought, or what demographics bought, what saving that data somewhere, because you’re getting countless of it, you’re getting so much of it every day. Saving that storing that that’s like a data engineering portion of it. And data engineering is a part of data science. The other part is taking that data and basically trying to make it make it in the best format possible so that you can view it right so I could look at if I give you a table, and it’s all messy and you can’t really understand what it is some data science might be me just formatting it to get it in a, in a more useful, readable data table for you. Some of it might be using data science, data science could also be taking data and using what you have to predict the future, or to answer certain questions. So more than I mean, in general data science is taking data and answering questions or solving problems.
Syd Roe 13:48
Yeah, I mean, the predictability piece is the part that always confuses me. Because if I mean, and I guess here’s what I struggle with is, do you know what you’re looking for when you start? Like, because I think we all come into certain things with biases, right. And so you’ve got this big pile of data, and you’re trying to determine what happens in the future, but you’re, you’re coming at it from your own. Like, I just I wonder if you have data scientists, a data scientist B, who’s looking at this data and saying, well, based off of this, I think this is going to happen next. Do both of them come up with different conclusions because they have different biases. Is or do both of them come up with the same thing? Do they know what they’re looking for when they start?
Briana McClain 14:32
So and that’s I mean, bias is a part of the science bias is a part of the statistics. So there are a ton of bias models that you bake into your model and make sure that you aren’t that those biases aren’t coming about. So it’s not necessarily a bias that the analyst has. But it’s a bias that’s in the code. And you would call that say skewed data. So Briana McClain
when you talk about what you’re predicting, oftentimes what you’re predicting is what you have, but, but replicating it in 10 years, right? So if I look at Amazon data, and I say, this is what everybody bought in March of 2019. Can I predict what everybody’s going to buy in March of 2020. Now, you’re going to use everything that you all The data that you saw in 2019, not just you’re going to use refund data, you’re going to use purchasing data, you’re going to use data from clicks, you’re going to use all this stuff to predict what people are going to buy in 2020. But when it talks about biases, now, you might have a bias because you may not have so you don’t have all the data. So you only have the data for Amazon purchases that were made and sent to Arkansas, or a certain part in Arkansas. And that happens to be a certain demographic and a certain race, a certain age in comparison to the rest of the country. So your data is now skewed. Now you have a bias in there because the same people that the same thing that people buy, there isn’t what people are going to buy across the whole country. So your numbers for 2020 are going to be different than what they would be if you didn’t have that bias in there. So what you would need to do or say you had not just data from Arkansas, but you had a high amount of data from the southern states versus the northern states. So there’s a bias in there and all you can do Is distort your data to make it more equal across the board so that you can so that you can get a more, you can get a better view of what would happen in 2020. But that may involve taking some of the southern states data away. So it equals out, right. And that then you’re saying was that helpful, because now I’m taking some of my data out, and you always want more data, that you always want more the most amount of data you can have. So that’s kind of how it works. It’s to explain how, what question it’s asking usually that’s, that’s provided. I mean, there is an exploratory portion of data science and particularly in machine learning, where before you even answer a question before you look at anything, you start exploring the data and you start seeing what’s in here. Is the data clean. Is the data valid? Is the data skew. Does it seem like there’s more males than females in this data? Or does it seem like there’s more data from this area like that is there’s definitely a portion and you may find other questions. You may find the answer to other questions in your data. But there’s typically a question before you even start. Because there’s a reason why they’re saving all of it right? A lot of these agencies aren’t just saving data to save it. They’re typically saving it because they have a goal in sight that they want to predict in the future.
Syd Roe 17:17
Yeah, no, that makes sense. And it’s interesting that it’s an I mean, you guys are coming at it with an understanding of that bias and with a process to maybe not erase the bias, but acknowledge the bias, it seems like Yes, okay. Yes, that makes sense. So, I mean, one of the examples that the example that you just gave, you know, with Amazon and trying to say, okay, you know, here are our sales from this year, let’s, let’s predict what, you know, it is from to next year, I mean, what are some ways that data science, I mean, I guess, some hot ways it’s being used today. And you mentioned machine learning Like what, how when bit when businesses say, Okay, I want to invest in data science, this is something that I’m going to put, and I’m sure it’s expensive, right? I mean, to go through the steps of, then this is going to totally test my knowledge of data science here. But right like looking at the Derrick data, cleaning it up, store, you know, storing it and then hiring someone to come in and analyze it and then act off of it. I mean, the whole implementation process of like, here’s the here’s what we’ve learned now we’ve got to act off it like that’s a very expensive process. So our Do you see any trends in what businesses are looking to use data science for? Or is it just like all over the map totally depends on the business.
Briana McClain 18:47
Yeah, I mean, I’d say it’s, it’s all over the map because there’s so much power in it. There’s so it’s so there’s so much applicability to everything, right. So on one hand, you can predict a company sales Sure. You can also predict You know, a terrorist attack, you can also look at, you can also use image detection to determine something that you may not be in front of it. But if you have a camera overseas or if you have a camera over a specific location, you can then use image detection to determine what you what that what you think that is, and you’re using machine learning because machine learning is now looking at those pixels and trying to determine what that what you think that image is. So there I mean, it just goes beyond me. There’s predictive maintenance where like, maybe we’ll never have to go take our cars in like, maybe cars can have baked in maintenance into them because it now can read based on how far you’ve drove what potholes you drove over how fast how bad of a driver you are, it can read what maintenance you need, and it can do it there without you having to take it anywhere. So I mean, it’s so it applies to everything that can save data can be reworked and changed and improved,
Syd Roe 19:57
do you I mean, do you think it’s like a matter of time before In a way every business starts to take this into their company and make it a part of the process.
Briana McClain 20:08
I mean, part of me feels that way part of me feels like because as other companies and other things start to improve from it, you’re going to have to in order to be competitive, that’s my thought, I think it’s gonna take a lot longer, then I think it’s going to take a lot longer than is reflective and how hot the, the career field is now, where the industry is, I think how hot the industry is now is coming off like it’s going to take over everything in the next two years. I don’t think that’s the case because of the process you just talked about. So, um, you know, a lot of these companies want to go from zero to machine learning, they want to go from zero to predict my future so I can have better numbers without realizing that the steps they aren’t a lot of them aren’t committed to taking all of the steps on right So, you one thing that you have to have is a lot of data. So if a company wanted to do machine learning, they need to start saving that they need to create processes in their company now, so that they can save data now, for the next it’s gonna take them 10 years to save enough data for them to be able to predict, right. So like they need to commit to that now knowing that they’re not going to see results for years, and a lot of companies are not willing to do that a lot of government facilities or government agencies aren’t willing to do that. So that’s the first thing. But the other thing is even once you even once you commit to that, you could find that the way that you’re in the way that you’re saving data is not conducive to you using it in the future. It’s not you’re not saving in a useful form. So some of it also goes into like, now you have to change your practices. Now you have to make sure that your employees have a focus on data so that they’re saving it accurately right. Some of these people that are saving data, they’re like, oh, who cares, because it’s now being saved for 10 years and nobody’s using it. They could enter anything, right? They could put big fat finger anything they in because it’s not being used. So there’s also a process of making data a priority in companies from the top down. That is, that is another thing that companies are willing to do. So I think there’s gonna take some time because these companies are not committed to it. They like throwing around the buzzwords and talking about it, but they’re not actually committed to the process.
Syd Roe 22:32
So you can’t see my face right now. But I am choosing so hard like you just said that. And here’s why. Because my partner and I, Seth, we have talked a lot about, I mean, part of the problem is the current technology in the insurance industry doesn’t allow for the correct storage of data and we could get into all sorts of reasons why that is. So that’s part of problem. But the other part of the problem is,
Syd Roe 23:03
these, this idea of making or creating data, like data doesn’t just appear. It’s not like you can just say, Well, I want to know all this stuff about my business, snap my fingers, hire this amazing data scientist. And then, you know, have them spit out all these immediate, amazing predict, you know, for sites about the about what’s going to happen or gives me insight into my business. Like, you’ve got to spend the time like you said, to actually create or make that data. And so I guess just getting into the idea of Okay, this this data, this is something you have to create, right like, from what I understand there’s two main buckets and you correct me where I’m wrong here because like I said, I will probably botch this up. But from what I understand there’s two buckets structured and unstructured are first of all, are those the two buckets is all the reading I’m doing on? You know, data scientists calm right. And then what could you give me like a quick explanation of what those two buckets are?
Briana McClain 24:07
Sure. So structured and unstructured data is a, those are Yeah, those I would say those are two solid buckets. And we’ll try to explain this the best way possible.
Briana McClain 24:19
So a resume and an application are pretty similar, right? Like you’re basically filling out the same stuff, right? Does that make sense? Like if you if like, if I provide my resume and then I provide like, I fill out an application on a company’s website, I’m usually filling out the same exact stuff, my name, my email address, all the places I worked where I went to school. So but one is structured and one isn’t because so when I go to this company’s application, they have specific lines and typically they have these specific things to fill out and then they have dropdowns right so like, when I put my education I can only select between boxes. There’s High School masters PhD. And that’s the same thing when then when I put it in my resume, I could write anything I could write bachelor’s, I could write, masters. I could write PhD, I could spell out in all those things like there. I could write MBA, I could write whatever I want. So the difference between structured and unstructured is, you know, what’s going to come from structured data, you know, that when someone puts data when the answer for data birth, comes up, you know, what’s that? Because you know, it’s going to be in 11 dash 13 dash 1988. You know, it’s going to be month, month, year, excuse me, month, month, day, day, year, year, but unstructured data, it could be anything. It could be November 13. spelled out. It could be the 13th of November 1988. It could be however you want like there’s no way to filter how someone there’s no way to filter what that says you can’t you don’t know how to bring it Because you don’t know how it’s gonna be written. So imagine trying to parse through like, a written a written application where you know what’s coming, you know what they’re sending you versus me writing you a letter. And you have, I’m answering the same questions, right? I’m still telling you, where I went to school and what my email is and what my name is and where my what my address is, but one is in spelled out specific, spelled out specific fields, and one is in the form of a paragraph. Does that make sense?
Syd Roe 26:30
You do. That was amazing. Yes. No, that totally makes sense. And I mean,
Syd Roe 23:37
I guess my question being what so it seems like structures is subjective or relative to the business who’s trying to understand the data. And so I guess what does that mean? You know, let’s just say for example, you have a business that’s got a big pile of data, and 40% of it is structured. 60% of it is unstructured. What does that mean? For the business’s ability to make sense of or predict things from this data that they’ve made.
Briana McClain 27:10
So it leads to making it easier to use for sure easy it leads to make it easier to use, it also leads to it being more accurate. So if I am allowing free Tech’s so think about we’ll go back to the birth date to the date of birth or considered date purchase date for something that you’re buying. So when I go on to Amazon, and say I’m filling out some refund information, and I’m putting purchase data in and I have to select from a calendar that basically makes it puts it in the format that you want. So I go on, I select from their little, you know, little calendar buttons they have on there, and then you select your date, well then it puts it in the month that it want it puts it in the format that it wants it so now whenever every time Amazon gets a return date, it’s going to be in that format. But if they didn’t have it like that say they had a process where they just had an open field because it’s easier for a programmer to just create an empty space for you to type in than it is for them to put the calendar on there and create a button and format. So instead, they still have a legacy system where it’s just an open field. And I decided that I’m going to put in 11 1388, but I’m, I wasn’t paying attention, and I accidentally put 11 1338 and now they’ve got a year, that doesn’t make any sense. So what do you do? Like how do they How do they correct that they just guess, or they drop that data and they don’t use it? Because the data is not accurate. So same thing with like, say you have a system that allows you to autofill someone’s social security number. So it has a system where it goes, this person puts in all their information, and then it goes and gets their social and then matches it to them, puts it on their application. Okay, cool. That’s great. Now we have all of this user’s information, but if it’s a situation where I have to put my own social security number in And I don’t really care about the end result and accidentally leave a number in or, or I accidentally put a B in there instead of a number. Now at the end of the process, when you go get my information, now you have garbage because so it’s not a valid social security number. So structured data leads to the ability to validate your data, it also leads to the ability that you can now read it correctly. So if I say you’re looking at complaints from someone at your store, or whatever, and they write up a letter, or a paragraph, it’s really hard to read that paragraph like from the human eye, it’s easy to read that paragraph. But from a computer, they don’t know how to connect like say you’re talking about multiple purchases, they don’t know when you said it was a really bad product. They don’t know if it’s referring to the whatever the item you bought last month of the item that you bought this month, like they don’t know. But if you had a structured way where you had boxes that you filled out You put you answered questions that way. Now it’s probably easier to track that data and retain it and do it, make it make use for it. But it’s really hard to make computers understand free texts and handwritten notes. And it’s hard. That’s called unstructured data. And it’s, I mean, they’re the industry is catching up and they’re learning it. They’re basically figuring out how to write code. It’s called natural language processing, which is a lot of what I do now how to write code to basically understand glowsticks and language. But it’s hard and when we’re not there yet.
Syd Roe 30:52
what’s up guys. Unfortunately, technology sucks. Sometimes our audio went out Right in the middle of our conversation, but
Syd Roe 31:01
we did save the day we switch to another platform. The audio might not be as crisp as fresh. So I do apologize for that. But I wanted you guys to be able to download on the rest of the awesome the one and only Breanna McLean. So here you go. Give me with like the natural language processing. And this is just like personal curiosity. You know, we had a, we had some builders in the office like two weeks ago, and we’re interviewing them to build this piece of tech on top of neon. And they had mentioned natural language processing. So I guess my question being Is this the ability in layman’s terms for technology to help turn unstructured data into structured data.
Briana McClain 34:58
So in a in a way. Yeah, like.
Briana McClain 32:03
So if I said, say there was a, you wrote me a letter, think about this, say someone’s writing a report on something that happened. And they write it up on a piece of paper with their, you know it’s in paragraphs, well, I can’t really use that data in a computer because the computer doesn’t know how to read English. So what you can do is say the important things that you want are like location data, birth date, social security numbers, etc. So what natural language processing does is it uses something called regular expressions and you can use a regular expression to basically look for something specific because a lot of these things have been specific forms right? Like an address typically has numbers and then it has a word. And then it has something some type of prefix like Boulevard Street and then it has a city and then as a zip code, city and state and zip code. So you can tell your regular expression to look for specific an even simpler one, when you’re looking for a date, you can tell your regular expression to look for two numbers forward slash two numbers, a forward slash and four numbers. And it will only find that pattern is a way to search for patterns in your data. And then basically you can use that pattern to now. Now once you have this save, give all the social security number saved, you can now put that in a table in which can be exported into Excel or CSV or something. And now you have a structured data set of addresses or social security numbers, date of births, or whatever it is. And so it’s a way to basically pull out things that you that you have a specific pattern. So keep in mind, though, that what about the things that don’t have a specific pattern like, names, right, like how do I distinguish your name from a city or a state or a location or a thing, and so a new thing that’s kind of happening now that I’m trying to get a little more information on that I don’t know very well is called named entity recognition and it’s a part of it’s basically a more advanced part of natural language processing. And it basically can now look at the sentence and say, it can find verbs and nouns and, and subjects of a sentence and use that context to additionally pull out more information. So say there’s a sentence in a report that says, Brianna McLean committed this crime, her on this day, and afterwards she went to see Sydney right. So when I pull out the date, am I am I connecting the crime to Sydney or to Brianna, like that? The computer doesn’t know that all if you’re looking at just straight regular expressions, all they know is to pull out words like crime, date of birth, and location. So what named entity recognition can do is a can look at it and look at the context and say, I think Brianna’s, the subject of This sentence because she started her name with at the beginning, I think she’s the subject meaning she’s the one that committed the crime and the date is goes to her name. So like name is basically ner allows you to use context of a sentence to better understand to better read the language.
Syd Roe 35:17
That makes sense. So I’m going to throw like an example out. And I’m just I’m curious if that, you know, my understanding of this is right, in terms of this might be why natural language processing is has been created. So we were talking, you know, with these builders about and this is, by the way, it’s like three years down the road, this isn’t going to be in version one of this technology, but insurance is a risk based business. So you have to be able, if you can determine if someone is you know, more risky, less risky, what they’re that might increase their premium volume that might make them someone who you don’t want to do business. With if they’re beyond a certain risk threshold. So understanding the nature of risk, as it relates to individuals is incredibly important. And what’s crazy is the amount of data that we’re putting out on the internet about ourselves in aggregation, I could tell you could, you know, give you a hint of maybe that risk profile. And so, but the problem being right, the information we put out about ourselves, if someone isn’t, like we talked about before, making it, creating it with a certain standardized process. It’s going to be completely unstructured, right stuff we put on social media stuff, we text back and forth phone calls that we put into businesses, whatever, right? These are all forms of unstructured data. So us you know, I guess it’s interesting that as we are Move forward and just the world moves forward, that volume of unstructured data just gets to be higher and higher and higher. And the ability and opportunity to use some of that, inside our own businesses is I mean, could be incredibly valuable. So I’m wondering if I mean, a, I’m just wondering what your what your thoughts are on that. There’s like, there’s so many different angles, you could take, you know, is it right? Is it wrong? Are we going to keep creating data? What does that mean for the world? But like, there’s just a lot of unstructured data out there. What does it mean for businesses and I guess is that why natural language processing has taken has become such a has evolved I guess, because it because we do have this volume of unstructured data out in the world.
Syd Roe 37:56
That was a lot by the way.
Briana McClain 37:58
Yeah, so because Essentially, if you look at it from afar, unstructured data is typically free, structured data cost because someone put effort in thought in three previous thought process into that data, right? Like they thought about how we’re going to save that how we’re going to have someone collected, and they paid for that. And so that’s usually something that they will close to their chest. But free text data, you know, it is unstructured data typically is like a note that somebody put in the trash they didn’t care about or look files, nobody’s using files that are in a file folder anymore. So like that stuff, people just give away. I don’t care. I don’t care what you read about my Facebook, but you should start to care. Because now when you go to say, when you won’t go to get insurance, and you lie and say, well, I’ve never smoked a cigarette before, but I can go through your Facebook and now use image detection to see a picture of you smoking a cigarette or I can look at your posts and to where you’re talking about cigarettes. Now the insurance company would say if you don’t want to take over On that guy, even though he did says he didn’t smoke, I feel like based on these things we found on his profile that was free that he did. So that type of stuff is, is huge. I mean that it just basically is going to create a situation where there’s no privacy for anyone. And it’s also going to create a situation where data is king where, if you the most data that you have, you’re going to it’s going to become like currency like we talked about before. So like, if you have it, you can now sell it to people. And that type of thing is, is already happening, right? Like if you think about, think about your insurance company, or excuse me, think about your credit card company or your bank, right? If you have online banking, they basically can see everything that you bought, they can see everywhere you went, they can see all the stuff you buy. And that’s their data because you’re using their system. So what do you think they don’t you think that they could sell all that data, everything that Sydney’s bought for the last year, we could sell them to Amazon now Amazon can better improve their algorithm to sell you things by knowing what to show you because they have that data. And you don’t even know that it’s happening. But it’s there. It’s not, it’s no longer yours, you create the privacy issue. So data is, I think it’s going to control everything is going to really eliminate privacy. And it’s going to become, it’s going to allow these companies to become
Syd Roe 40:23
Do you ever just get depressed about thinking about this? I mean, like, just sitting back, and I think it’s cool, super cool, but it’s also depressing. I mean, what do you think?
Briana McClain 40:38
Um, I think, on one hand, I think it’s sure it’s a little invasive and maybe depressing. But I also think the fact that we can move the needle on things that we need to like, terrorism and car accidents and, and all of these things that we could help. It’s somewhat inspiring and I think they kind of canceled Other out, is do I think it’s dangerous, how intrusive it can be into people’s lives. Sure, but also there as, as this continues to happen, there’s gonna be more strongholds on ways that you can keep your data close to your chest and not allow them to take advantage of it, there’s gonna, you know, as more companies get more power, more policies are gonna come out about how to protect people from, from these types of things, in my opinion, the same way that like, Facebook isn’t allowing certain things to be taken from by Google or the or the internet or whatever. Like, there’s certain things that companies will do to protect people because that’s also going to become something that people are interested in, in the same way that companies and agencies are interested in the data. Individuals are going to be interested in in only using sources that allow them to keep their lives private. So you know, you might find the other might be a bank that pops up that announces, we’re going to be the only bank that doesn’t sell your data. We’re going to be the only bank that doesn’t retain your bank statements, data. or something like that, you know,
Syd Roe 42:02
dude? Yes, Yes, I was. So read the ccpa, the California consumer privacy act like a couple months ago did a video on it like a month ago. And I was just thinking about that how your data privacy usage as a business is going to be a competitive edge in the future in how you use someone’s data. Because the ccpa doesn’t really put any like restrictions on businesses, it just says, you have to be more transparent in the way that you use it so that consumers can determine whether or not they want to do business with you, which is, you know, it’s very, very American to think that way, right, like freedom of the marketplace. And so, like, it does just make me think the more businesses who are staying in tune with what’s going on, especially small businesses, Like, to me, this is where they kind of carve out a bit of a niche where these big guys are basically like, we can’t not use data, like Amazon can’t not use it, there’s no way that they are going to be able to separate themselves from using data in the future, like it just it, they need it in every piece of their business, but a small business can say, Yeah, well, you could buy from us. And you know, it’ll be a completely private transaction.
Syd Roe 43:28
I just I just that 100% agree. I think that’s such a fascinating concept. To use it as a competitive edge.
Briana McClain 43:37
Yeah, for sure. on both ends. Yeah.
Syd Roe 43:40
Well, so um, in terms of what can small businesses do.
Syd Roe 43:48
The data is a huge topic. I mean, everything we just talked about, I feel like is barely scratching the surface. But it feels like one actionable takeaway is things About the processes you have in your business for making data. Like if, if you were talking to a small business, and they’re like, Bri, where do I start with this? Like, I don’t even know how to begin, would that be a successful first step is to just say, look, look at your business process and think about the way you’re creating data.
Briana McClain 44:23
Yeah, yeah. And I think the one edge that a small business has over big is, is that they can change those processes a lot easier than a large business can show like, you know, a small insurance or a small hospital and small doctor’s office can change the way that they keep their data, the way that their clerks Save, and retain things a lot easier than Kaiser Permanente. That’s all the way around the world everywhere. So they can’t, it is going to be really, really difficult for them to make changes to their legacy system that’s been running for years and years and they have to retrain all these people. But a small company that hasn’t even created these processes, yet. is able to make some changes and start out outcome fresh. And I think that that’s, that is one edge that they could have. Have above it had above other large companies
Syd Roe 45:12
do that. Yeah, no, go ahead. Go ahead. Go. I was
Briana McClain 45:16
gonna say but I mean, you have to have the data. That’s the issue that some of these small companies don’t have. They don’t. They don’t necessarily have the data, but they could. And it’s a really super, super cool, important idea to once you start getting it to ask that question like, Well, now that I’m getting it, what can I do? How should we get it? What’s the best way so that we can utilize it because even a small amount of smaller amount of data like a percentage, a percentage of say, Kaiser Permanente data is way better than having all of Kaiser’s data? That’s crap. Because it wasn’t done with the right process.
Syd Roe 45:52
Well, I hope, I hope everybody who is listening perked up over that because that’s
Syd Roe 45:58
That’s powerful. I mean that that seriously is because I think when, you know, when small businesses hear about this data talk, it’s basically, you know, they immediately think Amazon, Google Facebook, and it’s, you can’t compete there, right? It’s basically all data is gonna come along, and it’s gonna put everybody out of business because, you know, there’s going to be these massive because big companies have a leg up. And I just I don’t know that that’s necessarily true. I think this whole thing is going to evolve and if you’re paying attention and you’re taking the right steps and being intentional and you know, preparing yourself, I think there’s actually a lot of room for small businesses to play.