Vince: Anyone played around with that stuff? Natalia, you've heard Claude is the best. Monique you have ideas but would like to try with. Jacqueline, what's funny about that. Two ways. Like the prospect. But the other thing that's funny is like-- maybe any relationship with AI sun healthy. Right like I still don't know if-- are these tools good for my brain and my emotion SNIS still don't know. And Alex you used an agent that used a used a slide deck as in compeling? Or like a roller coaster? Alex: It was awesome. Work on the structure of the presentation and stuck it in Gamma and it did all the slides and created animation it is like a brave new world. Vince: Amazing. When they work, it feels like magic. And when they don't work it feels apocklistic. Alex says you trust co-pilot with your files-- yeah e-mail is like-- if I ever allow it to use my e-mail, I feel like I'm years away from that. You've mentioned trust. You saw Claude in the news-- don't tell clause things that you're also telling your lawyer. Not protected by client privilege. Right. That speaks maybe to the unhealthy relationship that Jacqueline is talking about as well. Important and underestimated. You know I say this as if I'm talking to all of you. Maybe some of you are using Claude code right now. And one of Su going to right back with a bunch of M dashes. That's a great point, Vince. Certainly. Absolutely. I think we're able to get started. Welcome back everyone. I'm so happy to be here BI107. This is usually a pretty fun place to reconnect because last time we talked BI106 week three it was kind of like the data was mostly imagined. Or huh glimpsed at it and now you're most case in the data collection phase. The project launched for most of you. Maybe just a quick shoutout in the chat. Is that the case? Are most of you in the stage where things are launch dad that is being collected maybe you've seen it. Jacqueline yeah-- I guess you could AI do your capstone work. I don't know what's the best way to build a relationship with the client. Oh yeah for months Kate that's great. Isaac's had data. Not to brag. Is it good data Isaac? Conrad air force team yes in collecting data. Andrea got the data back yesterday. That's like a-- kind of like a Christmas or birthday it feels like when I get the data back. Ooh, what do I see? In the collection phase haven't done anything with it yet. Great. Data back in the drive great. Awesome hopefully BI106 was hopefully helpful. You're now transitioning from 106 to what's next. I've got my data. I can calculate a mean or proportion. Or standard deviation. Create box plots. What do I do next? That's what BI 107 is all about so this is a good time for us to have the class. Has the very catchy subtitle inferential test difference. Comparison is at the heart of behavioural insights and that point is not as obvious but mostly what duo in behavioural insights is comparison. Compare treatment to control, pre, to post. Site one to site two. The idea of difference is really key. That part's probably intuitive. Inference. We'll be talking a little bit about inference over the next two weeks. Going to think about it more precisely and maybe critically. What we might mean and what is inference. Start with a land acknowledgment. At UBC Vancouver we're located on the traditionally and ancestral unceded territory of the Musqueam and the Tsleil Waututh peoples. Very honoured to live, work and play on their territory. In BI 106 we talked about data planning. Duh a written assignment at the end that was all about that. Data processing. Okay the data has come in. What's the first step that I do. How do I describe? We talked about histograms. Box plots and measures of central tendency. Descriptive statistics and we did sort of the basics of data visualization. How would you at least look for patterns in the data? Or how would you visualize the kind of data that you have? BI107 is I think a very interesting rich, cool step forward. We're going to think first about samples versus populations. This is the-- I'm going to turn to a few times in BI 107 because it's a pretty important idea. We'll talk about inferential tests of difference. They say comparesen and difference is at the heart of most behavioural insights projects. We'll talk about inferential tests of association. When we have two variables and we want to see if they are correlated or if there's a trend between the two of them. You folks have probably heard about statistical significance before. Very technical term. Substantive significance is another term that gets used a lot. We'll talk a little bit about the difference between these two things. You can have results that are statistically significant and substantially insignificant. You can in theory have the opposite. Statistically non-significant results that are still substantially important. We can talk about that edge case. We'll talk more about how you report some of the results. A lot is about testing. So if you've done the readings for this week and next, testing different models and different assumptions of models. How do you present and talk about that so other people in the audience. Other behavioural practitioners know what to expect and can understand what you're seeing. That gets to your credibility as behavioural insights practitioners. Ultimately all that we've got. You can talk about how you bunked all of this together to report results and hopefully visualize results so you can establish your credibility as researchers. I love BI 107 because it's the close to have the what I do in my day-to-day. I spent part of today at a departmental meeting and when it got kind of boring I opened up my data and I was doing inferential tests. Always happy to talk about it and you do need a foundation in BI 106 to think about stuff in BI 107. So I'm always excited here. When you're done BI 107 there's a few things that I think you should probably be able to do. So one of them is figure out-- given the data that I have and the design, of my study, what's the right test? How would I look at a test of difference or association? Once I know once the right test is, how would I go about and do it? Once I've done the thing, how do I interpret the thing? Once I interpret the thing, how do I report the thing? And because visualization is really important, once I report the thing, how do I visualize the thing? So we'll talk a bit about the thing and we'll talk about all these other. JASP says our days are very different. I feel very lucky. But I recognize that some people would not think that as a great day to be like Oh-- I'll just do stats and data. I do love it. My high school self would be shocked but yeah I definitely do. We have-- okay kind of looks like a big agenda. 12 is pretty big. A lot of these things are nested. T tests. Paired, independent, we'll talk about sort of ANOVA which is pretty closely related to a t-test. Degree of freedom is a concept that comes up throughout. My point is it's a packed agenda but not that packed and the activities are really designed to focus on the things that you need to get from week one to week two. Mostly on tests of different like t-tests and we'll just touch on ANOVA. And we'll talk about what it means. Capital letters probably thinking that's an acronym. You would be correct. Analysis of variance. We'll talk about that when we get to it. So you can think about today as laying the foundation. We'll return to some of these concepts next week and the week after. In other words, if in an hour and ten minutes you're like I am so lost, so confused, that's okay. We have office hours we can chat. You can send me e-mails and we'll return to these same concepts next week. So we'll go slow and together. Let's talk about descriptive and inferential stats. Descriptive are probably a little familiar because that's largely what you did in BI 106. Think about it this way. Most data problems are missing data problems. We want to take data we have and make inferences. We'll call that the population. So if you want to know whether people prefer chocolate to vanilla. Which maybe feels intuitive. Maybe vanilla is standard classic. So you would sample and make an inference of the sample to the population. Most behavioural analysis is inferences. You want from data we have to learn about things about data we don't have. Descript starve TIFTices are how we describe the sample to that readers and different audience members know okay-- these are the boundary conditions on what they can actually talk about. So let's say we look at people who like ice cream and we want to make an inference from people who we didn't talk to about ice cream. The average age in our sample is 40 years old. So first question you might ask is-- does that line up to the population? Is the average age in the population 40 years old? Not necessarily, right? And so how confident we are is partly a function of sample size. It's a function of a few other things but sample size is really important. You can think of it this way. If we had the full population and we calculated the mean, we would know the average age. Because we have the full population. As we go from the full population to a smaller sample, we see the missing data problem in action. If you have a large sample, we can talk about what a large sample is. A thousand people we've surveyed them in a reasonably random way. Haven't surveyed a thousand in old folks home or at day cares. We have a reasonably random sample. Might be close to the average population. And the basic inference from sample to population is at the core of BI107. So 106 we talk about how to describe the sample. We talked about things like looking at frequency table. Calculating histograms to visualize things. Different kinds of outcome measures that we might have. Binary. Might not look at the mean. The proportion or the percentage. If we do have a continuous outcome measure we might look at the mean, the median, the mode. If you are curious about dispersion or distribution we might calculate the variance. Standard deviation. We could create the box plot which would give us the interquartile range. Where most values in the distribution were found. And we found the full range itself. Min to max. We also and maybe it goes without saying, we'd have the sample size. The end. We talk about how the letter N is the way to connote the size of our sample. We had another-- two variables that when one goes up, the other goes up, there's a strong correlation. Either positive or negative. We talked about the correlation co-efficient. All of these concepts are building blocks for inference. So in 107, we're going to talk about how we make these inferences. We'll talk about things like the standard error which often abbreviated to just the S.E. Just one way of thinking about the confidence that we have. That the data we observe in our sample lines up with the data in the full population. We'll talk about a 95% confidence interval. A range of numbers within which we have some confidence. Or some belief that observations would typically fall if we were to collect data again and again and again. We'll also test whether the differences we observe between the proportion of people who like chocolate to the proportion of people who like vanilla. Between those who are over the age of 40 and under the age of 40. Whether these kinds of differences are significant. And we'll talk about the difference t-tests. Differences of proportion. Analysis of variance and square tests. And at the end we'll push on this idea of association between variables. Correlation, linear regression, and chi squared statistic. So in other words BI 106 lays the building block. The grammar for 107. Create poetry I don't want to twist the metaphor but build on those building blocks that we talked about in 106. Okay. Let's say we collect data from our sample. The sample we want to compare it to see whether the difference that we observe between chocolate and vanilla eaters. The full population itself. If you think about it more. Like if you step back. This idea about comparing differences in samples. The proportion of people like chocolate over the age of 40 and under maybe you want to look at the same individuals. The same group who is above the age of 40 they want to look like chocolate before and after. A rich dinner and we think yeah, something about rich salty fatty foods changes the kind of ice cream preferences people have. Now we're not comparing two different groups. We're comparing one group over time. A paired sample. A sample before and after some intervention. And we want to know yeah, people who have a really rich dinner. They tend to like vanilla. Maybe they think it's a palette cleanser and the sample we observe that's what we found. And we also want to know. And pair. How much business evidence. What we have. We're going to think about whether the difference is equal to the same size. This p value before anyone heard? Maybe everybody NAN stats paper? p the S less than .05. Isaac's yeah you've heard of it. It's kind of confusing. But the p here really just stands for probability. Says yes-- thank you-- and I know someone did the readings. Yeah. p is just probability. In this case it's the probability that a difference of the same size we observe in the sample would be observed if there was such a difference in the population. So sampling this larger population if in fact there's noer canlation between rich, salty fatty dinners and ice cream preferences, what are the odds that we would then observe that same difference in our sample itself? There's a few things that go into this probability value. One is the size of the difference. If we find in our sample a huge number of rich fatty food like vanilla ice cream. Okay maybe that's real. This effect is is so strong THAEVen in a smaller sample a flawed sample. Certainly a random sample. We still pick up on this more general true difference in the population. Variability. Doesn't make sense if we think about it as a proportion. But if you think about a large standard deviation. Big standard deviation the sample. One way we can reduce variability is sample size. Talk about that in BI 106 if the sample size goes up we have more power. In this case more sample size will shape with the p value is. These are all in some weird mathematical dance with one another. And if there really is a difference in the population, what are the odds that we observe it in the sample? Or if there is no difference in the population, what are the odds that we still observe it in the sample. These are determined by these three factors. So far so good. But the truth is p values are pretty arbitrary. For reasons that have more to do with history and sociology. Scientists have picked an arbitrary cut-off of p less than . 05. That is we've arbitrarily said that if we were to draw a sample from this population, 95% of the time we would find the true probability of-- there probably would be the true difference between these groups. Between the average or the proportion or whatever it is. 9 5% of the time we would get it right. 5% of the time our estimate would be wrong. That has ledtous think about p values. To think about either significant or fawn-significant or significant. It was above .05. They say this is not significant. And this is very arbitrary. This is definitely debated among scientists. Should we keep doing this? The truth is it's a convention. A normal expecktation. If you show a plot and you use JASP to calculate p values. People will expect you to use a cut-off of .05. You can talk to your audience about whether this makes sense in your case but generally this is the expectation. Whenever possible though JASP will give you exact p values and that's what you should report. What's the return the status go byes? Yeah it could be that people have coordinated on a norm. We've just decided well if we are ambiguous about what a good cut-off value is. It's going to lead to researchers to do shoddy practices. So we're just going to pick an arbitrary threshold because it's a binding constraint on future behaviour. There's lots of behaviour. Surprisingly lot of acumens have written about this. A collective expectation. The behavioural practitioner staff around p values. If you want to estimate the average age. Chocolate preference. The proportion in your data and JASP will give you a p value. Sometimes that p value would be so small. It just says .000. Other times you're going to get a p value. .04. .32. .5. .01. The best practice is to report the exact number. Rather than just saying it was above .05 or below. So my own research a little annoying. I always have to look at the p values and say it equals .032 and sometimes it's very small. Less than .05 because it's to the fifth decimal point. Your violence may vary. When are p values are above .05. It doesn't mean there's no difference. In fact the language we sometimes use that I use Cisse there's no evidence of the difference. In other words it's kind of like saying I don't really know. There are techniques to get more precise, null effects to say there is no effect and I'm confident about that. But generally we talk about where there's evidence in favour or evidence not in FAVer. Non-significant result p value above .05 doesn't mean there's no effect. Doesn't mean there's no evidence of an effect. What if there's no difference in the actual population. That we don't observe. We don't get data on them. But there really is no difference there. That's going to generate in expectation multiple samples over time. It's going to generate high p values because there's going to be no real difference in the population and sampling a good random sample over time is going to generate that. But there might be other things that can generate a non-significant result. If we have an inferential test. What might that be? Natalia is hinting at some of the behaviour here. Other than a true null effect. A true non-difference in the population. Why else might one of our inferential tests generate high p values. Conrad? >> I think if you have a low sample size then basically the way the test is structured. The sample size raises the standard when it's considered statistically significant. >> Yeah we talk about this in BI 106 gave you an example of a telescope far away and one that was close, right? If you have a small sample size it's like you have a really tiny telescope. And you are trying to see a star that's far away. You have a lot of power. The result is just due to random sampling because you have a very small sample size you'll get more volatility. Your sample might just bay quirk, have a high p value and no effect. What else? Other than there being no difference Aaron saying would be a flaw in the experience design. Yeah I could see that. What kind of flaw? What are you thinking? >> Just maybe something where you thought you had successfully done a random sample, but you didn't. Or something kind of went wrong in your data collection potentially too. Vince: Yeah you can see how this gets you to Conrad's comment. That like-- you have a sample of a thousand businesses. And you send the e-mail out. But then you get 8 00 bounce back e-mails. It isn't really a thousand, right? You think you have a big sample size. Actually you have a smaller sample size and now you have that same small sample variability that Conrad was talking about. A SARM size isn't really reflective of the population. That's essential yes when you have a small sample that's what's happening. You just might by random chance have drawn a sample that doesn't reflect the underlying population. And Kate your gain has something here. The effect is small. Not that there's no difference in the population. It's that there's a very wee little difference. In fact I would say this is very common. This is one of the most common things. You can control sample size to some extent. You may be able to sample over a greater time period or you might be able to get a bigger sample. But you can't do anything about the effect size. That's given to us by the statistical powers that be. And if you are just studying something that's a very weak effect size, and honestly I'm thinking something like whether my dinner is rich, might not shape my ice cream preferences all that much. That to me sound like a pretty weak effect. I have roast beef one night so I have vanilla. Roast beef the next night so I have vanilla and soup next night so I have chocolate? I don't know. Sound like a small effect. A lot of the behaviour science work involves small effects. We actually kind of bake it into how we talk about behavioural science. Small tweaks big impact. Yes. That's the best case scenario but there are other scenarios where we have small tweaks, small impact. These are the cases where we might get a non-significant result from an inferential test. When you-- things might say low p value. They'll say this is highly significant. And then if someone's like Oh my p value is .07 it's trending to significance. Or it's marginally significant. Depending who you are with. The p value is below .05 so we see evidence of an effect. Below, we see no evidence. The minute we get to edge cases or extremely significant highly significant, you're going to lose some members of your audience and you are not gaining that much. So my advice is join the club. Weic a STEPT expectation and norm that .05 is an arbitrary but useful cut-off. It has huge problems but many use it. We generally think about it as a fairly firm cut-off. Kate's asking about is there an issue of a large effect on a small number? Yeah if you have a really big effect size, you can get away with a smaller number. Kate: No I mean like it only has effects. Large effect only on a few people. Like my mom won't have ice cream if she's had a rich meal but and I will so will everybody I know. Vince: Such a great point Kate. So two things. One is this let's imagine our sample doesn't have a lot of moms. But the effect is STRONGTHest among most. Or parents. Maybe I did that. My overall ice cream preference has gone up since I became a parent. Always craving carbs. So like yeah okay. There was this larger effect for parents. But our sample doesn't have a lot of parents. What do we think that's going to do when we calculate the mean or the difference or some kind of inference. Is it going to make the effect overall bigger or weak for we don't have a lot of moms or parents in the sample? Like the effect is only there for moms. We only have 2% of our sample includes moms. Weaker, exactly. This has sparked a big literature on heterogenius treatment effects. Normative advance concept but yes Kate you are exact right. When we get a small effect, we don't often know. Is that because in the population the effect is small? Orb is it because in the population there are heterogenius treatment effects and for some people the SEFKT really big? And we don't know. I've read through your BI 106 plans and several of you groups think it's most effective for large. We think effects will be largest for people that are younger. Some of you are already thinking about this heterogenius treatment effects and that's where the literature is going to. So you can have more power and detect the smaller group differences. And if you go at a tech company they'll use machine learning to do this at a very precise level. Users between the ages of 30 and 32 on a mobile device and blah blah blah. These people really respond to coupons and this population machine learning to do exactly what you're talking about Kate. Great comment. Okay. I'm going to return to p values later in BI107. Hold up Hopkins. Questions about p values. We are going to return to p values. Lay the foundations. How do we figure out if there is a difference in the population? In our sample. To make the inference to the population. It was sort of called t tests. They are extremely common and they are the go-to first approximation to figure out if a difference in means between any two groups is due to chance or not fop get that p value for a difference between two mean values. And as it turns out you can use them with pretty darn big proportions too. t-tests are incredibly flexible. Super robust. You'll see them in the most advance STAD TIFT call papers. And you'll see them in first year interquantitative methods and courses. They are extremely robust and flexible. We won't get into the math too much. There are different kinds of t-tests. First the one sample t-test where you have one mean and you want to compare it to some benchmark value. We don't know if this is higher or lower. And they are going to talk about independent samples t tests where you have two groups like treatment and control. And also talk about a paired samples t-test. The same group before and after. Or the same group on two different variables. And to end we'll talk about ANOVA. It is more like a family of tests. And even-- an extended member of a larger family of tests. Closeren this cousins. Siblings separated at birth. We'll talk a bit about ANOVA and when you would want to use it. In fact one of the activities for classes today I'll ZOU think about is this the case for ANOVA or not. A one sample t-test again not the most common. But really a basic one. So it's where we're going to start. Formula is pretty simple. We'll take the mean that we observe in our sample. And subtract it from that we'll subtract the mean for the full sample. So each individual from the mean for the full sample. That lets us compare the mean that we get to some fixed value. We want to know okay so my sample my proportion of women in my sample is .5. But I know in the population I look at census or something like that the true proportion is .52. Is that different significance or meaningful between my sample and the population? A one sample t-STEFT going to let douse that. And JASP it's super easy. Here we'll just compare the estimated savings from energy-efficient television. We'll see if it's different from zero. So in the variable I add t estimate and the test value box I put zero. And you see the p value which is so small, that JSP just says I don't know it's less than .001. Now I'm going to do the same thing for a different value. Let's call it $150. You'll see all that changes the test value. And the circle I've put now 150 and the p value it's still less than .001. In other words these cases the estimated savings from the television are different from 0 and different from $150. Probably positive think about how much money people might save. People generally think you save more than nothing and more than $150. That can be interesting as a basic descriptive inference. One sample? Yes. One sample test is different from one tailed test yes. One tail is when you have a directional assumption so it's higher or lower. Independent samples t-STEFT when you have two groups that you know are different. And the most common case here is experiments. So we have randomly assigned treatment and control. We know they are different. We want to see for example whether this BC hydrosticker this decal actually decreases household energy use. And we measure a continuous outcome. Like energy usage for the weak. Some get the decal and some are not. t-test formula very simple subtract means divide by the full sample of both groups. And this will align with the p value. When people talk about a t-test, this is usually what they mean. Fewer people say Oh, we did a t-test. They are probably talking about independent samples t-test. In JASP also super easy. t-tests click down on independent samples. You get the estimated in this case-- I think this is the vacuum one. Savings from a vacuum cleaner. We're going to compare by gender. Not randomly assigned but we know there's independent samples because we've coated them as male and female. And we'll standardize the effect. Talk about Cohen's D and standard effect size 106. We do that here we see a Cohen's D and a p value that comes with it. The effect size is negative 0.277 a moderate difference. p value is .151. Would you STHA p value. Savings by men and women judging by these numbers. No-- what's the tip-off? Why no? And such a quick no. p value is too high, yeah. What were the p value have to be for us to think that yeah there is evidence of the significant difference how men and women estimate the savings from vacuums. Isaac's on it. You can sort of see why researchers might start to use words like marginally significant. Because FS p value came back at .051, you might be tended to be like Oh it's almost marginally it's close to nearly-- yeah. And Nathalia hey wait a minute negative 0.277 is less than 0.5 right? You're exactly right. We're looking at two values here. One is which the Cohen's D. Standardized effect size of the difference between vacuum estimation costs and the p value though goes with that effect size. But Nathalia you're flagging a really good point here. Can my p value ever be negative? A probability value. Can my p value ever be negative? Aaron says no. No with a question mark my favorite kind of no. No, it can't be. It's a probability. p a probability value was always down by 0 and 1 so there's two flags. One of which the t market. But hey it's a negative. Your intuition is right. Okay. But you might ask me wait a minute Hopkins. Standardized effect size. I don't speak your fancy math language. What's the difference here in dollars? We have a continuous measure of vacuum savings over time. Why don't you use actual dollars? A meaningful unit. He doesn't know what .277 is can we give them a dollar value? Yes in JASP it's called the location parameter. Thought I would have named it. You will get a mean difference. $208 between how men and women estimate savings. Talk about eight little bit. I don't know-- difference in groups? Maybe group difference point estimate? Group difference? Maybe I would have asked chat GPT what should I name this? When you see the ice cream difference. Called the ice cream facter? The i factor. You can look up here and see mean difference and SE difference. We'll talk about standard error later. What is a standard error difference. Add a measure of uncertainty are about the mean itself. When the standard error is big relative to the mean. That's a pretty good indication that you don't have the significant effect. We'll talk about why that is later on in BI 107 I love that stuff we'll get into it. That brings us to number one tests of difference. We'll put you folks into breakout groups. Two slides per group. Should go pretty quick. Activity two is a bit more complicated though I've given you everything you need to do to answer it. I also uninstalled JASP with prejudice and then re-installed JASP to make sure I don't make the same last time where we had a version control error. You folks were finding weird buttons on JASP that I wasn't because hi an outdated version JASP. I've removed it and reinstalled it so pretty confident now instructions and the visual should line up with what you see. You tell me if I'm wrong. Okay the instructions are in the chat there that'll be shared. You know we'll give it let's say 10 to 15 but we'll play it by ear. Any questions before we create the breakout groups? t-tests. Samples, inference, population. Yeah it will get shared in a second. No questions. Great Shakti will share the link to the slide deck. Jacqueline apologizes in advance. And few folks have questions. Raise your hands otherwise. Thanks everyone. Kate: You're reverbing a lot. Isaac: Yeah is that just me? I'll take this off. Kate: Are we supposed to be existing-- yes. Isaac: It's echoing for me too. I don't know why. Jacqueline: Can everybody mute and then unmute one at a time? Isaac: So let's see. Okay. There you go. That was weird. Good idea. Okay-- I have it open. I can share my screen. Are we group one? Okay. [ Ann speaking ] Isaac: I think so because there's two groups. Right? Jacqueline: Wouldn't it be paired sample because there's two variables? Kate: We're on activity two. Aren't we supposed to be on one? Isaac: Oh that's weird. Okay. Why is it all-- okay there you go. Okay different questions. Jacqueline: For this one I agree independent sample tests. Isaac: Yeah so that'll be for the second part. So he wantstous put it in JASP. Oh hey Vince. SFWL Vince: Jacqueline apologized so I felt like I had to jump in here first. Okay cool. I can open that as well. I think I have-- cool. Jacqueline: An independent t-test so we jump in and do our own and come back and compare, correct? Vince: Love it. Sort of military efficiency with that. Kate: I'm British, not Canadian. That's what that is. Vince: Right there's no long passive aggressive Canadian. How would you feel about-- what did you think about--. Shall we? What if we-- yeah, okay, great. Any questions? Or are you folks making your way through? . Kate: So the lightbulb-- the cost estimate incandescent lightbulb electricity would be L estimate? Okay. Kate: Ann, it's your system causing the reeverb. It's something on your side. It only happens when you're off mic. Ann: Well I don't know what happened how we put it today. Vince: Any other questions that I can help with? Kate: That's good, right? Vince: Okay. Just raise your hand if you have questions. Isaac: Okay thanks Vince. Okay so are we going to do this ourselves and then just share? Kate: Okay so go away and do it and come back and say what we found. Jacqueline: So Isaac we all got p equals.0146. Isaac: So it's the same? Okay we'll go to the next part. Kate: [ Reading the question ]. Isaac: I think it's the other one. It's just saying people. It doesn't specify groups. Just people in general. Like the population. [ Ann speaking ] Jacqueline: I have no idea how to calculate this one so FE if we wanted to do it as a group I would appreciate it. Kate: Me too. I don't know where you put everything. I'm totally confused. Like where would you put. Jacqueline: Okay I think it just got easeier. I was in the wrong t-test. So it's the L underscore estimate one right? Is the variable? Kate: L underscore estimate is the variable but then--. Jacqueline: This is in an independent sample. Isaac: So variable is L estimate. So is it just that? Kate: But then how do you compare it to what--. [ Ann speaking ]. Kate: Where the hell are you all looking? Jacqueline: Highlighted blue L estimate then check. Checkmark student. Test values. Isn't that where just showing us you would type in 240. Just type in 240. One sample test yeah. Isaac: The test value is 240? [ Reading ]. So yeah the test value should be 240. If I'm doing this right. Isaac: So then the p value changes. .529 now. Kate: So if you put in 240 it's 529. It changes yeah you're a genius. We're all geniuses. Thank you. Okay perfect test value. Okay yeah. And .529. Isaac: Is that it? Is that simple? Vince: Hey group one. Jacqueline: Either we're doing it right or we really screwed up and Vince will laugh at us later. Kate: You're awesome. Everything about you is wonderful. So I think we just have to do those questions, right? Yeah. Vince: Awesome. Let's take a ten minute break. When we come back we'll talk about assumptions and another group activity and that'll be the end of class one. So we said t-tests are part of a broader family of tests. We'll talk about it more in this course. We should talk about some of the assumptions that go in. In correlation. Regression. They are all part of the family of models called linear. Some that are important for Russ assuming observations are independent. What we mean by this is basically we're not asking the same person the same question ten times. Or asking ten individual people sampling ten different times. Another one is the dependant variable is continuous. It's either interval or continue. Maybe a lickert scale with a bunch of different values but not binary or categorical. The truth is these models work really well under certain situations when you have binary outcomes. One assumption is that outcomes are continuous. We've talked a little about distributions which we'll continue to talk about in BI107. The normal distribution, the bell curve. One assumption is that our data are normally ZRIBTed. In other words sample INTIG enough that we estimate the mean. It looks in the real population as if it's normally distributed. We want to have really big outliers. Really strong skew. A certain one or two observations that throw everything up. I have a survey measure. I have a paper where I ask people different partisan identities in the United States about different occupations. I ask them how much would you charge to work? Some of these are sort of Conservative. One respondent. A one, a five and a hundred zeros. And you can imagine I got the data back. The gillions of dollars. A bunch of other assumptions. You can call the Markov theorem in office hours if you want. When you violate these observations the results aren't quite reliant. This matters here thinking about inferences from the sample to the population. Has really big outliers. Skew. If we violate these assumptions the p value that JASP gives us might not be reliable. Some violations matter a lot. Outliers. Some don't matter quite as much. It's not quite as bad as the others. When you have a paired sample. Two groups. Men women treatment control. Now a pair samples. The same group either across time or within the same. Mostly for your projects in things like pre-post designs. Kiosks and the effect on some outcome measure before and after. Comparison was the outcome measure higher or lower. Before or after the intervention. This is the kind announces a single group. Multiple variables. And we go back to see difference to an independent sample. You have two groups. And one outcome. Independent and paired samples. Independent samples two groups. One dependant variable. And then there's a little note here. If you think about what's going on in a paired samples t-test. You are comparing whatever number of people that show up in person. Whether there's a sign or not. The after score from the before. And the one sample t-test and see is that difference-- different from zero? Because the testing. I think the difference before and after. Is zero. That's what I want to know. Did my kiosk and coupon intervention change things. A change would be something different from zero. Tracking before and SXAFR see whethering the actual result you get in the sample is different from zero. Paired sample t-tests very easy in JASP like the others. You've already seen it. When you click down t-tests. Independent paired. One sample. You select paired samples. t-tests. In this case we're going to look at the pair of the estimated savings from television versus the estimated savings from a vacuum. Did people think one, you save more money from one or the other? We're going to click here effect size. Descriptives. Helpful because if you don't click that JASP won't tell you what the mean is. Into the comparisons. That's usually important. At least for me. I always want to know what am I comparing. The television $919 from a vacuum $460. And if we hadn't clicked descriptives I wouldn't have seen that. Also going to check 95% confidence. We'll talk more about this idea in BI 107. If we were to draw samples from the population again and again and look at the difference between t-estimate and v estimate. How often would we get the true difference in the population? How would we solve the missing data problem? 95% of the time the true value would be within this range. We'll talk more about that later in this class. The key take away here is that in this period samples t-test we could see that there is a difference. What is the difference. How much money on average do you think people save in an energy-efficient vacuum. Just by looking at the screen? >> Like $503 or something. Vince: Yeah something like that. 919 minus 416. That's a pretty big number. And these sample sizes aren't very small but not tiny. 810 observations my first indication is that's a big effect size. People think you're going to save more money from the television than you save at all with the vacuum. It's kind of a big effect size and the sample sizes are not that small. My mind is thinking I bet this is going to have a low p value because we have a big effect size. And a small to moderate sample size. And indeed that's what JASP says. Here that the t-test the t statistic 44.99. The p values less than 0.5. So small. The standardized effect is somewhere between .28 and .68. Somewhere in that range. In other words on this we would see that there's evidence of a difference in how much people think you're going to save from an energy-e efficient television or VOK YUM. And maybe that makes sense. TWOIF dogs. People watch a lot of television and it's a big screen. Maybe that kind of makes sense. Erin: Maybe you mentioned this and I forgot but when you're talking about the effect size, what kind of range where you said this is a large effect. What would be a medium effect? Vince: This goes back to 106 in lecture three. This is a pretty big effect. The .48 would be moderate to large effect. So the closer you get to one is bigger an effect. The closer to 0 the smaller the effect. But yeah it's also pretty arbitrary. I don't know that Cohen imagined 60 years later people would be like-- quoting his arbitrary numbers as truth. Oh that's a .2. That's small. This is .5 that's moderate to big. It's a little arbitrary. And for me personally when I-- if I have standardized units in this case like dollars. I would leave with that. Right? People think save SGS $500 over ten-year period. And that seems like it's large. It has a Cohen's D of 4.. whenever possible Cohen's D for me as behavioural scientist. But I probably would not present that to decision maker if I had units like dollars. That's what I'm going to present. Cool. E Rinne: Just to clarify .48. That's a moderate. Vince: You can see the community thresholds. Substantive significance. Which is something we'll get to later in 107. I don't know if we'll return to this concept. But I'm a little less convinced by effect sizes. I need to know a little about the behavioural context. A small effect size a Cohen's D of .1 something sounds small. But if the outcome measure is infant mortality. That's to me a very substantially important effect. And so I don't try to let these arbitrary cut-offs of below .3 to 5 to 8 being small or moderate large. I try not to let that thank my thinking too much. What is the outcome. What is small medium large. What I see is don't stop there. Think about the units of the YUT come measure. What are we trying to measure here? Does that make sense? Erin: It does. I think my question came to it being 0 and 1. Is that a large effect? Obviously you'd probably never get a one on that. Vince: Same individuals that we've asked in the question at one point in time and another question a different point in time. Depending what the outcome measure Su might get a TREELT effect. Needing to plug in my computer here. Vince: Yeah .48 is generally pretty big. If I had the sample size it would be like nice. That's a B effect. In practice. Nathalia? Nathalia: Showing the click here. The dependant t-test. My assumptions that I need to go descriptive check-- then go to select. Is there another way to do? Or has to be a true step. Because if the means are doing. Vince: In JASP you could just do this in descriptives. In JASP you could go into t-test and click the descriptives box it will give you a little summary table like it does here with the sample size. The means. Vince: I think it's there for independent samples too. Vince: For independent samples it gives you the location parameter. The difference between the two TWO groups to calculate the mean of each group independently. You'd probably have to do it in descriptives first. Any questions about this paired sample t-test. When we use it. Why? How it's different from the other t-tests? Using her vacuum more than her tv. It's true. I used to have a whole life literally never vacuumed. Didn't own a vacuum. Watched a lot of tv. Life was good back then. It's good now but good back then too. What about three kinds of t-tests. One sample paired and independent. What if I said in life you can only pick one. What would be your go-to t-tests? Your ride or die t-tests? Paired, one sample or independent samples. One of the cases independent. Why is that? Kate: It just seems more useful. Vince: I would have the same answer I'd generally say independent. Usually try and intervene in the world. I want to make it better and change the world. I want my intervention works or not. And when I say works the difference between treatment and control is different from 0. I told you someone was commonly used. Thinking about interventions. And thinking about fair comparisons. And all the things randomization does for us. That's what pulls an independent t-samples test. A lot of pre-post stuff. You want the know people when they arrive and leave. Unfortunately you don't have to choose but sometimes students can get a little overwhelmed by the t-test. I just want to stress independent samples that typically using behavioural and applied real behavioural science. When you're out there and at some conference your big difference 2026 and people are talking. They are probably talking about independent samples t-test. And when you are thinking about your presentation should we talk about our t-test? You're probably going to think about independent samples t-tests so cool, okay. ANOVA or analysis of variance. So we use samples when we have one mean. One sample t-test. Two means. Paired and independent sample t-tests. But what about when when we have three or more. Did anyone up wind up doing a factorial experiment? High and low. Alex says yeah your group did a fabbing experiment. Behavioural science more generally. We do a lot of experience. What happens when we have three means? Multiple t-tests? Control instrument one. Treatment one minus treatment two. Yeah we can do that but it starts to get complicated. This is where ANOVA analysis of variance and other techniques like linear regression become very handy. So let's say we wanted to test the effect of these BC hydrostickers that we sent out. But we tested different kinds of stickers. Two compared to a control group that got no sticker. Now we have three groups. Things that are bit more complicated here. Now there are lots of different kinds of ANOVA and as I've stressed, it is just one flavour of a whole family of models called linear models. Linear regression. What is common is that they generate p values. ANOVA is special. Repeated measures ANOVA. All the kinds is p-values. So people often use it as a first approximation. And people say I have three groups. Treatment one or two. Is there any significant difference between these two groups? Is there any at all? Is there a difference? FLo-up tests. This is where ANOVA can be very helpful. In JASP, ANOVA super easy. We pull over the estimated savings from televisions. We pull over what JASP called a fixed factor. In this case it's education. We calculate descriptive stats. And when we do this in the ANOVA pane, it generates a p value automatically. So all the values of education. Some elementary. Complete almostry. Some high school. Completed high school. Some college. Advance training. It's comparing all of them. In this case the p value is no. No, I'll ask you. What do you think is going on? Only energy-efficient television over time. And we're looking at the mean estimated savings across levels of education. JASP is saying your p value is i. .82. No systematic difference. No evidence of a difference between these groups. There's truly no difference in a population among the whole-- communions who have different levels of education. Or do you think there's something else going on that's leading this p value. Kate's already hit on it. The ends are tiny. Kate: Almost be impossible to say. Maybe there is-- maybe it is actually about the population. And this is one of the unfortunate things about underpowered experiences. And we can adjudicate between Kate and Conrad versus what Nathali SASHGS saying. These are both equally plausible answers. We can't say whether the difference is because these sample sizes are really small. Or in this country in this context. There is no difference between people of different levels of education estimate the energy savings. Erin? Erin: Just a question. When you do this analysis and these seven groups. I'm just curious how it compares. Them, right? Because if say imagine yeah I don't know some groups were very high and some were low. Looking at it more like this. How it compares them. Vince: This is something we can chat about in office hours. Analysis of variance. Going to use the variance. The standard deviation if part for these different groups as a key way to figure out which difference matters for the mean estimate. As the mean varies. ANOVA is going to exploit the fact that some of these have higher or lower variance in relation to the difference in the mean. Why we call it ANOVA at all. Oh sorry yeah. Erin: That make sense. Thanks. Jacqueline: So just thinking about it in practical terms it's just reflecting that education just may not be a criteria where you are seeing variance? When you're looking at something like tv so culturally embedded doesn't matter if you are 12 or 22 or 82. We've all been exposed to tv so we know how it works. To your point, a F-year-old isn't maybe out there vacuuming. You know a 21-year-old is maybe not vacuuming to the same extent as a parent and a family that's 40, right? Is that part of why it's high p because it's letting you know that that is not-- helps you to narrow down what criteria is-- at play? Vince: It's getting something-- what Nathali SASHGS saying. Maybe just in this context. In this population. Doesn't matter for this comparison. The problem is because sample sizes are so small. We don't have a lot of confidence one way or the other. Look at some of these mean values. People at the lowest for 240 bucks. People at the level eight education think you save $1400. Serven times more but only three people in the first group and only seven in the next group. So it could be exactly what you're saying. Education is not salient. These groups aren't real. That could totally be it. Could also be looking at ten people. That's just a very small sample size. It doesn't solve our missing data problem. We want to know the group differences for millions of people. We've only asked ten of them. Three have little education. It's just not enough sample size. So this is why underpowered experiments are so heartbreaking. Because you started-- duh the whole thing to answer questions and at the end you have more questions than you did at the start? It's a very unsatisfying feeling. I say from personal experience. You can also a repeated measures ANOVA like repeated samples t-test. Except this time you have multiple variables that you're comparing. So here we're going to look at the different planning variables. We're going to compare all four. And then we see the-- within subjects effects here. The p value .001. ANOVA is helpful we have multiple means and it's the first thing we do before we compare is how we compare those means. First approximation. And also talk about regression and high scoring tests. Regression is also handy. Because it lets Dow a lot of the stuff all at once. The results are harder to interpret. But these models are almost interchangeable. And asking yeah knowing that could I sort of use selection bias and design a sample around-- yeah, I mean people-- that's like-- I wouldn't say it's a good research practice. But yeah-- it's all one problem. You generate other problems, but yeah. If you look for a lot of these. You'll see these df. This is called degrees of freedom. And it's a little bit hard to get your head around at the start. We'll return to it in 107 but you can think of it in like how much wiggle room you have in estimation. And I don't mean your ability to fudge the results. What I mean is how many bits of information went in to calculate the estimate in the first place. So if we ask three people what they think how much money they'll save from television? And we pull out the mean from that group. We've removed one degree of freedom just estimate the mean. In practice degree of freedom is helpful as a researcher can K tell you sometimes about what your sample size is. Use it to then think about how many people actually comparing researchers don't always give you the sample size. And when you have simple models like these t-tests you tend to always have the same degrease of freedom. Just something to report. Which you go find more information about at the end of these slides. It's not the most relevant thing. Talk about it more. It has to do with the numbers we DWRUZ calculate estimates of how those relate to the samples that we draw. Which brings us to class activity number two. Slide two of the google docs of the google size file. This time I want to ask you to create a computed or calculated variable. And might have ban few months since you've done that I've given you instructions on how to do it. In those instructions you'll see to click a back plus sign in JASP. Unfortunately JASP will only show that sign when you load data fresh. When you start to do analysis JASP will make the plus sign go away. There are other ways but that's one of the easiest. Give you instructions here's exactly what you do step by step. Jumping into the groups that might be the trickiest part of the activity. Always raise your hand if you have questions. Take about 15 minutes for this. We'll come back with a keytake aways and then I'll have office hours. Any questions before we jump into activity number two. Kate: So if you look at the descriptives box it's 7,000 to change means fs estimate and 200 for the L estimate. So furnace clearly higher and we know it's significant don't say very significant. Because it's less than 001. Perfect, okay. Isaac: Also first class back it's hard. I'm a bit tired today. Jacqueline: And it's building off of the last one with this really long gap. Pretty sure-- just laughing at us now. Saying ha ha, this is a behavioural insights test. There is going to be a report on this. Isaac: So for the second part, just mentioned this. [ Ann speaking ]. Kate: Is it repeated measures then. [ Ann speaking ] Kate: Oh, yeah all right okay so--. Isaac: So what are we calling it? Shakti: It comes when you load the data freshly. There are other ways to do it. If you like my help I can help you with it. Otherwise you can just read all of the data. Isaac: I'm confused. Yeah. Shakti: Go to insert at the top. Isaac: Sorry I can't hear because of the echo. Can everyone mute themselves. Thank you. Sorry can you say that again? Shakt IRGS: Yeah can you go to insert? Column after. Just double click on it. And then you can pace here. Isaac: Okay then copy and paste from the slide. Yes. Can you call at the bottom? Okay that's it then? But if you want to change the call. Yeah it should be good now to see the issues. Yeah. Just come back. Kate: I can't copy and paste. Jacqueline: I'm thinking is this ANOVA? Isaac: We can try. Or is it just normal? ANOVA? Kate: Three different measures. I think it's repeated measures. I hit the has to be. ANOVA on its own is multiple groups. But one--. What was the name of that new column? Yeah column 18. Isaac: Just looking at the slides too. Do I put it here? Vince: Hey group one. Okay so it's actually not a repeated measures ANOVA. Kate: That's my fault. My bad. Vince: Because each panelist has only given one number. Panelist ID number one was in week one. A different panelist was contacted in week two. So though it looks like it's repeating because it's time, they are different people because they are recruited week one. Then week two then week three. So column 18. Isaac: We didn't make it continuous. Supposed to be. Vince: It doesn't matter. And then-- dependant variable is-- do you recruit people who are younger in one week then another week? Awesome. Exactly. Kate: And the p is high. Vince: I'll leave you to it. Kate: Sorry that was me. Isaac: No my brain is not working today. Three different means compared across three separate weeks. Kate: Also different people. That's what he was saying the reason why it wasn't repeated. Is because they were different groups of people. Isaac: Okay so if the groups were the same, it would be repeated? Jacqueline: Doing multiple touch points with the same population. This is just stagnating. You have your general population. And when you do your intervention we have a sample. Click on the descriptive statistics. Isaac: Okay. Is that all we need to answer though? Compared to the age of each groups. Equivalent or not. In each of the three groups. So the average is the mean. It's fairly close so RP is saying and that's confirming. The p is confirming it. If I'm reading that right. Isaac: It's not statistically significant. Yeah. And then okay. You can also looking at the means they are very close, right? Jacqueline: We can't always go by that. Sometimes a small mean difference. Because our population is so small. Yeah. Isaac: TOI get my charger. But that's it, right? Okay. Jacqueline: I was just looking at the slides. It's like ZPRIE work on it on the weekend it's due on Sunday. Kate: I'll do it tomorrow instead. Kate: The main thing. Jacqueline: So what I've learned is make sure you have a statistician on the team. Kate: I swear all the stuff is asking to be AI generated. Jacqueline: Apparently Claude will do it for us. Kate: Show you what Claude. Jacqueline: Claude went on Easter vacation. I'll get back to you. Kate: They used to do these quizzes. And he said I'd ask Adrian. And he got the right answer. And I think that should have been a tick. Vince: We're back. I wasn't sure about adding the computed column with presher and time but I felt like it'd been a few months before we'd done a computed variable so my pleasure for giving you the formula. I think someone asked how do we come up with these formulas in the future. You can always book a one on one chat with me we can talk about that. The basic idea is they function like an excel formula. So if you are used to locking up excel formulas you can always reach out to me. Always happy to talk about how that works. If you have complicated computations. That is true. So I framed BI data analysis as a problem of missing data. We want to know things about the population. How do we make an inference about this using this. How do we do that? Gave it even more precision. Actually most interested in comparing averages. We want to compare the means from one sample to another. The difference between the means between group one and group two also exist in the full population. The data from which we don't have. Linear models for families. The t-test as the most common statistical technique to make this kind of group comparison. From a difference in the sample would obviously exist in the population. Would give you some language around that. As you probably picked up on it gets complicated. Independent sample t tests one ANOVA repeated linear regression correlation. Lots of different tests. Hopefully we can simplify the decision for you and reach out to the BI community people like me. AI of this data I have a question how do I figure out. That's what we'll talk about in BI 107 a bit more. Is how we Fick right model at the end of our design. At the end of these slides you'll see some additional information about how to report results using APA format. Also more advanced stuff on high scoring tests. Just saying this week's slides have a lot and some additional bonus material some of what will be helpful for the problem set which you should have access to. The video will be posted on moodl. I'll have office hours right after class so if you want to chat please stick arounder. Problem set due soon. Bunch of multiple choice fill in the blank questions. Later on another written set kind of like 106. Applying all the concepts here. This is our first introduction. And first class of BI 107 thank you so much for taking the time. Enthusiasm and intelligence and patience and comradery. I'll be sticking around for office hours a little bit. If you are unable please feel free to reach out by e-mail and we can book time to chat. Tell me about your data situation. Happy to offer insights. Thank you so much everyone. Isaac: Thanks Vince. Vince: Bye Isaac. Thank you. Nathalia: Yeah when I was reading the previews-- I was stuck with thinking if I would do something wrong in using does it work. Because I don't want to-- against the whole population. My inference is against mainly hostal workers. Is my-- okay or every time I'm doing this against everyone? I got confused then it's not going to be true because they are not health care workers. I don't know I got stuck with that. I don't know if it matters but I thought I would ask. Vince: Well on one level it doesn't matter. As in whether you think you are making inferences to population of Hospital workers in British Columbia. Hospital workers in the Vancouver coastal health authority. Over the age of 40 in the coastal health authority. It doesn't really matter. Everything we talked about today applies. You don't do anything differently. What does matter is how you think about sampling. If you want to make inferences about Hospital workers. You probably don't want to do a general survey. And so you have to convince your audience in two ways. One sample size. Big enough sample to make inferences to my population and you will believe me. But the other one is much less scientific and much more about let me tell you about how I recruited these individuals and why I think they are representative of a population that's very specific. This is hard to do. I said the immigration and one thing I love about studying migration is how diverse it is. It is extremely hard to talk about my population. Right now in the field to newcomeers to British Columbia and I can think about my results being representative in terms of Visa status. I know British Columbia 77% of new comers are permanent residents and naturalized citizens and about 23%er are non-permanent residents. Okay so I recruit a sample and I tell you I have 77% permanent residents. 23% non-permanent. And you might think wow that's really representative. I think yeah good job that's really representative. And then I look at country of birth. And I'm like Oh, damn. This since representative at all. I have way too many Europeans in my sample. Vastly overrepresented in my data. And so this is not about sample size. It's about sampling and me telling you hey, my sample is representative on Visa status and in terms of education level. And in terms of occupation. Not representative in terms of country of origin and language. This is more like a building my credibility with you. So on the one hand it doesn't matter because everything we talked about is the exact same. On the other hand it really matters. Your result or inferences are only as good as your sampling. Does that make sense? Awesome. >> I would be sampling Hospitals more about the unit I work. But some of the tests from the previous class I know I can use for this one I don't know. Vince: ZPUK think about representativeness in every sense here. Let's say that you surveyed Hospital workers specifically. Dower work with multiP Hospitals? Nathalia: Children's Hospital in-patient unit. Vince: I'm going to sample children's Hospital and there's 400 employees. And you sample a hundred of them. 25%. Is that good? I don't know maybe they are all managers or first year residents. Or all people who have ten years experience because you did the sampling during business hours and seniority predicts who gets the day and night shift. You have really good representation of day shift workers. It's the right unit. Right Hospital. But they are not representative of a full unit. Nat: Oh I consider the unit okay. Vince: So uh some flexibility here. My sample represented in these dimensions and not in these dimensions. Andrea: Would you highlight that in both scenarios reasons for that? So say that example that you just gave to Nathalia, more managers filled out her survey because they sit at a desk more DURLG during the day than students who actually aren't on physical devices. And they would have to go to their phone and look at a survey through their phone. Maybe they don't do that lout the day. Vince: Yeah it's even worse. I want to study burnout. In-patient unit care? A very specific population. I surveyed 400 employees. No evidence of burnout. Really low. Now I asked a hundred people the average salary was $250,000 a year. I did my recruitment at is:30 P.M. And I asked for 60 minutes of people's time. And they were like well wait a minute. The people who can afford to take 60 minutes out of their day in the middle of the afternoon and earn that kind of money? Might be the people who are least likely to experience burnout. And tell you one thing about representativeness in outcomes and comparisons. And so yes-- I would communicate all of that. But also do thinking on behalf of my audience and say here's why that's a problem. In my case the survey of newcomers. We launched it. All did it in a three-month period. Highlight it's been great. It's been a huge headache though. Mostly a country of origin. Because what it is showing me is how little I know about these individuals. One variable complex life and I'm getting these huge differences. And I can at least tell you how they are non-representative of my country of origin. One of the things not representative about that I can't measure? Andrea: When you take that out remove country of origin so it doesn't skew people thought? Vince: In this case I did a much more complicated thing. I worked out Claude code a lot to find-- something called survey waiting. Sow found-- I said my population newcomeers to British Columbia. And now I want to find out what is the average education level of newcomeers to British Columbia. What is the percentage of people who are permanent and non-perm SNENT what is the average age? What is the average reasonen of birth. And then I use complex math. Magic hand wave to adjust and wait my survey responses. And so groups that are over-represented count less and groups that are under represented count more. And this is a very well-established technique. And if you ever read survey data in the newspaper, people are using weighting. So hi a statistical solution for it and then I tell my audience these are representative by Visa status. Region of birth. Age and education and gender. They are not representative for family size. In terms of whether they've experienced discrimination. I don't measure in a survey I don't know what the population distribution is. What percentage have experienced discrimination? That's an unknown quantity. We don't know that like census data. It's a communication thing. I'm trying my best is and thinking about this really hard so that you don't have to. And I'll tell you all my limitations and hopefully at the end of the day you trust me. And then I don't lie to you. That's my approach. Melanie: Vince I just have a quick question because I have to get back to work unfortunately. Only 3:15 for me so my company is very good at giving me a couple hours off but the e-mails are building up unfortunately. I'm struggling with our capstone project slightly because it's work safe BC apparently we're doing open rates and click rates of e-mail. And my brain just can't get there. If it was people or something just-- and apparently very few people open e-mails from work safe BC. But we have a high click rate. So that's good isn't it? But where can I learn more-- going so--. Well maybe. Unless you're talking about what you're talking about. Unfortunately if I'd been doing patients. I'm a risk management special T in health and safety. But e-mails probably one of the bad ones. I don't open e-mails either. Vince: You don't have a statistical problem you have the data problem. Okay here's what I tell myself. I work on the most boring projects. Especially really-- just what I feel like why did I go to school so long? Why did I work so hard to work on these stupid projects? Let me tell you about how I motivate myself. A stupid project that feels-- I felt that way. Is a great project to learn from. Actual health matter. Like illness is an outcome. Morbidity is a plausible outcome. Income assistance received. Those are scary projects to learn on. I learn by making mistakes. So I motivate myself by one by saying hey this boring problem. This is a great place to learn. I can make a mistake and I won't feel bad. That's one thing. Another motivation is ever heard of what's a conversion funnel? A very common phrase in tech. And you've experienced it before when you log onto net flex and it says create an account and it lets you choose an avatar. Here are the top moveies and then you scroll down and it shows here comedies. And Netflix is tracking everything that you're doing. They are counting when you open the app. How long it takes you to create an account. What avatar you choose. The way this movie is pore trade. How long you scroll. It understands human behaviour operates it as a funnel. Focusing of attention. And it gets measured by pupil dilation. To go from that to clicking is a very systematic general pattern of human behaviour. That grabs our attention. Triggers a motor response. Which makes your muscle click on the mouse. This matters for life and death things. Whether people stand up for prostate cancer screening. Whether people gate pediatric consult. Pupils die late and finger moves. Attention allocation and effort full behaviour. What you are learning is the cone version fund. Triggers motor response. This is a great way to learn it. To do it some day with health and safety. And you want to learn all that stuff and thought about all that stuff now. So it's a great way to learn and the things you learn are very general about how to grab attention and how effort can change people's lives. In time I've come to believe there's bad project. There are maybe boar progress JEKTs but not a waste of time. Very good use of time because I'm a lifelong learner I always want to get better at what I do. So that's my pep talk. Melanie: Well I'll try. I guess it took the wind out of my sails. And Jacqueline was like Oh well I wasn't expecting it. I was expecting we'd get some click throughs with all the work that we did. And we have three. PEEMENT have opened it just about 50%. Work safe BC tends to get 30%. So 50% of the people opened it great. Of the 50 we've done three different e-mails. We had one click on one. We don't have huge sample size. 160 in each group and the other one we got three. So we are not getting a lot of people clicking through and I don't know whether it's related to the fact people don't open work safe BC e-mails. We were told that's the best way to communicate with our group. Vince: Honestly this sounds great because you have a huge open rings and they are always lower than reality. Some people use tools like g mail. Which blocks images. Downloaded onto your community. Google walks downloaded and images. Download images from the centre. It's talking about trackers. So depending what people use you are underestimating open rate. So you tell me awe OO% open rate I tell you you have a minimum 50% open rate. So you have a very high open rate. That's interesting. Something about your e-mail is getting people's attention. You're dielating pupils. They are drawn to it. No data is data. Melanie: I say we don't make mistakes. So maybe that's what I have to keep thinking. We kind of thought we were going to get more. Open rates. Really well established government open rates are higher. You can imagine why. Vince: Night life? Melanie: Pubs and bars. Because they are the ones with hearing loss nobody is doing anything about. I believe it's because they don't understand risk. Vince: Risk is great because it drives attention, allocation and effortless behaviour. And thinking strategically and a clever way about risk can help you design even better. But yeah, you're right, you want to get that click rate higher. Let's make experiments part of how we do everyday business. We launch an experiment and write a report and learn from it. Let's do experiments all the time. Melanie: That's the good thing. Jack we SLIN at work safe BC. So she'll continue the project. This is really cool. Vince: It is. And Nathalia here talking about keeping expectations low. Yeah--. I hope you boost that up. E eventually you start to think-- yeah. Nathalia: We can move the then kiosk clearly viz to believe where it's front of the door. Would we have a difference here? Vince: I definitely think that ambiguity. Gone I believe my intervention don't work the way they should work. Like Oh that's cool that's interesting. Melanie: The company I work for is amazing and I'm planning on using it for safety. How can we educate people in the workplace and improve and reduce them from getting hurt. Vince: If you have an injury prevention project you have some to reduce workplace injuries. Falls, accidents. And you tell me uh an effect size of .1. A p value of .000. And sample size of 5,000. That's awesome. Probably worth the intervention. One life changing injury is incredible. What an important thing to--. Melanie: We have to graduate first Vince. Vince: But you will making pitches. Good for you. Also part of a community. That's the nice thing about this program. Part of the BI community. There's the forum and the wiki--. Hey hi this project. I wonder what they are working on. I love it. Melanie: Hoping I can get the company to do a capstone project. Anyway back to the exciting data. Ann: I was going to ask by e-mail. [ Ann speaking ]. . Vince: The written assignment is at the end of the class so hopefully you will have learned a lot of material before you get there. [ Ann speaking ]. You can still do ANOVA by hand. The calculus if you want. [ Ann speaking ] Which equation? I don't know which. [ Ann speaking ] Yeah that's fine. What about the written assignment right? The BI one written assignment work sheet? [ Ann speaking ] Pam Heggie, CSR(A) RPR Accurate Realtime Reporting Inc. Uncertified (draft) Verbatim Transcript