Vince: Anyone played around with
that stuff? Natalia, you've 
heard Claude is the best. 
Monique you have ideas but would
like to try with.
Jacqueline, what's funny about 
that. Two ways. Like the 
prospect. But the other thing 
that's funny is like-- maybe any
relationship with AI sun 
healthy. Right like I still 
don't know if-- are these tools 
good for my brain and my emotion
SNIS still don't know.
And Alex you used an agent that 
used a used a slide deck as in 
compeling?
Or like a roller coaster? 
  Alex: It was awesome. Work on 
the structure of the 
presentation and stuck it in 
Gamma and it did all the slides 
and created animation it is like
a brave new world.
  Vince: Amazing. When they 
work, it feels like magic.
And when they don't work it 
feels apocklistic. Alex says you
trust co-pilot with your files--
yeah e-mail is like-- if I ever 
allow it to use my e-mail, I 
feel like I'm years away from 
that. You've mentioned trust. 
You saw Claude in the news-- 
don't tell clause things that 
you're also telling your lawyer.
Not protected by client 
privilege. Right. That speaks 
maybe to the unhealthy 
relationship that Jacqueline is 
talking about as well. Important
and underestimated. You know I 
say this as if I'm talking to 
all of you. Maybe some of you 
are using Claude code right now.
 And one of Su going to right 
back with a bunch of M dashes. 
That's a great point, Vince. 
Certainly. Absolutely.
I think we're able to get 
started.
Welcome back everyone. I'm so 
happy to be here BI107. This is 
usually a pretty fun place to 
reconnect because last time we 
talked BI106 week three it was 
kind of like the data was mostly
imagined.
Or huh glimpsed at it and now 
you're most case in the data 
collection phase. The project 
launched for most of you. Maybe 
just a quick shoutout in the 
chat. Is that the case? Are most
of you in the stage where things
are launch dad that is being 
collected maybe you've seen it. 
Jacqueline yeah-- I guess you 
could AI do your capstone work. 
I don't know what's the best way
to build a relationship with the
client.
Oh yeah for months Kate that's 
great. Isaac's had data. Not to 
brag. Is it good data Isaac? 
Conrad air force team yes in 
collecting data. Andrea got the 
data back yesterday. That's like
a-- kind of like a Christmas or 
birthday it feels like when I 
get the data back. Ooh, what do 
I see? In the collection phase 
haven't done anything with it 
yet. Great.
Data back in the drive great. 
Awesome hopefully BI106 was 
hopefully helpful. You're now 
transitioning from 106 to what's
next. I've got my data. I can 
calculate a mean or proportion. 
Or standard deviation. Create 
box plots. What do I do next? 
That's what BI 107 is all about 
so this is a good time for us to
have the class. Has the very 
catchy subtitle inferential test
difference. Comparison is at the
heart of behavioural insights 
and that point is not as obvious
but mostly what duo in 
behavioural insights is 
comparison. Compare treatment to
control, pre, to post. Site one 
to site two. The idea of 
difference is really key. That 
part's probably intuitive. 
Inference. We'll be talking a 
little bit about inference over 
the next two weeks. Going to 
think about it more precisely 
and maybe critically. What we 
might mean and what is 
inference. Start with a land 
acknowledgment. At UBC Vancouver
we're located on the 
traditionally and ancestral 
unceded territory of the 
Musqueam and the Tsleil Waututh 
peoples. Very honoured to live, 
work and play on their 
territory. In BI 106 we talked 
about data planning. Duh a 
written assignment at the end 
that was all about that. Data 
processing. Okay the data has 
come in. What's the first step 
that I do. How do I describe? We
talked about histograms. Box 
plots and measures of central 
tendency. Descriptive statistics
and we did sort of the basics of
data visualization. How would 
you at least look for patterns 
in the data? Or how would you 
visualize the kind of data that 
you have? BI107 is I think a 
very interesting rich, cool step
forward. We're going to think 
first about samples versus 
populations.
This is the-- I'm going to turn 
to a few times in BI 107 because
it's a pretty important idea. 
We'll talk about inferential 
tests of difference. They say 
comparesen and difference is at 
the heart of most behavioural 
insights projects. We'll talk 
about inferential tests of 
association. When we have two 
variables and we want to see if 
they are correlated or if 
there's a trend between the two 
of them. You folks have probably
heard about statistical 
significance before. Very 
technical term. Substantive  
significance is another term 
that gets used a lot. We'll talk
a little bit about the 
difference between these two 
things. You can have results 
that are statistically 
significant and substantially 
insignificant. You can in theory
have the opposite.
Statistically non-significant 
results that are still 
substantially important. We can 
talk about that edge case. We'll
talk more about how you report 
some of the results.
A lot is about testing. So if 
you've done the readings for 
this week and next, testing 
different models and different 
assumptions of models. How do 
you present and talk about that 
so other people in the audience.
Other behavioural practitioners 
know what to expect and can 
understand what you're seeing. 
That gets to your credibility as
behavioural insights 
practitioners. Ultimately all 
that we've got. You can talk 
about how you bunked all of this
together to report results and 
hopefully visualize results so 
you can establish your 
credibility as researchers. I 
love BI 107 because it's the 
close to have the what I do in 
my day-to-day. I spent part of 
today at a departmental meeting 
and when it got kind of boring I
opened up my data and I was 
doing inferential tests. Always 
happy to talk about it and you 
do need a foundation in BI 106 
to think about stuff in BI 107. 
So I'm always excited here. When
you're done BI 107 there's a few
things that I think you should 
probably be able to do. So one 
of them is figure out-- given 
the data that I have and the 
design, of my study, what's the 
right test? How would I look at 
a test of difference or 
association? Once I know once 
the right test is, how would I 
go about and do it? Once I've 
done the thing, how do I 
interpret the thing? Once I 
interpret the thing, how do I 
report the thing? And because 
visualization is really 
important, once I report the 
thing, how do I visualize the 
thing?
So we'll talk a bit about the 
thing and we'll talk about all 
these other. JASP says our days 
are very different. I feel very 
lucky. But I recognize that some
people would not think that as a
great day to be like Oh-- I'll 
just do stats and data.
I do love it. My high school 
self would be shocked but yeah I
definitely do. We have-- okay 
kind of looks like a big agenda.
12 is pretty big. A lot of these
things are nested. T tests. 
Paired, independent, we'll talk 
about sort of ANOVA which is 
pretty closely related to a 
t-test. Degree of freedom is a 
concept that comes up 
throughout. My point is it's a 
packed agenda but not that 
packed and the activities are 
really designed to focus on the 
things that you need to get from
week one to week two. Mostly on 
tests of different like t-tests 
and we'll just touch on ANOVA.
And we'll talk about what it 
means. Capital letters probably 
thinking that's an acronym. You 
would be correct. Analysis of 
variance. We'll talk about that 
when we get to it. So you can 
think about today as laying the 
foundation. We'll return to some
of these concepts next week and 
the week after. In other words, 
if in an hour and ten minutes 
you're like I am so lost, so 
confused, that's okay. We have 
office hours we can chat. You 
can send me e-mails and we'll 
return to these same concepts 
next week. So we'll go slow and 
together. Let's talk about 
descriptive and inferential 
stats. Descriptive are probably 
a little familiar because that's
largely what you did in BI 106. 
Think about it this way.
Most data problems are missing 
data problems. We want to take 
data we have and make 
inferences. We'll call that the 
population.
So if you want to know whether 
people prefer chocolate to 
vanilla. Which maybe feels 
intuitive.
Maybe vanilla is standard 
classic. So you would sample and
make an inference of the sample 
to the population. Most 
behavioural analysis is 
inferences. You want from data 
we have to learn about things 
about data we don't have.
Descript starve TIFTices are how
we describe the sample to that 
readers and different audience 
members know okay-- these are 
the boundary conditions on what 
they can actually talk about.
So let's say we look at people 
who like ice cream and we want 
to make an inference from people
who we didn't talk to about ice 
cream. The average age in our 
sample is 40 years old. So first
question you might ask is-- does
that line up to the population? 
Is the average age in the 
population 40 years old? Not 
necessarily, right? And so how 
confident we are is partly a 
function of sample size. It's a 
function of a few other things 
but sample size is really 
important. You can think of it 
this way. If we had the full 
population and we calculated the
mean, we would know the average 
age. Because we have the full 
population. As we go from the 
full population to a smaller 
sample, we see the missing data 
problem in action.
If you have a large sample, we 
can talk about what a large 
sample is.
A thousand people we've surveyed
them in a reasonably random way.
Haven't surveyed a thousand in 
old folks home or at day cares.
We have a reasonably random 

sample.
Might be close to the average 
population. And the basic 
inference from sample to 
population is at the core of 
BI107.
So 106 we talk about how to 
describe the sample. We talked 
about things like looking at 
frequency table. Calculating 
histograms to visualize things. 
Different kinds of outcome 
measures that we might have.
Binary. Might not look at the 
mean. The proportion or the 
percentage.
If we do have a continuous 
outcome measure we might look at
the mean, the median, the mode.
If you are curious about 
dispersion or distribution we 
might calculate the variance. 
Standard deviation. We could 
create the box plot which would 
give us the interquartile range.
Where most values in the 
distribution were found. And we 
found the full range itself. Min
to max. We also and maybe it 
goes without saying, we'd have 
the sample size. The end. We 
talk about how the letter N is 
the way to connote the size of 
our sample. We had another-- two
variables that when one goes up,
the other goes up, there's a 
strong correlation. Either 
positive or negative. We talked 
about the correlation 
co-efficient. All of these 
concepts are building blocks for
inference. So in 107, we're 
going to talk about how we make 
these inferences. We'll talk 
about things like the standard 
error which often abbreviated to
just the S.E.
Just one way of thinking about 
the confidence that we have. 
That the data we observe in our 
sample lines up with the data in
the full population. We'll talk 
about a 95% confidence interval.
A range of numbers within which 
we have some confidence. Or some
belief that observations would 
typically fall if we were to 
collect data again and again and
again.
We'll also test whether the 
differences we observe between 
the proportion of people who 
like chocolate to the proportion
of people who like vanilla. 
Between those who are over the 
age of 40 and under the age of 
40. Whether these kinds of 
differences are significant.
And we'll talk about the 
difference t-tests. Differences 
of proportion. Analysis of 
variance and square tests. And 
at the end we'll push on this 
idea of association between 
variables. Correlation, linear 
regression, and chi squared 
statistic. So in other words BI 
106 lays the building block.
The grammar for 107.
Create poetry I don't want to 
twist the metaphor but build on 
those building blocks that we 
talked about in 106.
Okay. Let's say we collect data 
from our sample.
The sample we want to compare it
to see whether the difference 
that we observe between 
chocolate and vanilla eaters.
The full population itself. If 
you think about it more. Like if
you step back.
This idea about comparing 
differences in samples.
The proportion of people like 
chocolate over the age of 40 and
under maybe you want to look at 
the same individuals.
The same group who is above the 
age of 40 they want to look like
chocolate before and after. A 
rich dinner and we think yeah, 
something about rich salty fatty
foods changes the kind of ice 
cream preferences people have. 
Now we're not comparing two 
different groups. We're 
comparing one group over time. A
paired sample.
A sample before and after some 
intervention.
And we want to know yeah, people
who have a really rich dinner. 
They tend to like vanilla. Maybe
they think it's a palette 
cleanser and the sample we 
observe that's what we found.
And we also want to know. And 

pair. How much business 
evidence. What we have. We're 
going to think about whether the
difference is equal to the same 

size.
This p value before anyone 
heard? Maybe everybody NAN stats
paper? p the S less than .05. 
Isaac's yeah you've heard of it.
It's kind of confusing. But the 
p here really just stands for 
probability. Says yes-- thank 
you-- and I know someone did the
readings. Yeah. p is just 
probability.
In this case it's the 
probability that a difference of
the same size we observe in the 
sample would be observed if 
there was such a difference in 
the population.
So sampling this larger 
population if in fact there's 
noer canlation between rich, 
salty fatty dinners and ice 
cream preferences, what are the 
odds that we would then observe 
that same difference in our 
sample itself? There's a few 
things that go into this 
probability value. One is the 
size of the difference.
If we find in our sample a huge 
number of rich fatty food like 
vanilla ice cream. Okay maybe 
that's real. This effect is is 
so strong THAEVen in a smaller 
sample a flawed sample. 
Certainly a random sample. We 
still pick up on this more 
general true difference in the 
population.
Variability. Doesn't make sense 
if we think about it as a 
proportion. But if you think 
about a large standard 
deviation. Big standard 
deviation the sample. One way we
can reduce variability is sample
size. Talk about that in BI 106 
if the sample size goes up we 
have more power. In this case 
more sample size will shape with
the p value is.
These are all in some weird 
mathematical dance with one 
another. And if there really is 
a difference in the population, 
what are the odds that we 
observe it in the sample? Or if 
there is no difference in the 
population, what are the odds 
that we still observe it in the 
sample. These are determined by 
these three factors. So far so 
good. But the truth is p values 
are pretty arbitrary. For 
reasons that have more to do 
with history and sociology. 
Scientists have picked an 
arbitrary cut-off of p less 
than . 05. That is we've 
arbitrarily said that if we were
to draw a sample from this 
population, 95% of the time we 
would find the true probability 
of-- there probably would be the
true difference between these 
groups. Between the average or 
the proportion or whatever it 
is. 9 5% of the time we would 
get it right. 5% of the time our
estimate would be wrong. That 
has ledtous think about p 
values. To think about either 
significant or fawn-significant 
or significant.
It was above .05. They say this 
is not significant. And this is 
very arbitrary. This is 
definitely debated among 
scientists. Should we keep doing
this? The truth is it's a 
convention. A normal 
expecktation. If you show a plot
and you use JASP to calculate p 
values. People will expect you 
to use a cut-off of .05.
You can talk to your audience 
about whether this makes sense 
in your case but generally this 
is the expectation. Whenever 
possible though JASP will give 
you exact p values and that's 
what you should report.
What's the return the status go 
byes? Yeah it could be that 
people have coordinated on a 
norm. We've just decided well if
we are ambiguous about what a 
good cut-off value is. It's 
going to lead to researchers to 
do shoddy practices. So we're 
just going to pick an arbitrary 
threshold because it's a binding
constraint on future behaviour. 
There's lots of behaviour. 
Surprisingly lot of acumens have
written about this.
A collective expectation. The 
behavioural practitioner staff 
around p values. If you want to 
estimate the average age. 
Chocolate preference. The 
proportion in your data and JASP
will give you a p value. 
Sometimes that p value would be 
so small. It just says .000. 
Other times you're going to get 
a p value. .04. .32. .5. .01. 
The best practice is to report 
the exact number. Rather than 
just saying it was above .05 or 
below. So my own research a 
little annoying.
I always have to look at the p 
values and say it equals .032 
and sometimes it's very small. 
Less than .05 because it's to 
the fifth decimal point.
Your violence may vary.
When are p values are above .05.
It doesn't mean there's no 
difference. In fact the language
we sometimes use that I use 
Cisse there's no evidence of the
difference.
In other words it's kind of like
saying I don't really know.
There are techniques to get more
precise, null effects to say 
there is no effect and I'm 
confident about that. But 
generally we talk about where 
there's evidence in favour or 
evidence not in FAVer.
Non-significant result p value 
above .05 doesn't mean there's 
no effect. Doesn't mean there's 
no evidence of an effect.
What if there's no difference in
the actual population. That we 
don't observe. We don't get data
on them. But there really is no 
difference there.
That's going to generate in 
expectation multiple samples 
over time. It's going to 
generate high p values because 
there's going to be no real 
difference in the population and
sampling a good random sample 
over time is going to generate 
that. But there might be other 
things that can generate a 
non-significant result. If we 
have an inferential test. What 
might that be? Natalia is 
hinting at some of the behaviour
here. Other than a true null 
effect. A true non-difference in
the population. Why else might 
one of our inferential tests 
generate high p values. Conrad? 
>> I think if you have a low 
sample size then basically the 
way the test is structured. The 
sample size raises the standard 
when it's considered 
statistically significant.
>> Yeah we talk about this in BI
106 gave you an example of a 
telescope far away and one that 
was close, right? If you have a 
small sample size it's like you 
have a really tiny telescope. 
And you are trying to see a star
that's far away. You have a lot 
of power. The result is just due
to random sampling because you 
have a very small sample size 
you'll get more volatility. Your
sample might just bay quirk, 
have a high p value and no 
effect. What else? Other than 
there being no difference Aaron 
saying would be a flaw in the 
experience design. Yeah I could 
see that. What kind of flaw? 
What are you thinking? 
>> Just maybe something where 
you thought you had successfully
done a random sample, but you 
didn't. Or something kind of 
went wrong in your data 
collection potentially too.
  Vince: Yeah you can see how 
this gets you to Conrad's 
comment. That like-- you have a 
sample of a thousand businesses.
And you send the e-mail out. But
then you get 8 00 bounce back 
e-mails.
It isn't really a thousand, 
right? You think you have a big 
sample size. Actually you have a
smaller sample size and now you 
have that same small sample 
variability that Conrad was 
talking about.
A SARM size isn't really 
reflective of the population. 
That's essential yes when you 
have a small sample that's 
what's happening. You just might
by random chance have drawn a 
sample that doesn't reflect the 
underlying population. And Kate 
your gain has something here. 
The effect is small. Not that 
there's no difference in the 
population. It's that there's a 
very wee little difference. In 
fact I would say this is very 
common.
This is one of the most common 
things. You can control sample 
size to some extent. You may be 
able to sample over a greater 
time period or you might be able
to get a bigger sample. But you 
can't do anything about the 
effect size. That's given to us 
by the statistical powers that 
be. And if you are just studying
something that's a very weak 
effect size, and honestly I'm 
thinking something like whether 
my dinner is rich, might not 
shape my ice cream preferences 
all that much. That to me sound 
like a pretty weak effect. I 
have roast beef one night so I 
have vanilla. Roast beef the 
next night so I have vanilla and
soup next night so I have 
chocolate? I don't know. Sound 
like a small effect. A lot of 
the behaviour science work 
involves small effects. We 
actually kind of bake it into 
how we talk about behavioural 
science. Small tweaks big 
impact. Yes. That's the best 
case scenario but there are 
other scenarios where we have 
small tweaks, small impact. 
These are the cases where we 
might get a non-significant 
result from an inferential test.
When you-- things might say low 
p value. They'll say this is 
highly significant. And then if 
someone's like Oh my p value 
is .07 it's trending to 
significance. Or it's marginally
significant.
Depending who you are with. The 
p value is below .05 so we see 
evidence of an effect. Below, we
see no evidence. The minute we 
get to edge cases or extremely 
significant highly significant, 
you're going to lose some 
members of your audience and you
are not gaining that much.
So my advice is join the club. 
Weic a STEPT expectation and 
norm that .05 is an arbitrary 
but useful cut-off.
It has huge problems but many 
use it.
We generally think about it as a
fairly firm cut-off. Kate's 
asking about is there an issue 
of a large effect on a small 
number? Yeah if you have a 
really big effect size, you can 
get away with a smaller number.
  Kate: No I mean like it only 
has effects.
Large effect only on a few 
people. Like my mom won't have 
ice cream if she's had a rich 
meal but and I will so will 
everybody I know.
  Vince: Such a great point 
Kate. So two things. One is this
let's imagine our sample doesn't
have a lot of moms.
But the effect is STRONGTHest 
among most. Or parents. Maybe I 
did that. My overall ice cream 
preference has gone up since I 
became a parent. Always craving 
carbs. So like yeah okay. There 
was this larger effect for 
parents.
But our sample doesn't have a 
lot of parents.
What do we think that's going to
do when we calculate the mean or
the difference or some kind of 
inference. Is it going to make 
the effect overall bigger or 
weak for we don't have a lot of 
moms or parents in the sample? 
Like the effect is only there 
for moms. We only have 2% of our
sample includes moms.
Weaker, exactly. This has 
sparked a big literature on 
heterogenius treatment effects. 
Normative advance concept but 
yes Kate you are exact right. 
When we get a small effect, we 
don't often know. Is that 
because in the population the 
effect is small? Orb is it 
because in the population there 
are heterogenius treatment 
effects and for some people the 
SEFKT really big? And we don't 
know. I've read through your BI 
106 plans and several of you 
groups think it's most effective
for large. We think effects will
be largest for people that are 
younger.
Some of you are already thinking
about this heterogenius 
treatment effects and that's 
where the literature is going 
to. So you can have more power 
and detect the smaller group 
differences. And if you go at a 
tech company they'll use machine
learning to do this at a very 
precise level.
Users between the ages of 30 and

32 on a mobile device and blah 
blah blah. These people really 
respond to coupons and this 
population machine learning to 
do exactly what you're talking 
about Kate.
Great comment.
Okay. I'm going to return to p 
values later in BI107.
Hold up Hopkins. Questions about
p values. We are going to return
to p values.
Lay the foundations.
How do we figure out if there is
a difference in the population? 
In our sample. To make the 
inference to the population. It 
was sort of called t tests.
They are extremely common and 
they are the go-to first 
approximation to figure out if a
difference in means between any 
two groups is due to chance or 
not fop get that p value for a 
difference between two mean 
values.
 And as it turns out you can use
them with pretty darn big 
proportions too. t-tests are 
incredibly flexible. Super 
robust. You'll see them in the 
most advance STAD TIFT call 
papers. And you'll see them in 
first year interquantitative 
methods and courses.
They are extremely robust and 
flexible.
We won't get into the math too 
much. There are different kinds 
of t-tests.
First the one sample t-test 
where you have one mean and you 
want to compare it to some 
benchmark value. We don't know 
if this is higher or lower.
And they are going to talk about
independent samples t tests 
where you have two groups like 
treatment and control.
And also talk about a paired 
samples t-test. The same group 
before and after. Or the same 
group on two different 
variables. And to end we'll talk
about ANOVA. It is more like a 
family of tests. And even-- an 
extended member of a larger 
family of tests.
Closeren this cousins. Siblings 
separated at birth. We'll talk a
bit about ANOVA and when you 
would want to use it.
In fact one of the activities 
for classes today I'll ZOU think
about is this the case for ANOVA
or not.
A one sample t-test again not 
the most common. But really a 
basic one. So it's where we're 
going to start. Formula is 
pretty simple. We'll take the 
mean that we observe in our 
sample.
And subtract it from that we'll 
subtract the mean for the full 
sample. So each individual from 
the mean for the full sample. 
That lets us compare the mean 
that we get to some fixed value.
We want to know okay so my 
sample my proportion of women in
my sample is .5. But I know in 
the population I look at census 
or something like that the true 
proportion is .52. Is that 
different significance or 
meaningful between my sample and
the population? A one sample 
t-STEFT going to let douse that.
And JASP it's super easy. Here 
we'll just compare the estimated
savings from energy-efficient 
television. We'll see if it's 
different from zero. So in the 
variable I add t estimate and 
the test value box I put zero. 
And you see the p value which is
so small, that JSP just says I 
don't know it's less than .001.
Now I'm going to do the same 
thing for a different value. 
Let's call it $150. You'll see 
all that changes the test value.
And the circle I've put now 150 
and the p value it's still less 
than .001.
In other words these cases the 
estimated savings from the 
television are different from 0 
and different from $150. 
Probably positive think about 
how much money people might 
save. People generally think you
save more than nothing and more 
than $150. That can be 
interesting as a basic 
descriptive inference.
One sample? Yes. One sample test
is different from one tailed 
test yes. One tail is when you 
have a directional assumption so
it's higher or lower. 
Independent samples t-STEFT when
you have two groups that you 
know are different. And the most
common case here is experiments.
So we have randomly assigned 
treatment and control. We know 
they are different. We want to 
see for example whether this BC 
hydrosticker this decal actually
decreases household energy use. 
And we measure a continuous 
outcome. Like energy usage for 
the weak. Some get the decal and
some are not. t-test formula 
very simple subtract means 
divide by the full sample of 
both groups. And this will align
with the p value. When people 
talk about a t-test, this is 
usually what they mean. Fewer 
people say Oh, we did a t-test. 
They are probably talking about 
independent samples t-test.
In JASP also super easy. t-tests
click down on independent 
samples.
You get the estimated in this 
case-- I think this is the 
vacuum one. Savings from a 
vacuum cleaner. We're going to 
compare by gender. Not randomly 
assigned but we know there's 
independent samples because 
we've coated them as male and 

female.
And we'll standardize the 
effect. Talk about Cohen's D and
standard effect size 106. We do 
that here we see a Cohen's D and
a p value that comes with it. 
The effect size is negative 
0.277 a moderate difference. p 
value is .151.
Would you STHA p value. Savings 
by men and women judging by 
these numbers.
No-- what's the tip-off? Why no?
And such a quick no. p value is 
too high, yeah. What were the p 
value have to be for us to think
that yeah there is evidence of 
the significant difference how 
men and women estimate the 
savings from vacuums. Isaac's on
it.
You can sort of see why 
researchers might start to use 
words like marginally 
significant. Because FS p value 
came back at .051, you might be 
tended to be like Oh it's almost
marginally it's close to 
nearly-- yeah. And Nathalia hey 
wait a minute negative 0.277 is 
less than 0.5 right? You're 
exactly right. We're looking at 
two values here. One is which 
the Cohen's D. Standardized 
effect size of the difference 
between vacuum estimation costs 
and the p value though goes with
that effect size. But Nathalia 
you're flagging a really good 
point here. Can my p value ever 
be negative? A probability 
value. Can my p value ever be 
negative? Aaron says no.
No with a question mark my 
favorite kind of no. No, it 
can't be. It's a probability. p 
a probability value was always 
down by 0 and 1 so there's two 
flags. One of which the t 
market. But hey it's a negative.
Your intuition is right.
Okay.
But you might ask me wait a 
minute Hopkins. Standardized 
effect size. I don't speak your 
fancy math language. What's the 
difference here in dollars? We 
have a continuous measure of 
vacuum savings over time. Why 
don't you use actual dollars? A 
meaningful unit. He doesn't know
what .277 is can we give them a 
dollar value? Yes in JASP it's 
called the location parameter.
Thought I would have named it.
You will get a mean difference. 
$208 between how men and women 
estimate savings. Talk about 
eight little bit. I don't know--
difference in groups? Maybe 
group difference point estimate?
Group difference? Maybe I would 
have asked chat GPT what should 
I name this? When you see the 
ice cream difference. Called the
ice cream facter?
The i factor. You can look up 
here and see mean difference and
SE difference. We'll talk about 
standard error later. What is a 
standard error difference.
Add a measure of uncertainty are
about the mean itself. When the 
standard error is big relative 
to the mean. That's a pretty 
good indication that you don't 
have the significant effect. 
We'll talk about why that is 
later on in BI 107 I love that 
stuff we'll get into it. That 
brings us to number one tests of
difference.
We'll put you folks into 
breakout groups. Two slides per 
group. Should go pretty quick. 
Activity two is a bit more 
complicated though I've given 
you everything you need to do to
answer it. I also uninstalled 
JASP with prejudice and then 
re-installed JASP to make sure I
don't make the same last time 
where we had a version control 
error. You folks were finding 
weird buttons on JASP that I 
wasn't because hi an outdated 
version JASP. I've removed it 
and reinstalled it so pretty 
confident now instructions and 
the visual should line up with 
what you see. You tell me if I'm
wrong. Okay the instructions are
in the chat there that'll be 
shared.
You know we'll give it let's say
10 to 15 but we'll play it by 
ear.
Any questions before we create 
the breakout groups? t-tests. 
Samples, inference, population.
Yeah it will get shared in a 
second. No questions. Great
Shakti will share the link to 
the slide deck. Jacqueline 
apologizes in advance.
And few folks have questions. 
Raise your hands otherwise. 
Thanks everyone.
  Kate: You're reverbing a lot. 
  Isaac: Yeah is that just me? 
I'll take this off.
  Kate: Are we supposed to be 
existing-- yes. 
  Isaac: It's echoing for me 
too. I don't know why.
  Jacqueline: Can everybody mute
and then unmute one at a time? 
  Isaac: So let's see. Okay. 
There you go. That was weird. 
Good idea. Okay-- I have it 
open. I can share my screen.
Are we group one? Okay.
[ Ann speaking ]
  Isaac: I think so because 
there's two groups. Right? 
  Jacqueline: Wouldn't it be 
paired sample because there's 
two variables? 
  Kate: We're on activity two. 
Aren't we supposed to be on one?
  Isaac: Oh that's weird. Okay.
Why is it all-- okay there you 
go.
Okay different questions. 
  Jacqueline: For this one I 
agree independent sample tests. 
  Isaac: Yeah so that'll be for 
the second part.
So he wantstous put it in JASP. 
Oh hey Vince.
SFWL Vince: Jacqueline 
apologized so I felt like I had 
to jump in here first.
Okay cool. I can open that as 
well. I think I have-- cool. 
  Jacqueline: An independent 
t-test so we jump in and do our 
own and come back and compare, 
correct? 
  Vince: Love it. Sort of 
military efficiency with that. 
  Kate: I'm British, not 
Canadian. That's what that is. 
  Vince: Right there's no long 
passive aggressive Canadian.
How would you feel about-- what 
did you think about--. Shall we?
What if we-- yeah, okay, great. 
Any questions? Or are you folks 
making your way through? . 
  Kate: So the lightbulb-- the 
cost estimate incandescent 
lightbulb electricity would be L
estimate? Okay. 
  Kate: Ann, it's your system 
causing the reeverb. It's 
something on your side. It only 
happens when you're off mic.
  Ann: Well I don't know what 
happened how we put it today.
  Vince: Any other questions 
that I can help with? 
  Kate: That's good, right? 
  Vince: Okay. Just raise your 
hand if you have questions. 
  Isaac: Okay thanks Vince. Okay
so are we going to do this 
ourselves and then just share? 
  Kate: Okay so go away and do 
it and come back and say what we
found.
  Jacqueline: So Isaac we all 
got p equals.0146.
  Isaac: So it's the same? Okay 
we'll go to the next part.
  Kate: 
[ Reading the question ].
  Isaac: I think it's the other 
one. It's just saying people. It
doesn't specify groups.
Just people in general. Like the

population.
[ Ann speaking ]
  Jacqueline: I have no idea how
to calculate this one so FE if 
we wanted to do it as a group I 
would appreciate it. 
  Kate: Me too. I don't know 
where you put everything. I'm 
totally confused. Like where 
would you put.
  Jacqueline: Okay I think it 
just got easeier. I was in the 
wrong t-test.
So it's the L underscore 
estimate one right? Is the 
variable? 
  Kate: L underscore estimate is
the variable but then--. 
  Jacqueline: This is in an 
independent sample. 
  Isaac: So variable is L 
estimate. So is it just that? 
  Kate: But then how do you 
compare it to what--. 
[ Ann speaking ].
  Kate: Where the hell are you 
all looking?
  Jacqueline: Highlighted blue L
estimate then check. Checkmark 
student. Test values. Isn't that
where just showing us you would 
type in 240. Just type in 240. 
One sample test yeah. 
  Isaac: The test value is 240? 
[ Reading ].
So yeah the test value should be
240.
If I'm doing this right. 
  Isaac: So then the p value 
changes. .529 now.
  Kate: So if you put in 240 
it's 529. It changes yeah you're
a genius. We're all geniuses. 
Thank you.
Okay perfect test value. Okay 
yeah. And .529. 
  Isaac: Is that it?
Is that simple? 
  Vince: Hey group one.
  Jacqueline: Either we're doing
it right or we really screwed up
and Vince will laugh at us 
later. 
  Kate: You're awesome. 
Everything about you is 
wonderful.
So I think we just have to do 
those questions, right? Yeah.
  Vince: Awesome. Let's take a 
ten minute break. When we come 
back we'll talk about 
assumptions and another group 
activity and that'll be the end 
of class one.
So we said t-tests are part of a
broader family of tests.
We'll talk about it more in this
course. We should talk about 
some of the assumptions that go 
in. In correlation. Regression. 
They are all part of the family 
of models called linear.
Some that are important for Russ
assuming observations are 
independent. What we mean by 
this is basically we're not 
asking the same person the same 
question ten times. Or asking 
ten individual people sampling 
ten different times.
Another one is the dependant 
variable is continuous.
It's either interval or 
continue. Maybe a lickert scale 
with a bunch of different values
but not binary or categorical. 
The truth is these models work 
really well under certain 
situations when you have binary 
outcomes.
One assumption is that outcomes 
are continuous. We've talked a 
little about distributions which
we'll continue to talk about in 
BI107.
The normal distribution, the 
bell curve. One assumption is 
that our data are normally 
ZRIBTed.
In other words sample INTIG 
enough that we estimate the 
mean. It looks in the real 
population as if it's normally 
distributed. We want to have 
really big outliers. Really 
strong skew. A certain one or 
two observations that throw 
everything up. I have a survey 

measure.
I have a paper where I ask 
people different partisan 
identities in the United States 
about different occupations.
I ask them how much would you 
charge to work? Some of these 
are sort of Conservative. One 
respondent.
A one, a five and a hundred 
zeros.
And you can imagine I got the 
data back.
The gillions of dollars. A bunch
of other assumptions. You can 
call the Markov theorem in 
office hours if you want.
When you violate these 
observations the results aren't 
quite reliant. This matters here
thinking about inferences from 
the sample to the population. 
Has really big outliers. Skew. 
If we violate these assumptions 
the p value that JASP gives us 
might not be reliable.
Some violations matter a lot. 
Outliers. Some don't matter 
quite as much. It's not quite as
bad as the others. When you have
a paired sample. Two groups. Men
women treatment control.
Now a pair samples.
The same group either across 
time or within the same.
Mostly for your projects in 
things like pre-post designs. 
Kiosks and the effect on some 
outcome measure before and 
after. Comparison was the 
outcome measure higher or lower.
Before or after the 
intervention.
This is the kind announces a 
single group. Multiple 
variables.
And we go back to see difference
to an independent sample. You 
have two groups.
And one outcome. Independent and
paired samples. Independent 
samples two groups.
One dependant variable.
And then there's a little note 
here. If you think about what's 
going on in a paired samples 
t-test.
You are comparing whatever 
number of people that show up in
person.
Whether there's a sign or not. 
The after score from the before.
And the one sample t-test and 
see is that difference-- 
different from zero? Because the
testing. I think the difference 
before and after.
Is zero. That's what I want to 
know. Did my kiosk and coupon 
intervention change things. A 
change would be something 
different from zero. Tracking 
before and SXAFR see whethering 
the actual result you get in the
sample is different from zero.
Paired sample t-tests very easy 
in JASP like the others. You've 
already seen it.
When you click down t-tests. 
Independent paired.
One sample. You select paired 
samples. t-tests. In this case 
we're going to look at the pair 
of the estimated savings from 
television versus the estimated 
savings from a vacuum. Did 
people think one, you save more 
money from one or the other? 
We're going to click here effect
size. Descriptives. Helpful 
because if you don't click that 
JASP won't tell you what the 
mean is. Into the comparisons. 
That's usually important. At 
least for me. I always want to 
know what am I comparing.
The television $919 from a 
vacuum $460. And if we hadn't 
clicked descriptives I wouldn't 
have seen that.
Also going to check 95% 
confidence.
We'll talk more about this idea 
in BI 107.
If we were to draw samples from 
the population again and again 
and look at the difference 
between t-estimate and v 
estimate. How often would we get
the true difference in the 

population? How would we solve 
the missing data problem? 95% of
the time the true value would be
within this range.
We'll talk more about that later
in this class. The key take away
here is that in this period 
samples t-test we could see that
there is a difference.
What is the difference. How much
money on average do you think 
people save in an 
energy-efficient vacuum. Just by
looking at the screen? 
>> Like $503 or something. 
  Vince: Yeah something like 
that.
919 minus 416.
That's a pretty big number. And 
these sample sizes aren't very 
small but not tiny.
810 observations my first 
indication is that's a big 
effect size.
People think you're going to 
save more money from the 
television than you save at all 
with the vacuum.
It's kind of a big effect size 
and the sample sizes are not 
that small. My mind is thinking 
I bet this is going to have a 
low p value because we have a 
big effect size.
And a small to moderate sample 
size. And indeed that's what 
JASP says.
Here that the t-test the t 
statistic 44.99.
The p values less than 0.5. So 
small.
The standardized effect is 
somewhere between .28 and .68. 
Somewhere in that range.
In other words on this we would 
see that there's evidence of a 
difference in how much people 
think you're going to save from 
an energy-e efficient television
or VOK YUM.
And maybe that makes sense. 
TWOIF dogs. People watch a lot 
of television and it's a big 
screen. Maybe that kind of makes
sense.
  Erin: Maybe you mentioned this
and I forgot but when you're 
talking about the effect size, 
what kind of range where you 
said this is a large effect.
What would be a medium effect? 
  Vince: This goes back to 106 
in lecture three.
This is a pretty big effect.
The .48 would be moderate to 
large effect. So the closer you 
get to one is bigger an effect. 
The closer to 0 the smaller the 
effect. But yeah it's also 
pretty arbitrary.
I don't know that Cohen imagined
60 years later people would be 
like-- quoting his arbitrary 
numbers as truth. Oh that's 
a .2. That's small. This is .5 
that's moderate to big. It's a 
little arbitrary.
And for me personally when I-- 
if I have standardized units in 
this case like dollars. I would 
leave with that. Right? People 
think save SGS $500 over 
ten-year period.
And that seems like it's large. 
It has a Cohen's D of 4.. 
whenever possible Cohen's D for 
me as behavioural scientist. But
I probably would not present 
that to decision maker if I had 
units like dollars. That's what 
I'm going to present.
Cool.
  E Rinne: Just to clarify .48.
That's a moderate. 
  Vince: You can see the 
community thresholds. 
Substantive  significance. Which
is something we'll get to later 
in 107. I don't know if we'll 
return to this concept. But I'm 
a little less convinced by 
effect sizes. I need to know a 
little about the behavioural 
context.
A small effect size a Cohen's D 
of .1 something sounds small. 
But if the outcome measure is 
infant mortality. That's to me a
very substantially important 
effect. And so I don't try to 
let these arbitrary cut-offs of 
below .3 to 5 to 8 being small 
or moderate large. I try not to 
let that thank my thinking too 
much.
What is the outcome. What is 
small medium large. What I see 
is don't stop there.
Think about the units of the YUT
come measure. What are we trying
to measure here? Does that make 
sense? 
  Erin: It does. I think my 
question came to it being 0 and 

1.
Is that a large effect? 
Obviously you'd probably never 
get a one on that. 
  Vince: Same individuals that 
we've asked in the question at 
one point in time and another 
question a different point in 

time.
Depending what the outcome 
measure Su might get a TREELT 
effect.
Needing to plug in my computer 
here.
  Vince: Yeah .48 is generally 
pretty big. If I had the sample 
size it would be like nice. 
That's a B effect. In practice. 
Nathalia? 
  Nathalia: Showing the click 
here. The dependant t-test. My 
assumptions that I need to go 
descriptive check-- then go to 
select. Is there another way to 
do? Or has to be a true step. 
Because if the means are doing. 
  Vince: In JASP you could just 
do this in descriptives.
In JASP you could go into t-test
and click the descriptives box 
it will give you a little 
summary table like it does here 
with the sample size. The means.
  Vince: I think it's there for 
independent samples too. 
  Vince: For independent samples
it gives you the location 
parameter. The difference 
between the two TWO groups to 
calculate the mean of each group
independently.
You'd probably have to do it in 
descriptives first. Any 
questions about this paired 
sample t-test. When we use it. 
Why? How it's different from the
other t-tests? Using her vacuum 
more than her tv. It's true. I 
used to have a whole life 
literally never vacuumed.
Didn't own a vacuum. Watched a 
lot of tv. Life was good back 
then. It's good now but good 
back then too.
What about three kinds of 
t-tests.
One sample paired and 
independent.
What if I said in life you can 
only pick one. What would be 
your go-to t-tests?
Your ride or die t-tests? 
Paired, one sample or 
independent samples. One of the 
cases independent.
Why is that? 
  Kate: It just seems more 
useful.
  Vince: I would have the same 
answer I'd generally say 
independent.
Usually try and intervene in the
world. I want to make it better 
and change the world. I want my 
intervention works or not.
And when I say works the 
difference between treatment and
control is different from 0.
I told you someone was commonly 
used.
Thinking about interventions. 
And thinking about fair 
comparisons.
And all the things randomization
does for us. That's what pulls 
an independent t-samples test.
A lot of pre-post stuff. You 
want the know people when they 
arrive and leave.
Unfortunately you don't have to 
choose but sometimes students 
can get a little overwhelmed by 
the t-test. I just want to 
stress independent samples that 
typically using behavioural and 
applied real behavioural 
science. When you're out there 
and at some conference your big 
difference 2026 and people are 
talking. They are probably 
talking about independent 
samples t-test.
And when you are thinking about 
your presentation should we talk
about our t-test? You're 
probably going to think about 
independent samples t-tests so 
cool, okay. ANOVA or analysis of
variance. So we use samples when
we have one mean. One sample 
t-test. Two means. Paired and 
independent sample t-tests.
But what about when when we have
three or more. Did anyone up 
wind up doing a factorial 
experiment? High and low. Alex 
says yeah your group did a 
fabbing experiment. Behavioural 
science more generally. We do a 
lot of experience.
What happens when we have three 
means? Multiple t-tests? Control
instrument one. Treatment one 
minus treatment two. Yeah we can
do that but it starts to get 
complicated.
This is where ANOVA analysis of 
variance and other techniques 
like linear regression become 
very handy. So let's say we 
wanted to test the effect of 
these BC hydrostickers that we 
sent out. But we tested 
different kinds of stickers.
Two compared to a control group 
that got no sticker. Now we have
three groups. Things that are 
bit more complicated here.
Now there are lots of different 
kinds of ANOVA and as I've 
stressed, it is just one flavour
of a whole family of models 
called linear models.
Linear regression. What is 
common is that they generate p 
values. ANOVA is special. 
Repeated measures ANOVA.
All the kinds is p-values.
So people often use it as a 
first approximation. And people 
say I have three groups.
Treatment one or two.
Is there any significant 
difference between these two 
groups?
Is there any at all? Is there a 
difference? FLo-up tests.
This is where ANOVA can be very 
helpful. In JASP, ANOVA super 
easy. We pull over the estimated
savings from televisions.
We pull over what JASP called a 
fixed factor. In this case it's 
education. We calculate 
descriptive stats.
And when we do this in the ANOVA
pane, it generates a p value 
automatically.
So all the values of education. 
Some elementary. Complete 
almostry. Some high school. 
Completed high school. Some 
college.
Advance training. It's comparing
all of them.
In this case the p value is no.
No, I'll ask you. What do you 
think is going on? Only 
energy-efficient television over
time. And we're looking at the 
mean estimated savings across 
levels of education.
JASP is saying your p value is 
i. .82. No systematic 
difference. No evidence of a 
difference between these groups.
There's truly no difference in a
population among the whole-- 
communions who have different 
levels of education. Or do you 
think there's something else 
going on that's leading this p 
value. Kate's already hit on it.
The ends are tiny. 
  Kate: Almost be impossible to 

say.
Maybe there is-- maybe it is 
actually about the population.
And this is one of the 
unfortunate things about 
underpowered experiences. And we
can adjudicate between Kate and 
Conrad versus what Nathali 
SASHGS saying. These are both 
equally plausible answers. We 
can't say whether the difference
is because these sample sizes 
are really small.
Or in this country in this 
context.
There is no difference between 
people of different levels of 
education estimate the energy 
savings. Erin? 
  Erin: Just a question.
When you do this analysis and 
these seven groups. I'm just 
curious how it compares.
Them, right? Because if say 
imagine yeah I don't know some 
groups were very high and some 
were low.
Looking at it more like this.
How it compares them. 
  Vince: This is something we 
can chat about in office hours. 
Analysis of variance. Going to 
use the variance.
The standard deviation if part 
for these different groups as a 
key way to figure out which 
difference matters for the mean 
estimate.
As the mean varies.
ANOVA is going to exploit the 
fact that some of these have 
higher or lower variance in 
relation to the difference in 
the mean.
Why we call it ANOVA at all. Oh 
sorry yeah. 
  Erin: That make sense. Thanks.
  Jacqueline: So just thinking 
about it in practical terms it's
just reflecting that education 
just may not be a criteria where
you are seeing variance? When 
you're looking at something like
tv so culturally embedded 
doesn't matter if you are 12 or 
22 or 82.
We've all been exposed to tv so 
we know how it works. To your 
point, a F-year-old isn't maybe 
out there vacuuming.
You know a 21-year-old is maybe 
not vacuuming to the same extent
as a parent and a family that's 
40, right? Is that part of why 
it's high p because it's letting
you know that that is not-- 
helps you to narrow down what 
criteria is-- at play? 
  Vince: It's getting 
something-- what Nathali SASHGS 
saying. Maybe just in this 
context. In this population.
Doesn't matter for this 
comparison. The problem is 
because sample sizes are so 
small.
We don't have a lot of 
confidence one way or the other.
Look at some of these mean 
values. People at the lowest for
240 bucks. People at the level 
eight education think you save 
$1400. Serven times more but 
only three people in the first 
group and only seven in the next
group. So it could be exactly 
what you're saying. Education is
not salient. These groups aren't
real.
That could totally be it. Could 
also be looking at ten people. 
That's just a very small sample 
size. It doesn't solve our 
missing data problem. We want to
know the group differences for 
millions of people.
We've only asked ten of them. 
Three have little education. 
It's just not enough sample 
size. So this is why 
underpowered experiments are so 
heartbreaking.
Because you started-- duh the 
whole thing to answer questions 
and at the end you have more 
questions than you did at the 
start? It's a very unsatisfying 
feeling.
I say from personal experience. 
You can also a repeated measures
ANOVA like repeated samples 
t-test. Except this time you 
have multiple variables that 
you're comparing.
So here we're going to look at 
the different planning 
variables.
We're going to compare all four.
And then we see the-- within 
subjects effects here. The p 
value .001. ANOVA is helpful we 
have multiple means and it's the
first thing we do before we 
compare is how we compare those 
means. First approximation. And 
also talk about regression and 
high scoring tests.
Regression is also handy. 
Because it lets Dow a lot of the
stuff all at once. The results 
are harder to interpret. But 
these models are almost 
interchangeable.
And asking yeah knowing that 
could I sort of use selection 
bias and design a sample 
around-- yeah, I mean people-- 
that's like-- I wouldn't say 
it's a good research practice. 
But yeah-- it's all one problem.
You generate other problems, but
yeah.
If you look for a lot of these.
You'll see these df.
This is called degrees of 
freedom. And it's a little bit 
hard to get your head around at 
the start.
We'll return to it in 107 but 
you can think of it in like how 
much wiggle room you have in 
estimation. And I don't mean 
your ability to fudge the 
results. What I mean is how many
bits of information went in to 
calculate the estimate in the 
first place. So if we ask three 
people what they think how much 
money they'll save from 

television? And we pull out the 
mean from that group. We've 
removed one degree of freedom 
just estimate the mean. In 
practice degree of freedom is 
helpful as a researcher can K 
tell you sometimes about what 
your sample size is.
Use it to then think about how 
many people actually comparing 
researchers don't always give 
you the sample size.
And when you have simple models 
like these t-tests you tend to 
always have the same degrease of
freedom.
Just something to report. Which 
you go find more information 
about at the end of these 
slides.
It's not the most relevant 
thing.
Talk about it more.
It has to do with the numbers we
DWRUZ calculate estimates of how
those relate to the samples that
we draw.
Which brings us to class 
activity number two. Slide two 
of the google docs of the google
size file. This time I want to 
ask you to create a computed or 
calculated variable. And might 
have ban few months since you've
done that I've given you 
instructions on how to do it.
In those instructions you'll see
to click a back plus sign in 

JASP.
Unfortunately JASP will only 
show that sign when you load 
data fresh. When you start to do
analysis JASP will make the plus
sign go away. There are other 
ways but that's one of the 
easiest. Give you instructions 
here's exactly what you do step 
by step. Jumping into the groups
that might be the trickiest part
of the activity.
Always raise your hand if you 
have questions. Take about 15 
minutes for this. We'll come 
back with a keytake aways and 
then I'll have office hours.
Any questions before we jump 
into activity number two.
Kate: So if you look at the 
descriptives box it's 7,000 to 
change means fs estimate and 200
for the L estimate. So furnace 
clearly higher and we know it's 
significant don't say very 
significant. Because it's less 
than 001.
Perfect, okay.
  Isaac: Also first class back 
it's hard. I'm a bit tired 
today. 
  Jacqueline: And it's building 
off of the last one with this 
really long gap.
Pretty sure-- just laughing at 
us now. Saying ha ha, this is a 
behavioural insights test. There
is going to be a report on this.
  Isaac: So for the second part,
just mentioned this. 
[ Ann speaking ].
  Kate: Is it repeated measures 
then.
[ Ann speaking ]
  Kate: Oh, yeah all right okay 
so--.
  Isaac: So what are we calling 
it? 
  Shakti: It comes when you load
the data freshly. There are 
other ways to do it. If you like
my help I can help you with it. 
Otherwise you can just read all 
of the data.
  Isaac: I'm confused.
Yeah. 
  Shakti: Go to insert at the 
top. 
  Isaac: Sorry I can't hear 
because of the echo. Can 
everyone mute themselves.
Thank you. Sorry can you say 
that again? 
  Shakt IRGS: Yeah can you go to
insert? Column after.
Just double click on it. And 
then you can pace here. 
  Isaac: Okay then copy and 
paste from the slide. Yes. Can 
you call at the bottom? Okay 
that's it then? But if you want 
to change the call. Yeah it 
should be good now to see the 
issues. Yeah. Just come back. 
  Kate: I can't copy and paste.
  Jacqueline: I'm thinking is 
this ANOVA? 
  Isaac: We can try.
Or is it just normal? ANOVA? 
  Kate: Three different 
measures.
I think it's repeated measures. 
I hit the has to be.
ANOVA on its own is multiple 
groups.
But one--.
What was the name of that new 

column? Yeah column 18.
  Isaac: Just looking at the 
slides too.
Do I put it here? 
  Vince: Hey group one. Okay so 
it's actually not a repeated 
measures ANOVA.
  Kate: That's my fault. My bad.
  Vince: Because each panelist 
has only given one number.
Panelist ID number one was in 
week one. A different panelist 
was contacted in week two. So 
though it looks like it's 
repeating because it's time, 
they are different people 
because they are recruited week 
one. Then week two then week 

three.
So column 18.
  Isaac: We didn't make it 
continuous. Supposed to be. 
  Vince: It doesn't matter. And 
then-- dependant variable is-- 
do you recruit people who are 
younger in one week then another
week? Awesome. Exactly. 
  Kate: And the p is high. 
  Vince: I'll leave you to it.
  Kate: Sorry that was me. 
  Isaac: No my brain is not 
working today. Three different 
means compared across three 
separate weeks.
  Kate: Also different people. 
That's what he was saying the 
reason why it wasn't repeated.
Is because they were different 
groups of people. 
  Isaac: Okay so if the groups 
were the same, it would be 
repeated? 
  Jacqueline: Doing multiple 
touch points with the same 
population. This is just 
stagnating. You have your 
general population.
And when you do your 
intervention we have a sample.
Click on the descriptive 
statistics.
  Isaac: Okay. Is that all we 
need to answer though? Compared 
to the age of each groups. 
Equivalent or not. In each of 
the three groups.
So the average is the mean.
It's fairly close so RP is 
saying and that's confirming. 
The p is confirming it.
If I'm reading that right.
  Isaac: It's not statistically 
significant. Yeah.
And then okay. You can also 
looking at the means they are 
very close, right? 
  Jacqueline: We can't always go
by that. Sometimes a small mean 
difference. Because our 
population is so small. Yeah. 
  Isaac: TOI get my charger. But
that's it, right? Okay.
  Jacqueline: I was just looking
at the slides. It's like ZPRIE 
work on it on the weekend it's 
due on Sunday.
  Kate: I'll do it tomorrow 
instead.
  Kate: The main thing.
  Jacqueline: So what I've 
learned is make sure you have a 
statistician on the team.
  Kate: I swear all the stuff is
asking to be AI generated.
  Jacqueline: Apparently Claude 
will do it for us. 
  Kate: Show you what Claude.
  Jacqueline: Claude went on 
Easter vacation. I'll get back 
to you. 
  Kate: They used to do these 
quizzes.
And he said I'd ask Adrian. And 
he got the right answer. And I 
think that should have been a 
tick.
  Vince: We're back. I wasn't 
sure about adding the computed 
column with presher and time but
I felt like it'd been a few 
months before we'd done a 
computed variable so my pleasure
for giving you the formula. I 
think someone asked how do we 
come up with these formulas in 
the future. You can always book 
a one on one chat with me we can
talk about that.
The basic idea is they function 
like an excel formula.
So if you are used to locking up
excel formulas you can always 
reach out to me. Always happy to
talk about how that works.
If you have complicated 
computations. That is true. So I
framed BI data analysis as a 
problem of missing data. We want
to know things about the 

population.
How do we make an inference 
about this using this. How do we
do that? Gave it even more 
precision. Actually most 
interested in comparing 
averages.
We want to compare the means 
from one sample to another.
The difference between the means
between group one and group two 
also exist in the full 
population. The data from which 
we don't have. Linear models for
families.
The t-test as the most common 
statistical technique to make 
this kind of group comparison.
From a difference in the sample 
would obviously exist in the 
population.
Would give you some language 
around that.
As you probably picked up on it 
gets complicated. Independent 
sample t tests one ANOVA 
repeated linear regression 
correlation. Lots of different 

tests.
Hopefully we can simplify the 
decision for you and reach out 
to the BI community people like 
me. AI of this data I have a 
question how do I figure out. 
That's what we'll talk about in 
BI 107 a bit more.
Is how we Fick right model at 
the end of our design.
At the end of these slides 
you'll see some additional 
information about how to report 
results using APA format.
Also more advanced stuff on high
scoring tests.
Just saying this week's slides 
have a lot and some additional 
bonus material some of what will
be helpful for the problem set 
which you should have access to.
The video will be posted on 
moodl. I'll have office hours 
right after class so if you want
to chat please stick arounder. 
Problem set due soon. Bunch of 
multiple choice fill in the 
blank questions. Later on 
another written set kind of like
106.
Applying all the concepts here. 
This is our first introduction.
And first class of BI 107 thank 
you so much for taking the time.
Enthusiasm and intelligence and 
patience and comradery. I'll be 
sticking around for office hours
a little bit. If you are unable 
please feel free to reach out by
e-mail and we can book time to 
chat. Tell me about your data 
situation.
Happy to offer insights.
 Thank you so much everyone.
  Isaac: Thanks Vince. 
  Vince: Bye Isaac. Thank you.
  Nathalia: Yeah when I was 
reading the previews-- I was 
stuck with thinking if I would 
do something wrong in using does
it work. Because I don't want 
to-- against the whole 
population. My inference is 
against mainly hostal workers. 
Is my-- okay or every time I'm 
doing this against everyone? I 
got confused then it's not going
to be true because they are not 
health care workers. I don't 
know I got stuck with that. I 
don't know if it matters but I 
thought I would ask. 
  Vince: Well on one level it 
doesn't matter.
As in whether you think you are 
making inferences to population 
of Hospital workers in British 
Columbia. Hospital workers in 
the Vancouver coastal health 
authority.
Over the age of 40 in the 
coastal health authority. It 
doesn't really matter. 
Everything we talked about today
applies. You don't do anything 
differently.
What does matter is how you 
think about sampling. If you 
want to make inferences about 
Hospital workers.
You probably don't want to do a 
general survey.
And so you have to convince your
audience in two ways. One sample
size. Big enough sample to make 
inferences to my population and 
you will believe me.
But the other one is much less 
scientific and much more about 
let me tell you about how I 
recruited these individuals and 
why I think they are 
representative of a population 
that's very specific.
This is hard to do. I said the 
immigration and one thing I love
about studying migration is how 
diverse it is. It is extremely 
hard to talk about my 
population.
Right now in the field to 
newcomeers to British Columbia 
and I can think about my results
being representative in terms of
Visa status.
I know British Columbia 77% of 
new comers are permanent 
residents and naturalized 
citizens and about 23%er are 
non-permanent residents.
Okay so I recruit a sample and I
tell you I have 77% permanent 
residents. 23% non-permanent. 
And you might think wow that's 
really representative.
I think yeah good job that's 
really representative.
And then I look at country of 
birth.
And I'm like Oh, damn.
This since representative at 
all. I have way too many 
Europeans in my sample.
Vastly overrepresented in my 
data.
And so this is not about sample 
size. It's about sampling and me
telling you hey, my sample is 
representative on Visa status 
and in terms of education level.
And in terms of occupation. Not 
representative in terms of 
country of origin and language. 
This is more like a building my 
credibility with you.
So on the one hand it doesn't 
matter because everything we 
talked about is the exact same. 
On the other hand it really 
matters.
Your result or inferences are 
only as good as your sampling.
Does that make sense? Awesome.
>> I would be sampling Hospitals
more about the unit I work.
But some of the tests from the 
previous class I know I can use 
for this one I don't know.
  Vince: ZPUK think about 
representativeness in every 
sense here.
Let's say that you surveyed 
Hospital workers specifically.
Dower work with multiP 
Hospitals? 
  Nathalia: Children's Hospital 
in-patient unit.
  Vince: I'm going to sample 
children's Hospital and there's 
400 employees.
And you sample a hundred of 
them. 25%.
Is that good? I don't know maybe
they are all managers or first 
year residents.
Or all people who have ten years
experience because you did the 
sampling during business hours 
and seniority predicts who gets 
the day and night shift.
You have really good 
representation of day shift 
workers. It's the right unit. 
Right Hospital.
But they are not representative 
of a full unit.
  Nat: Oh I consider the unit 
okay. 
  Vince: So uh some flexibility 
here.
My sample represented in these 
dimensions and not in these 
dimensions.
  Andrea: Would you highlight 
that in both scenarios reasons 
for that? So say that example 
that you just gave to Nathalia, 
more managers filled out her 
survey because they sit at a 
desk more DURLG during the day 
than students who actually 
aren't on physical devices.
And they would have to go to 
their phone and look at a survey
through their phone.
Maybe they don't do that lout 
the day.
  Vince: Yeah it's even worse. I
want to study burnout. 
In-patient unit care? A very 
specific population.
I surveyed 400 employees. No 
evidence of burnout. Really low.
Now I asked a hundred people the
average salary was $250,000 a 
year. I did my recruitment at 
is:30 P.M.
And I asked for 60 minutes of 
people's time. And they were 
like well wait a minute. The 
people who can afford to take 60
minutes out of their day in the 
middle of the afternoon and earn
that kind of money? Might be the
people who are least likely to 
experience burnout. And tell you
one thing about 
representativeness in outcomes 
and comparisons. And so yes-- I 
would communicate all of that.
But also do thinking on behalf 
of my audience and say here's 
why that's a problem.
In my case the survey of 
newcomers.
We launched it. All did it in a 
three-month period. Highlight 
it's been great. It's been a 
huge headache though.
Mostly a country of origin. 
Because what it is showing me is
how little I know about these 

individuals.
One variable complex life and 
I'm getting these huge 
differences.
And I can at least tell you how 
they are non-representative of 
my country of origin. One of the
things not representative about 
that I can't measure? 
  Andrea: When you take that out
remove country of origin so it 
doesn't skew people thought? 
  Vince: In this case I did a 
much more complicated thing. I 
worked out Claude code a lot to 
find-- something called survey 
waiting.
Sow found-- I said my population
newcomeers to British Columbia.
And now I want to find out what 
is the average education level 
of newcomeers to British 
Columbia.
What is the percentage of people
who are permanent and non-perm 
SNENT what is the average age? 
What is the average reasonen of 
birth. And then I use complex 
math.
Magic hand wave to adjust and 
wait my survey responses.
And so groups that are 
over-represented count less and 
groups that are under 
represented count more.
And this is a very 
well-established technique.
And if you ever read survey data
in the newspaper, people are 
using weighting. So hi a 
statistical solution for it and 
then I tell my audience these 
are representative by Visa 
status. Region of birth. Age and
education and gender.
They are not representative for 
family size.
In terms of whether they've 
experienced discrimination. I 
don't measure in a survey I 
don't know what the population 
distribution is. What percentage
have experienced discrimination?
That's an unknown quantity.
We don't know that like census 

data.
It's a communication thing. I'm 
trying my best is and thinking 
about this really hard so that 
you don't have to. And I'll tell
you all my limitations and 
hopefully at the end of the day 
you trust me.
And then I don't lie to you. 
That's my approach.
  Melanie: Vince I just have a 
quick question because I have to
get back to work unfortunately. 
Only 3:15 for me so my company 
is very good at giving me a 
couple hours off but the e-mails
are building up unfortunately. 
I'm struggling with our capstone
project slightly because it's 
work safe BC apparently we're 
doing open rates and click rates
of e-mail.
And my brain just can't get 
there. If it was people or 
something just-- and apparently 
very few people open e-mails 
from work safe BC. But we have a
high click rate.
So that's good isn't it?
But where can I learn more-- 
going so--. Well maybe. Unless 
you're talking about what you're
talking about. Unfortunately if 
I'd been doing patients.
I'm a risk management special T 
in health and safety. But 
e-mails probably one of the bad 
ones. I don't open e-mails 
either.
  Vince: You don't have a 
statistical problem you have the
data problem.
Okay here's what I tell myself.
I work on the most boring 
projects. Especially really-- 
just what I feel like why did I 
go to school so long? Why did I 
work so hard to work on these 
stupid projects? Let me tell you
about how I motivate myself. A 
stupid project that feels-- I 
felt that way.
Is a great project to learn 
from. Actual health matter. Like
illness is an outcome. Morbidity
is a plausible outcome.
Income assistance received. 
Those are scary projects to 
learn on. I learn by making 
mistakes. So I motivate myself 
by one by saying hey this boring
problem. This is a great place 
to learn. I can make a mistake 
and I won't feel bad.
That's one thing. Another 
motivation is ever heard of 
what's a conversion funnel? A 
very common phrase in tech.
And you've experienced it before
when you log onto net flex and 
it says create an account and it
lets you choose an avatar.
Here are the top moveies and 
then you scroll down and it 
shows here comedies. And Netflix
is tracking everything that 
you're doing. They are counting 
when you open the app. How long 
it takes you to create an 

account.
What avatar you choose.
The way this movie is pore 
trade. How long you scroll.
It understands human behaviour 
operates it as a funnel. 
Focusing of attention.
And it gets measured by pupil 
dilation. To go from that to 
clicking is a very systematic 
general pattern of human 
behaviour.
That grabs our attention. 
Triggers a motor response.
Which makes your muscle click on
the mouse.
This matters for life and death 
things.
Whether people stand up for 
prostate cancer screening.
Whether people gate pediatric 
consult.
Pupils die late and finger 
moves. Attention allocation and 
effort full behaviour. What you 
are learning is the cone version
fund.
Triggers motor response. This is
a great way to learn it. To do 
it some day with health and 
safety. And you want to learn 
all that stuff and thought about
all that stuff now.
So it's a great way to learn and
the things you learn are very 
general about how to grab 
attention and how effort can 
change people's lives.
In time I've come to believe 
there's bad project.
There are maybe boar progress 
JEKTs but not a waste of time.
Very good use of time because 
I'm a lifelong learner I always 
want to get better at what I do.
So that's my pep talk. 
  Melanie: Well I'll try. I 
guess it took the wind out of my
sails. And Jacqueline was like 
Oh well I wasn't expecting it.
I was expecting we'd get some 
click throughs with all the work
that we did.
And we have three. PEEMENT have 
opened it just about 50%. Work 
safe BC tends to get 30%.
So 50% of the people opened it 
great. Of the 50 we've done 
three different e-mails. We had 
one click on one.
We don't have huge sample size.
160 in each group and the other 
one we got three.
So we are not getting a lot of 
people clicking through and I 
don't know whether it's related 
to the fact people don't open 
work safe BC e-mails.
We were told that's the best way
to communicate with our group. 
  Vince: Honestly this sounds 
great because you have a huge 
open rings and they are always 
lower than reality. Some people 
use tools like g mail. Which 
blocks images.
Downloaded onto your community. 
Google walks downloaded and 
images.
Download images from the centre.
It's talking about trackers.
So depending what people use you
are underestimating open rate.
So you tell me awe OO% open rate
I tell you you have a minimum 
50% open rate.
So you have a very high open 
rate. That's interesting. 
Something about your e-mail is 
getting people's attention.
You're dielating pupils. They 
are drawn to it.
No data is data.
  Melanie: I say we don't make 
mistakes.
So maybe that's what I have to 
keep thinking.
We kind of thought we were going
to get more.
Open rates.
Really well established 
government open rates are 
higher.
You can imagine why. 
  Vince: Night life? 
  Melanie: Pubs and bars.
Because they are the ones with 
hearing loss nobody is doing 
anything about. I believe it's 
because they don't understand 
risk.
  Vince: Risk is great because 
it drives attention, allocation 
and effortless behaviour.
And thinking strategically and a
clever way about risk can help 
you design even better. But 
yeah, you're right, you want to 
get that click rate higher. 
Let's make experiments part of 
how we do everyday business.
We launch an experiment and 
write a report and learn from 
it. Let's do experiments all the
time.
  Melanie: That's the good 
thing. Jack we SLIN at work safe
BC.
So she'll continue the project.
This is really cool.
  Vince: It is. And Nathalia 
here talking about keeping 
expectations low. Yeah--.
I hope you boost that up. E 
eventually you start to think-- 
yeah.
  Nathalia: We can move the then
kiosk clearly viz to believe 
where it's front of the door.
Would we have a difference here?
  Vince: I definitely think that
ambiguity. Gone I believe my 
intervention don't work the way 
they should work. Like Oh that's
cool that's interesting. 
  Melanie: The company I work 
for is amazing and I'm planning 
on using it for safety.
How can we educate people in the
workplace and improve and reduce
them from getting hurt.
  Vince: If you have an injury 
prevention project you have some
to reduce workplace injuries. 
Falls, accidents.
And you tell me uh an effect 
size of .1. A p value of .000.
And sample size of 5,000. That's
awesome. Probably worth the 
intervention.
One life changing injury is 
incredible.
What an important thing to--. 
  Melanie: We have to graduate 
first Vince.
  Vince: But you will making 
pitches. Good for you.
Also part of a community. That's
the nice thing about this 
program. Part of the BI 
community.
There's the forum and the 
wiki--. Hey hi this project. I 
wonder what they are working on.
I love it. 
  Melanie: Hoping I can get the 
company to do a capstone 
project.
Anyway back to the exciting 
data.
  Ann: I was going to ask by 
e-mail. 
[ Ann speaking ].
. 
  Vince: The written assignment 
is at the end of the class so 
hopefully you will have learned 
a lot of material before you get
there. 
[ Ann speaking ].
You can still do ANOVA by hand. 
The calculus if you want.
[ Ann speaking ]
Which equation? I don't know 
which.
[ Ann speaking ]
Yeah that's fine.
What about the written 
assignment right? The BI one 
written assignment work sheet? 
[ Ann speaking ]
Pam Heggie, CSR(A) RPR
Accurate Realtime Reporting Inc.
Uncertified (draft) Verbatim 
Transcript